paint-brush
PSG is a New Task for AIs Requiring Higher Levels of Understandingby@whatsai
1,086 reads
1,086 reads

PSG is a New Task for AIs Requiring Higher Levels of Understanding

by Louis BouchardSeptember 3rd, 2022
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Panoptic scene graph generation, or PSG, is a new problem task aiming to generate a more comprehensive graph representation of an image or scene based on panoptic segmentation rather than bounding boxes. It can be used to understand images and generate sentences describing what's happening. This may be the most challenging task for an AI! Learn more in the video...

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - PSG is a New Task for AIs Requiring Higher Levels of Understanding
Louis Bouchard HackerNoon profile picture

Panoptic scene graph generation, or PSG, is a new problem task aiming to generate a more comprehensive graph representation of an image or scene based on panoptic segmentation rather than bounding boxes. It can be used to understand images and generate sentences describing what's happening. This may be the most challenging task for an AI! Learn more in the video...

References

►Read the full article: https://www.louisbouchard.ai/psg/
►Yang, J., Ang, Y.Z., Guo, Z., Zhou, K., Zhang, W. and Liu, Z., 2022.
Panoptic Scene Graph Generation. arXiv preprint arXiv:2207.11247.
►Code: https://github.com/Jingkang50/OpenPSG
►Project page (PSG dataset): https://psgdataset.org/
►Try it: https://replicate.com/cjwbw/openpsg, https://huggingface.co/spaces/ECCV2022/PSG
►My Newsletter (A new AI application explained weekly to your emails!): https://www.louisbouchard.ai/newsletter/

Video Transcript

       0:00

you can use ai to identify what's in an

0:02

image like finding out whether there's a

0:04

cat or not in this scene if there's one

0:07

you can use another ai to find where it

0:10

is in the image and you can find it very

0:12

precisely these tasks are called image

0:15

classification object detection and

0:17

finally instance segmentation then you

0:20

can build cool applications to extract

0:23

your cat from an image and put it into a

0:25

fun gift card or a meme but what if you

0:27

want an application that understands the

0:29

scene and image not only being able to

0:32

identify whether there's an object and

0:34

where it is but what's happening you

0:36

don't want to identify if there's a

0:38

customer or not in your shop but you

0:40

might want to identify if the customer

0:42

in question is stealing you whether

0:44

using such surveillance is ethically

0:46

correct or not is a whole other question

0:49

you also need to consider still suppose

0:51

we focus on finding out what's happening

0:53

in a scene or a particular image in that

0:56

case you'd want to use a task called

0:58

scene graph generation where objects are

1:01

directed using bounding boxes as shown

1:04

previously with object detection which

1:06

is then used to create a graph with each

1:09

object's relationship to each other

1:11

object it will basically try to

1:13

understand what's happening from all the

1:15

principal objects of the scene it works

1:17

quite well and finds out these main

1:19

characteristics of the image but there's

1:21

a big problem it relies on the bonding

1:23

box accuracies and completely disregards

1:26

the background which is often crucial in

1:28

understanding what's happening or at

1:30

least giving a more realistic summary

1:33

instead you might want to use this new

1:35

task called panoptic scene graph

1:38

generation or psg psg is a new problem

1:42

task aiming to generate a more

1:43

comprehensive graph representation of an

1:46

image or scene based on panoptic

1:49

segmentation rather than bonding boxes

1:52

something much more precise taking into

1:54

account all pixels of an image as we saw

1:57

and the creators of this task didn't

1:58

only invent it but they also created a

2:01

data set as well as a baseline model to

2:03

test your results against which is

2:05

really cool this task has a lot of

2:07

potential as understanding what is

2:09

happening in an image is incredibly

2:11

useful and complex for machines even

2:14

though humans do it automatically it

2:16

brings some sort of needed intelligence

2:18

to the machines making the difference

2:20

between being a cool funny app like

2:23

snapchat to a product you'd use to save

2:25

time or complete a need like

2:27

understanding when your cat wants to

2:29

play and using a robot to play with it

2:31

automatically so it isn't bored all the

2:33

time

2:34

understanding a scene is really cool but

2:36

how can a machine do that well you need

2:39

two things a data set and a powerful

2:42

model we know that we already have the

2:44

data sets since they built it for us now

2:47

the second thing how to learn from this

2:50

data set which means how to build this

2:52

ai model and what should it do there are

2:55

multiple ways to approach this problem

2:58

and i invite you to read their paper to

3:00

find out more but here's one way to do

3:02

it

3:03

before getting into it give me a few

3:05

seconds to be my own sponsor and talk

3:07

about our community since you are

3:09

watching this video i know you will love

3:11

it as it was basically created for you

3:13

of course we have the youtube community

3:15

which you should definitely join by

3:17

clicking the little subscribe button and

3:19

commenting below for instance i'd love

3:21

to know what you think about this task

3:23

and if it's interesting or not to the ai

3:25

community i also wanted to share our

3:28

discord community learn ai together it's

3:31

a place to connect with fellow ai

3:33

enthusiasts from any skill level find

3:35

people to learn with find people to work

3:37

with ask your questions or even find

3:40

interesting job offers we are organizing

3:42

a lot of very cool events and q as like

3:44

the one we are currently running with

3:46

the mine rl organizers from deepmind and

3:49

openai the link is in the description

3:51

below and i'd love to see you join an

3:53

exchange with us there

3:55

as we said the model needs to find the

3:57

class for each pixel of the image

3:59

meaning that it has to identify every

4:01

pixel of the image the first stage of

4:04

the model will be responsible for this

4:06

it will be a model called panoptic fpn

4:09

already trained to classify each pixel

4:12

such a model is already available online

4:14

and quite powerful it will take an image

4:17

and return what we call a mask with each

4:19

pixel matched to an existing object like

4:22

a ball human or grass in this case you

4:25

now have the segmentation and you know

4:28

what's in the image and where if you are

4:30

not familiar with how such a model works

4:32

i invite you to watch one of the videos

4:34

i made covering similar approaches like

4:36

this one the next step is to find out

4:38

what's happening with those objects

4:41

here you already know it's a man playing

4:43

soccer on the field but the machine

4:45

actually has no idea the only thing it

4:48

knows is that there is a man a ball and

4:50

a field with a lot of confidence but it

4:53

doesn't understand anything and cannot

4:55

connect the dots as we do with ease we

4:58

need a second model trained just to take

5:00

those objects and figure out why they

5:03

are in the same picture this is the

5:05

scene graph generation step where a

5:07

modal will learn how to match a

5:09

dictionary of words and concepts

5:12

covering multiple possible object

5:13

relations to objects in a scene using

5:16

the information extracted from the first

5:19

stage learning how to structure the

5:21

objects with each other object and voila

5:25

you end up with a clear graph that you

5:27

can use to build sentences covering

5:29

what's happening in your image you can

5:31

now use this approach in your next

5:32

application and give a few iq points to

5:35

your approach getting it closer to

5:37

something intelligent if you'd like to

5:39

learn more about this new task i

5:41

strongly invite you to read the paper

5:43

linked below thank you for watching

5:45

until the end and i will see you next

week with another amazing paper