PSG dataset has 48749 images with 133 object classes (80 objects and 53 stuff) and 56 predicate classes. It annotates inter-segment relations based on COCO panoptic segmentation.


We believe that the biggest problem of classic scene graph generation (SGG) comes from the noisy dataset. Classic scene graph generation datasets adopt bounding box-based object grounding, which inevitably causes a number of issues:
  • Coarse localization:

    bounding boxes cannot reach pixel-level accuracy,
  • Inability to ground comprehensively:

    bounding boxes cannot ground backgrounds,
  • Tendency to provide trivial information:

    current datasets usually capture objects like head to form the trivial relation of person-has-head, due to the large freedom of bounding box annotation.
  • Duplicate groundings:

    the same object could be grounded by multiple separate bounding boxes.
All of the problems above can be easily addressed by PSG dataset, which we ground the objects using panoptic segmentation with appropriate granularity of object categories (adopted from COCO).

In fact, PSG dataset contains 49k overlapping images from COCO and Visual Genome. In a nutshell, we ask annotators to annotate relations based on COCO panoptic segmentation, i.e., relations are mask-to-mask.

Clear Predicate Definition

We also find that a good definition of predicates is unfortunately ignored in the previous SGG datasets. To better formulate PSG task, we carefully define 56 predicates for PSG dataset. We try hard to avoid trivial or duplicated relations, and find that the designed 56 predicates are enough to cover the entire PSG dataset (or common everyday scenarios). Readers can check the short-version of PSG annotation documentation.

Positional Relations (6) over, in front of, beside, on, in, attached to.
Common Object-Object Relations (5) hanging from, on the back of, falling off, going down, painted on.
Common Actions (31) walking on, running on, crossing, standing on, lying on, sitting on, leaning on, flying over, jumping over, jumping from, wearing, holding, carrying, looking at, guiding, kissing, eating, drinking, feeding, biting, catching, picking (grabbing), playing with, chasing, climbing, cleaning (washing, brushing), playing, touching, pushing, pulling, opening.
Human Actions (4) cooking, talking to, throwing (tossing), slicing.
Actions in Traffic Scene (4) driving, riding, parked on, driving on.
Actions in Sports Scene (3) About to hit, kicking, swinging.
Interaction between Background (3) entering, exiting, enclosing (surrounding, warping in)

Distribution of Number of Objects per image

Distribution of Number of Relations per image

Distribution of Density of Relations per image

Frequency of Objects

Frequency of Objects (Thing)

Frequency of Objects (Stuff)

Frequency of Predicates

Word Clouds of Predicates

Word Clouds of Things and Stuff




Website Layout inspired by en.psg.fr.
“Allez Paris, Paris est Magique!”