logo
ResearchBunny Logo
The role of semantics in the perceptual organization of shape

Psychology

The role of semantics in the perceptual organization of shape

F. Schmidt, J. Kleis, et al.

This fascinating study by Filipp Schmidt, Jasmin Kleis, Yaniv Morgenstern, and Roland W. Fleming uncovers how we establish correspondence between objects with vastly different shapes by focusing on their semantic parts. Discover how the similarities in heads, wings, and legs lead to a deeper understanding of object recognition and perception!... show more
Introduction

The study investigates how humans establish point-to-point correspondences between object shapes, a process crucial for object constancy, similarity judgments, and inferring transformations. Prior work shows correspondence can be robust to rigid and non-rigid transformations and can be guided by geometric heuristics such as curvature landmarks. The authors hypothesize that when available, high-level semantic information about parts (e.g., head, legs, wings) contributes to or even overrides purely geometric cues in determining correspondences across very different shapes or ambiguous figures. They test whether semantic part organization is sufficient to guide correspondence when geometric similarity is low (very different shapes) or uninformative (identical shapes with different semantic interpretations), thereby probing the interplay between perceptual organization and cognition.

Literature Review

Previous studies demonstrated that point-to-point correspondence can be maintained across changes in viewpoint and complex transformations, including non-rigid deformations. Earlier work by the authors showed that humans are accurate at dot-to-dot correspondence across transformations in 2D shapes, and a simple curvature-based heuristic often predicted human data better than ground truth transformations. However, geometry-based heuristics alone cannot explain correspondences between highly dissimilar objects (e.g., elephant vs. anteater) or identical contours with different semantic interpretations (e.g., duck–rabbit). Classic research in vision highlights that objects are perceived in terms of parts (Hoffman & Richards; Biederman; Siddiqi et al.), suggesting that semantic part structure could provide anchor points for correspondence. The paper situates its contribution by contrasting semantic part-based correspondence with curvature-based and uniform sampling baselines, extending prior models that relied on geometric landmarks.

Methodology

Design: Three experiments and computational modeling.

  • Experiment 1 (Different geometry, similar parts): 15 participants (11 w, 4 m; mean age 22.5). Task: dot-matching. Stimuli: six pairs of animal silhouettes with different contours but similar semantic part organization (Elephant–Anteater; Ostrich–Flamingo; Antelope–Giraffe; Lama–Fox; Butterfly–Owl; Lizard–Whale). For each base shape, 50 probe points were sampled at equidistant intervals from a random starting point. Procedure: On each trial, a red probe dot appeared on the base shape; participants placed a green bullseye at the corresponding point on the test shape’s contour. Probe order was randomized; each pair remained visible until all 50 matches were made. Orientation (same vs. opposing headings) was counterbalanced. Stimuli were scaled to equal bounding box area; presented on an EIZO CG277 monitor (2560×1440, 59 Hz). Analyses: With no ground truth, the authors quantified (i) inter-participant congruity (0=random to 1=perfect, based on average along-contour distances as % perimeter), (ii) preservation of point ordering (reversals vs. random), and (iii) semantic alignment of matched parts (evaluated in Experiment 3).
  • Experiment 2 (Identical geometry, different parts): 15 participants (12 w, 3 m; mean age 23.7). Task: same dot-matching. Stimuli: five ambiguous figures with identical contours but different labels prompting different semantic part organizations (Swan–Squirrel; Parrot–Goose; Whale–Snail; Duck–Rabbit; Swan–Cat), plus baseline identical-label pairs (Whale–Whale; Antelope–Antelope from a separate n=15). Procedures matched Experiment 1; ambiguous pairs had different labels beneath otherwise identical shapes. Analyses paralleled Experiment 1, including subgroup analyses by perceived orientation/interpretation.
  • Experiment 3 (Semantic labeling and correspondence): Two parts with separate cohorts. • 3A (n=12): Participants segmented each shape into parts by placing non-intersecting cuts and assigned labels from a provided list (Head, Body, Eye/s, Neck, Front leg/s, Hind leg/s, Foot/Feet, Ear/s, Trunk, Mouth, Antenna, Horn/s, Beak, Wing/s, Tail, Fin/s, None). For each contour point, the most frequent label across participants was assigned, yielding dense semantic maps per shape. • 3B (n=9): Using printed cards for the most frequent labels from 3A (color-coded for base/test shapes), participants sorted labels to establish one-to-one (allowing many-to-one but not many-to-many) correspondences between semantic parts across shape pairs, purely from labels (no images). Modeling: The core semantic organization model uses Experiment 3 data. For a base-shape probe point: (1) identify its semantic part label (from 3A), (2) map to the corresponding semantic part on the test shape (from 3B), (3) compute the probe’s relative position along the base part’s contour segment (fraction of the part’s perimeter from part start to probe), considering heading/order direction, and (4) place the predicted corresponding point at the same relative fraction along the mapped part on the test shape. The sole inputs are semantic labels and correspondences; there are no free parameters. Alternative models:
  • Uniform sampling: predicts equidistant placements around the test contour maintaining the probe order; starting points aligned by the leftmost semantic part to be maximally favorable.
  • Curvature-based: computes contour surprisal (an unsigned curvature-related information measure using a von Mises distribution on turning angles; window size 5% of perimeter), aligns base/test surprisal profiles via dynamic time warping, then maps probe points accordingly.
  • Combined model: within each mapped semantic part, adjust positions relative to salient surprisal landmarks (local extrema) when unequivocal assignments are possible; otherwise default to the semantic model. Statistical evaluation: For each probe, distances between individual/model predictions and the median human response were computed as % of the test contour perimeter. T-tests across 50 probes per pair and Bayes factors (JZS prior, scale 0.707) compared model vs. human distances. Congruity and ordering preservation metrics complemented these analyses.
Key Findings

Experiment 1 (Different geometry, similar parts):

  • High inter-participant congruity across pairs (A–F): 0.90, 0.86, 0.90, 0.88, 0.48, 0.71 (all > random, Wilcoxon Z between −5.61 and −6.15, p<0.001). Lower congruity for Butterfly–Owl (0.48) was due to 3D orientation ambiguity; subgroup congruities were 0.76 and 0.81 with preserved internal consistency.
  • Order preservation exceeded random for all pairs: 100%, 80%, 94%, 92%, 16%, 62% (Wilcoxon −2.23<Z<−6.94, p<0.026); Butterfly–Owl subgroups: 56% and 68%.
  • Distances to median human responses (% perimeter): Humans: Elephant–Anteater 2.6; Ostrich–Flamingo 3.6; Antelope–Giraffe 2.5; Lama–Fox 3.0; Butterfly–Owl 5.5 (13.0 across two types); Lizard–Whale 7.2. Semantic model: 2.7; 6.5; 2.1; 3.9; 4.9 (12.1); 5.2. T-tests showed no significant differences between semantic model and human distances for any pair (Bonferroni-corrected p<.008 threshold; see Table 1). Uniform sampling and curvature-based models yielded substantially larger errors for most pairs (many p<.001, BF10>30), except isolated cases (e.g., curvature matched human variance for Elephant–Anteater and Lama–Fox but failed elsewhere). Experiment 2 (Identical geometry, different parts):
  • Participants remained more congruent than random but less than in Experiment 1: congruity (A–E): 0.75, 0.28, 0.40, 0.37, 0.76 (all p<0.001 vs. random). Ambiguous orientations influenced ordering (e.g., Parrot–Goose subgroups congruity 0.86 vs. 0.51; Duck–Rabbit subgroups reflecting different rabbit interpretations).
  • Ordering preservation exceeded random: 82%, 90%, 82%, 58%, 82%; subgroup ordering similar for Parrot–Goose (88%, 94%) and Duck–Rabbit (56%, 68%).
  • Human distances: Swan–Squirrel 6.2; Parrot–Goose 7.9 (18.0); Whale–Snail 15.0; Duck–Rabbit 10.0 (15.8); Swan–Cat 6.0. Semantic model: 4.9; 5.3 (6.8); 16.5; 6.2 (6.6); 4.8. Only Duck–Rabbit showed a significant difference (T=2.85, p=.006, BF10=7.20); other pairs were not significantly different. Uniform sampling and curvature-based models performed markedly worse (e.g., uniform 32.7–35.4% for several pairs; curvature ~23–27% across pairs; many p<.001, BF10>30). Overall modeling:
  • The semantic organization model matched human responses within human–human variability in most conditions and far outperformed uniform sampling and curvature-based baselines, indicating that semantic part mappings and within-part positional interpolation capture human correspondence judgments.
  • Adding curvature landmarks to the semantic model did not improve performance meaningfully (no consistent gains; one poorer case in Parrot–Goose).
Discussion

Findings support the hypothesis that semantic part organization guides point-to-point correspondence when geometric similarity is low or uninformative. In Experiment 1, despite dramatic geometric differences, observers produced consistent correspondences aligned by analogous semantic parts (e.g., elephant trunk to anteater snout). In Experiment 2, identical contours with different labels yielded different correspondences, demonstrating that semantic interpretation can override geometry. The semantic organization model—anchoring correspondences to labeled parts and transferring relative within-part positions—accounts for human behavior as well as inter-observer consistency, while purely geometric models (uniform spacing or curvature-based surprisal alignment) fail on these stimuli. This suggests a flexible interplay: humans rely on geometric landmarks when shapes are similar or unfamiliar, but shift to semantic part cues when objects are familiar or share part structures. The results also speak to cognitive-perceptual integration: top-down semantic knowledge can structure local spatial correspondence decisions, potentially reflecting interactions between higher-level object representations and lower-level contour processing.

Conclusion

The paper demonstrates that humans establish correspondences between very different shapes by evaluating similarity between semantic parts, and that semantics can override purely geometric cues. A zero-parameter semantic organization model using independently collected part labels and part-to-part correspondences predicts human dot-matching with accuracy comparable to human–human variability and surpasses uniform and curvature-based baselines. Contributions include: (1) behavioral evidence for semantic guidance in correspondence across dissimilar shapes and ambiguous figures; (2) a simple, effective semantic part-based model for fine-grained point predictions; (3) a comparison showing limited utility of curvature cues in these contexts. Future directions include integrating semantic part segmentation from modern machine learning for large-scale correspondence and perceptually plausible shape morphing, extending to 3D shapes to resolve orientation ambiguities and enrich spatial context, exploring probabilistic modeling of multiple semantic interpretations, and investigating neural mechanisms of top-down semantic influences on perceptual organization.

Limitations

Model performance depends on the availability, granularity, and quality of semantic part labels; typical operating levels (e.g., coarse ‘Head’ vs. fine distinctions such as ‘Ears’ and ‘Mouth’) may vary across observers and stimuli. The approach does not predict inter-individual differences or ambiguous interpretations (e.g., Butterfly–Owl wing orientation; Duck–Rabbit rabbit heading). Ambiguities in 3D orientation reduce congruity and ordering. Current work is limited to 2D contours; extending to 3D surfaces would require modeling spatial relations among surrounding parts and may alleviate orientation ambiguities. A probabilistic mixture over alternative semantic labelings might better capture variability and ambiguous cases.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny