Psychology
Domain-specific functional coupling between dorsal and ventral systems during action perception
H. Yang, C. He, et al.
This fMRI study reveals how our brains differentiate between social and manipulation actions while interacting with the ventral occipitotemporal cortex. Conducted by Huichao Yang, Chenxi He, Zaizhu Han, and Yanchao Bi, the research uncovers domain-specific activations along with intriguing connectivity patterns, suggesting a unified framework for comprehending action perception.
~3 min • Beginner • English
Introduction
Object recognition primarily relies on the ventral occipitotemporal cortex (VOTC), which exhibits a robust domain organization (e.g., faces, animals, scenes, tools) including an animate/inanimate distinction. Action observation predominantly engages dorsal stream regions including occipitoparietal cortex, posterior dorsal temporal gyrus, and inferior frontal gyrus. While dorsal–ventral interactions are well documented, it remains unclear whether the action perception system itself is organized by analogous domains and whether its communication with ventral object regions follows domain-specific principles. The authors test whether action perception is organized along social-communicative versus manipulation domains—beyond object-domain properties—and whether domain-specific dorsal action systems dynamically couple with ventral object-selective regions during perception.
Literature Review
Prior work shows domain-related interactions: manipulable objects engage inferior parietal regions implicated in manipulation knowledge and hand–object action perception, whereas faces/animals engage posterior superior temporal sulcus (pSTS) linked to biological motion and social interaction processing. Structural and resting-state connectivity indicates that tool-preferring ventral regions connect with frontoparietal manipulation networks, and face-preferring ventral regions connect with pSTS. Task-based FC between tool-preferring ventral and dorsal regions is enhanced during action tasks. Causal evidence shows left inferior parietal lesions/stimulation modulate tool representations in ventral medial tool-preferring areas. However, much of this evidence involves object stimuli, potentially confounding domain effects with object-domain properties. It is unclear whether action domains (social-communicative vs. manipulation) exist independent of object properties, whether pSTS differentiates types of biological motion by goal (human-directed vs. object-directed), and whether dorsal–ventral communication is dynamically domain-specific during action perception.
Methodology
Participants: Forty-four right-handed adults (20 males; 22.4±2.4 years, 18–28) with normal or corrected vision were recruited. Thirty-six participants (15 males; 22.2±2.4 years) were included; eight were excluded due to excessive head motion and to balance action–shape matching rules. Informed consent was obtained; IRB approved.
Stimuli and design: Animated videos depicted a human cartoon figure performing two action domains toward the same set of meaningless shapes: social-communicative actions (waving, saluting, bowing, kissing, clapping, greeting) and manipulation actions (folding, tearing, overturning, rotating, pressing up/down, pressing left/right). A navigation condition (not of primary interest) was also included. Meaningsless shapes combined three outlines (hexagon, circle, square) with six interior shapes; action–shape correspondences were counterbalanced across three participant groups to match shapes across action domains at the group level. Sociality ratings (N=23) confirmed higher person-directedness for social-communicative vs. manipulation actions (6.04±0.65 vs. 3.02±1.55; t(22)=9.104, p=6.468×10⁻9). Manipulation actions intrinsically induced more cumulative movement/shape changes than social actions (p=1.644×10⁻7). A third navigation condition had even greater movement (p=1.185×10⁻29) and was used to assess movement confounds.
Task paradigm: Long-block fMRI design optimized for both univariate activation and task-based FC. Four runs: each began with 10 s fixation, followed by three blocks (one per condition). Each block: 24 trials from the same action condition, then 10 s fixation. Trial: 2000 ms action video + 1000 ms static (or running without turning for navigation). TR=2000 ms; 36 volumes per block. Participants learned action–shape mappings and later reported and simulated them post-scan.
MRI acquisition: Siemens Trio Tim 3T. Structural T1 MPRAGE: TR=2530 ms, TE=3.39 ms, flip=7°, slice thickness 1.3 mm, gap 0.65 mm, in-plane 1.3×1.0 mm², FOV 256×256 mm², 144 slices. Functional EPI: TR=2000 ms, TE=30 ms, flip=90°, slice thickness 3.5 mm, gap 0.7 mm, in-plane 3.1×3.1 mm², FOV 200×200 mm², 33 slices (32 after upgrade). Results were similar pre/post-upgrade, so data combined.
Preprocessing: SPM12. Discard first 5 volumes/run; motion correction; co-register T1 to mean EPI; segmentation; normalization to MNI; resample to 3×3×3 mm³; 6 mm FWHM smoothing. Additional for FC: linear trend removal, bandpass 0.01–0.1 Hz, regress 6 motion parameters + WM + CSF. Main analyses without global signal regression (GSR); validation analyses with GSR.
Univariate analysis: First-level GLM with predictors for the three video conditions plus six motion parameters; high-pass 128 s. Second-level one-sample t-tests for contrasts SA>MA, MA>SA, and each vs. baseline. Threshold: voxel p<0.0001, cluster-extent FWE p<0.05, within gray matter mask (probability >0.4 SPM5).
Definition of action-system ROIs: Leave-one-participant-out (LOPO) approach using whole-brain SA>MA and MA>SA contrasts to define peaks per participant; 3 mm radius spheres formed around peaks; ROIs within VOTC excluded. Peaks listed in Supplementary Table S4.
Ventral object ROIs: Two approaches. (1) Theory-driven: Neurosynth association maps (FDR p<0.01) for “face” (896 studies) and “tools” (115 studies). Peaks within VOTC used to define 3 mm spheres: bilateral FFA for faces; left LOTC for tools; also included a small left medial FG tool-preferring cluster (medFG) based on prior literature. (2) Data-driven: a whole VOTC mask from prior lab dataset combining functional and anatomical localization (OTC regions active during object picture perception with z<10).
FC computation: For each participant and run, residual time series were segmented by condition: first 4 volumes of each block discarded; 2 volumes of following fixation included to account for hemodynamic delay. Within each action-system ROI sphere, voxel time series were averaged. Pearson correlations (Fisher z) were computed between each action-system seed (averaged across ROIs within system) and: (a) each Neurosynth ventral ROI (ROI analysis) or (b) each voxel in the VOTC mask (voxel-wise analysis), separately for SA and MA viewing. This yielded four FC measures per ventral ROI/voxel: with SA-system during SA viewing, with SA-system during MA viewing, with MA-system during SA viewing, with MA-system during MA viewing. Correlations were Fisher z-transformed and averaged across runs.
Statistics on FC: Repeated-measures 2×2 ANOVAs (Action system: SA vs. MA) × (Action condition: SA vs. MA). For Neurosynth ROIs, ANOVAs run in SPSS; simple-effect paired t-tests Bonferroni-corrected. For VOTC voxel-wise analysis, SPM12 cluster-level FWE correction at voxel p<0.001, cluster p<0.05; only clusters significant in both primary (no GSR) and validation (with GSR) analyses were reported.
Relationship between dorsal–ventral FC and VOTC activation: For each VOTC voxel, averaged FC measures and activation strengths (SA and MA) across participants were computed. Pearson correlations tested the relation between domain-specific FC preference (e.g., FC with SA-system minus MA-system during SA viewing) and domain-specific activation differences (SA minus MA), across VOTC voxels.
Key Findings
- Univariate activations (vs. baseline): Both action conditions activated bilateral inferior frontal gyrus (IFG), superior parietal gyri, and posterior superior to inferior temporal gyri (voxel p<0.0001, cluster-extent FWE p<0.05).
- Social-communicative actions (SA) > Manipulation actions (MA): Greater activation in right precentral gyrus (Prec) and bilateral pSTS/pMTG. Table 1 peaks include: Right Prec (x=42, y=3, z=45, t=7.47, size=43); Right STG/MTG (57, -42, 15, t=5.73, size=71); Left MTG (-66, -42, 9, t=5.09, size=20).
- Manipulation actions (MA) > Social actions (SA): Greater activation in bilateral SMG, bilateral IPL, bilateral SPL, bilateral postcentral gyri (Posc), bilateral Prec, right SFG/IFG, and left insula. Example peaks: Right Posc (60, -18, 33, t=8.56, size=459); Left SPL (-36, -45, 60, t=8.17, size=462); Right SFG/Prec (27, -9, 63, t=6.80, size=52); Left Prec (-54, 6, 36, t=5.76, size=15); Right IF oper/Prec (54, 9, 24, t=5.46, size=21); Left insula/rolandic oper (-39, -6, 12, t=5.10, size=16).
- Control for movement confound: A navigation condition with even higher cumulative movement than MA showed equal or lower activation than MA in all but one manipulation-specific cluster (right IFG/Prec at 54, 9, 24), indicating MA-specific activations were not due to movement alone. FC analyses were replicated excluding this cluster with the same pattern.
- ROI-based FC (Neurosynth-defined ventral ROIs): Significant Action system × Action condition interactions for bilateral FFA and left LOTC.
• Left FFA: F(35)=7.699, p=0.009. Increased FC with SA-system during SA viewing vs. MA viewing (t(35)=2.474, uncorrected p=0.018). During SA viewing, FC with SA-system > FC with MA-system (t(35)=4.878, adjusted p=9.259×10⁻⁶).
• Right FFA: F(35)=8.773, p=0.005. Increased FC with SA-system during SA viewing vs. MA viewing (t(35)=2.737, adjusted p=0.039). During SA viewing, FC with SA-system > FC with MA-system (t(35)=5.235, adjusted p=3.147×10⁻⁵).
• Left LOTC (tool-preferring): F(35)=11.581, p=0.002. Pattern showed stronger FC with MA-system during MA viewing vs. SA viewing (t(35)=2.348, uncorrected p=0.025; marginal at corrected level).
• Left medial FG (tool-preferring): No significant main or interaction effects (ps≥0.086).
- VOTC voxel-wise FC: A right ITG/FG cluster showed significant interaction (voxel p<0.001, cluster FWE p<0.05). This cluster connected more strongly with MA-system during MA viewing than SA viewing (t(35)=3.308, adjusted p=0.009) and more strongly with SA-system during SA viewing than MA viewing (t(35)=3.636, adjusted p=0.004). A right superior temporal pole cluster showed a main effect favoring SA-system connectivity, but this effect was attributable to anatomical proximity and did not persist after regressing out Euclidean distance.
- FC–activation relationship in VOTC: Across 3915 VOTC voxels, domain-specific FC preferences correlated with domain-specific activation differences: SA domain (FC_SA-system−FC_MA-system during SA viewing vs. activation SA−MA): R=0.480, p=5.550×10⁻225. MA domain (FC_MA-system−FC_SA-system during MA viewing vs. activation MA−SA): R=0.486, p=3.934×10⁻231.
Discussion
Findings demonstrate that action perception is organized along social-communicative and manipulation domains within dorsal and dorsal-temporal regions: SA preferentially engages bilateral pSTS and right precentral cortex, while MA preferentially engages SMG/IPL, SPL, pre/postcentral gyri, and frontal regions. Importantly, these domain-specific dorsal systems dynamically communicate with ventral object-selective cortex in a matching, domain-specific manner: SA-system coupling with bilateral FFA is enhanced during social-action viewing, and MA-system coupling with left LOTC is enhanced during manipulation-action viewing. These connectivity modulations are not reducible to dorsal activation differences and persist after controls including excluding a movement-sensitive IFG/Prec cluster and validation with GSR. Whole-VOTC analysis corroborates domain-specific modulation in a right ITG/FG cluster. The moderate correlations between domain-specific dorsal–ventral FC and local VOTC activation suggest that VOTC responses during action perception may, in part, reflect top-down or cross-stream inputs from action systems, beyond bottom-up object properties. Results align with resting-state and causal studies showing dorsal–ventral domain-linked connectivity (e.g., pSTS–FFA, IPL–tool regions), but extend them by demonstrating task-driven, domain-specific enhancements. Lack of stable task-driven effects for left medial FG suggests it may be intrinsically connected with IPL without additional modulation by action viewing in the absence of object-related bottom-up inputs. Overall, the data support a connectivity-constrained domain representation spanning dorsal action and ventral object systems, organized around social versus manipulation domains.
Conclusion
The study shows that action perception follows a domain organization (social-communicative vs. manipulation) within dorsal action networks and that these domain-specific systems functionally couple with ventral object-selective regions in a matching, domain-specific manner (SA-system with FFA during social action viewing; MA-system with left LOTC during manipulation action viewing). Across VOTC voxels, stronger domain-matched dorsal–ventral coupling relates to stronger domain-specific local activation. Together with classical object-domain organization in VOTC, these results support a unified, connectivity-constrained principle of social versus manipulation domains across perception, and demonstrate domain-based dynamic functional communication between dorsal and ventral systems.
Limitations
- Potential visual/movement confound: Manipulation videos contained greater cumulative movement and shape changes than social videos. Control analyses using a navigation condition with even greater movement indicated that manipulation-specific activations were not solely driven by movement; one right IFG/Prec cluster was movement-sensitive and excluded in FC replications.
- Directionality: fMRI FC analyses cannot establish causal direction; inferences about dorsal-to-ventral influence are indirect.
- Global signal regression: GSR is controversial; primary analyses were without GSR with validation including GSR. Some effects (e.g., medFG) were not stable across preprocessing choices.
- Anatomical proximity confound: A right superior temporal pole cluster’s main effect favored SA-system connectivity but was attributable to shorter Euclidean distance; effect disappeared after distance regression.
- VOTC univariate sensitivity: No robust above-threshold VOTC activation differences between action conditions, potentially limiting detection of domain-specific univariate effects in ventral areas.
- ROI definition: No independent localizer; action-system ROIs were defined via LOPO procedure to maintain independence, but this may differ from standard localizers.
- Scanner upgrade: Slice number changed (33 to 32) during the study; separate analyses showed similar patterns, but heterogeneity remains a potential limitation.
- Sample exclusions: Eight participants excluded (motion and counterbalancing), reducing final N to 36.
Related Publications
Explore these studies to deepen your understanding of the subject.

