Psychology
Decoding predicted future states from the brain's "physics engine"
R. T. Pramod, E. Mieczkowski, et al.
Discover evidence that a network in the human parietal and frontal lobes runs forward simulations to predict physical events: it encodes object contact and predicts future collisions. This preregistered study, conducted by R. T. Pramod, Elizabeth Mieczkowski, Cyn X. Fang, Joshua B. Tenenbaum, and Nancy Kanwisher, supports the brain’s "physics engine" hypothesis.
~3 min • Beginner • English
Introduction
To plan even simple actions, humans must predict future states of the world based on prior knowledge of physical laws. The authors focus on predicting physical events and test the hypothesis that a set of bilateral parietal and frontal regions—the physics network (PN)—spontaneously generates predictions of future states during passive viewing of object motion, consistent with a forward-simulation "physics engine." Prior neuroimaging work identified PN via stronger responses during physical reasoning than perceptual judgments and showed PN encodes abstract properties like object mass and stability and responds to violations of physical (but not social) expectations. Yet, explicit representation of predicted future states in PN before they occur had not been demonstrated. The study uses object contact relations (containment, support, attachment vs occlusion) as critical cues for dynamic physical prediction, given that contact interlinks object fates and constrains motion. The authors hypothesize that PN encodes current contact relations (experiment 1) and, crucially, predicted future contact (collisions) before occurrence (experiment 2). They also examine the abstraction level of PN representations, testing generalization across object identities, shapes, configurations, motion trajectories, and scenarios, to assess whether PN represents contact at an abstract relational level suitable for efficient forward simulation.
Literature Review
Prior work identified a frontoparietal "physics network" (PN) engaged in intuitive physical inference tasks (e.g., predicting tower falls) relative to perceptual judgments (Fischer et al., 2016). PN carries information about object mass (Schwettmann et al., 2019) and physical stability (Pramod et al., 2022) that generalizes across scenarios and shows increased response to violated physical expectations (Liu et al., 2024). Developmental and behavioral literature underscores the primacy of object contact relations: infants expect containees to move with containers and are sensitive to support constraints and causal contact (Hespos & Baillargeon, 2001; Baillargeon, 1998; Spelke et al., 1996). Adults rapidly and categorically encode object relations and deploy attention accordingly, with evidence for automatic, abstract coding (Hafri & Firestone, 2021; Hafri et al., 2024). Neuroimaging has linked parietal and frontal regions to spatial relations and ventral visual areas to relative positions and multi-object configurations (Amorapanth et al., 2010; Hayworth et al., 2011; Kaiser & Peelen, 2018; Karakose-Akbiyik et al., 2023). Theoretical accounts propose mental game engines that run forward simulations (Battaglia et al., 2013; Ullman et al., 2017), though debates exist about whether prediction relies on pattern recognition rather than simulation (e.g., Ludwin-Peery et al., 2020; Davis & Marcus, 2015). This study builds on these lines by directly probing whether PN represents predicted future states (contact/collision) before they occur, and whether such representations are abstract across scenarios and trajectories.
Methodology
Design: Two preregistered fMRI experiments tested whether PN represents (1) current object contact vs noncontact and (2) predicted future contact (collision) vs noncontact before occurrence. Functional ROIs (fROIs) included PN (frontoparietal), lateral occipital complex (LOC), ventral temporal cortex (VTC), and V1. Analyses used correlation-based multivoxel pattern analysis (MVPA) decoding indices and representational similarity analysis (RSA).
Participants: Experiment 1: n=14 (22–38 years; 8 females); Experiment 2: n=14 (20–35 years; 5 females). All had normal/corrected vision, provided informed consent (MIT IRB no. 0403000096). Additional participants excluded in Exp.1 for motion or failure to localize PN.
Imaging: Siemens 3T Prisma with 32-channel head coil. Structural: T1 MPRAGE (TR=2.53 s, TE=3.57 ms, flip angle 9°, FOV 256 mm, 1 mm isotropic). Functional: T2*-EPI (TR=2 s, TE=30 ms, flip angle 90°, FOV 204 mm, matrix 102×102, 2 mm isotropic voxels, 66 slices, no gap). Preprocessing and GLMs followed prior work. Experiment 1 GLM included regressors for each of 27 conditions; Experiment 2 used separate GLMs for perceived and predicted runs with regressors for each of 48 stimuli plus one-back events, plus nuisance regressors.
ROI localization: PN defined per participant using a physics>color localizer (uncorrected p<0.001) intersected with group parcels. LOC localized via dynFOSS (objects>scrambled; p<0.001) with anatomical masks. VTC via visual>fixation (p<0.001) intersected with Desikan-Killiany atlas. V1 via anatomical labels intersected with scrambled>objects. Analyses pooled hemispheres; PN subregions (frontal/parietal; left/right) also examined. Robustness checks used top 10% or top 100 voxels by localizer t-values.
Experiment 1 (Contact decoding): Stimuli comprised naturalistic and rendered 3 s videos depicting contact relations (containment, support, attachment) vs noncontact (occlusion) across three scenarios (natural-create, natural-consequence, rendered) and two base objects (bowl, mug), plus single-object baselines. Total 768 clips. Block design: blocks of four 3 s videos from a condition with one immediate repetition for a one-back task; each block 15.4 s. Each run: 28 stimulus blocks + five 15 s fixation blocks. Participants fixated a central red dot.
Exp.1 MVPA: For each fROI, voxel response patterns for contact (averaged over three contact relations) and noncontact (occlusion) were computed per scenario. Decoding index was Fisher z-transformed mean within-condition correlation (contact-contact + noncontact-noncontact) minus between-condition correlation (contact-noncontact) computed across scenarios (natural vs rendered). Additional analysis restricted to contain vs occlude (shape-matched) to reduce shape confounds. Contact-type decoding tested pairwise among containment/support/attachment.
Experiment 2 (Future contact prediction): Preregistered 2×2 design: contact vs noncontact × perceived vs predicted, with two scenarios (roll, throw), six background scenes (3 indoor/3 outdoor), and two motion directions (left/right). Stimuli: 96 clips (1.5 s) rendered in Blender; perceived clips showed explicit contact/noncontact; predicted clips cut earlier to imply impending contact/noncontact. Objects differed across perceived vs predicted within each scenario to minimize low-level confounds (roll perceived: bowl/cylinder; roll predicted: mug/sphere; throw perceived: mug/cube; throw predicted: bowl/icosphere). Event-related design: 8 runs (4 perceived, 4 predicted), each with 144 trials (48 unique videos ×3 repetitions), variable ISIs (0.5, 2.5, 4.5 s) optimized via optseq2, 10 one-back trials/run, fixation blocks at start/end. Run order interleaved to reduce priming/fatigue effects.
Exp.2 MVPA: Within each fROI, computed decoding index for contact vs noncontact generalized across perceived↔predicted within a scenario (roll or throw) by correlating voxel patterns across conditions; indices averaged over scenarios. Cross-scenario decoding computed correlations across perceived in one scenario and predicted in the other (roll↔throw), testing generalization across both condition and scenario. A stronger generalization test also crossed motion direction (left/right). Group-level ANOVAs compared decoding across fROIs.
RSA: Computed fMRI representational dissimilarity matrices (RDMs; correlation distance) for the 96 stimuli per fROI. Compared to (i) an Ideal Contact RDM (IC-RDM; 1 for differing contact label, 0 otherwise) and (ii) a Video Model RDM (VM-RDM) derived from a video foundation model (pfVC1_CTRNN_physion) trained on Physion; features used to train linear SVMs and create distance-based RDM. Partial correlations assessed contact information beyond visual features.
Key Findings
Experiment 1 (current contact):
- PN shows significant scenario-invariant decoding of contact vs noncontact (decoding index mean ± SEM = 0.045 ± 0.01; P = 0.0002, Wilcoxon signed-rank). Positive in 13/14 participants. LOC also significant (0.022 ± 0.007; P = 0.0085); VTC not significant (0.016 ± 0.007; P = 0.24); V1 not significant (0.0044 ± 0.0031; P = 0.13). ANOVA across fROIs: F(3) = 4.01, P = 0.0085; PN > VTC and V1 (P = 0.005, 0.004) and marginally > LOC (P = 0.06). No hemispheric or lobe subregional differences in PN.
- Univariate activations: No significant contact vs noncontact differences in any fROI (P > 0.1), indicating MVPA effects not driven by mean amplitude.
- Shape-controlled analysis (contain vs occlude): Significant in PN (0.026 ± 0.022; P = 0.016); not significant in other fROIs; PN > others (P < 0.05), arguing against shape confounds.
- Contact-type decoding (contain vs support vs attach): Not significant in PN (0.0086 ± 0.013; P = 0.81) or VTC (0.0198 ± 0.012; P = 0.1); significant in LOC (0.025 ± 0.0084; P = 0.0023) and V1 (0.015 ± 0.0068; P = 0.037), suggesting LOC/V1 effects may reflect low-level features and that PN abstracts presence/absence of contact rather than specific types.
Experiment 2 (predicted future contact):
- Within-scenario perceived↔predicted decoding: PN significant (0.051 ± 0.017; P = 0.013); LOC (−0.007 ± 0.015; P = 0.9), VTC (−0.018 ± 0.01; P = 0.11), and V1 (0.0147 ± 0.0136; P = 0.38) not significant. PN > V1/LOC/VTC (P < 0.05). ANOVA: F(3) = 6.86, P = 0.0002.
- Cross-scenario perceived↔predicted decoding (roll↔throw): PN significant (0.032 ± 0.01; P = 0.0052); LOC (−0.009 ± 0.007; P = 0.33), VTC (−0.013 ± 0.005; P = 0.07), V1 (0.008 ± 0.016; P = 0.9) not significant. PN > V1 and LOC (P < 0.05); marginal vs VTC (P = 0.09). ANOVA: F(3) = 3.71, P = 0.0125. Searchlight localized decoding primarily to frontoparietal cortices overlapping PN.
- Cross condition, scenario, and motion direction (left/right): PN significant (0.033 ± 0.02; P = 0.008); LOC (−0.01 ± 0.01; P = 0.54), VTC (−0.002 ± 0.007; P = 0.64), V1 (0.0003 ± 0.012; P = 0.52) not significant; PN > all others (P < 0.05).
- Univariate activations: Contact vs noncontact not different in PN for perceived or predicted (all P > 0.1). Only perceived runs showed differences in V1 (noncontact > contact) and LOC (contact > noncontact) (P < 0.05), supporting MVPA specificity in PN.
RSA:
- IC-RDM significantly correlates with PN fMRI RDM (avg r = 0.023; P < 0.05) and more than LOC (P < 0.05), indicating abstract contact structure in PN.
- VM-RDM marginally correlates with PN (avg r = 0.012; P = 0.06), but PN still significantly correlates with IC-RDM after partialling out VM-RDM (avg r = 0.022; P < 0.05), showing contact information beyond visual features.
Overall: PN encodes abstract contact vs noncontact for current scenes and predicted future events, generalizing across scenarios and motion directions, with effects absent or weaker in ventral visual areas and V1, supporting PN’s role in forward simulation of physical events.
Discussion
The findings directly support the hypothesis that the human frontoparietal physics network runs forward simulations of the physical world. PN represents whether objects are in contact in a scenario-invariant and shape-robust way (Experiment 1), and crucially, PN’s multivoxel patterns for predicted contact mirror those for actually perceived contact events (Experiment 2). This similarity generalizes across distinct scenarios (roll vs throw) and motion directions, indicating an abstract relational code rather than trivial trajectory extrapolation or low-level visual features. The absence of predicted-contact information in V1 and ventral visual fROIs (LOC, VTC) suggests that ventral computations alone are insufficient for generalizable physical prediction; instead, dorsal/frontoparietal mechanisms implement a generative predictive model.
The representational profile indicates abstraction in PN (contact vs noncontact), with specificity for contact-type distinctions appearing in LOC and even V1, likely reflecting visual feature differences. This aligns with a hierarchical view wherein abstract relational states are encoded in PN to efficiently constrain and guide more detailed simulations, potentially complemented by ventral representations of specific object configurations and shapes. RSA further shows that PN’s representational geometry reflects contact structure beyond what is captured by visual model features, strengthening the inference of simulation-related coding.
These results advance debates about whether physical prediction relies on pattern recognition versus forward simulation by demonstrating explicit representation of predicted content in PN during an orthogonal task, consistent with automatic or spontaneous prediction. The work invites future tests of how PN integrates abstract and precise quantitative variables (contact type, mass, forces) within a hierarchical simulation framework, the temporal horizon of predictions (event boundaries vs multistep plans), task dependence and automaticity, and causal roles via lesion or stimulation studies. The relationship between PN and overlapping networks for action planning, tool use, and the multiple-demand system remains to be disentangled, potentially revealing fractionated subsystems for physical reasoning within broader control networks.
Conclusion
This study provides convergent evidence that the human physics network (PN) in frontoparietal cortex contains abstract information about object contact and represents predicted contact events before they occur, with patterns resembling those for perceived contact. The effects generalize across objects, scenarios, and motion directions and are not explained by low-level visual features or univariate differences, supporting the view that PN implements a generative model running forward simulations of imminent physical states.
Main contributions: (i) scenario-invariant decoding of contact vs noncontact in PN; (ii) cross-condition (perceived↔predicted) and cross-scenario decoding of future contact unique to PN; (iii) RSA evidence that PN’s representational geometry encodes contact structure beyond visual features.
Future directions include: testing hierarchical representations that combine abstract relations with precise physical parameters; mapping prediction horizons and event-boundary structure; probing automaticity, attentional dependence, and capacity for multi-object prediction; establishing causal roles of PN via patient studies or neuromodulation; and benchmarking PN activity against computational models of simulation operating at multiple timescales and abstraction levels.
Limitations
- Causality: fMRI cannot establish causal necessity; lesion or stimulation studies are needed to confirm PN’s role in physical prediction.
- Scope of predictions: Stimuli focused on contact/collision; results may not generalize to other physical properties (e.g., mass, friction, elasticity) without further testing.
- Contact-type specificity: PN did not significantly decode specific contact types (contain vs support vs attach); whether PN encodes finer-grained details may depend on task demands and was not tested here.
- Visual confounds: Although multiple controls and V1 null effects argue against low-level confounds, some LOC/V1 effects and marginal VM-RDM correlations suggest residual visual feature contributions cannot be fully excluded.
- Task and automaticity: Participants performed an orthogonal one-back task; the degree to which predictions occur without attention or under higher cognitive load remains untested.
- Sample size and generalizability: Modest N (n=14 per experiment) typical for fMRI; replication across larger and more diverse samples would strengthen generalizability.
- Temporal resolution: fMRI BOLD limits inferences about the fine temporal dynamics of prediction; electrophysiological studies could refine timing and mechanisms.
Related Publications
Explore these studies to deepen your understanding of the subject.

