
Psychology
Quantitative models reveal the organization of diverse cognitive functions in the brain
T. Nakai and S. Nishimoto
This groundbreaking study by Tomoya Nakai and Shinji Nishimoto explores how our brain organizes multiple cognitive processes through fMRI and advanced encoding models, yielding insights into the hierarchical nature of cognitive tasks and revealing the brain's adaptability to novel situations.
~3 min • Beginner • English
Introduction
The study addresses how diverse active cognitive processes are represented and organized across the human cortex. Prior work has modeled perceptual experiences (visual, auditory, linguistic) using voxel-wise encoding/decoding during passive viewing or listening, but comprehensive quantitative models spanning many active cognitive tasks and revealing their cortical organization have been lacking. The authors aim to estimate a cognitive space that captures relationships among a broad set of tasks and to map this space onto cortex. They introduce two models: (1) a sparse task-type encoding model to elucidate hierarchical relationships among 103 tasks and their cortical mapping, and (2) a continuous cognitive factor model derived from metadata (Neurosynth) to capture latent features and enable prediction and decoding for novel, untrained tasks. This framework seeks to advance understanding of the comprehensive cortical organization underlying human cognition.
Literature Review
Voxel-wise encoding and decoding approaches have modeled brain responses using visual features, categories, auditory features, and linguistic information, revealing semantic spaces and cortical maps of categorical dimensions. However, these efforts largely used passive paradigms and did not clarify representations underlying active, multi-domain cognitive processes. Prior studies identified clusters/components related to sensory and higher-level functions and examined task networks, but without a unified, quantitative model covering many naturalistic tasks or generalizable decoding across novel tasks. The present work builds on these literatures by integrating encoding models with metadata-based reverse inference to objectively interpret cognitive factors and to generalize across tasks.
Methodology
Participants: Six healthy, right-handed adults (22–33 years; 2 females), normal vision/hearing, provided informed consent. Approved by NICT ethics committee.
Task battery: 103 naturalistic tasks spanning diverse domains (e.g., visual detection, auditory judgments, motor responses, language, memory, introspection, calculation). Each task had 12 instances (8 for training runs, 4 for test runs). Tasks were performed without pre-experimental training; instructions were presented on-screen.
Experimental design: 18 runs over 3 days (6 runs/day). Each run: 77–83 trials (6–12 s/trial), with intermittent 2 s feedback events to equalize run length. Each run included 6 s no-task at start and end (start omitted in analysis). Training runs had pseudorandomized order accounting for dependencies (e.g., MemoryDigit/MatchDigit). Test runs presented all 103 tasks four times in the same order across six runs (with different instances; no overlap with training instances). Button responses recorded via MR-compatible pads.
MRI acquisition: 3T Siemens TIM Trio, 32-channel head coil. Functional MB-EPI: TR=2000 ms, TE=30 ms, FA=62°, FOV=192×192 mm², 72 axial slices, 2.0-mm thick, no gap, resolution 2×2 mm², multiband factor=3; 275 volumes per run. Anatomical T1 MPRAGE: TR=2530 ms, TE=3.26 ms, FA=9°, FOV=256×256 mm², 1 mm³ voxels.
Preprocessing: SPM8 motion correction (aligned to first EPI), 240 s median filter detrending, voxel-wise normalization (demean, unit variance). FreeSurfer used for cortical surface identification and registration; cortical voxels used (53,345–66,695 per subject).
Encoding framework (general): Finite impulse response design with hemodynamic delays of 2, 4, 6 s. Feature matrix F_E [T×3N] formed by concatenating delayed features; voxel response R_E [T×V] modeled as R_E = F_E W_E. L2-regularized linear regression fit on training data; regularization selected via 10-fold CV over 18 λ values (100 to 100×2^17). Prediction accuracy: Pearson correlation between predicted and measured test signals; significance via null distribution of correlations for independent Gaussian vectors; FDR-corrected p<0.05.
Task-type model: One-hot features (N=103). Training dataset: 3336 samples (6672 s). Test dataset: 412 samples (824 s), repeated four times; four repetitions averaged to boost SNR, with adjustments to remove end-of-run no-task and some feedback periods.
Hierarchical cluster analysis (HCA): Predictive voxels selected per subject (FDR p<0.05; 39,485–56,634 voxels), three delays averaged. Weights concatenated across subjects; representational similarity matrix (Pearson r across task weights) computed; dissimilarity = 1−r; minimum-linkage clustering to generate dendrogram. Clusters labeled by included task types. Also validated with HCA on brain activity directly and MDS (Supplementary).
Hierarchical model: Features defined by 102 non-terminal dendrogram nodes (binary indicators of whether any subordinated tasks occurred), built using HCA derived from five subjects (excluding target subject). Compared prediction accuracy versus task-type model.
Principal component analysis (PCA): PCA on concatenated task-type weight matrix (predictive voxels). Tasks mapped onto 2D by PC1 (x) and PC2 (y); colors reflect loadings on PC1–PC3 (RGB). PCs interpreted by task loadings and metadata-based inference. Voxel-wise PCA score maps created and combined (RGB) for cortical visualization.
Metadata-based reverse inference: 715 Neurosynth reverse-inference maps (selected from ~3000 terms; redundant/plural/past-tense and anatomical terms excluded) registered to subject EPI space. For cluster maps, correlations with each term yielded cognitive factor vectors; top terms interpreted as associated factors. Also used to construct cognitive factor features.
Cognitive factor model (latent features): For each subject, computed correlations between each task’s task-type weight map and each of the 715 Neurosynth maps, producing a 103×715 coefficient matrix. The cognitive transform function (CTF) for a target subject was the average of coefficient matrices from the other five subjects (data-independent for target). Multiplying the CTF by the task-type feature time series yielded a 715-dim latent feature matrix (N=715). Encoding model trained as above.
Generalization to novel tasks: Five-fold task-wise split: 103 tasks partitioned into five groups (20–21 tasks each). For each fold, models trained on 80% tasks (82–83 tasks), excluding target-task time points and 6 s post-onset; tested on held-out tasks’ time points (and 6 s after). Predicted responses or decoded features concatenated across folds; duplicate time points averaged.
Control for sensorimotor confounds: During training, concatenated visual Motion Energy (ME, 1395 features), auditory Modulation Transfer Function (MTF, 1000 features), and Button Response (BR, 4 features) with cognitive factor features; during testing, sensorimotor regressors excluded. ME: 3D spatiotemporal Gabor filters on grayscale LAB-L, multiple spatial/temporal frequencies; log-transformed, 0.5 Hz sampling. MTF: cochleogram (128 bands, 20–10,000 Hz), convolved with spectrotemporal modulation filters (5 spectral × 5 temporal rates; up/down sweeps), log-transformed and averaged; aggregated into 10 frequency ranges. BR: counts per second for 4 buttons.
Decoding model: Feature estimation from cortical responses with 2/4/6 s delays: F_D = R_D W_D (L2-regularized). For novel tasks, at each time point, computed correlation between decoded cognitive feature vector and template CTF vectors for all 103 tasks (template derived excluding target subject). One-vs.-one binary comparisons between target task and each other task; accuracy = percentage where target likelihood exceeded competitor. Significance assessed with one-sided sign tests, FDR p<0.05.
Statistical notes: Group-level regularization parameter for HCA/PCA selected via resampling (50 repeats of 80/20 splits) averaging performance across subjects. Additional null distributions for encoding/decoding created by element-wise shuffling of feature or CTF matrices (1000 permutations). Visualization used pycortex; analyses implemented in MATLAB.
Key Findings
- Task representational hierarchy: HCA of task-type model weights (concatenated across subjects) revealed six prominent clusters: visual, auditory, motor, language, memory, and introspection. Subclusters captured finer distinctions (e.g., food vs. negative images in visual; calculation vs. digit matching in memory; imagining future/recalling past vs. places/faces in introspection).
- Metadata-based interpretation: Correlations between cluster maps and 715 Neurosynth maps yielded top associated terms consistent with cluster labels. Examples (Table 1): Visual—“visual”, “object”, “face”, “motion”, “perceptual”; Memory—“working memory”, “calculation”, “executive”; Language—“reading”, “language”, “semantic”; Motor—“motor”, “movement”, “sensorimotor”; Introspection—“default mode”, “autobiographical”; Auditory—“auditory”, “listening”, “music”. Reverse inference also illuminated subclusters (e.g., time perception within auditory linked to “timing”, “monitoring”).
- Hierarchical model improves prediction: A model using features from non-terminal dendrogram nodes outperformed the task-type model in predicting brain activity (hierarchical: mean ± SD r = 0.313 ± 0.046; task-type: 0.293 ± 0.053; one-sided Wilcoxon signed-rank, p < 0.001 for all subjects).
- Cognitive space via PCA: Top PCs explained >5% variance each for the first four PCs. PC1 (auditory), PC2 (audiovisual), PC3 (language), PC4 (introspection). Cluster-average loadings matched related PCs (two-sided sign test, p < 0.05, FDR). Tasks arranged from movie-related/image/auditory toward complex cognition (language, memory, logic, calculation). Voxel-wise PCA projections showed consistent cortical topographies across subjects (e.g., occipital green for movie/image; left lateral frontal blue for language).
- Topographic selectivity: Voxel-wise task weight visualizations on the cognitive space identified selectivities (e.g., language in middle temporal, introspection in medial frontal, auditory in superior temporal). Left inferior parietal lobule exhibited a gradient from motor (inferior) to visual (superior), with calculation/logic across positions.
- Predicting novel tasks (cognitive factor model): Training on ~80% of tasks and predicting held-out ~20% yielded significant cortical prediction accuracy broadly (mean ± SD r = 0.322 ± 0.042; 86.2 ± 5.1% voxels significant at FDR p < 0.05). Example subject (ID01) mean r = 0.323 with 87.2% voxels significant; significance threshold r = 0.0846.
- Robust beyond low-level features: With visual/auditory/motor regressors regressed during training, prediction remained significant and widespread (mean ± SD r = 0.285 ± 0.035; 82.4 ± 4.9% voxels significant; FDR p < 0.05), indicating higher-order cognitive contributions beyond simple sensorimotor effects.
- Decoding novel tasks: Binary one-vs.-one decoding among 103 tasks using decoded latent features achieved high accuracy for novel tasks (mean ± SD 96.0 ± 0.8%; 99.5 ± 0.5% of tasks significant; one-sided sign tests, FDR p < 0.05). Example subject showed mean 96.8% with all tasks significant.
- Coverage and generalizability: Compared to prior passive-viewing modeling (significant predictions in ~22% of cortex), the present approach achieved significant predictions in ~86% of cortical voxels and task-specific decoding, suggesting broad coverage of human cognitive space by the task battery.
Discussion
The findings demonstrate that brain activity during a broad set of active, naturalistic tasks can be modeled quantitatively to reveal a hierarchical and continuous organization of human cognitive functions across cortex. The task-type model captured representational similarities, forming six major clusters aligned with intuitive domains and validated via reverse-inference from large-scale neuroimaging metadata. PCA provided orthogonal dimensions (auditory, audiovisual, language, introspection) that correspond to cluster structure and mapped consistently onto cortical topographies, revealing gradients and fine-grained topographic organization, particularly in association cortex (e.g., IPL).
The cognitive factor model, by transforming sparse task labels into a 715-dimensional latent cognitive feature space derived from Neurosynth, enabled accurate prediction and decoding of brain activity for novel, untrained tasks. Control analyses regressing out low-level visual, auditory, and motor features showed that generalization stems from higher-order cognitive components rather than simple sensorimotor effects. Together, these results address the central question of how diverse cognitive processes are represented in the brain by providing a unifying, generalizable modeling framework and an interpretable cognitive space that is broadly mapped across cortex.
Conclusion
This study introduces a quantitative framework that integrates voxel-wise encoding with metadata-based latent cognitive features to model and map the organization of diverse cognitive functions across the human cortex. It reveals hierarchical task clusters, interpretable principal components aligned with auditory, audiovisual, language, and introspection processes, and consistent cortical topographies. The cognitive factor model generalizes to novel tasks, enabling both accurate prediction and high-accuracy task decoding.
Future work could expand the task battery to cover underrepresented domains (e.g., olfaction, speech production, social interaction), refine and extend metadata-based feature spaces, and apply subject-wise models to probe individual differences and cognitive traits. Extending this approach to larger cohorts and additional modalities could further chart a comprehensive, high-resolution map of the human cognitive space.
Limitations
- Task coverage: The 103-task battery does not span the entire domain of human perception and cognition; certain modalities and contexts (e.g., odor perception, speech production, social interaction) were not included.
- Metadata feature space: The 715-term Neurosynth-based feature set was manually curated to avoid redundancy and exclude anatomical terms; while broad, it may omit relevant constructs and depends on literature-derived associations.
- Sample size: Analyses were performed on six subjects, which may limit generalizability across populations despite consistent patterns observed.
- Stimulus modality influence: Although control analyses regressed out low-level sensorimotor features and decoding discriminated tasks within the same modality, modality remains a prominent factor in the cognitive space and may influence representations.
Related Publications
Explore these studies to deepen your understanding of the subject.