Biology
Keeping time and rhythm by internal simulation of sensory stimuli and behavioral actions
V. D. Lafuente, M. Jazayeri, et al.
The study investigates how the brain encodes and maintains rhythmic timing in the absence of external stimuli and movements. Building on the observation that timing engages distributed cortical and subcortical regions, the authors question whether a specialized, central timing mechanism exists. They review competing theories (pacemaker-accumulator, multiple oscillators, and timing-through-dynamics in recurrent neural populations) and note task-dependent diversity in neural dynamics (ramping, cycling, low- vs. high-dimensional trajectories) across areas such as motor cortex, hippocampus, basal ganglia, and cerebellum. They hypothesize that temporal information is represented by internally recreating and simulating the neuronal dynamics associated with sensory stimuli and motor actions required for time-critical behavior. To test this, they use a visual metronome paradigm requiring monkeys to perceive and then internally maintain the rhythm across tempos while recording neural activity across the sensory–motor hierarchy.
Prior work shows time perception and reproduction rely on distributed mechanisms recruiting premotor/motor, association, and subcortical areas (motor system ramping/cycling dynamics; hippocampus, basal ganglia, cerebellum neurons selective for specific time points/intervals). Classical timing models include pacemaker-accumulator mechanisms and arrays of oscillators. Recent frameworks emphasize timing-through-dynamics: either high-dimensional, sequentially unique states (population clocks) or low-dimensional geometries whose rate reflects elapsed time. Different tasks reveal distinct neural signatures (e.g., frontal/parietal low-dimensional trajectories scale speed during single-interval production, and both speed and amplitude during rhythmic actions). This task-dependence suggests no single dedicated timing mechanism. Psychophysics and neuroimaging also link timing with spatial processing, often engaging parietal cortex, and demonstrate interactions and interference between space and time representations.
Subjects: Two rhesus monkeys (Macaca mulatta) participated (Monkey M: 8–9 kg, 9 years; Monkey I: 10–12 kg, 10 years). Procedures were approved by the relevant ethics committee. Monkeys were implanted with titanium head bolts and recording chambers guided by stereotactic coordinates and MRI. Task: A visual metronome (gray circle, 10° diameter, 25° eccentricity) alternated between left and right around a central fixation point with tempos of 500, 750, or 1000 ms. Each trial began with three entrainment intervals, after which the stimulus disappeared. During one to six maintenance intervals (SMA sessions: one to four), a go-cue (hand fixation disappearance at the midpoint of a selected interval) instructed touching the estimated location (left/right). Eye and hand fixation were enforced until the go-cue. A control task interleaved trials with passive observation (no tracking, rewarded for fixation; 750-ms tempo only). Recordings: Extracellular spikes and LFPs were recorded using seven independently movable platinum–tungsten electrodes (2–3 MΩ; Thomas Recording). For hippocampus, a guide tube delivered a single electrode to 1 cm above the structure; the remaining 1 cm was traversed by the electrode. Spike sampling: 30 kHz; LFP band-pass: 0.5–500 Hz at 1 kHz (offline 1–50 Hz). Areas: V4, LIP, MIP, SMA, PFC, hippocampus. PFC and hippocampus were recorded only in Monkey I. Inclusion criteria: stable units with ≥3 trials per condition (~100–350 trials). EMG: Surface EMG (gold cup electrodes, 1 kHz sampling, 500 Hz low-pass) from right dorsal, deltoid, biceps, pectoralis, triceps, lower and upper trapezius; rectified and normalized. Behavioral analysis: Psychometric curves p(correct) were fit with a timing model implementing scalar variability; performance examined across elapsed maintenance intervals and tempos. Neural analyses: Mean firing rates computed with sliding windows (50 ms, step 10 ms) and normalized (z-scores). Preferred spatial location per neuron determined via cross-correlogram of firing rate and stimulus position. Oscillation periods estimated via autocorrelation of firing rate differences (preferred vs. nonpreferred), identifying the first minimum as half-period (tempo). LFP spectrograms (Matlab spectrogram; 500-ms Hamming window, 50-ms step; Chronux confirmatory analysis) computed separately for left/right starts and subtracted; mean power across frequencies tracked over time; autocorrelation of this power used to estimate half-periods. Decoding: SVM classifiers (linear, 10-fold cross-validation) trained on single-trial firing rate time series (maintenance epoch) to predict left/right choices; trials with go-cues after three to six maintenance intervals (two to four for SMA) were analyzed with separate classifiers per go-cue. Dimensionality reduction: PCA applied to mean firing rate matrices (20-ms nonoverlapping bins), concatenating conditions; PC trajectories analyzed for effects of space and tempo, with Euclidean distances between trajectories computed via bootstrapping (n=150 neurons; 300 repetitions) using the first three PCs. dPCA applied to all neurons across areas in a single transformation to estimate encoding weights for tempo, space, and total elapsed time (condition-independent dynamics). MAP decoding (cross-validated) estimated spatial location, tempo, and elapsed time using GLMs (Poisson link), with training/testing splits and bootstrap resampling (n=150 neurons; 300 repetitions). Eye movement correlations were assessed by Pearson’s r between firing rate and eye position during maintenance; eye-correlated LIP neurons were removed to assess population effects. Statistics: t-tests (one-/two-tailed), z-tests for proportions, Bonferroni corrections, bootstrap resampling for across-area comparisons, linear regressions assessed by F-tests. Analyses were performed in Matlab.
Behavior: Monkeys maintained internal metronomes across fast, medium, and slow tempos, with high early accuracy (first maintenance interval: 96.5 ± 0.2% correct; mean ± SEM across tempos). p(correct) declined with elapsed maintenance time, consistent with scalar timing. Neural spiking: Across V4, LIP, MIP, SMA, PFC, and hippocampus, neurons exhibited oscillatory firing during entrainment and maintenance (in absence of stimuli/movements), with oscillation periods scaling to match the metronome's tempo. The periods of firing rate oscillations during maintenance strongly correlated with those during entrainment (Pearson r = 0.93, P < 0.01) and with true tempo (r = 0.96, P < 0.01). Choice decoding (SVM): Single-trial, single-neuron activity during maintenance predicted left/right choices significantly above chance in all areas (P < 0.01). Mean validation accuracies and significant neuron counts: V4: 271/483, μ = 64%; LIP: 229/294, μ = 66%; MIP: 522/841, μ = 65%; SMA: 510/1134, μ = 67%; PFC: 64/154, μ = 63%; Hippocampus: 174/447, μ = 62%. LFP power: Broadband LFP power oscillations tracked the metronome’s position and tempo during maintenance; autocorrelation-derived half-periods matched entrained tempos (r = 0.97, P < 0.01). Oscillatory LFP power was stronger during maintenance (suggesting increased within-circuit processing). In passive control trials, oscillations in firing rates and LFP power during maintenance were significantly reduced (main text: P < 0.05; supplementary paired t-tests: P < 0.01). Population dynamics (PCA/dPCA): PCA showed area-specific representations: V4 and parietal areas (LIP, MIP) strongly encoded spatial location; SMA distinguished tempos more than space. Quantitatively, V4 exhibited larger trajectory separations for space than tempo (paired t-tests, P < 0.01), whereas SMA showed the opposite (P < 0.01). MAP decoding corroborated that parietal cortices excel in spatial decoding, while SMA best decoded tempo; tempo decoding and elapsed-time decoding were highly correlated across areas (MAP: r = 0.87, P < 0.05). dPCA revealed that during entrainment, V4 and LIP had largest space encoding weights; SMA had the largest tempo encoding, followed by LIP and MIP. During maintenance, V4’s space encoding decreased significantly while MIP’s increased (paired t-tests, P < 0.01). Tempo encoding capacity correlated strongly with total elapsed-time encoding (dPCA linear regression, P < 0.01). Consistency across epochs: Area-specific capacities to encode space and tempo during maintenance were significantly correlated with their capacities during entrainment (linear regressions, P < 0.01), arguing against a central timekeeper broadcasting uniform signals. Dimensionality: Encoding capacity for space and time was inversely correlated with dimensionality. Number of PCs to explain 80% variance (mean ± SD): SMA 9.7 ± 1.0; V4 11.8 ± 1.7; LIP 14.2 ± 1.3; PFC 14.3 ± 1.2; MIP 20.4 ± 1.7; Hippocampus 30.6 ± 2.1. Controls: EMG showed no evidence of movements during entrainment or maintenance. Eye movement correlations were rare; removing LIP neurons correlated with eye position (26/294) did not affect population dynamics.
Results demonstrate that broad networks spanning visual, parietal, premotor, prefrontal, and hippocampal regions exhibit oscillatory spiking and broadband LFP power that encode the spatial and temporal attributes of an internally maintained metronome. The tempo-flexible oscillations persist without sensory input or overt movement, and neural signals predict behavior. The area-specific encoding profiles (parietal for space, SMA for tempo) and strong links between tempo and elapsed time encoding indicate no single, central timing module; rather, timing arises from internally simulating sensory stimuli and motor plans relevant to the task. Cognitive processes such as attention, motor preparation, cognitive control, and contextual representations are likely engaged as part of this internal simulation strategy. The findings situate timing within a general-purpose framework where the brain dynamically recreates time-dependent representations to coordinate actions with anticipated external events.
The study supports a unifying framework in which timekeeping and rhythm maintenance are achieved via internal simulation of sensory stimuli and motor actions, not via a dedicated central timekeeper. Neural activity across the sensory–motor hierarchy encodes tempo and spatial position of an internal metronome, with consistent, area-specific encoding capacities across entrainment and maintenance. The strong association between tempo and elapsed-time encoding suggests shared mechanisms for timing and rhythm. Future work should perform causal perturbations to test behavioral relevance, record additional areas to map the full network, design nonspatial metronome tasks (e.g., alternating auditory cues), vary target eccentricity to probe space–time interactions, and use densely sampled go-cue times to assess potential delta-band contributions. The internal simulation framework may generalize to diverse cognitive operations, including decision-making that compares current inputs against predicted states.
Causal mechanisms were not tested; no perturbations were performed to establish necessity of specific areas. The task involved a spatial component (left–right alternation), potentially confounding timing with spatial attention and motor planning; control trials used only the 750-ms tempo. PFC and hippocampus data were recorded from a single monkey. MAP decoding assumes linear relationships that may not capture nonlinear neural dynamics, contributing to reduced accuracy at long durations. LFP analyses showed broadband oscillations without selective delta-band peaks; the temporal sampling of behavior (go-cue at interval midpoints) may have limited detection of endogenous delta modulation. The study focused on three tempos and specific visual stimuli, potentially limiting generalizability across modalities and timing contexts.
Related Publications
Explore these studies to deepen your understanding of the subject.

