logo
Loading...
Consciousness is learning: predictive processing systems that learn by binding may perceive themselves as conscious

Computer Science

Consciousness is learning: predictive processing systems that learn by binding may perceive themselves as conscious

V. A. Aksyuk and V. Aksyuk

Explore the groundbreaking research by V A Aksyuk and Vladimir Aksyuk, as they delve into how predictive processing systems can achieve flexible generalization through the formation of working memories. This study connects the dots between machine learning, consciousness, and the evolution of perceptual value prediction, offering a perspective that could redefine our understanding of action policies.... show more
Introduction

The paper addresses why humans can learn online from few examples, form declarative memories, and generalize flexibly across domains—capacities tightly linked to consciousness—while current machine learning systems struggle with such tasks. It reviews the proliferation of theories of consciousness and the absence of a unified computational account. The central hypothesis is that consciousness arises as a functional consequence of a predictive processing (PP) system augmented with a new learning mechanism—hierarchical binding of unpredicted inferences—enabling rapid, compositional, and associative learning that forms working and declarative memories and supports flexible, generalizable behavior.

Literature Review

The work builds on predictive processing and active inference as Bayesian generative accounts of perception and action, and on reinforcement learning (RL), especially temporal-difference RL, as a mechanism for learning action policies. It engages with theories of consciousness including global workspace, recurrent processing, feature binding accounts, and higher-order theories, and with empirical paradigms such as masking, postdiction, and iconic memory. It integrates insights about attention, working memory, affect and value systems (dopaminergic signaling), and their roles in learning and perception. Prior neural and computational proposals for PP, RL, and binding are used as scaffolding for the new functional architecture.

Methodology

This is a theoretical and functional proposal rather than an empirical experiment. The methodology is a qualitative, mechanism-level specification of an information-processing architecture that integrates: (1) Predictive Processing (PP): a hierarchical generative model of discrete (categorical) and continuous causes, with bidirectional prediction and error correction across time and levels, including temporal dynamics handling latencies and persistence. (2) Learning by Binding: an added induction bias that attributes temporally correlated, unexplained prediction errors to previously unknown causes and provisionally adds such causes to the model. Binding operates over windows (≈200 ms to ≈1 s depending on level/modality) and across modalities, allowing near-synchronous and time-sequenced features to be bound into new causes (objects, sequences, episodes). Temporary causes consolidate with repeated inference (short- to long-term memory) or decay if not reactivated. Binding is hierarchical: new causes, themselves unpredicted, can be further bound into shallow trees culminating in unified episodic causes. (3) Global Workspace via Cross-Prediction: once bound, features cross-predict, making them immediately available to influence many other inferences and actions (global availability) without invoking a separate broadcast mechanism. (4) Working Memory via Selective Attention: actions of selective attention (encoded in the same PP hierarchy) increase precision/likelihood for targeted causes (e.g., spatial locations), maintaining bound objects even without direct sensory support (imagination/WM maintenance). (5) Reinforcement Learning of Action Policies: the system estimates future value via TDRL by attributing reward prediction errors backward to recently active causes and separately tunes action policies by strengthening or weakening action predictions based on dopaminergic value surprise. Multiple reward types define a multidimensional affective space, ultimately combined for action learning. (6) Exploration–Exploitation: action thresholds and perceptual/binding hyperparameters can be modulated (hypothesized roles for tonic dopamine/serotonin) to explore policy modifications (including binding-driven new perception–action couplings) and retain those that increase future value. (7) Higher-Order Perceptions and Transparency: the system learns transparent higher-order causes (e.g., ‘experience occurring’, ‘self’) that predict its own modeling contingencies, explaining phenomenality and the meta-problem as learned inferences similar in kind to object perception. The architecture yields testable predictions about timing (binding thresholds), masking/postdiction dependencies, attention’s role in WM, and dopaminergic contributions to policy learning.

Key Findings
  • A PP system augmented with hierarchical learning-by-binding can rapidly create new causes from temporally correlated, unexplained features, enabling single/few-shot, compositional, and generalizable learning that mirrors human declarative memory formation.
  • Bound contents are unified yet differentiated: hierarchical binding produces shallow trees culminating in episodic causes, explaining unity of conscious contents while retaining structure.
  • Binding thresholds in time and error magnitude predict masking and postdiction phenomena: short presentations or masking reduce prediction-error persistence and prevent binding; postdictive integration occurs when known sequence causes are inferred before features can bind individually, yielding apparent discrete perception windows (~hundreds of ms up to ~1 s at higher levels).
  • Global workspace emerges from cross-prediction among bound features, immediately linking perceptions to previously learned actions and enabling flexible generalization to novel combinations (e.g., answering queries about a newly encountered object’s attributes).
  • Working memory maintenance arises from learned selective attention actions that increase the likelihood (precision) of generic features (e.g., spatial maps), sustaining novel bound objects even without sensory input; imagination reuses predictive pathways.
  • Value learning via TDRL assigns future values to causes and uses dopaminergic reward prediction error to shape action policies; perception learning is biased toward structures predictive of future value and action guidance.
  • The architecture entails higher-order inferences (‘experience occurring’, ‘self’) learned to predict perceptual contingencies, offering an illusionist account of phenomenality and a proposed solution to the meta-problem.
  • States of consciousness reflect global modulation of hyperparameters (inference/binding thresholds, plasticity, consolidation/forgetting, exploration–exploitation), linking affect, mood, meditation, sleep/dreaming, and psychopathology to functional changes in the architecture.
  • Consciousness is functionally identified with the system’s capacity for efficient declarative learning via binding; proposed empirical measures should target newly formed causal links between previously unrelated features and their associative recall effects.
Discussion

The proposal addresses the research question—how consciousness relates to rapid, flexible, declarative learning—by positing learning-by-binding atop PP as the core mechanism producing unified, globally available, associatively recallable contents, while RL tunes actions toward evolutionary value. This explains why conscious processing is closely tied to declarative memory and flexible generalization, and why unconscious processing can be sophisticated yet lacks rapid compositional learning and global availability. The account reconciles insights from global workspace, recurrent processing, feature binding, and higher-order theories within one functional framework. It generates testable predictions about timing windows for binding, effects of masking and postdiction, the role of selective attention in WM maintenance, dopaminergic signals in action learning, and differences between low-level binding and high-level episodic unification. It reframes the hard/meta problems by treating phenomenality as a learned, transparent higher-order perception. The framework has implications for understanding mood, meditation, sleep/dreams, imagination, and thought as specific configurations of hyperparameters and learned policies within the same architecture.

Conclusion

The paper advances a unified, functional architecture in which predictive processing combined with hierarchical learning-by-binding and reinforcement learning yields the hallmark properties ascribed to consciousness: unified yet differentiated contents, global availability, associative recall, working memory maintenance, flexible generalization, and higher-order self/experience inferences. It proposes mechanistic explanations for masking, postdiction, attention/WM interactions, imagination, and state-dependent phenomena (mood, meditation, dreaming). The author outlines clear, falsifiable predictions and calls for theoretical, computational, and empirical programs to test stability, learning efficiency, timing constraints, neuromodulatory roles, and neural mappings. Future work should: (1) formalize the architecture within statistical learning theory; (2) implement numerical models demonstrating stable, sample-efficient, compositional online learning; (3) design experiments that manipulate novel feature combinations, timing, and attention to measure new causal links and their recall; and (4) carefully consider ethical implications of potential artificial systems with conscious-like properties.

Limitations
  • The proposal is qualitative and conjectural; it lacks formal proofs and quantitative models demonstrating stability, scalability, and learning efficiency.
  • No empirical experiments are presented; predictions require targeted studies (e.g., controlled novel feature combinations, timing thresholds, dopaminergic manipulations).
  • Neural mappings are suggestive (e.g., dopaminergic value surprise, thalamo-cortical PP, hippocampal memory consolidation) but not specified at circuit/mechanism level.
  • Hyperparameter settings (binding windows, plasticity/attenuation schedules, consolidation timescales) and their neuromodulatory control remain to be characterized and validated.
  • Ethical concerns are highlighted regarding potential numerical implementations that might instantiate conscious-like processing; guidance is not fully developed.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny