logo
ResearchBunny Logo
Quantifying empirical support for theories of consciousness: a tentative methodological framework

Interdisciplinary Studies

Quantifying empirical support for theories of consciousness: a tentative methodological framework

A. Kirkeby-hinrup

Consciousness studies are crowded with competing theories, but which truly deserve our attention? This paper sketches a novel methodology to quantify the divergent empirical support for existing theories and proposes an inference-to-the-best-explanation approach inspired by Bayesian confirmation theory to prioritize the most promising contenders. Research conducted by Asger Kirkeby-Hinrup.... show more
Introduction

The paper situates itself within interdisciplinary consciousness studies (ICS), noting a proliferation of competing theories and a lack of noncontentious methods to decide among them. While ICS largely agrees that understanding the brain’s role is central and that empirical data carries evidential weight, conceptual debates remain gridlocked due to disagreements about the intension of key concepts (e.g., phenomenality) despite shared assumptions about the extension of consciousness. The field has increasingly turned to empirical evidence to arbitrate between theories, but questions persist about how empirical evidence should be collected, deployed, assessed, and compared to inform long-term plausibility. The author proposes a novel methodology to quantify empirical support for theories to enable inference to the best explanation (IBE), aiming to identify the most promising theories and provide a practical path forward beyond conceptual stalemate.

Literature Review

Section 2 examines two current approaches to comparing theories. The ARC (Accelerating Research on Consciousness) adversarial collaboration approach tests contradictory predictions between selected theories (e.g., GNWT vs IIT), but initial results (Ferrante et al., 2023; Melloni et al., 2023) were inconclusive. ARC faces issues of targeted theories (limited to small subsets), narrow scope (few paradigms), methodological generalizability (paradigm-specific comparisons), robustness (sensitivity to theory revisions requiring new large projects), and prohibitive cost. The criteria-based approach (CRIT) by Doerig et al. (2020) evaluates theories via principled criteria and scope distinctions. However, it is insufficiently sensitive to amounts and kinds of empirical support, struggles with arbitration when ties occur or when different criteria are satisfied, and raises concerns about flexibility (how many and which criteria to add) and arbitrariness (who decides criteria and weights). These analyses motivate a third approach that can incorporate diverse evidence, avoid arbitrariness, and enable fine-grained, theory-neutral comparison.

Methodology

The proposed approach is Quantification to the Best Explanation (QBE), an IBE-inspired framework grounded in Bayesian Confirmation Theory. Because competing theories differ in the intension of the explanandum, QBE compares theories by their ability to explain and predict empirical data rather than by explanatory power over a shared conceptual target. Evidence is defined as claims of empirical support: arguments connecting empirical phenomena to theoretical claims via interpretations consistent with the theory’s conceptual framework (to avoid conceptual bleed and question-begging). Likelihood is treated as the extent to which a theory explains or predicts a phenomenon, operationalized through an ordinal argument score (A-score): Accepted; Coherent and testable; Coherent but untestable; Rejected. Marginal (probability of the evidence) is anchored by replication via an ordinal replication score (R-score): Low, Medium, High (with possible expansion). Priors are assigned equally across theories to avoid arbitrariness (e.g., 0.04 assuming ~25 viable theories), with scaling constrained to keep posteriors within [0,1]. Ordinals are converted to numbers by using the highest and lowest categories as anchors: A-values on a 1–10% scale (Accepted = 10%; Coherent but untestable = 1%; the middle category varies across 2–9% to avoid fixing an arbitrary single value). R-values are scaled (e.g., 1–10 in hundreds) and implemented in the updating function such that higher replication increases support (e.g., dividing by 1 − R-value). The updating iterates through each phenomenon, updating the prior to a new posterior for each piece of evidence. To avoid arbitrariness in middle-category values, QBE computes posteriors across all combinations of possible middle A- and R-values, producing a set of posteriors per theory. Comparison can then be performed using mean posteriors, areas under the curve, Z-scores (standardizing support relative to the field), t-tests for pairwise comparisons, or pairwise ratios of mean posteriors. Arbitrariness is further mitigated by engaging original authors to corroborate A-scores and clarify testability, crowdsourcing or data-mining to set ordinal categories and thresholds (e.g., clustering replication counts), and involving statisticians and philosophers of science to refine the updating mechanism. The method is designed to be flexible, allowing additional ordinals (e.g., physical/functional closeness, distribution, scope of measurement) and straightforward dataset updates as new evidence emerges.

Key Findings

This Hypothesis and Theory paper offers a proof-of-concept demonstration using simulated datasets for four hypothetical theories (A, B, C, D), each with 20–40 phenomena categorized by A-score and R-score. Posteriors were computed across 64 combinations of middle A- and R-values (2–9). Reported summary statistics include: mean posteriors (A: 0.4409; B: 0.2157; C: 0.2920; D: 0.1329) and Z-scores (A: 1.6555; B: −0.5309; C: 0.2099; D: −1.3345). Pairwise ratios of mean posteriors further illustrate comparative strength (e.g., A vs B: 2.0442; A vs D: 3.3175). These results show that QBE can differentiate theories quantitatively under varying middle-category parameterizations, enabling robust comparisons while minimizing arbitrariness. The framework demonstrates sensitivity to the total amount of evidence and its quality (via A- and R-scores), and suggests that ties become unlikely once full evidence sets are incorporated and posteriors are aggregated across parameter combinations.

Discussion

QBE addresses the core challenge of how empirical evidence can practically arbitrate among competing theories of consciousness by quantifying claims of support and aggregating them via a Bayesian-inspired updating process. The approach avoids key shortcomings of ARC and CRIT: it scales to many theories simultaneously (avoiding targeted-theory limitations), considers broad bodies of evidence rather than narrow paradigms (addressing scope), uses a unified methodology across theories (improving generalizability), is robust to theory revisions and new evidence (easy re-scoring and re-updating), and is sensitive to all available empirical support (countering CRIT’s insensitivity). Arbitration is facilitated by continuous measures (posteriors, Z-scores, ratios), making ties unlikely. Arbitrariness is minimized by evaluating evidence on each theory’s own terms, engaging original authors to corroborate A-scores, crowdsourcing or data-mining to set ordinal thresholds, and computing results across all plausible middle-category values. The method reframes disagreements towards objective, mathematical questions about updating mechanisms and scoring, helping cauterize conceptual bleed and enabling constructive synergy with ARC (e.g., QBE can identify promising theory pairs or paradigms) and CRIT (e.g., meta-theoretic criteria can complement QBE’s empirical quantification).

Conclusion

The paper introduces Quantification to the Best Explanation (QBE), a novel methodology to quantify empirical support for theories of consciousness and compare them using Bayesian-inspired confirmation. It provides a transparent, flexible, and updateable process that aggregates diverse evidence via structured scoring (A-score, R-score), ordinal-to-numeric conversion, and multi-parameter posterior estimation, enabling several comparison modalities. QBE is presented as a complement to ARC and CRIT, with potential for synergy: ARC findings can be incorporated as scored evidence, and CRIT’s meta-level considerations can guide what aspects theories should explain. Future directions include compiling comprehensive datasets of proposed evidence per theory, refining and possibly expanding ordinals (e.g., closeness, distribution, scope), crowdsourcing or data-mining thresholds, and collaborating with statisticians and philosophers of science to formalize and justify updating mechanisms. These developments aim to move the field beyond conceptual stalemates toward data-informed, transparent comparisons, focusing effort on the most promising theories.

Limitations

The methodology is explicitly a first sketch rather than a finalized solution. There are currently no complete datasets of all phenomena claimed in favor of each theory, necessitating future compilation and case-by-case scoring. Choices in scaling and updating mechanisms (e.g., multiplication vs addition; handling of the marginal) can influence results and require justification to avoid arbitrariness; multiple competing implementations may emerge. Ordinal category numbers and thresholds (for replication or other proposed ordinals) need community consensus or data-driven derivation. Despite measures to minimize arbitrariness, some decisions (e.g., prior equalization across theories; parameter ranges for middle categories) remain provisional. The demonstration uses simulated data rather than empirical datasets, so empirical validation and refinement are needed.

Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny