Political Science
A numerical study on efficient jury size
T. Watanabe
Jury trials are widely regarded as embodiments of democracy, yet practical constraints prevent involving entire communities in every trial. Historically, juries have commonly comprised about 12 lay citizens, particularly when unanimity is required, and this convention persists despite arguments for smaller juries. This study investigates a statistical explanation for why approximately 12 jurors are often chosen. The central hypothesis is that juries of around 12 individuals optimally balance verdict accuracy—how well a jury’s unanimous verdict represents the decision of the full community—and deliberation time—the number of voting steps needed to reach unanimity. Using a majority-vote model with noise on a fully connected network, the study quantifies these two metrics, explores how community opinion homogeneity and anti-conformity shape the efficient jury size, and tests whether these factors together can explain the prevalence of ~12-person juries. The work aims to provide a parsimonious, statistically grounded account of efficient jury size with implications for judicial practice and democratic representation.
Prior research has debated whether jury decisions adequately represent community opinions and examined properties such as impartiality, consistency, and accuracy. Sociological and historical accounts suggest that jury sizes have been influenced by political and societal events, while a meta-analysis of 17 studies reported democratic and sociological advantages of 12-person over 6-person juries. Some studies argue for smaller juries, yet the conventional ~12-juror system persists across jurisdictions. Theoretical and statistical models of opinion dynamics, including the majority-vote model with noise and related voter/Ising-like models, provide tools to study group decision-making, consensus formation, and phase transitions in social systems. Analytical results on critical noise thresholds for majority-vote dynamics on complete graphs inform parameter choices to ensure convergence toward consensus. This paper builds on these strands by applying a simple, widely used opinion dynamics model to the specific question of efficient jury size and by linking model parameters to measurable community characteristics.
Model and setting: The study uses the majority-vote model with noise q (0 ≤ q ≤ 0.5) on a fully connected complete graph to simulate deliberations among N jurors (N ≥ 2) who interact with all others. Each juror i at time t holds a binary opinion σ_i(t) ∈ {−1, +1} (e.g., not guilty or guilty). Initial opinions σ_i(0) are sampled from a large community of 10^5 individuals with opinion bias F_Major (0.5 < F_Major ≤ 1), such that the ratio of majority to minority opinions is F_Major:(1 − F_Major). Opinion update rule: At each discrete time step t → t+1, each juror adopts the jury’s majority opinion at time t with probability 1 − q or the opposite (minority) with probability q. In case of a tie at time t, opinions are updated randomly. This noise q reflects anti-conformity or social temperature. Updates proceed until all jurors share the same opinion. Deliberation time: The deliberation time T is the number of steps until unanimity is reached. Because T is skewed, its median, denoted (T), is used to summarize deliberation time. Verdict accuracy: Jury verdict accuracy is defined relative to the decision that would be made by the full community (10^5 individuals). Community dynamics are simulated with the same update rule but among all 10^5 individuals. Since community opinions are unlikely to become unanimously aligned, the community verdict is defined by the majority opinion at t = 100. For each jury trial, the accuracy score is 1 if the unanimous jury verdict matches the community verdict and 0 otherwise. The mean accuracy over trials is (Accuracy). To quantify the benefit of increasing jury size, the improvement over a baseline 2-juror system is Δ(Accuracy) = (Accuracy)_N − (Accuracy)_2. Jury efficiency metric: Jury efficiency is defined as Δ(Accuracy)_N / (T)_N. For each parameter setting, this is computed for N from 2 to 50, and the most efficient jury size N_efficient is identified as the N at which the fitted efficiency curve attains a local maximum (via polynomial fit and differentiation). Simulation protocol: For each combination of parameters (N, F_Major, q), the simulation is repeated many times to estimate (T) and (Accuracy). The text reports conducting extensive repetitions (10^6 trials) to obtain robust estimates of T and accuracy. Parameter ranges and rationale: The search spans F_Major in [0.52, 0.66], focusing on cases without overwhelming supermajorities (as supermajorities would yield quick, accurate unanimity regardless of N), and q in [0.05, 0.2], chosen to be below the critical noise threshold q_e for complete graphs to favor ordering and ensure a high chance of reaching unanimity even for smaller N. Analytical approximations for scaling: Across the parameter space, the median deliberation time scales approximately exponentially with N: ln(T) = β_Time N + ε_Time (R^2 ≥ 0.92), while the accuracy gain scales sublinearly: ln Δ(Accuracy) = β_Acc ln N + ε_Acc (R^2 ≥ 0.92). From these, the logarithm of efficiency satisfies ln(Δ(Accuracy)/(T)) = β_Acc/N − β_Time, implying efficiency peaks near N ≈ β_Acc / β_Time. This approximation is validated by a strong correlation between β_Acc/β_Time and empirically obtained N_efficient (R^2 = 0.78; coefficient of variation = 0.14). Determinants of β coefficients: Regression analyses across the parameter grid show β_Acc is strongly and negatively associated with F_Major (r ≤ −0.96, P < 10^−3, Bonferroni-corrected P < 0.05) and not associated with q (|r| < 0.14, P > 0.76). Conversely, β_Time is strongly and positively associated with q (r ≥ 0.97, P < 10^−3, Bonferroni-corrected P < 0.05) and not with F_Major (|r| < 0.086, P > 0.85). Real-life network analysis: To relate F_Major and q in real settings, the model is simulated on three empirical undirected, unweighted networks from SNAP: a co-authorship (CondMat) network, an email network (EU-core), and a Facebook ego network. For each network, q is set in [0.05, 0.15] and initial opinion homogeneity F_Major_initial in [0.52, 0.66]. Initial opinions are assigned to match F_Major_initial. Dynamics proceed until opinion homogeneity reaches a plateau, defined by fluctuations within 0.1 over 10 updates. For fixed q, the converged homogeneity F_Major_converge is recorded and found to be largely independent of F_Major_initial. Correlations between F_Major_converge and q are then estimated. The empirically observed inverse relationship between F_Major and q is combined with the brute-force N_efficient results to narrow the predicted efficient jury size range in real-life contexts.
- Example case (F_Major = 0.6, q = 0.075): The median deliberation time (T)_12 = 3.76 steps. Jury accuracy (Accuracy)_12 = 0.75 versus (Accuracy)_2 = 0.60, yielding efficiency Δ(Accuracy)_12/(T)_12 = (0.75 − 0.60)/3.76 = 0.040. Efficiency peaks at N_efficient = 12.1.
- Across the explored parameter space (0.52 ≤ F_Major ≤ 0.66; 0.05 ≤ q ≤ 0.2), the most efficient jury size ranges from 6.70 (F_Major = 0.66, q = 0.2) to 18.40 (F_Major = 0.52, q = 0.05).
- Scaling relations: ln(T) grows approximately linearly with N (exponential time increase), while ln Δ(Accuracy) grows approximately linearly with ln N (sublinear accuracy gain). The theoretical approximation N_efficient ≈ β_Acc/β_Time correlates strongly with empirically obtained N_efficient (R^2 = 0.78; coefficient of variation = 0.14).
- Determinants: β_Acc is strongly negatively correlated with F_Major (r ≤ −0.96, Bonferroni-corrected P < 0.05) and not with q; β_Time is strongly positively correlated with q (r ≥ 0.97, Bonferroni-corrected P < 0.05) and not with F_Major. Interpretation: more homogeneous communities reduce the accuracy benefit of larger juries (reducing N_efficient), while stronger anti-conformity increases deliberation time growth (also reducing N_efficient).
- Real networks: In collaboration, email, and Facebook networks, opinion homogeneity converges to a q-dependent value independent of initial conditions. F_Major_converge is negatively correlated with q (r^2 ≈ 0.98; P ≤ 0.0034; Bonferroni-corrected P < 0.05). Incorporating this inverse relationship with the brute-force map narrows the efficient jury size to 8.8–14.7, i.e., 11.8 ± 3.0, aligning with conventional 12-person juries.
- Overall, an inverse correlation between community opinion homogeneity and anti-conformity tendency helps prevent extreme shrinkage or expansion of efficient jury sizes.
The study addresses the question of why ~12-person juries are prevalent by formalizing jury deliberations as majority-vote dynamics with anti-conformist noise and by defining efficiency as the accuracy gain per unit deliberation time. The findings show that two community-level properties independently shape the efficient jury size: higher opinion homogeneity diminishes the marginal accuracy benefits of enlarging juries, pushing the efficient size downward, while higher anti-conformity accelerates the growth of deliberation time with jury size, likewise reducing the efficient size. Crucially, empirical network simulations reveal an inverse relationship between opinion homogeneity and anti-conformity, which stabilizes efficient jury sizes within a moderate range and explains the robustness of ~12-person juries. The alignment of the model’s predicted efficient size (11.8 ± 3.0) with common real-world jury sizes suggests that simple statistical mechanics of opinion formation can capture essential features of jury decision-making under unanimity constraints. These results are relevant to policy discussions on jury composition, indicating that expanding or shrinking juries carries trade-offs that depend on underlying community characteristics.
A simple majority-vote model with noise, calibrated to balance verdict accuracy against deliberation time, explains why jury sizes around 12 can be efficient under unanimity rules. The efficient jury size is governed by two determinants: community opinion homogeneity (which curtails accuracy gains from larger juries) and anti-conformity (which increases deliberation time with size). An empirically observed inverse correlation between these determinants prevents extreme optimal sizes and yields a predicted efficient range of 8.8–14.7 (mean 11.8 ± 3.0), consistent with prevailing jury practices. Future research should incorporate sociopolitical factors influencing jury institutions, extend models to non-unanimous verdict rules and mixed lay-professional panels, introduce heterogeneity (e.g., strong opinion holders, individual variation in anti-conformity, graded opinions and confidence), and develop analytical solutions linking parameters to accuracy and consensus times on various network topologies.
- Sociological scope: The work does not model historical, political, or institutional processes that have shaped jury sizes across jurisdictions.
- Legal system variability: Many real-world systems use majority or supermajority verdicts and some involve professional judges; the study focuses on unanimity-only lay juries.
- Model simplifications: Assumes binary opinions, homogeneous noise parameter q across individuals, fully connected interactions for juries, and ignores strong opinion holders or influencers and individual differences in anti-conformity.
- Measurement of community decision: Community verdicts are approximated via majority at a fixed time (t = 100), which may not capture all realistic dynamics.
- Analytical generality: While supported by regressions and simulations, deeper analytical derivations could more precisely relate q, F_Major, accuracy, and consensus times; suggested approaches include heterogeneous mean-field and master equation methods from prior work.
- Parameterization and OCR uncertainties: Critical noise thresholds and some quantitative relationships are taken from prior studies and applied qualitatively; exact functional forms may vary across models and network structures.
Related Publications
Explore these studies to deepen your understanding of the subject.

