Interdisciplinary Studies
Ethnic markers and the emergence of group-specific norms
J. Ozaita, A. Baronchelli, et al.
The study addresses whether observable ethnic markers foster cooperation or primarily facilitate coordination in social interactions. While some theories view ethnic groups as loci for cooperation based on observable traits, that account struggles with free-riding. An alternative line of work suggests markers help individuals coordinate on shared norms and conventions. Empirical evidence is mixed, though experiments designed to discriminate between cooperation and coordination generally support the coordination hypothesis. The paper investigates how ethnic markers can underpin the emergence of group-specific norms via coordination games when individuals adapt via reinforcement learning rather than imitation, a more realistic mechanism when global payoff information is unavailable. The key research question is under what conditions reinforcement learning creates marker–behavior correlations that enable successful coordination within and across marked groups.
Foundational work argues that ethnic groups are maintained through social categorization and parochialism, with markers like language, dress, and cuisine guiding social interaction. Earlier hypotheses proposed markers promote cooperation within groups, but that view faces the free-rider problem. Alternative theories propose markers solve coordination problems by signaling norms and focal points; the empirical record is mixed, but targeted experiments tend to support coordination over cooperation as the role of markers. McElreath et al. introduced a model where a binary marker guides behavior selection under imitation-driven evolutionary dynamics. Related agent-based and bargaining models show tags can support self-enforcing social norms and, in some settings, cooperation. This paper departs from imitation and memory-based frameworks by focusing on reinforcement learning with fixed, external markers, assessing its ability to generate marker-dependent conventions in pure coordination games.
- Model overview: A population of N agents repeatedly plays a two-action coordination game in randomly formed pairs. Each agent carries an immutable, observable binary marker (0 or 1) and chooses an action (0 or 1) when interacting. Agents use a marker-contingent probability vector: probabilities of choosing each action when facing same-marker partners and when facing different-marker partners (P_=0 + P_=1 = 1; P_≠0 + P_≠1 = 1).
- Payoff structure: Symmetric pure coordination game with payoffs 1 for mismatch and 1 + δ for coordination (δ = 0.5 by default). Markers do not alter payoffs, only action-selection probabilities.
- Interaction bias (homophily): With probability e, interactions are unbiased (random partner); with probability 1−e, the focal agent interacts with someone sharing the same marker. Thus, interactions with same-marker partners occur with probability (1−e)/2.
- Reinforcement learning: Each agent has an aspiration level A. After each round, the agent compares realized payoff π_t to A, generating a stimulus s (positive if π_t ≥ A, negative if π_t < A). Action probabilities are updated separately for each marker-contingent context using: • If s ≥ 0: P_{a,t+1} = P_{a,t} + (1 − P_{a,t}) s • If s < 0: P_{a,t+1} = P_{a,t} + P_{a,t} s The learning rate l controls the speed of adaptation (fast learning l = 0.5 in baseline; slower l = 0.05 tested). Actions become more (less) likely after positive (negative) stimuli.
- Simulation protocol (baseline): • N = 500 agents; markers and initial behaviors/probabilities assigned at random with equal probability. • δ = 0.5; e = 0.5 (interaction bias toward same-marker partners); l = 0.5 (fast learning). • Fixed, homogeneous aspirations for baseline sweeps: A ∈ {1, 1+δ/2, 1+δ, 1+2δ} and continuous ranges around thresholds; further experiments include heterogeneous aspirations and dynamic aspirations (habituation). • Each run lasts 4000 interactions per agent; results averaged over 100 independent simulations.
- Variants explored: (i) imperfect coordination (asymmetric payoffs creating a Pareto-dominant equilibrium), (ii) different learning rates l, (iii) different bias e (including e = 1 unbiased), (iv) heterogeneous aspirations mixtures, (v) multiple groups with migration (two groups, group size 250, migration parameters: fraction migrating m, migration frequency β), (vi) habituation (dynamic aspirations A_{t+1} = (1−h)A_t + h π_t), and (vii) finite-size effects (N ∈ {200, 500, 1000, 1250, 1500}).
- Outcome measures: Coordination ratio m = (number of coordinated rounds)/(total interactions); distributions/variance of marker-contingent probabilities to assess intra- and inter-marker correlations.
- Three behavioral regimes driven by aspiration level A (δ = 0.5): • Frequentists (A ≤ 1): Positive or neutral stimuli dominate. Agents converge to deterministic but idiosyncratic action choices not correlated with markers; coordination ratio m ≈ 0.5. • Learning agents (1 < A < 1 + δ/2 = 1.25): Positive stimuli dominate but negative exist. Agents develop strong intra-marker and noticeable inter-marker correlations (marker-dependent strategies). Coordination ratio m ≈ 1 (slightly below due to fluctuations/errors). Population concentrates at probability corners corresponding to consistent rules by marker. • Random walkers (A ≥ 1 + δ/2 up to ≈ 1 + δ): Negative stimuli dominate. Probabilities remain broadly distributed near 0.5; no stable marker–behavior correlations; m ≈ 0.5.
- Imperfect coordination (asymmetric payoffs): When the two coordinated outcomes have different payoffs, reinforcement learning drives agents to the Pareto-dominant equilibrium regardless of markers; markers become irrelevant for equilibrium selection.
- Learning rate effects: Slower learning (l = 0.05) smooths transitions near A = 1 and keeps random walkers near quasi-random probabilities (≈ 0.5) longer. Qualitative regimes persist; low aspirations show higher coordination given more time to settle.
- Interaction bias e: With bias (e = 0.5), destruction of correlations near the learning–random-walker transition proceeds in two stages: inter-marker correlation breaks first while intra-marker correlation persists longer; with no bias (e = 1), both correlations degrade simultaneously. Coordination distributions exhibit positive skew (biased) vs negative skew (unbiased) near the transition.
- Heterogeneous aspirations: Mixing learning agents (A = 1.1) with frequentists (A = 0.8) or random walkers (A = 1.5) reduces overall coordination relative to homogeneous learning populations. • If learning agents are majority (75%), partial intra- and inter-marker correlations persist (probabilities in correlated direction between 0.5 and 1). Frequentists split into four deterministic subgroups aligned or misaligned with learning agents’ conventions; random walkers cluster around learning agents’ conventions. • At 50–50 mixes, inter-marker correlation typically breaks while intra-marker correlation partially remains; m remains > 0.5. • If learning agents are minority (25%), correlations are very weak or vanish; they gravitate to frequentists’ equilibria and achieve modestly better m than random mixtures. • With three-way mix (A = 0.8, 1.1, 1.5), outcomes resemble the minority-learning case.
- Multiple groups and migration: • Different aspirations across two groups reproduce previously described within-group regimes; spatial separation per se does not create new phenomena. • With two groups of learning agents (A = 1.1), migration promotes homogeneity of conventions across groups. For fixed migrant fraction m = 1/N, increasing migration frequency (decreasing β) increases both intra- and inter-homogeneity (Table 1: No migration → intra 20%, inter 45%; β = 0.1 → intra 50%, inter 100%; β = 0.01 → intra 100%, inter 100%). For fixed β = 0.1, increasing migrant fraction raises homogeneity (Table 2: m = 1/N → intra 50%, inter 100%; m = 10/N → intra 100%, inter 100%).
- Habituation (dynamic aspirations): • Low habituation (h < 5×10^−3) with initially low A maintains correlations and high coordination; aspirations drift to A_eq ≈ 1 + δ. • Intermediate h (≈ 5×10^−3 to 0.5) progressively destroys correlations and reduces m, often via two-stage breakdown under biased interaction. • High h (≈ 0.5) yields uniform aspiration distributions in [1, 1+δ], producing random behavior and m ≈ 0.5. When initial A > 1+δ/2, behavior remains randomized regardless of h.
- Finite-size analysis: Qualitative regimes persist across N ∈ [200, 1500]. Variance peaks near A ≈ 1+δ/2 diminish with size, but standard finite-size scaling signatures of a phase transition are not observed.
The findings show that reinforcement learning with appropriately tuned aspirations can generate robust marker–behavior correlations that resolve pure coordination problems by enabling different, consistent conventions within and between marked subgroups. This supports the coordination (rather than cooperation) role of ethnic markers: when multiple equivalent equilibria exist, markers act as focal cues that, combined with individual learning, lead to high coordination. However, aspiration levels must be moderate; if too low, agents become path-dependent frequentists without marker linkage, and if too high, they randomize, both cases yielding m ≈ 0.5. When coordinated outcomes differ in payoff, the payoff gradient dominates learning and renders markers irrelevant, aligning with the notion that markers mainly matter for equilibrium selection among equivalent options. Migration, rather than fragmenting conventions, can synchronize them across groups, producing intra- and inter-group homogeneity given sufficient flow or frequency. Compared with imitation-driven models, reinforcement learning requires only private payoff information and fixed markers, leading to less frequent but clearer marker correlations contingent on aspiration tuning. The results delineate when and how observable social traits can structure norms and conventions via individual learning dynamics.
This work introduces and analyzes a reinforcement-learning model showing that ethnic markers can facilitate the emergence of group-specific norms in pure coordination settings. Main contributions: (i) identification of aspiration-dependent regimes, with a mid-range enabling near-perfect coordination via marker-dependent conventions; (ii) demonstration that when equilibria are payoff-asymmetric, markers become irrelevant under reinforcement learning; (iii) characterization of how learning rate, interaction bias, heterogeneous aspirations, migration, and habituation shape or erode coordination; and (iv) robustness across population sizes. Future directions include: extending to richer game structures (e.g., bargaining, anti-coordination), multi-valued or continuous markers, memory-based reinforcement, co-evolution of markers and strategies, endogenizing aspiration dynamics with environmental feedback, and experimental tests with small groups to validate model predictions.
- The study relies on agent-based simulations; analytical characterization is limited.
- Markers and actions are binary and markers are immutable; real-world markers can be multi-dimensional or evolve.
- Results depend on aspiration calibration; dynamic aspirations (habituation) can undermine coordination depending on h.
- Payoff structures are stylized; noise, observation errors, or richer interaction networks are not explicitly modeled.
- Coordination success is measured by aggregate ratios; individual-level heterogeneity beyond aspirations is limited.
- External validation via experiments or field data is not provided within the study.
Related Publications
Explore these studies to deepen your understanding of the subject.

