Psychology

Habit formation viewed as structural change in the behavioral network

K. Yamada and K. Toda

What if habits aren't driven by a separate system but by a reshaping of a single behavioral network? This study demonstrates that goal-directed actions can become habits through changes in a network of responses and confirms the model with animal behavioral data — research conducted by Kota Yamada and Koji Toda.... show more

Introduction

The study examines how habitual behavior emerges from goal-directed actions. Goal-directed behaviors require evaluating consequences and thus higher computational effort, whereas habits are elicited by situational cues and are stereotyped and computationally cheaper. Canonical accounts posit two distinct systems (goal-directed vs habitual) whose relative influence determines control over behavior. However, recent evidence challenges this dichotomy, suggesting habits may still be consequence-sensitive and that a single integrated mechanism may underlie both behaviors. The authors propose viewing behavior as a network of interrelated responses (nodes) and transitions (edges). They hypothesize that structural changes in this behavioral network (e.g., edge concentration onto particular responses) can explain the shift from goal-directed behavior to habit within a single system.

Literature Review

Classical theoretical frameworks in psychology and neuroscience describe two systems controlling instrumental actions: a goal-directed system sensitive to outcomes and a habit system driven by stimulus-response associations. Canonical models weigh these systems to explain behavior and habit formation as diminished reward sensitivity. Yet, integrated and planning-based models (e.g., Pezzulo et al., Keramati et al.) challenge the dichotomy by mixing goal-directed and habitual control within one continuum or controller, often in multistage decision tasks. Dezfouli and Balleine introduced habits as acquired action sequences, emphasizing sequence shaping rather than loss of reward sensitivity. Empirical reviews further question the canonical view by showing that some habitual behaviors remain sensitive to consequences. Advances in computational ethology reveal rich behavioral repertoires beyond experimenter-defined responses, motivating models that consider broader behavioral structure.

Methodology

Behavior is modeled as a network where nodes are distinct responses (e.g., lever press, grooming, walking), and edges are transitions between responses. The agent aims to maximize reward, selecting responses based on reward values and transitioning via shortest paths in the network. Q-learning encodes the history of reward for transitions, treating the immediately prior response as the state and the next response as the action. Parameters: learning rate α = 0.1; discount γ = 0.5. Edge attachment probabilities depend on learned Q-values via a softmax with inverse temperature β = 50, ensuring at least two edges per node; network generation used NetworkX. Response selection probability is proportional to reward value (operant response r = 1.0; other responses r = 0.001). Shortest paths are computed using Dijkstra’s algorithm; if multiple shortest paths exist, one is chosen at random. Simulation 1: Arbitrary networks were generated from a hypothetical Q-matrix constructed as the outer product of a Q-vector. The operant element’s Q-value (Q-operant) varied from 0.0 to 1.0; other responses were fixed at 0.001. Habit was assessed via a standard reward devaluation procedure: baseline phase with operant reward = 1.0, then devaluation (operant reward set to 0.0) without updating reward within the test. Agents repeatedly selected goals by reward values and traversed shortest paths; the proportion of operant responses was measured pre/post-devaluation. Simulation 2: Agents learned Q-values through interaction under free-operant conditions with different reinforcement schedules and training amounts, then underwent baseline, devaluation, and post-devaluation tests as in Simulation 1. Schedules included variable interval (VI), variable ratio (VR), concurrent VI VI (choice condition; two operants), and VI with non-contingent rewards (no-choice via VT). Other responses delivered small rewards under FR 1. Learning used softmax choice (β = 3.0) and schedule-specific reward timing: VR via geometric/Bernoulli process (VR 15), VI via Poisson/exponential inter-reward intervals matched to VR’s mean interval; concurrent VI 60 VI 60 vs VI 60 VT 60 controlled the presence of alternatives. Simulation 3: Tandem schedules tested competing predictions about habit formation: tandem VI VR vs tandem VR VI (e.g., VI 15 then VR 3; VR 10 then VI 5), mimicking procedures that dissociate response-reward correlation from contiguity. After training under each schedule, habit was assessed as before. Additional analyses assessed network features (number of edges to the operant, betweenness centrality, average path length) and computational efficiency (simulation run time). Robustness checks varied node counts, path search algorithms, and learning algorithms (SARSA), with consistent results.

Key Findings

Simulation 1: Increasing Q-operant concentrated edges onto the operant response, producing habit (higher resistance to devaluation). As Q-operant increased: (a) number of edges attached to the operant grew, (b) operant betweenness centrality rose (greater inclusion in shortest paths), (c) average path length across the network decreased, and (d) baseline simulation time shortened, indicating reduced computational costs and more efficient transitions. Habit thus emerged as a structural network change. Simulation 2: Habit formation increased with training amount and was promoted more under VI than VR schedules. With more training, the operant response acquired more edges and higher betweenness centrality in VI compared to VR. In choice settings (concurrent VI VI), habit formation was disrupted: edges distributed across two operants and devaluation reduced selection of the devalued operant due to intact alternatives. In no-choice (VI VT), a single operant captured more edges and habit was stronger. The Q-value for operant self-transition increased with training and was higher under VR than VI, consistent with higher response rates in VR due to bouts of repeated operant responding. Simulation 3: Tandem VR VI promoted habit formation (higher resistance to devaluation, more edges to the operant, higher betweenness centrality) compared to tandem VI VR and simple VR, aligning with a contiguity-based account rather than response-reward correlation. Q-values for operant self-transitions were higher in VR and tandem VI VR than in VI and tandem VR VI. Collectively, results indicate: (1) habit as edge concentration and network centrality of the operant; (2) VI-like time dependency reinforces transitions from other responses to the operant, strengthening habit; (3) alternatives dilute edge concentration and prevent habit; and (4) structural changes reduce average path lengths and computational costs.

Discussion

The findings address the research question by showing that habit formation can be understood as a structural change in a single behavioral network rather than shifting control between two distinct systems. When edges concentrate onto a specific response, that response becomes central to shortest paths, making it frequently engaged even after devaluation (habit). This network perspective parsimoniously explains classic free-operant devaluation results: training and VI schedules promote habit by selectively reinforcing transitions from other responses to the operant, while explicit choice alternatives prevent edge monopolization. The model reconciles evidence that behaviors remain goal-directed in choice evaluation (reward-based selection) while habits emerge from network topology. It aligns with planning-based integrated models in multistage tasks, extending them to free-operant settings by situating planning in behavioral (response) space. Neural discussions (DLS/DMS roles) support the notion that sequential response patterns and network transitions underlie habit development.

Conclusion

Viewing behavior as a network of responses yields a unified account in which both goal-directed actions and habits arise from a single system via structural changes. Habit formation emerges when the operant response becomes a central hub, acquiring most incoming edges, thereby increasing its inclusion in shortest paths and resistance to devaluation. This framework reproduces major empirical phenomena in free-operant experiments and supports contiguity-based explanations over pure response-reward correlation. The model bridges psychological, neuroscientific, and ethological perspectives by incorporating broader behavioral structure and planning within behavioral space. Future research should integrate innate behavioral constraints (e.g., priors on edges), enable self-transitions and bout dynamics via alternative sequence-generation algorithms, and extend the approach to multistage decision tasks and spatial navigation. Advances in behavioral quantification will further validate network structures and parameterization in real organisms.

Limitations

The model currently: (1) omits innate constraints and predispositions among specific responses, which could be incorporated as priors on edge susceptibility; (2) cannot represent self-transitions or bout-and-pause patterns due to reliance on shortest-path algorithms, necessitating alternative sequence generation methods; (3) is validated only in free-operant tasks, not in multistage Markov decision tasks. Additionally, behavioral quantification challenges (e.g., node granularity, timescale for defining responses) limit direct mapping to real animal networks, though results were robust across network sizes and algorithms.

Related Publications

Explore these studies to deepen your understanding of the subject.

Sociology

Non-coresident family as a driver of migration change in a crisis: the case of the COVID-19 pandemic

U. Kan, J. Mcleod, et al.

Economics

International price volatility transmission and structural change: a market connectivity analysis in the beef sector

T. Tanaka and J. Guo

Environmental Studies and Forestry

Twitch as a privileged locus to analyze young people's attitudes in the climate change debate: a quantitative analysis

A. Navarro and F. J. Tapiador

Medicine and Health

Concept and location neurons in the human brain provide the ‘what’ and ‘where’ in memory formation

S. Mackay, T. P. Reber, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny