Biology
Social networks predict the life and death of honey bees
B. Wild, D. M. Dormagen, et al.
The study addresses how individual roles and task allocation within complex animal societies, specifically honey bee colonies, are reflected in social interaction networks. While honey bee workers typically exhibit temporal polyethism—shifting from brood care when young to foraging when older—there is substantial variability in individual developmental trajectories driven by genetics, physiology, colony state, and environmental conditions. Existing studies often focus on specific interactions or roles and may disturb the system or rely on limited tracking, leaving open the challenge of measuring comprehensive, longitudinal social networks without interference. The authors aim to develop a scalable, noninvasive, low-dimensional descriptor of an individual’s position within the colony’s social network—termed network age—that captures task allocation, predicts behavior and survival, and reveals developmental pathways.
Prior work has used social network descriptors to study pair bonding, leadership, cultural diffusion, and collective behavior across taxa. In social insects, division of labor has been hypothesized to emerge from interactions, but comprehensive links between network structure and roles remain underexplored. Honey bee task allocation follows age-related polyethism but varies due to internal factors (genotype, ovary size, sucrose responsiveness), colony state (brood and food stores), and environment (season, resource availability). Spatial organization of tasks within the nest further couples behavior with location. Automated tracking has renewed interest in colony-wide interaction changes and spatial predictors of tasks, yet most studies track only subsets of individuals or short intervals. There is a need for methods capturing multiple interaction modes over entire lifetimes to understand stability and transitions in task allocation.
Study system and tracking: A full colony of individually tagged Apis mellifera in a two-sided single-frame observation hive was recorded at 3 Hz for 25 consecutive days (1–25 Aug 2016), with tracking extending from 24 Jul to 19 Sep 2016. Bees had outside access via a tube. A total of 3166 bees were tagged; 1920 individuals from 30 cohorts (ages 0 days to ~8 weeks) were recorded during the focal period. Video capture (two cameras per side) yielded 46 TB of data, processed to decode tags and track positions/rotations over time. Homographies aligned image coordinates to comb coordinates. Low-confidence decodes and implausible detections were filtered; Otsu’s method separated real from spurious IDs.
Nest map and task descriptor: For each day, the capped brood and honey storage areas were annotated from background images; open brood inferred as areas becoming capped within 8 days. The dance floor was estimated from high-confidence waggle run detections and fit by an ellipse; an exit zone was defined as a 7.5 cm region around the tube. For each bee, one high-confidence detection per minute was sampled and the fraction of detections within brood, honey storage, dance floor, and near-exit regions computed to form a per-day task-location descriptor.
Interaction networks: Multiple daily, undirected, weighted networks were constructed and aggregated over 24 h: (1) proximity contacts (tags <2 cm apart for ≥0.9 s), (2) Euclidean proximity similarity networks based on daily mean pairwise distances (Gaussian similarity and max-distance transforms), (3) trophallaxis (food exchange) events detected via a two-step classifier: initial logistic regression on distances/orientations followed by a CNN on 5 s trajectories; combined filters achieved 60% recall at 47% precision while discarding 99.97% of negatives on an unbiased test set, and (4) interaction effect networks based on pre/post-contact changes in movement speed (four matrices capturing positive/negative mean and cumulative changes for interactions ≤60 s separated by ≥5 s). Affinity matrices were rank-transformed and normalized to [0,1].
Network age computation: For each day and interaction mode, spectral embeddings (first eight dimensions) were computed (bispectral decomposition for non-symmetric effect matrices). Signs of eigenvectors were aligned across days by Spearman correlation. For each day, embeddings across interaction types were concatenated to yield a per-bee feature vector. Canonical correlation analysis (CCA) learned a linear mapping from embeddings to a three-dimensional space maximizing correlation with a projection of the task-location descriptor; the first CCA dimension was defined as network age. Robust scaling mapped the daily distribution so that the 5th percentile equaled 0 and the 95th percentile equaled 40, aligning directionality with biological age. An unsupervised variant replaced CCA with PCA on concatenated spectral embeddings.
Task prediction and statistics: Models (GLM or small neural network) were trained to predict either individual location fractions (binomial likelihood) or the full distribution (multinomial) using biological age or network age as predictors. Model performance was evaluated via McFadden’s pseudo R^2 with 128 bootstrap samples; effect sizes were compared by bootstrapped CIs and likelihood ratio χ^2 tests for nested models. Repeatability of network age across days for individuals was computed as R = Var_r/(Var_r + Var_w). Developmental trajectory clustering used hierarchical clustering (Ward’s method) on daily network age time series (missing values linearly interpolated) with three clusters chosen to separate early and late transitions. The onset of bimodality was quantified by daily K-means clustering of network age into two modes and identifying the biological age when individuals first joined the higher mode.
Mortality and behavior measures: Death dates were inferred via a Bayesian changepoint model on normalized daily detection counts with weakly informative priors and sampling by PyMC3 NUTS. Additional behavioral measures included daily and nightly average speed, circadian rhythmicity (Lomb-Scargle; summarized via sine fit power), and time of peak activity. Predictive models for these measures used identity link and Gaussian residuals. Targeted embeddings: The CCA step was repurposed to optimize for specific outcomes (e.g., days until death, velocities, circadian metrics) by maximizing correlation with the target variable. Future predictability: To predict future task allocation, spectral embeddings were computed per day; CCA mappings and regression parameters were trained on a sliding 12-day window preceding the validation window and applied to held-out future days (1–11 days ahead). A paired binomial test assessed whether network age improved mean-squared error over biological age when predicting 7 days ahead.
Forager validation: A feeder experiment (28 Jul–22 Aug 2016) trained foragers to sucrose stations (F1–F4). Bees landing at the feeder were photographed and identified, producing a set of N=40 known foragers for age comparisons.
- Network age accurately captures task allocation and spatial preferences, outperforming biological age. In multinomial regression, network age achieved median McFadden’s R^2 = 0.682 (95% CI [0.678, 0.687]) vs biological age 0.342 (95% CI [0.335, 0.349]); effect significant (likelihood ratio χ^2, p < 0.001, N = 26,403; N = 128 bootstraps).
- Known foragers (N = 40) had high network ages with significantly lower variance than biological age (Levene’s test on z-transformed values, p < 0.001, N = 200). A 12-day-old forager had network age 25.5 (z = 0.61) despite low biological age (z = −0.46), indicating network age better reflects role than chronological age.
- Subsampling robustness: With only 1% of bees tracked, network age still predicted tasks well (median R^2_MCF = 0.516, 95% CI [0.135, 0.705]); at 5% tracking, performance approached full-colony levels (median 0.650, 95% CI [0.578, 0.705]). An unsupervised PCA variant achieved median R^2_MCF = 0.646 (95% CI [0.641, 0.650]).
- Population-level developmental bifurcation: After ~6 biological days, network age distributions became bimodal. Bees with high network age spent most time on the dance floor (foragers), while low network age bees were predominantly in honey storage; transitions from high to low were rare.
- Distinct developmental trajectories within same-aged cohorts: Clustering of network age time series revealed at least three patterns—early transition to high network age (~11 days), later transition (~21 days), and sustained low network age through the focal period—consistent across cohorts.
- Individual transitions are gradual and tasks are stable: Spatial distributions shifted smoothly over several days during transitions; network age was highly repeatable across days (median R = 0.612, 95% CI [0.199, 0.982]).
- Future prediction: Network age predicted task allocation up to 10 days ahead. Predicting task 7 days in the future using current network age yielded better performance than biological age predicting current tasks (paired binomial test, p < 0.001, N = 55,390; median improvement in R^2 = 0.080, 95% CI [0.055, 0.090], N = 12 time windows).
- Mortality prediction: Network age predicted impending death better than biological age (median R^2 = 0.165, 95% CI [0.158, 0.172] vs 0.064, 95% CI [0.059, 0.068]; p < 0.001). Biologically young but network-old bees had a 7-day mortality of 80.6% (N = 139) vs 42.1% (N = 390) for biologically old but network-young bees (χ^2 test p < 0.001, N = 529), consistent with risks of precocious foraging.
- Movement and circadian behavior: Network age outperformed biological age in predicting daytime/nighttime speeds, circadian rhythm power, and time of peak activity (likelihood ratio χ^2 tests p < 0.001, N = 26,403 for all except where noted).
- Targeted embeddings: Replacing the task descriptor with specific targets in CCA improved corresponding predictions. A mortality-optimized embedding improved death-date prediction by 31% over standard network age (median ΔR^2 = 0.05, 95% CI [0.04, 0.06], N = 128); similar gains were found for velocity and circadian measures (except time of peak activity).
The results show that the multimodal social interaction network of a honey bee colony encodes rich information about individual roles and states. By extracting a low-dimensional descriptor (network age) guided by spatial task labels yet constrained to information inherent in the social network, the study demonstrates that social interactions alone capture task allocation, developmental progression, mortality risk, and movement patterns. Network age distinguishes task groups (e.g., nurses vs foragers) and reveals colony-level heterogeneity in developmental trajectories, while transitions at the individual level are gradual and roles are stable over days. Comparisons to models based solely on spatial location indicate that social networks contain predictive information beyond location. The unsupervised PCA variant remains predictive, suggesting location signals dominate but do not exhaust the information content. The targeted embedding framework further illustrates that specific behavioral or physiological outcomes (e.g., mortality risk) can be extracted from social networks. These findings underscore the value of continuous, whole-colony, multi-interaction tracking for understanding how social organization emerges and is maintained, and they suggest broad applicability to other complex social systems.
This work introduces network age, a scalable, noninvasive, one-dimensional descriptor derived from multimodal social interaction networks that accurately captures and predicts honey bee task allocation, developmental pathways, mortality risk, and movement patterns. Network age outperforms biological age across tasks, generalizes under subsampling, and can be adapted via targeted embeddings to address specific questions (e.g., mortality, circadian activity). The method opens avenues for real-time monitoring and manipulation (e.g., selective removal at transition points), testing environmental or disease impacts on development, and linking internal physiological mechanisms (e.g., juvenile hormone, vitellogenin regulation) to emergent social structure. Future research could extend to multi-colony comparisons, real-time computation, inclusion of intraday dynamics and additional interaction types, and applications in other animal societies and contexts such as disease transmission and environmental stressors.
- Single-colony study: Although thousands of individuals and many overlapping cohorts were analyzed, results (e.g., timing of transitions) may vary with environmental conditions and colony idiosyncrasies. Cross-colony embeddings are nontrivial due to non-overlapping individuals, necessitating within-trial control comparisons for treatment effects.
- Interaction coverage and temporal aggregation: Only a subset of possible interaction modalities was captured, and daily aggregation may miss informative intraday dynamics; including more behaviors and finer temporal resolution may enhance discrimination and prediction.
- Dependence on spatial guidance: The primary network age used CCA guided by annotated task-associated locations; while the method extracts only information present in the networks, guidance may bias the extracted dimension toward spatially coupled behavior. The PCA variant mitigates this but is slightly less predictive.
- Tracking constraints: Despite robust pipelines, occlusions, decoding errors, and visibility gaps can affect data completeness; mortality estimation required a Bayesian model due to imperfect detections.
- Generalization and standardization: No straightforward common embedding across colonies without shared individuals; external validation across seasons, genotypes, or stressors will strengthen generality.
Related Publications
Explore these studies to deepen your understanding of the subject.

