Medicine and Health
Infant cries convey both stable and dynamic information about age and identity
M. Lockhart-bouron, A. Anikin, et al.
This fascinating study reveals that human infant cries encode vital information regarding age and identity, with tonal changes observed as babies grow. Conducted by Marguerite Lockhart-Bouron and colleagues, the research showcases the unique acoustic signatures that remain stable but challenges our ability to pinpoint the cause of crying.
~3 min • Beginner • English
Introduction
Human infant crying is an evolutionarily conserved signal eliciting caregiver attention and aid. Although cries are known to reflect distress level, it remains unclear whether they encode discrete causes (for example hunger, isolation, discomfort) and whether adults can reliably infer such causes from acoustics alone. Prior work and popular beliefs propose distinct cry types, but scientific evidence is mixed, often confounded by small or heterogeneous datasets and lack of control for infant identity and age. Alternatively, the graded cry hypothesis posits that cry acoustics vary with distress severity rather than context. A further complication is strong inter-individual differences in infant cries, which may serve identity signaling. The present study asks: Do infant cries carry robust information about sex, age, and identity? Do they encode cause (hunger, isolation, discomfort) universally or within infants? Can adults learn to classify cry causes? The study aims to address these questions using a large, longitudinal, naturalistic corpus paired with acoustic modeling, machine learning, and listener experiments.
Literature Review
Classical views posited discrete cry types tied to specific causes, with early reports that caregivers and nurses could identify them without context. More recent studies indicate limited discrimination between causes when distress levels are similar and show that experience influences categorization. Automated classification efforts have reported successes mainly distinguishing pain versus non-pain cries, but common-context differentiation (hunger, separation, discomfort) remains under-tested due to scarce, well-annotated datasets. Existing corpora often mix ages, include clinical/pathological cries, have unknown infant identities, or suffer from small sample sizes, risking overfitting and pseudo-replication. The graded distress hypothesis emphasizes that acoustic features (e.g., nonlinear phenomena, pitch spikes) scale with distress intensity rather than cause. Individual differences in cries are well documented; pitch is a key, stable indexical cue from infancy through adulthood, suggesting potential identity signatures. However, standardized, fine-grained analyses controlling for age and identity have been lacking, and sex differences are not expected pre-puberty.
Methodology
Design and ethics: Preregistered study (NCT03716882) approved by French ethics committees; informed consent obtained. Participants: 24 healthy term infants (10 girls, 14 boys) from 24 families near Saint-Etienne, France. Recordings: In-home continuous 48-hour sessions at four ages post-birth: ~15 days (0.5 m), 1.5 m, 2.5 m, 3.5 m. Due to availability and decreasing crying with age, recordings obtained for 17 infants at 0.5 m, 24 at 1.5 and 2.5 m, and 12 at 3.5 m. Equipment: Wildlife Acoustics Song Meter SM4 with omnidirectional mics, 44.1 kHz, 16-bit WAV, recorder positioned 1.2–2.1 m high and 1–4 m from infant. Total ~3600 hours recorded.
Parental questionnaire: For each crying bout, parents logged onset, perceived cause (hunger, isolation, discomfort, pain, unknown, other), actions taken, and the action that stopped the cry. In 75% of cases, perceived cause matched the effective action. Labeling prioritized the action that stopped the cry when mismatched.
Signal processing and selection (six-step pipeline): (1) Manual extraction of cry sequences from 48-h sessions (Praat), linked to parental entries, yielding 3308 sequences. (2) Quality rating: excellent, acceptable, noisy; retain excellent/acceptable, totaling 676 sequences (mean 49 ± 74 s). (3) Manual cleaning to remove brief noise bursts. (4) Automatic segmentation into 78094 vocalizations with soundgen::segment (≥50 ms segments, ≥100 ms inter-segment gaps). (5) Inclusion criteria to retain cries and remove non-cry infant vocalizations: ≥20% voiced, median f0 >150 Hz, duration >250 ms, Wiener entropy <0.6, resulting in 44605 cries (estimated 2% false positives in validation sample). (6) Exclusion of rare/ambiguous contexts: remove pain (95 cries), other (76), and unknown (11.7% of cries), leaving three main causes: isolation 15609 (39.8%), hunger 13095 (33.4%), discomfort 10497 (26.8%), mean duration 0.86 ± 0.59 s (range 0.22–12 s). Final dataset for analysis: 39201 cries.
Acoustic features: Ten predictors computed primarily in soundgen 2.0.0; jitter and shimmer in Praat: median pitch (f0), pitch IQR, voicing proportion, spectral centroid, Wiener entropy, harmonicity (HNR), jitter, shimmer, roughness (30–150 Hz AM band), and duration. Amplitude was not analyzed due to variable mic distance. For modeling by cause, features were z-normalized within each infant; for identity classification also across observations.
Statistical modeling: Multivariate Bayesian mixed models (R, brms 2.17.0) assessed effects of sex and age and acoustic differences by cause and age: mvbind(duration, entropy, HNR, jitter, pitch_iqr, pitch_median, roughness, shimmer, spectralCentroid, voiced) ~ cause * age + (cause * age | baby) + (1 | sequence). Random intercepts per infant and per sequence controlled repeated measures.
Machine learning: Random Forests (randomForest 4.7-1) using the 10 acoustic predictors; stratified by class; 2/3 training, 1/3 testing; training and testing from different recording sessions; 100 runs with different splits; reported median accuracy and 95% coverage intervals. Additional analyses included models with age as predictor, per-age training/testing, identity classification across 24 infants, and single-infant cause models with transfer testing to other infants.
Visualization of acoustic similarity: Frame-level DTW distances (for features such as f0, HNR) with UMAP (uwot 0.1.11) to project high-dimensional distances into 2D spaces preserving global and within-cluster structure. Also computed pairwise infant distance matrices per context to test for shared coding strategies.
Perception experiments: Two online playback studies hosted on Labvanced with participants recruited via Prolific. Experiment 1 (implicit training): n=146 adults (36 mothers, 37 fathers, 38 non-mother women, 35 non-father men; mean age 28.0 ± 6.4). Participants rated intensity of one infant’s cries during training (sliders for hunger/discomfort/isolation) then performed 3-AFC cause classification on new cries from the same infant. Experiment 2 (explicit training): n=102 (26 mothers, 25 fathers, 24 non-mother women, 27 non-father men; mean age 27.1 ± 5.6). Participants received feedback on cause during training, then classified new cries (3-AFC). Stimuli: cries from 1.5-month-old male infants only (to avoid sex confounds), selecting infants with ≥36 cries per context ≥0.7 s; 7 male infants used, total 2430 cries (mean duration 1.34 ± 0.51 s). Analyses: Bayesian mixed models of trial-wise accuracy with fixed effects of cause, participant sex, parental status, exposure count within session, and random effects of participant and infant.
Key Findings
Sex: No consistent acoustic differences between boys’ and girls’ cries at any age or across ages (multivariate Bayesian mixed models; UMAP showed overlapping distributions).
Age: Cries became more tonal and less shrill over the first 4 months. Per month effects (z-units): HNR +0.17 SD [0.05, 0.29], entropy −0.42 SD [−0.55, −0.29], voicing +0.24 SD [0.12, 0.36], spectral centroid −0.19 SD [−0.34, −0.05], median pitch +0.11 SD [0.04, 0.17]. Age-group classification from a single cry achieved ~40% accuracy [35, 43], Odds Ratio (OR) ≈ 2.0 to chance.
Identity: Random Forest classified infant identity among 24 infants with ~28% accuracy [23, 31] from single cries across ages (OR ≈ 8.7 [6.8, 10.2]), with median pitch the top predictor; all features contributed. Identity signatures were relatively stable but drifted with age: cross-age training/testing showed decreasing accuracy with greater age gaps.
Cause of crying: Across all infants and ages, Random Forest achieved ~36% accuracy (OR ≈ 1.1 [1.0, 1.2]) for hunger vs isolation vs discomfort, near chance (33.3%). Using parental perceived cause vs action that stopped crying produced similar results (35% [33, 36] vs 36% [33, 38]). Including age, or training/testing within the same age group, did not raise accuracy (31–37%, OR 1.0–1.2). Single-infant models modestly improved median accuracy to 38% [17, 65] when trained and tested within the same infant (best OR ≈ 2.2 [0.9, 4.8]), with no generalization to other infants (34% [30, 54], OR 1.0 [0.9, 1.2]). Pairwise infant distance matrices by context were uncorrelated (max r ≈ .06), indicating no shared sub-group coding strategies.
Human listeners: Experiment 1 (implicit training): overall accuracy 34.8% [29.4, 40.2], no advantage for parents vs non-parents (−0.9% [−4.4, 2.7]) and minimal sex effect (women +3.6% [−0.2, 7.3]). Experiment 2 (explicit training with feedback): overall accuracy 35.4% [31.7, 39.1], no parental status effect (+1.0% [−2.5, 4.6]), slight female advantage (+4.0% [0.3, 7.6]), likely related to higher reported infant-care experience among nonparent women; exposure count had negligible effect (OR 1.01 [1.00, 1.02]).
Discussion
The findings indicate that infant cries do not encode sex but do convey age-related changes and robust individual identity signatures that drift predictably during early development. Despite substantial data and advanced modeling, cries produced during hunger, isolation, and discomfort lack consistent, universal acoustic differences, supporting the graded distress hypothesis rather than discrete context-specific cry types. The stable yet dynamically evolving identity signatures may facilitate caregiver recognition and tracking of an infant’s state against a familiar acoustic baseline, potentially serving adaptive functions in caregiver–infant communication. The inability of both machine learning models and trained adult listeners to infer cause from brief cries underscores that contextual or multimodal cues (visual, temporal routines, caregiving history) are likely critical for real-world cause inference, with pain possibly an exception (too rare here to test).
Conclusion
Human infant cries provide reliable indexical information about the caller’s identity and developmental stage but do not form a universal discrete code for common causes such as hunger, isolation, or discomfort. Identity signatures are strong and predictably shift with age, while age-related acoustic changes show increasing tonality and reduced spectral roughness. Adults, including parents, cannot reliably decode cry cause from short cry segments even after brief training. Future work should examine longer cry bouts and temporal dynamics, include older ages and rarer causes (e.g., pain), leverage multimodal and contextual information, and test whether extended training or richer acoustic descriptors (e.g., pitch contours) yield improved cause decoding.
Limitations
Cause labels were inferred from parental reports and the action that stopped crying, which may include biases despite a 75% match rate between perceived cause and effective action. Rare causes (especially pain) were underrepresented (0.2%) and excluded, limiting generalization to high-distress contexts. Analysis focused on short, noise-free cry segments extracted from longer bouts; temporal organization and prosodic dynamics over longer sequences were not assessed. The number of infants recorded at 3.5 months was smaller due to reduced crying frequency with age, potentially reducing power at that age. Acoustic amplitude was not analyzed due to variable microphone distance. Listener experiments used cries from 1.5-month-old infants only and involved brief training; performance with older infants, longer exposure, or longer cry sequences may differ.
Related Publications
Explore these studies to deepen your understanding of the subject.

