Biology
Uncovering developmental time and tempo using deep learning
N. Toulany, H. Morales-navarrete, et al.
Discover an innovative automated deep learning approach that uses Twin Networks for analyzing embryonic development, developed by Nikan Toulany and colleagues. This research not only facilitates accurate embryo staging but also quantifies temperature-dependent developmental tempo and uncovers developmental abnormalities, paving the way for creating staging atlases across various species. Dive into the future of embryogenesis analysis!
~3 min • Beginner • English
Introduction
Embryogenesis progresses through conserved stages (cleavage, blastula, gastrula, organogenesis, segmentation, larval), but morphological transitions are continuous and variable across embryos. Classical staging atlases, derived from manual microscopy, rely on idealized images and assumed correspondence between stage and absolute time, which is often violated by smooth transitions, overlapping phenotypes, imaging variability, embryo orientation, and noise. Environmental factors (for example, temperature) further decouple developmental stage from absolute time, complicating objective staging. Existing computer-driven methods, especially supervised machine learning, require extensive labeled datasets and predefined classes and are not easily generalizable. The study aims to develop an automated, unbiased approach that captures smooth morphological dynamics by computing similarities between embryo images over time to infer developmental age, tempo, variability, and stages without manual annotations.
Literature Review
Foundational developmental staging work spans multiple species (for example, zebrafish, medaka, chick, human, Drosophila, Hydra, mouse, C. elegans). Manual or semi-automated staging is labor-intensive and subject to subjectivity. Prior computational approaches include supervised phenotype classification and automated embryo staging, which need large annotated datasets and predefined classes, limiting detection of nuanced, time-dependent features. Temperature scaling of developmental rates has been studied using Arrhenius and van’t Hoff frameworks, with reports of species-specific ranges and deviations at thermal extremes. Recent machine learning applications in embryo phenotyping and tracking demonstrate feasibility but still rely on class labels or registration methods (for example, vector diffusion maps) with varying precision.
Methodology
Data acquisition and preprocessing: A high-throughput imaging pipeline generated >2 million raw images of zebrafish (each image with 1–30 embryos), yielding >3 million embryo segments after segmentation and quality control from >15,000 embryos across the first day of development. For temperature studies, zebrafish (23.5–35.5 °C) and medaka (18–36 °C) embryos were imaged every 2–5 min for ~24 h; datasets included 100–200 zebrafish embryos or 20–100 medaka embryos per temperature. Additional open-source datasets were used for drug perturbations and for medaka, stickleback, and C. elegans.
Segmentation: An SSD ResNet101 v1-FPN (pretrained on COCO) model (TensorFlow 2.2) was trained on 877 annotated images and evaluated on 36 test images (230 embryos), achieving a positive predictive value of 99% for embryo detection. Embryo segments were extracted and time-indexed.
Twin Network (Siamese) architecture: A ResNet50 backbone (ImageNet pretrained) with a custom head (three dense layers, batch normalization, 256-D embeddings) formed the embedding model. For transfer learning, all ResNet50 layers were frozen except conv block 5 and the head. Training used triplet loss with image triplets (anchor, positive, negative) to minimize anchor–positive and maximize anchor–negative embedding distances; margin α enforced separation. Two zebrafish models were trained (300k triplets ×10 epochs; 1M triplets ×2 epochs). Temperature models: zebrafish (1M triplets ×40 epochs), medaka (100k triplets ×70 epochs). Other species: medaka, stickleback, C. elegans trained with 150k/150k/100k triplets (30 epochs). Training used NVIDIA RTX 3070/3090 GPUs (zebrafish ~18 h; medaka ~12 h; stickleback ~10 h; C. elegans ~2 h).
Similarity computation: Embeddings for image pairs were compared using cosine similarity (ρ = a·b / (||a|| ||b||)). Similarity profiles were generated by comparing a test image against time-ordered reference sequences or against earlier frames of the same embryo (self-similarity).
Age and trajectory estimation: For each test image, similarities were computed against n images from ten randomly chosen reference embryos with known acquisition times; the predicted age is the time of maximum similarity. Trajectories were constructed by repeating this for all frames of a test embryo.
Temperature-dependent tempo: Developmental age vs experimental time was fitted via RANSAC linear models to estimate growth rate (slope). Arrhenius analysis: ln(g) vs 1/T linear fits yielded apparent activation energies (Ea); 99.99% confidence intervals were bootstrapped (100 samples).
Variability and abnormality detection: Among sibling embryos at the same nominal age, all-by-all similarities were computed at each timepoint; per-embryo mean similarity indices and z-scores (both instantaneous and cumulative) flagged deviating embryos. A normal range was defined from aphenotypic trajectories; embryos falling outside were considered abnormal.
Drug-induced phenotypes: Similarity of treated embryos (BMP, PCP, FGF, Shh, Nodal, Wnt inhibitors; RA exposure) to untreated references was computed over time; at each timepoint, one-sided Mann–Whitney U tests (P<0.01 threshold; no multiple-comparison correction) assessed significance. A group was labeled abnormal if ≥30% of frames differed significantly. Detection accuracy vs cohort size was evaluated by subsampling (3–44 embryos) repeatedly. BMP dose-response classes (C2–C5; bmp2b mutant) probed sensitivity.
Automatic epoch detection and staging atlases: Self-similarity matrices (cosine similarities between a frame and all earlier frames) revealed local high-similarity plateaus (autostages). Noise masking thresholds were derived from similarity histograms; boundaries in inverse diagonal sums were identified via peak finding (scipy.signal.find_peaks). This was applied to zebrafish, medaka, stickleback, and early C. elegans cleavage stages.
Image sorting and benchmarking: Images were ordered using combined z-scores from Euclidean distances and cosine similarities to improve temporal ordering; comparisons with vector diffusion maps used Kolmogorov–Smirnov and Wilcoxon signed-rank tests on absolute deviations from ground truth.
Key Findings
- The Twin Network (Siamese/ResNet50 with triplet loss) learns phenotypic embeddings that enable accurate automatic staging by comparing test images to time-ordered references; trajectories closely match ground truth over the first 24 h of zebrafish development.
- Large-scale dataset: >15,000 zebrafish embryos imaged, >2 million raw images, >3 million high-quality embryo segments after QC; segmentation positive predictive value ~99%.
- Temperature dependence of developmental tempo: For zebrafish (23.5–35.5 °C) and medaka (18–36 °C), developmental tempo slows at low temperatures and accelerates at higher temperatures relative to reference (28.5 °C zebrafish; 28.0 °C medaka). Approximately twofold change in tempo per 10 °C (Q10 ~2). Arrhenius analysis within species-specific ranges yielded apparent activation energies: zebrafish Ea ≈ 65 kJ mol−1; medaka Ea ≈ 77 kJ mol−1, consistent with poikilotherms and distinct from homeotherms.
- Deviations at thermal extremes: At high temperatures, rates plateaued (deviation from ideal Arrhenius). At low temperatures, zebrafish slowed linearly with lethality below ~23 °C; medaka exhibited nonlinear slowing and partial arrest, spending disproportionately long in blastula at the coldest conditions.
- Natural variability: Among 77 sibling embryos, early stages showed narrow distributions of predicted stages; variability increased after segmentation onset. Average similarities decreased after gastrulation while similarity distribution width increased stepwise.
- Abnormal development detection: ~1% of embryos in the dataset developed abnormally (for example, disintegration, dorsal–ventral defects). Abnormal embryos deviated early from the normal predicted-stage range and showed low average similarity values, enabling early detection within sibling batches.
- Drug-induced phenotypes: Compared to untreated embryos (n=44), embryos treated with BMP, PCP, FGF, Shh, Nodal, Wnt inhibitors, or RA exposure showed persistently lower similarities and significant differences over time (Mann–Whitney U, P<0.01). Detection accuracy increased with the number of embryos and the severity/penetrance; strongly dorsalized BMP phenotypes and bmp mutants needed few embryos, while milder phenotypes were detectable with ~30 embryos.
- Automated staging atlases and epochs: Self-similarity matrices revealed plateaus corresponding to classical developmental epochs (cleavage, blastula, gastrula, organogenesis/segmentation) and sharp transitions between them, enabling de novo atlas construction without prior labels. The approach generalized to medaka, stickleback, and early C. elegans cleavage cycles using open datasets.
- Image ordering: The Twin Network provided high-precision temporal ordering without a priori knowledge and compared favorably to vector diffusion maps.
Discussion
The study addresses the challenge of objective, fine-grained staging of embryogenesis by focusing on similarity computations rather than predefined classifications. By learning dynamic phenotypic embeddings, the Twin Network captures smooth transitions and overlapping morphologies, enabling accurate age estimation and trajectory reconstruction from minimal prior information. Temperature analyses validate classical physical biology predictions within species-specific Arrhenius ranges and quantify species differences (Ea) and distinct low-temperature behaviors, informing ecological and evolutionary interpretations. The method quantifies increasing phenotypic variability over time and robustly detects spontaneous and pharmacologically induced deviations without training on abnormal classes, supporting scalable unbiased screening. Self-similarity analyses reveal stereotypic alternations of stable epochs and rapid transitions, producing de novo staging atlases across diverse taxa and data qualities. Collectively, these findings standardize embryo staging, connect environmental factors to developmental tempo, and provide a general framework for analyzing processes that unfold over time, with computational efficiency suitable for high-throughput settings.
Conclusion
Twin Networks that compute image similarities provide an automated, unbiased, and generalizable framework to measure developmental time and tempo, detect deviations, and derive staging atlases de novo. The approach accurately stages embryos, quantifies temperature-dependent growth rates and apparent activation energies, detects natural and drug-induced phenotypic variability, and identifies developmental epochs across species using heterogeneous datasets. Future work could extend cross-domain robustness via fine-tuning or retraining for new species/imaging conditions, expand training through data augmentation or generative models, and integrate multi-omics to elucidate mechanisms underlying deviations and tempo control.
Limitations
Direct transfer of trained models to different species or imaging conditions is limited; fine-tuning or retraining is needed for new domains. At thermal extremes, deviations from Arrhenius behavior complicate simple model fits. The statistical comparisons for drug screens used per-timepoint tests without multiple-comparison correction. Detection sensitivity depends on cohort size and phenotype penetrance. Imaging quality, orientation, and dataset curation still influence performance.
Related Publications
Explore these studies to deepen your understanding of the subject.

