logo
ResearchBunny Logo
Predicting 3D soft tissue dynamics from 2D imaging using physics informed neural networks

Engineering and Technology

Predicting 3D soft tissue dynamics from 2D imaging using physics informed neural networks

M. Movahhedi, X. Liu, et al.

Discover how a hybrid physics-informed neural network algorithm can revolutionize the understanding of 3D flow-induced tissue dynamics from sparse 2D images, thanks to the innovative research by Mohammadreza Movahhedi and colleagues.

00:00
00:00
~3 min • Beginner • English
Introduction
The study addresses the challenge of reconstructing high-fidelity, real-time 3D tissue dynamics from limited 2D imaging, a problem critical to diagnosis and treatment across organ systems. Vocal-fold vibration during phonation exemplifies the difficulty: clinical endoscopy typically provides only a 2D top view without vertical motion, yet vertical kinematics are important for phonation. Existing 3D imaging strategies (e.g., marker tracking, ultrasound, laser-based tracking, stereo endoscopy, optical coherence tomography) face trade-offs in temporal/spatial resolution and practicality, limiting real-time 3D measurements in vivo. Physics-informed neural networks (PINNs) offer promise for reconstructing 3D fields from sparse data by embedding physical laws into training; however, traditional PINNs struggle with 3D flow–structure interaction (FSI) due to computational scaling, non-smooth FSI interfaces, large-deformation soft-tissue dynamics, lack of explicit temporal modeling, and the absence of direct correspondence between 2D projections and 3D fields. This work proposes a hybrid PINN–differentiable programming algorithm that integrates a recurrent soft-tissue model with a differentiable fluid solver and projects solid dynamics onto an eigenmode basis, enabling efficient, scalable, and temporally coherent inference of 3D tissue dynamics from 2D profiles. Validation is performed on synthetic canine larynx simulations and in vitro pigeon syrinx experiments, with evaluation focused on 3D kinematics and derived aerodynamic/acoustic quantities.
Literature Review
The paper situates its contribution among prior efforts in measuring and modeling vocal-fold and related tissue dynamics. Imaging techniques for 3D vocal-fold motion include high-speed marker tracking, high-frame-rate ultrasound, laser-based point-wise 3D measurements, high-speed stereo endoscopy, and OCT, but each faces limitations in temporal/spatial resolution and robustness across wide phonatory frequencies. PINNs have been applied across domains (aerodynamics, biomechanics, chemical systems, heat transfer) with successes mainly in 2D or steady 3D cases and improved scalability via discrete PINN schemes combining numerical methods with deep learning. Challenges remain for 3D FSI due to large problem sizes, nonlinearity and non-smoothness at interfaces, and poor convergence on complex temporal dynamics with standard MLP-based PINNs. Prior data-driven laryngeal methods can reconstruct vibratory parameters from endoscopy but do not infer unmeasurable physical quantities. Some data assimilation approaches using simplified vocal-fold models (lumped element or 2D FEA) have attempted to estimate physical quantities, but they lack full 3D realism. The present work advances the state-of-the-art by integrating reduced-order modal dynamics, recurrent sequence modeling, and differentiable fluid solvers to infer full 3D physical fields from sparse 2D data.
Methodology
Overview: The algorithm integrates a recurrent neural network (LSTM encoder–decoder + FCNN) modeling 3D soft-tissue modal dynamics with a fully differentiable 1D fluid solver and a differentiable projection operator to match predicted 3D shapes to observed 2D profiles. Physics constraints are enforced via equation residuals in modal space, and data consistency is enforced via projection-based 2D profile loss. Training minimizes a weighted sum of equation and data losses with end-to-end differentiability. Solid dynamics and modal reduction: The tissue is modeled as a damped continuum using the semi-discrete finite element form [M]ü + [C]u̇ + [K]u = F(t), with Rayleigh damping [C] = α[M] + β[K]. The displacement field is expanded in eigenmodes u(t) = Σ_j b_j(t) U_j, truncated to a finite number of dominant modes (typically 10–100). Projecting onto the eigenbasis yields decoupled modal ODEs: b̈_j + (α + β ω_j^2) ḃ_j + ω_j^2 b_j = U_j^T F(t). Given material properties, eigenfrequencies ω_j, and modes U_j (computed numerically via ARPACK shift-invert), solving for modal coefficients b_j(t) reconstructs 3D shapes X(t) = X_0 + u(t). Differentiable flow solver: For phonation, a modified 1D Bernoulli model computes intraglottal pressure as P(y) = P_sub − 0.5 ρ (Q/A(y))^2, with flow separation at the minimum area and downstream pressure set to ambient. The flow rate Q = sqrt(2 P_sub A_min / ρ). The solver is implemented in PyTorch for automatic differentiation, enabling gradients to pass through fluid computations to the network during sequence-to-sequence training, thereby reducing error accumulation and improving stability. Contact model: A symmetric penalty contact force at the midline models tissue collision during closure with lateral contact pressure P_c = k_d dx (1 + k_a dx^2), where dx is penetration distance and k_d, k_a are contact coefficients. Discrete PINN architecture: Inputs are sequential 2D profiles extracted from high-speed images. An LSTM encoder compresses the sequence into a hidden vector; an LSTM decoder, conditioned on this state and receiving 2D profiles stepwise, outputs hidden vectors passed to a FCNN to predict time histories of modal coefficients b_j(t). Reconstructed 3D shapes are differentiably projected to 2D profiles to compute data loss L_d (profile mismatch). The 3D shapes also provide A(y) and dx for computing fluid and contact pressures and thus F(t). The equation loss L_e is the sum of residuals of the modal ODEs over all modes. Total loss L = W_e L_e + W_d L_d, with all subroutines differentiable. Hyperparameters: LSTM with 1 hidden layer of 128 features. FCNN is a 4-layer MLP with residual connections and layer norms, 128 neurons per layer, ReLU activations; the output layer predicts N_j modal coefficients. Optimizer: Adam with ReduceLROnPlateau; initial LR 1e-2, minimum LR 5e-5. Loss weights: W_e = 1e-1, W_d = 1e-5. Parameters were chosen to balance loss magnitudes; no exhaustive tuning performed. Datasets: (1) Synthetic canine: Left vocal fold modeled assuming left-right symmetry, 1D Bernoulli flow coupled with 3D Navier equation. Mesh: 20,643 4-node tetrahedra. Two-layer VF (cover, body) with transversely isotropic materials (properties in Supplementary Table 1). Rayleigh damping α = 60.0 s, β = 6.0×10^-5 s^-1. Flow channel discretized into 100 horizontal sections. P_sub = 1.0 kPa, ρ = 1.1 kg/m^3. Simulation duration 200 ms. 20 time-labeled 2D glottal shapes per cycle (sampling ~2.5 kHz) used for training. Lowest 100 eigenmodes provided to compute equation loss. (2) Experimental pigeon syrinx: Four excised rock pigeon syringes with anatomy from DiceCT scans to build 3D models (pair of LVMs and surrounding cartilages). High-speed videos provided frontal-view 2D LVM profiles (manually annotated). Simultaneous acoustic measurements available. Network predicts first 50 vibration modes. Other parameters (P_sub, ρ, number of flow sections, epochs) matched canine case. Training performed on a single NVIDIA A100 GPU (~7 hours per subject); inference <1 s. Acoustic analysis: Using linear source–filter theory with a monopole source p = (ρ_air/(4π r)) dQ/dt. Acoustic pressure resampled to 48 kHz and low-pass filtered at 20 kHz. SPL at 1 m computed as SPL = 20 log10(p/P_ref) + TL, with P_ref = 2×10^-5 Pa and TL = 20 log10(d) for d = 12 cm. Acoustic power P = A I with A = 4π a^2.
Key Findings
Synthetic canine validation: Training converged after ~6×10^4 epochs for both data and equation losses. 3D shape reconstruction error (L2 norm of displacement difference normalized by ground truth) over one cycle ranged 2.0–5.1%, mean 3.8% with SD 0.97%. Sensitivity to modal truncation: mean error decreased from 7.3% (20 modes) to 3.8% (100 modes), with rapid improvement as modes increased. Predicted 3D shapes and vertical velocity contours closely matched ground truth. Maximum lateral and vertical medial-surface displacements: ground truth 2.57 mm and 1.75 mm; PINN 2.53 mm and 1.68 mm; errors −1.6 ± 2.9% and −3.6 ± 3.7%, respectively. Maximum vertical velocity: ground truth 1.03 m/s; PINN 0.95 m/s; error −7.7 ± 7.6%. Aerodynamics and acoustics (synthetic): Glottal flow rate waveform accurately reproduced, including opening/closing quotients and peak flow. Time-mean flow rate error 1.7% with 2.4% SD. Intraglottal pressure along streamline accurately predicted; time-mean error of mean intraglottal pressure 2.1% with 1.6% SD. Key quantities: peak flow rate error 1.35%; mean flow rate error −4.72%; mean intraglottal pressure error 2.10 ± 1.6%; SPL error 0.35%; acoustic power error −0.39%. Experimental pigeon syrinx cross-validation: Network converged (example: data loss at ~27k epochs; equation loss at ~70k epochs). Direct 3D kinematics not available; validated acoustics: across four syringes, mean differences between PINN and experiment were 1.6% (SPL) and 1.1% (acoustic power). Experimental SDs: ±1.4 dB (both SPL and power); PINN SDs: ±4.3 dB (both). Predicted 3D LVM dynamics showed strong inferior–superior wave propagation with inferior aspect leading and notable longitudinal motion (approximately half-wavelength mode) with phase differences during closing. The PINN provided 3D LVM shapes, lateral velocity fields, syringeal opening area evolution, and flow rate waveforms. Computational performance: Training ~7 hours on a single NVIDIA A100 GPU; inference <1 second per query. Compared to traditional forward FSI simulations (~2 CPU-hours per run), the trained model offers significant advantages for many-query scenarios.
Discussion
The research question was whether accurate 3D soft-tissue dynamics and derived physical quantities can be inferred from sparse 2D imaging by embedding physics into deep learning. The hybrid PINN–differentiable programming approach addressed key obstacles for 3D FSI inference: (i) dimensionality and scalability, via projection of solid dynamics onto a reduced eigenmode space; (ii) nonlinearity and interface non-smoothness, via physics-based losses and a modal formulation; (iii) temporal coherence and stiffness, via a recurrent encoder–decoder discrete PINN architecture that explicitly models time dependencies and stabilizes training; and (iv) lack of direct 2D–3D correspondence, via differentiable projection operators enabling data loss computation on 2D profiles. Validation on synthetic canine data demonstrated low 3D reconstruction errors (~3.8% mean) and high-fidelity prediction of aerodynamics and acoustics (all within ~5%). Cross-validation on experimental pigeon syrinx showed that acoustic outputs (SPL, acoustic power), which are sensitive to underlying dynamics, agreed within ~1–2% on mean values, implying correct reconstruction of dominant dynamics. The approach can infer otherwise unmeasurable quantities (e.g., intraglottal pressure distribution, glottal flow rate, tissue stresses, contact areas) due to embedded physics, expanding diagnostic metrics beyond kinematics. Computationally, once trained, the model offers rapid many-query predictions compared to traditional FSI solvers, making it attractive for inverse modeling and uncertainty quantification. The methodology is generalizable to other 3D FSI problems (e.g., cardiovascular and valve dynamics) and can be extended with richer physics and multimodal data.
Conclusion
This work introduces a hybrid physics-informed, differentiable learning framework that reconstructs high-resolution 3D soft-tissue dynamics and associated aerodynamics/acoustics from sparse 2D imaging. By combining modal reduction of solid mechanics, a differentiable 1D fluid solver, recurrent sequence modeling, and differentiable projection, the method achieves accurate, temporally coherent reconstructions and inference of additional physical quantities. Synthetic validation in a canine larynx yielded ~3.8% mean 3D displacement error and <5% errors in key aerodynamic/acoustic metrics; experimental pigeon syrinx cross-validation achieved mean differences of ~1–2% for SPL and acoustic power. The approach enables fast many-query predictions post-training and is applicable to a broad class of 3D FSI problems. Future work includes: inferring subject-specific material properties by integrating eigenmode computation into training; replacing/augmenting the 1D flow model with differentiable Navier–Stokes solvers; integrating multimodal inputs (e.g., acoustics) to improve accuracy and reduce reliance on dense 2D profiles; exploring advanced sequence models (e.g., Transformers) and PINN-based denoising for noisy inputs; and conducting rigorous validations on human laryngeal datasets with direct 3D measurements.
Limitations
- Dependence on known material properties: Current implementation requires a priori tissue material properties to compute eigenmodes; in vivo properties are typically unknown. Future integration of eigenmode computation into training could enable material parameter inference. - Simplified flow model: A 1D Bernoulli-based flow model may be insufficient for applications where 3D vortex dynamics are critical (e.g., heart valves). Incorporating differentiable Navier–Stokes solvers with trainable components would improve generality and accuracy. - Input modality constraints: Present training uses only 2D profile sequences; performance may degrade with sparse/noisy segmentations. Incorporating synchronized multimodal data (e.g., acoustics) and PINN-based denoising could enhance robustness and accuracy. - Network architecture: While recurrent modeling improves convergence for stiff ODEs, exploring alternative Seq2Seq architectures (e.g., Transformers with temporal attention) may further enhance performance. - Validation scope: Direct 3D validation was performed only on synthetic canine data; experimental validation relied on acoustics in pigeon syrinx. More comprehensive validations on human laryngeal datasets with high-speed 3D measurements are needed for clinical translation.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny