Mathematics

Practical parameter identifiability and handling of censored data with Bayesian inference in mathematical tumour models

J. Porthiyas, D. Nussey, et al.

This paper, conducted by Jamie Porthiyas, Daniel Nussey, Catherine A. A. Beauchemin, Donald C. Warren, Christian Quirouette, and Kathleen P. Wilkie, uncovers the pivotal role of decision-making in parameter estimation from experimental tumor growth data, proposing a framework that handles censored data to enhance analysis accuracy.

00:00

~3 min • Beginner • English

Index

Introduction

The study investigates how methodological choices in parameter estimation affect inference and prediction in mechanistic tumour growth models. It focuses on practical identifiability when fitting ordinary differential equation-based models to noisy experimental data, and examines how discarding censored measurements (below lower or above upper limits of detection) and the choice of priors in Bayesian inference influence parameter posteriors and model predictions. The context emphasizes the importance of accurately estimating parameters such as initial tumour volume, growth rate, and carrying capacity to make reliable patient-specific and population-level predictions, and to discriminate between competing biological hypotheses. The work underscores that inter-mouse variability in tumour volumes is consistent with log-normal error structure, and that model complexity often outpaces data informativeness, necessitating careful treatment of priors and censored observations.

Literature Review

The paper situates its work within prior research on model identifiability and tumour growth laws (e.g., logistic, Gompertz, generalized logistic/Richards), highlighting the distinction between structural and practical identifiability and strategies to improve identifiability (more data, broader ranges, varied inputs). It references evidence that tumour volume measurement errors are log-normally distributed (Benzekry et al.), prior uses of profile likelihood to assess identifiability (Raue et al.; Simpson et al.), and discussions on model selection and complexity (Gerlee). It also connects to applications in virtual clinical trials and treatment response modelling, where posterior ensembles represent inter-patient heterogeneity. The literature indicates that fixing parameters like initial volume can bias model selection and predictions, and that choice of priors profoundly impacts Bayesian posteriors when data are limited.

Methodology

- Models: Five ODE-based tumour growth models with increasing complexity were considered: Exponential (Exp: dC/dt=μC; parameters μ, C0), Exponential with cap (ExpCap: solution C(t)=min(C0 e^{μt}, κ); parameters κ, μ, C0), Gompertz (Gomp: dC/dt=−μ C ln(C/κ); parameters κ, μ, C0), Logistic (Logis: dC/dt=μ C(1−C/κ); parameters κ, μ, C0), and a generalized logistic (Richards/Rich: dC/dt= μ/min(α,1) · C · [1−(C/κ)^{max(1/α,1)}]; parameters κ, μ, α, C0). The Rich model recovers Logis (α=1), Gomp (α→0), and ExpCap (α→∞) limits. - Data: Control group tumour volume time series from Benzekry et al. (10 mice; caliper volumes; measurement times 5–22 dpi; upper limit of detection ULD=1500 mm^3; early lower unmeasurability in some mice; euthanasia at ULD leads to right-censoring). - Likelihood: Residuals computed on log10(volume) due to approximately log-normal inter-mouse variability. Baseline likelihood L_measured uses exp(−SSR/(2σ^2)) with SSR the sum of squared residuals between log10 model predictions and measurements across all mice and times; fixed σ=0.16 (estimated as average SD of log10 volumes across times). - Censored data handling: Revised likelihood multiplies L_measured by L_unmeasured to include contributions from unmeasurable observations (below LLD or above ULD). For each time t_k, with U_k unmeasurable mice, L_unmeasured includes [P(volume<LLD)+P(volume≥ULD)]^{U_k} based on the normal CDF in log10-space (error function). ULD fixed at 1500 mm^3; LLD treated as an additional parameter constrained to (0,19] mm^3. - Priors: Four posterior scenarios compared: likelihood from measured-only vs measured+unmeasured, combined with either linear-uniform or log-uniform priors. Log-uniform prior proportional to 1/(C0 μ κ α LLD) where applicable. Parameter bounds: C0∈(10^−3,10^6] mm^3 (effectively unconstrained above by data), μ∈(10^−5,10^5) d^−1, κ∈[0,10^6] mm^3, α∈(10^−5,10^5), LLD∈(0,19] mm^3; Prior set to zero outside bounds. Discussion notes artificial upper bounds (e.g., κ) used to facilitate MCMC convergence. - Inference: Markov chain Monte Carlo using emcee (affine-invariant sampler; stretch scale 2) via phymcmc wrapper. Initial chain positions log-normally perturbed around a steepest-descent SSR-minimized set. Posterior estimation with 300 chains × 10,000 steps (≥10,000 burn-in), yielding 3,000,000 accepted samples; at least ~1% independent per autocorrelation times. - Profile likelihoods: For each parameter, profile log-likelihood computed by fixing that parameter over a grid and maximizing likelihood over others via MCMC (40 chains×10,000 steps; picking the best sample). Used to assess practical identifiability and relationships among parameters. - Sensitivity analysis: Local sensitivity of log10 C to relative parameter perturbations computed via central differences about the MLE, to assess which times inform which parameters (C0 early, μ mid, κ late). - Reporting: Compared maximum likelihood estimate (MLE) vs maximum a posteriori (MAP), and marginal posterior distributions (MPDs) with highest-density 95% credible intervals (single contiguous interval).

Key Findings

- Excluding censored data biases inference: Omitting unmeasurable volumes leads to overestimation of model-predicted tumour volumes at early times and underestimation at late times, causing overestimation of initial tumour volume C0 and underestimation of carrying capacity κ. Including censored data reverses these biases: predicted curves are lower early and higher late; best-fit C0 decreases and κ increases relative to analyses excluding censored data. - Estimated LLD and model preference: When accounting for censored data, the most likely lower limit of detection is ~18.9 mm^3 (near the upper bound allowed). Inclusion of censored data favours Gomp-like behaviour (α→0 in Rich model) over Logis/ExpCap-like (α≈1.4) when censored data are excluded. - Parameter shifts across models: Without censored data, MLE C0 exceeded the nominal 1 mm^3 (10^6 cells) across models (~5 mm^3 Gomp; ~10 mm^3 Logistic/Rich; up to ~20 mm^3 Exp). With censored data included, MLE C0 values moved closer to biological expectations (~1.5 mm^3 Gomp/Rich; ~4.5 mm^3 Logistic/ExpCap; ~8 mm^3 Exp) and κ estimates increased. - Practical identifiability varies by model: The Exp model (2 parameters) exhibits narrow posteriors (better constrained), whereas the Rich model (4 parameters) shows wide posteriors and limited sensitivity to α outside a moderate range (10^−1–10^1), indicating weak identifiability of added complexity without richer data. - Prior choice materially impacts posteriors: Linear-uniform vs log-uniform priors can shift MAP and MPD modes and can induce multimodality in marginal posteriors (e.g., κ in ExpCap under linear prior), especially when data do not tightly constrain parameters. Reporting only MAP and 95% CI can be misleading for multi-modal MPDs. - Time-point informativeness: Sensitivity analysis confirms early times inform C0, mid-times inform μ, and late times inform κ, consistent with intuition and profile likelihoods. - Methodological framework: A Bayesian approach with log-residual likelihood and explicit censored-data likelihood yields more physically consistent parameter estimates and predictive distributions; ensemble posteriors can represent virtual cohorts.

Discussion

The findings show that methodological decisions—specifically handling of censored observations and prior specification—directly affect practical identifiability and biological interpretation of tumour growth parameters. Including censored data corrects biases introduced by selecting only measurable extremes (largest early, smallest late), leading to more plausible estimates of initial volume and carrying capacity and altering which model behaviours are supported by the data (Gompertz-like dynamics favored when censored data are included). When data are insufficient to constrain higher-dimensional models, prior choice substantially shapes the posterior; linear-uniform priors can overweight large-scale regions in log-space, producing multimodal marginals and shifting MAPs. Consequently, summaries limited to point estimates and contiguous 95% CIs can obscure multi-modality and uncertainty structure. The results reinforce that identifiability depends both on model structure and data coverage, and that presenting full posterior information and profile likelihoods improves transparency. The framework better constrains predictions beyond observed times by using all information in the dataset, including censored points, and by aligning likelihood with the log-normal error structure of tumour volume measurements.

Conclusion

This work introduces a reusable Bayesian framework for parameter estimation in tumour growth models that (1) uses a likelihood consistent with log-normal measurement variability, (2) incorporates censored observations via an explicit likelihood term, and (3) systematically evaluates the impact of prior choices. Applying the framework across five models highlights that excluding censored data biases C0 and κ and can mislead model preference and extrapolations, while prior selection can materially affect posteriors when data are limited. The authors recommend including censored data in likelihoods, favoring physically justified (often log-uniform) priors with well-supported bounds, and reporting full posterior distributions (not just MAP and 95% CI). Future research should design experiments to better inform early and late growth regimes (improving constraints on C0 and κ), refine prior choices and parameterizations (e.g., reparameterizing α to avoid improper posteriors), and perform robust model selection once proper priors and bounds are established.

Limitations

- Data limitations: The dataset lacks very early and very large tumour measurements, precisely where models diverge and parameters C0 and κ are most informed, limiting practical identifiability and model selection. - Prior bounds and improper posteriors: Artificial upper bounds (e.g., κ≤10^6 mm^3) were imposed to ensure MCMC convergence, which can yield improper true posteriors and affect correlated parameters’ MPDs and MAPs. - Weak identifiability in complex models: Parameters such as α in the Rich model have limited influence outside moderate ranges, leading to broad or multi-modal posteriors. - Reporting CIs: Highest-density 95% CIs were constrained to contiguous intervals; for multi-modal MPDs this can misrepresent uncertainty. - Approximate profile likelihoods: Profile likelihoods were computed via MCMC without stringent convergence diagnostics for that step; values are approximate though smoothness suggests adequacy. - Simplified LLD modelling: A fixed LLD was estimated rather than a probabilistic detectability function over volume, due to limited data. - No formal model selection: Arbitrary bounds and potential improper posteriors preclude reliable use of information criteria for model selection in this analysis.

Related Publications

Explore these studies to deepen your understanding of the subject.

Health and Fitness

Early childhood adversity and body mass index in childhood and adolescence: linking registry data on adversities with school health records of 53,401 children from Copenhagen

L. K. Elsenburg, A. Rieckmann, et al.

Medicine and Health

Population Pharmacokinetic and Exposure–Response Analysis of Finerenone: Insights Based on Phase IIb Data and Simulations to Support Dose Selection for Pivotal Trials in Type 2 Diabetes with Chronic Kidney Disease

N. Snelder, R. Heinig, et al.

Business

Measuring the impact of enterprise risk management on performance, value, and risk indicators of Borsa Istanbul XBANK companies with data mining prediction models

M. Ç. Akbaş

Medicine and Health

The prognostic role of diet quality in patients with MAFLD and physical activity: data from NHANES

J. Huang, Y. Wu, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny