logo
ResearchBunny Logo
Fundamental limits to learning closed-form mathematical models from data

Mathematics

Fundamental limits to learning closed-form mathematical models from data

O. Fajardo-fontiveros, I. Reichardt, et al.

This research by Oscar Fajardo-Fontiveros, Ignasi Reichardt, Harry R. De Los Ríos, Jordi Duch, Marta Sales-Pardo, and Roger Guimerà uncovers groundbreaking insights into the challenges of learning mathematical models from noisy data. Discover the pivotal phase transition that determines whether models can be learned effectively or not, along with the innovative use of probabilistic model selection.

00:00
00:00
~3 min • Beginner • English
Introduction
The study asks under what conditions the true closed-form generating model can be identified from noisy, finite data when the model structure is unknown. Motivated by the long tradition of inductively deriving interpretable mathematical laws and recent advances in symbolic regression and model discovery, the authors consider datasets generated by closed-form models with additive Gaussian noise and relatively low-dimensional feature spaces. The focus is on learning the model structure rather than parameter values. They formulate a probabilistic model selection framework to determine when the true model is recoverable and to assess generalization to unseen data, discovering a learnability transition as observation noise varies.
Literature Review
The paper situates its contribution within recent methods that automatically uncover closed-form models from data (symbolic regression/model discovery) and their applications in physics (quantum systems, nonlinear/chaotic dynamics, fluid mechanics, astrophysics). It contrasts this with standard machine learning, including neural networks, which may struggle in low-noise regimes due to interpolation limits. From a statistical physics perspective, prior work has largely focused on parameter learning, graphical and network models, and phase transitions in inference problems. Rigorous probabilistic model selection and MDL/BIC principles provide a foundation, but the structural learning of closed-form models and associated transitions has received less attention.
Methodology
The problem is posed probabilistically via the posterior over model structures p(m|D) obtained by marginalizing parameters: p(m|D) ∝ ∫ dθ p(D|m,θ) p(θ|m) p(m). This posterior is expressed as a Boltzmann distribution with energy H(m) = −ln p(D|m) ≈ B(m)/2 − ln p(m) using a Laplace (BIC) approximation, yielding an MDL interpretation. Sampling of p(m|D) is performed with a Metropolis Markov chain using the Bayesian machine scientist, which enumerates/samples closed-form expressions under a prior over models and parameters. The MDL (MAP) model among sampled candidates is selected. Synthetic datasets are generated from known closed-form models m′ with additive Gaussian noise ε ~ N(0,σ²), with inputs x sampled uniformly in [−2,2] (for each dimension), and various dataset sizes N ∈ {25, 50, 100, 200, 400} and noise levels σ. For each dataset, models are sampled from p(m|D), the MDL model is selected, and predictions are evaluated on independent noisy test sets. Learnability is assessed by comparing description length gaps ΔH = H(m) − H(m′) across sampled models; the true model is deemed learnable if it is the MDL model (no sampled model has lower description length). The phenomenology of the learnability transition is further analyzed by contrasting the true model with a trivial a priori most plausible model m² (e.g., constant function), deriving description lengths for both under the BIC approximation and an upper bound for the critical noise at which H(m²) = H(m′) on average. Scaling analyses collapse learnability curves when noise is scaled by the estimated critical value.
Key Findings
- Probabilistic model selection (MDL via posterior/BIC) yields quasi-optimal generalization to unseen data, attaining the irreducible error σ except at small N and intermediate noise. In the low-noise limit, MDL recovers the true generating model and interpolates perfectly; in the high-noise limit, prediction error is dominated by observation noise for all models. Standard machine learning baselines (e.g., artificial neural networks) are suboptimal at low noise due to interpolation limitations, despite good performance at high noise. - A learnability transition exists: at low observation noise, the true model is typically the MDL model (learnable); beyond a critical noise, other models with shorter description lengths dominate, making the true model unlearnable by any method. The region where prediction error deviates from σ coincides with the transition, indicating a hard phase for generalization. For one example, negative description length gaps for alternatives begin to appear around s_e ≥ 0.6 for some datasets. - Three regimes are identified: (i) learnable phase with optimal predictions; (ii) transition (hard) regime with suboptimal average predictions and partial learnability; (iii) unlearnable phase with optimal predictions dominated by noise but inability to identify the true model. - An upper bound for the transition noise is derived by comparing the true model m′ against the trivial a priori most plausible model m². Under the BIC approximation, H(m′) = ½[ln(2πσ²)+1] + k + ½ ln N − ln p(m′), and H(m²) = ½[ln(2π(σ² + δ_p²))+1] + ln N − ln p(m²), where k is the number of parameters and δ_p² is the reducible error variance of the trivial model over the observation interval. Setting H(m²) = H(m′) yields an estimate of the critical noise s_c (Eq. 8) and, for large N and O(1) description length differences, the approximation s_c ≈ √(2(k−1) ln N · δ_p²) up to constants (as in Eq. 9). The bound closely matches the empirical transition and the peak of scaled RMSE. - Description length of the MDL model matches that of the true model below the transition and the trivial model above it. Around the transition and especially for smaller N, the observed MDL can be lower than both H(m′) and H(m²), indicating the relevance of multiple competing models (a Rashomon set) in the hard phase. - Learnability curves collapse when plotted against scaled noise s/s_c, suggesting potential universality of the transition. The transition sharpens with increasing N as fluctuations diminish, and s_c grows with N, implying consistency: for any finite noise, sufficiently large N ensures learnability under the BIC approximation.
Discussion
The work bridges statistical learning theory and statistical physics by casting model structure learning as sampling over a discrete configuration space with an energy-like description length. It demonstrates a phase transition in learnability analogous to hard phases seen in satisfiability and Bayesian inference problems. The MDL/BIC framework provides consistent selection and near-optimal generalization in both low- and high-noise limits, while the transition regime is characterized by multiple near-optimal models and degraded generalization. The derived upper bound for the transition noise, based on competition between the true and trivial models, accurately approximates the empirical transition and enables scaling collapse, hinting at universal behavior. The results suggest rich phenomenology when considering interactions between model-structure and parameter-learning transitions, and they offer a principled way to reason about data requirements (N) versus noise to recover interpretable closed-form models.
Conclusion
The paper formalizes the problem of discovering closed-form models from data within a probabilistic MDL/BIC framework, showing that: (i) MDL-based model selection generalizes quasi-optimally; (ii) a learnability transition separates regimes where the true model is identifiable from those where it is not; and (iii) an analytical upper bound accurately estimates the critical noise, enabling scaling collapse across models and dataset sizes. These findings clarify fundamental limits to learning interpretable models under noise and finite data. Future directions include: tightening bounds and characterizing the full landscape of competing models in the transition; studying interactions between structure- and parameter-learning transitions (including noiseless yet unlearnable parameter regimes); extending beyond additive Gaussian noise and to higher-dimensional features; and assessing the expressivity and generalization of closed-form models when the data-generating process is not closed form (e.g., complex PDE solutions).
Limitations
The approach relies on additive Gaussian noise and a BIC (Laplace) approximation, which assumes well-peaked likelihoods and smooth priors; deviations from these assumptions may affect performance. The prior over model structures influences plausibility and the identity of the trivial model; different priors could shift transition estimates. The derived critical noise is an upper bound based on competition with a single trivial model; other models can dominate earlier, especially in the transition region where a Rashomon set emerges. Results are demonstrated on synthetic data with relatively low-dimensional inputs and known priors; real-world generative processes or non-closed-form dynamics may not conform. Sampling with the Bayesian machine scientist approximates the posterior and may miss rare but important models for small N.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny