logo
ResearchBunny Logo
Efficient Meta-Learning via Error-based Context Pruning for Implicit Neural Representations

Computer Science

Efficient Meta-Learning via Error-based Context Pruning for Implicit Neural Representations

J. Tack, S. Kim, et al.

Introducing Error-based Context Pruning (ECoP), a groundbreaking meta-learning technique that optimizes large-scale implicit neural representations (INRs) by intelligently selecting context points based on predictive error. This innovative approach enhances reconstruction quality and facilitates learning in high-dimensional signals, showcasing remarkable improvements across various modalities. This research was conducted by Jihoon Tack, Subin Kim, Sihyun Yu, Jaeho Lee, Jinwoo Shin, and Jonathan Richard Schwarz.

00:00
00:00
~3 min • Beginner • English
Introduction
The paper addresses the scalability challenges of optimization-based meta-learning for implicit neural representations (INRs), especially for high-dimensional signals where the context set size and second-order meta-learning lead to prohibitive memory usage. The research question is whether one can reduce the context set used during adaptation without sacrificing reconstruction quality, thereby enabling longer adaptation horizons and learning on higher-resolution signals. The authors motivate the need by noting that context size grows super-linearly with signal dimensionality, and prior patch-based solutions increase adaptation time and ignore cross-patch statistics. The proposed solution, Error-based Context Pruning (ECoP), aims to maintain performance by adaptively selecting high-error context points online during adaptation while correcting for potential information loss and enabling effective use of full contexts at test time. This is important for making meta-learning practical on large images, videos, audio, and climate data.
Literature Review
The work builds on multiple strands of research: (1) Implicit Neural Representations (INRs) for images, videos, 3D scenes, and audio (e.g., SIREN, NeRF, ACORN), which provide continuous signal representations but are costly to fit. (2) Optimization-based meta-learning for INRs such as MAML-based Learnit and Transformer-based TransINR to accelerate fitting, though these approaches face memory scaling issues and often require second-order gradients. First-order alternatives (FOMAML, Reptile) have shown underperformance on INR tasks. (3) Memory/efficiency in meta-learning, including sparse/meta-learning methods and amortized approaches like Neural/Conditional Neural Processes and Prototypical Networks, which can be modality- or architecture-specific. (4) Data pruning and selection, especially EL2N scores that identify important data early in training, active learning, and continual learning replay; these inspire selecting high-loss coordinates. The authors position ECoP as an online, error-based context pruning scheme integrated with second-order optimization-based meta-learning for INRs, distinct from one-shot dataset pruning or patch-based approaches, and compatible across modalities and architectures.
Methodology
ECoP is an optimization-based meta-learning framework designed to reduce memory usage via online, error-based context pruning during inner-loop adaptation, complemented by bootstrapped correction and test-time gradient scaling. Problem setup: Given N signals, each represented by a context set C = {(x_j, y_j)} of M coordinate-value pairs, the goal is to learn a shared initialization θ0 for an INR f_θ that rapidly adapts to each signal. Standard MAML optimizes θ0 such that a few inner steps on C minimize reconstruction error (mean squared error), but this is memory-inefficient for large M and typically requires second-order gradients. Core components: - Error-based online context pruning: At each inner adaptation step k, compute a per-sample error score R_k(x, y) = ||f_{θ_k}(x) − y||^2. Form C_high^k by selecting the top γ fraction of samples from the full context set C_full according to R_k. Update parameters via θ_{k+1} = θ_k − α ∇_{θ_k} L_MSE(θ_k; C_high^k). This online re-ranking and pruning adapts the subset each step, focusing first on global structure and later on high-frequency details. - Bootstrapped correction: After K inner steps with pruned contexts yielding θ_K, continue adapting for L additional steps using the full context set C_full to produce a target θ_boot^{K+L}. Define a meta-objective that includes a parameter-distance regularizer μ(θ_K, θ_boot^{K+L}) (squared L2 distance), with stop-gradient on θ_boot^{K+L}, plus the reconstruction error of θ_K on C_full. The total meta-loss: L_total(θ0; C_full) = L_MSE(θ_K; C_full) + λ μ(θ_K, θ_boot^{K+L}). This reduces information loss due to pruning and extends the effective horizon beyond K without backpropagating through the extra L steps. - Meta-test with full context set and gradient scaling: Using the full context at test-time can cause smaller gradient norms than during training (where pruned sets concentrate high-loss points). To match effective step sizes, scale the test-time gradient at step k by the ratio of training-time pruned gradient norm to full-context gradient norm: g_test^k = (||∇ L(θ_k; C_high)|| / ||∇ L(θ_k; C_full)||) ∇ L(θ_k; C_full). Update θ_k with g_test^k. A loss-ratio-based scaling variant yields similar gains with lower overhead. Optimization details: - Outer loop uses second-order MAML-style updates; inner loop uses gradient descent with step size α. Bootstrapped target generation does not store second-order gradients (stop-gradient), limiting overhead. - Sampling ratio γ controls memory-performance trade-off; typical γ = 0.25 (Librispeech/ERA5 use 0.5; NeRV on UCF-101 uses 0.5 for 128 and 0.2 for 256). - Hyperparameters used in experiments include: inner K scaled roughly by 1/γ to keep memory comparable to baselines; bootstrapping steps L = 5; regularization weight λ = 100; outer optimizer Adam with dataset-specific learning rates. Architectures and datasets: - Primary INR: SIREN MLPs (5 layers, 256 hidden; 7 layers for video). Also NeRV for video experiments. - Modalities: images (CelebA, Imagenette, Text, ImageNet-100, CelebA-HQ, AFHQ), video (UCF-101 at various resolutions/clip lengths), audio (LibriSpeech), manifold/climate (ERA5). Preprocessing includes coordinate normalization and standard resizing/cropping. Algorithmic overview: - For each batch of signals: extract C_full; for k = 0..K−1: compute R_k, select C_high^k = Top-γ(C_full; R_k), perform inner update on C_high^k. Then from θ_K, perform L additional full-context updates to obtain θ_boot^{K+L} (stop-gradient). Compute L_total and update θ0 with its gradient (second-order). At test-time, adapt with full context using gradient scaling. Implementation notes: Online pruning resembles EL2N-style selection but is integrated within meta-training to avoid per-signal pretraining costs. The pruning yields significant memory savings per inner step, enabling longer horizons and reducing myopia.
Key Findings
- Consistent reconstruction improvements across modalities and architectures: - Video (UCF-101, 128×128×16): - SIREN: Learnit PSNR 25.46 / SSIM 0.720 / LPIPS 0.363; ECoP 26.59 / 0.769 / 0.237. - NeRV: Learnit 28.86 / 0.871 / 0.140; ECoP 33.99 / 0.949 / 0.019. - Video (UCF-101, 256×256×32): - SIREN: Learnit and TransINR out-of-memory; ECoP 22.76 / 0.621 / 0.549. - NeRV: Learnit 23.75 / 0.659 / 0.422; ECoP 28.58 / 0.834 / 0.207. - Audio (LibriSpeech): PSNR (dB) — Learnit 39.55 (1s) / 31.39 (3s) vs ECoP 43.40 (1s) / 36.45 (3s). - Climate/manifold (ERA5 181×360): PSNR — Learnit 64.91 vs ECoP 74.10. - Cross-domain adaptation (trained on UCF-101 128×128×16, adapted elsewhere): - CelebA 128×128: Learnit 27.74 dB vs ECoP 28.45 dB. - Imagenette 128×128: 25.18 vs 26.25 dB. - Kinetics-400 128×128×16: 26.42 vs 27.32 dB. - Comparison with efficient meta-learners on CelebA 178×178 (SIREN): - FOMAML: 25.85 PSNR / 0.669 SSIM / 0.342 LPIPS. - Reptile: 33.41 / 0.918 / 0.084. - ECoP: 40.54 / 0.975 / 0.005. - High-resolution capability: ECoP enables meta-learning on 1024×1024 images and 256×256×32 videos on a single GPU where prior methods run out of memory. - Ablations: - Each component (error-based pruning, bootstrapped correction, gradient scaling) contributes to performance; pruning+scaling alone already outperforms Learnit and random pruning. Bootstrapping stabilizes training on high-loss samples. - Longer adaptation horizons significantly improve ECoP, mitigating short-horizon myopia; Learnit shows limited gains with increased steps. - Bootstrapped target generation using full context yields the best gains (e.g., PSNR 38.72 vs 37.49 without bootstrapping on CelebA 178×178), indicating information recovery from full contexts beyond just longer horizons. - Gradient norm analysis shows pruned-context gradients can be up to ~3× larger than full-context gradients at certain steps, validating the need for test-time gradient scaling. - Training-time efficiency: Despite added bootstrapping and longer horizons (roughly 2× time per step vs Learnit under matched memory), ECoP reaches higher PSNR more than 2× faster to Learnit’s best performance in wall-clock comparisons.
Discussion
The findings demonstrate that online, error-based context pruning can substantially reduce memory usage during optimization-based meta-learning of INRs without compromising, and often improving, reconstruction quality. By focusing inner-loop updates on high-error samples, ECoP first captures global structure and then high-frequency details, yielding better adaptations. The memory savings enable longer adaptation horizons per signal, addressing the myopia inherent in short-horizon meta-learning and delivering stronger meta-initializations. Bootstrapped correction with a full-context target recovers information lost due to pruning and extends the effective learning horizon without incurring second-order costs over the extra steps. Test-time gradient scaling resolves the mismatch in update magnitudes between pruned training and full-context testing, which is critical for realizing the gains at inference. Across images, videos, audio, and climate data, ECoP’s model- and modality-agnostic design shows broad applicability and clear advantages over both second-order (Learnit) and first-order (FOMAML, Reptile) baselines, as well as greater scalability to high-resolution settings previously inhibited by memory constraints.
Conclusion
ECoP introduces a practical, scalable meta-learning framework for INRs that integrates online error-based context pruning, bootstrapped correction to a full-context target, and test-time gradient scaling. This combination delivers substantial reconstruction improvements, enables longer adaptation horizons under fixed memory budgets, and makes meta-learning feasible for high-resolution images and videos on single GPUs. The approach is model-agnostic and straightforward to implement, with consistent benefits across multiple modalities and architectures. Future work includes extending ECoP to scenarios with disjoint context and target sets (e.g., scene rendering), and to extreme resolutions where even a single full forward pass is infeasible. Potential directions involve iterative, hierarchical selection strategies (e.g., tree-search over grids) to locate high-loss regions efficiently.
Limitations
- Current formulation and experiments primarily consider the case where the context set used for inner-loop adaptation is also used for the outer/meta objective; handling disjoint context/target sets is deferred to future work. - Naively using full contexts at test-time degrades performance due to gradient norm mismatches; explicit gradient scaling is required to retain gains. - Training introduces additional computation from longer inner-loop horizons and bootstrapped target generation, although wall-clock analyses show it can still reach higher accuracy faster than baselines. - Error-based sampling can be unstable when learning only from high-loss examples; bootstrapped correction was found necessary to stabilize and improve training. - The method relies on second-order optimization during meta-training for INRs, which has higher memory/computation requirements than first-order approaches, though pruning mitigates the memory burden.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny