Physics
Identifying Exoplanets with Deep Learning. V. Improved Light Curve Classification for TESS Full Frame Image Observations
E. Tey, D. Moldovan, et al.
Discover the groundbreaking Astronet-Triage-v2, a deep learning marvel developed by a talented team including Evan Tey, Dan Moldovan, and Michelle Kunimoto. This model excels in classifying TESS light curves to identify exoplanets with astonishing accuracy—99.6% recall and 75.7% precision. It's already improving our quest for new planetary candidates!
~3 min • Beginner • English
Introduction
The study addresses the challenge of efficiently and reliably distinguishing true exoplanet transit signals from false positives in the vast volume of TESS Full Frame Image (FFI) time series data. Human vetting, historically central to exoplanet candidate classification, is slow and inconsistent, making it impractical for the scale of TESS data. The research aims to improve automated triage by enhancing a deep neural network classifier, Astronet-Triage, to reduce losses of viable planet candidates while discarding more false positives. The context includes prior reliance on human inspection and automated tools (e.g., Kepler Robovetter, Autovetter) and the success of convolutional neural networks (e.g., Astronet) in similar tasks for Kepler, K2, and TESS. The purpose is to deliver a more accurate, generalizable, and informative classifier, Astronet-Triage-v2, trained on a larger, high-quality, human-labeled dataset from TESS FFIs, thereby improving candidate recovery and streamlining the Quick-Look Pipeline (QLP).
Literature Review
Automated vetting in exoplanet surveys has evolved from decision trees (Kepler Robovetter) and random forests (Autovetter) to convolutional neural networks such as Astronet. Astronet was first applied to Kepler and later adapted to K2 and TESS, with various studies introducing new inputs and data representations to improve performance. For TESS, Yu et al. (2019) developed Astronet-Triage for FFI data as a binary transit-like versus non-transit-like classifier. Other ML efforts have targeted two-minute SPOC postage-stamp data (e.g., Osborn 2020; Rao 2021; Valizadegan 2021; Fiscale 2021) and detection approaches (Pearson 2018; Zucker & Giryes 2018; Cui 2021). The TESS FFI context presents higher false positive rates due to larger pixels and shorter baselines. This work builds on Astronet-Triage by expanding training data across Sectors 1–39, using multi-vetter labels and a richer set of input views, and retaining multi-label outputs (E, S, B, J, N) to support nuanced triage.
Methodology
Data: Approximately 24,926 human-vetted threshold crossing events (TCEs) from TESS FFIs, detected by the QLP across Sectors 1–39 (Primary Mission: 30 min cadence; 1st Extended Mission: 10 min cadence). TCE selection comprised three batches: Y1 (Sector 13, ~8,992 TCEs), Y2 (Sectors 14–26, ~13,372 TCEs, brightest targets), and Y3 (Sectors 27–39, ~2,588 TCEs). QLP light curves were extracted from five circular apertures, detrended per orbit with basis splines (spacing 0.3–1.5 d), merged across sectors, and searched with Box Least Squares (BLS). TCEs required signal-to-pink-noise > 9 and BLS peak significance > 5 (T<12) or > 9 (T>12); signals with a/R*<1 were filtered as non-physical.
Labels: Five labels were assigned via multi-vetter visual assessment: E (periodic eclipsing signals, including planets and non-contact EBs), S (single-transit or incorrect/aliased periods), B (contact EBs), J (junk: stellar variability/instrumental effects), N (not sure). Ambiguity rules favored E over S for borderline period errors and B over S for contact EBs with incorrect periods. Weekly independent vetting by 3–7 authors with consensus resolution; fractional weights used for non-unanimous cases.
Preprocessing and inputs: From raw flux, transits masked using BLS ephemerides; three detrendings produced via basis splines with spacings 0.3 d, 5.0 d, and BIC-optimized. For each detrended light curve, seven robustly binned and normalized views were generated: Global (201 bins; with per-bin std, empty-bin mask, in-transit mask), Local (±2 durations, 61 bins; with std and normalization scale), Secondary (local view around best out-of-transit secondary; with scale and secondary phase), Local Half-Period (folded at P/2; std only), Global Double Period (2P), Sample Global Segments (up to 7 individual folds, 201 bins each), Sample Local Segments (up to 8 local folds, 61 bins each). Exposure-time weighting accounted for cadence changes. Scalar features included P, duration, depth, number of observed folds (log-scaled, capped at 100), TESS magnitude, stellar mass and radius (from TIC v8.2; ~2,400 radii estimated via Gaia-based SED and MIST relations when missing), number of light-curve points, secondary phase, and normalization scale factors; all normalized to zero mean/unit variance as appropriate.
Neural network: A convolutional neural network derived from Astronet. Time-series views grouped into multi-channel 1D inputs and passed through separate convolutional towers (ReLU activations with pooling), flattened and concatenated with scalar features, then processed by a fully-connected tower with dropout regularization. Final layer outputs five sigmoid scores (for E, S, B, J, N), trained with binary cross-entropy (allowing multi-label scoring) and label weights reflecting vetter consensus. Training used Adam optimizer for 20,000 steps; hyperparameters tuned using Vizier. No data augmentation applied. An ensemble of 10 independently trained models produced final predictions: if any model predicted E above threshold, ensemble output E; otherwise majority vote among label predictions.
Evaluation: Data split into train/validation/test. Metrics emphasized precision-recall for E labels; AUC-PR used for threshold-independent comparison. Generalization assessed using held-out 1st Extended Mission Sector 33 TCEs (1,349 TCEs labeled by a single vetter for this test, excluding S for direct comparison to Astronet-Triage), split by camera and magnitude. Performance also assessed on the TOI catalog (4,140 PCs and Ps through Sector 47) using E-score thresholds.
Key Findings
- Validation performance: AUC-PR = 0.977. At threshold 0.0105: 100% recall at 41% precision. At threshold 0.215: 96.9% recall at 79.8% precision.
- Test performance: AUC-PR = 0.965. At threshold 0.0005: 100% recall at 15% precision. With thresholds from validation: at 0.0105, 99.6% recall at 39.7% precision; at 0.215, 97.2% recall at 75.7% precision.
- Generalization to 1st Extended Mission (Sector 33, excluding S-labeled TCEs; 1,315 TCEs): Astronet-Triage-v2 outperforms Astronet-Triage across precision-recall, with AUC-PR 0.961 vs 0.927. Models trained only on Y1, Y2, and Y3 subsets achieved AUC-PR 0.954, 0.960, and 0.917, respectively, indicating strong generalization even without 1st Extended Mission training data.
- Sector 33 camera-wise metrics at two thresholds (precision, recall):
- Threshold 0.0105: Cam 1: (0.64, 0.98); Cam 2: (0.53, 1.00). Threshold 0.215: Cam 1: (0.89, 0.91); Cam 2: (0.84, 0.99). Astronet-Triage at threshold 0.08: Cam 1: (0.89, 0.85); Cam 2: (0.82, 0.90).
- TOI catalog recovery (4,140 PCs/Ps): At precision-matched thresholds (Astronet-Triage 0.09 vs Astronet-Triage-v2 0.2), Astronet-Triage-v2 recovers 3,577 TOIs vs 3,349 for Astronet-Triage, saving at least ~200 additional planet candidates at equal precision. 93% of TOIs have E-score > 0.0105; 86% pass at 0.215. Higher recall observed for confirmed/validated planets than for planet candidates.
- Operational impact: Astronet-Triage-v2 deployed in the QLP starting Sector 34, improving candidate triage without increasing human vetting load.
Discussion
The results demonstrate that Astronet-Triage-v2 substantially improves the triage of TESS FFI transit-like signals, achieving higher recall at equal or better precision than Astronet-Triage and thus reducing the number of viable planet candidates lost during automated screening. The expanded and higher-quality training set, multi-view detrending and folding, additional scalar features, and multi-label supervision enable better discrimination of eclipsing signals from noise/systematics and contact binaries. The strong AUC-PR and favorable precision-recall trade-offs on both validation/test sets and held-out Extended Mission data (Sector 33) confirm good generalization despite changes in cadence and noise characteristics. Application to the TOI catalog shows that, at the same precision, Astronet-Triage-v2 recovers hundreds more candidates, directly addressing the study’s goal of preserving planet candidates while controlling false positives. Limitations in precision are linked to ambiguous morphologies (e.g., borderline E vs B, noisy transits amid stellar variability), sensitivity to BLS period/duration errors that can distort phase-folded representations, and the inherently lossy binning/folding process. Compared with prior methods, including the original Astronet-Triage, the new model’s broader inputs and richer labeling scheme improve robustness and interpretability, supporting its adoption in QLP and laying groundwork for more automated and statistically characterizable planet catalogs.
Conclusion
Astronet-Triage-v2, a convolutional neural network trained on ~25,000 high-quality, human-labeled TESS FFI TCEs (Sectors 1–39), improves light curve classification for exoplanet candidate triage. The model uses three detrending strategies, 21 distinct phase-folded and segment views, and auxiliary scalar features, and outputs five label scores (E, S, B, J, N). It achieves state-of-the-art precision-recall performance on validation/test data and generalizes well to 1st Extended Mission observations, outperforming Astronet-Triage and recovering more TOIs at matched precision. Deployed in the QLP from Sector 34 onward, Astronet-Triage-v2 reduces losses of viable candidates without increasing vetting burden. Future work will target fully automated, uniform FFI planet vetting suitable for demographics, including distinguishing planets from eclipsing binaries, improved handling of BLS ephemeris errors, leveraging TOI vetting labels, and data augmentation to mitigate class imbalance and enhance robustness.
Limitations
- The classifier does not distinguish planets from non-contact eclipsing binaries; both are labeled E, limiting immediate planet specificity.
- Sensitivity to inaccuracies in BLS-derived period and duration can distort detrending and phase-folded views, degrading classification and causing false negatives, especially over multi-year baselines.
- Phase folding and binning are lossy representations that may remove informative time-domain features.
- Class imbalance, particularly the scarcity of S-labeled examples, can affect training; no data augmentation was used in this work.
- Differences between SPOC two-minute and QLP binned light curves imply some TOIs may be undetectable in FFI-based inputs, affecting TOI recall comparisons.
- Training set selections (e.g., brightness cuts, sector coverage) may introduce distributional biases; some CCDs/sectors were excluded due to data unavailability during vetting.
Related Publications
Explore these studies to deepen your understanding of the subject.

