logo
ResearchBunny Logo
Predicting standardized uptake value of brown adipose tissue from CT scans using convolutional neural networks

Medicine and Health

Predicting standardized uptake value of brown adipose tissue from CT scans using convolutional neural networks

E. Erdil, A. S. Becker, et al.

This innovative research led by a team of experts—including Ertunc Erdil, Anton S. Becker, and others—introduces a groundbreaking method using convolutional neural networks to enhance the identification of active Brown Adipose Tissue. With results showing a significant boost in accuracy over traditional methods, this study paves the way for more efficient and cost-effective large-scale BAT imaging using unenhanced CT scans.

00:00
00:00
~3 min • Beginner • English
Introduction
Personalized treatment strategies are increasingly relevant for metabolic diseases, where genetic and phenotypic heterogeneity influences both disease risk and response to therapies (e.g., GLP-1 agonists, ACE inhibitors). Adipose tissue biology, particularly the role of brown adipose tissue (BAT) in non-shivering thermogenesis, has emerged as a promising target for stratified and personalized approaches. The presence and activity of BAT are highly heterogeneous across the population and may have genetic underpinnings; higher BAT activity is associated with more favorable cardiometabolic profiles. However, large-scale validation of these associations is limited by the standard measurement method—[18F]-FDG PET/CT—which is costly and involves ionizing radiation, limiting feasibility for population-level studies. Prior work indicates that BAT exhibits higher CT density (HU) than white adipose tissue, and supraclavicular BAT HU correlates with PET-measured standardized uptake value (SUV), motivating computational approaches to infer BAT activity from CT alone. This study investigates whether convolutional neural networks (CNNs) can predict PET SUV of BAT from unenhanced CT in predefined BAT-rich regions, enabling segmentation and subject stratification without PET.
Literature Review
Prior studies have established: (1) BAT’s distinct physiological role and links to metabolic health; (2) imaging characteristics differentiating BAT from white adipose tissue, including higher CT attenuation (HU) in BAT; and (3) moderate correlations between supraclavicular BAT HU and PET SUV (e.g., R≈0.66 in cold-stimulated settings). Population studies have shown voxel-wise HU–SUV associations in BAT depots, suggesting potential for predictive modeling. However, PET/CT-based assessments are constrained by radiation and cost. Conventional CT-based BAT segmentation uses HU thresholding (e.g., −180 to −10 HU) but suffers from limited accuracy and false positives. Deep learning, particularly CNNs and U-Net/Attention U-Net architectures, has demonstrated strong performance in medical image prediction and segmentation tasks without manual feature engineering. The literature also highlights domain shift challenges for CNNs across institutions and acquisition protocols, necessitating careful evaluation of intra-/inter-cohort generalization. Moreover, work comparing cold-stimulated versus thermoneutral conditions indicates HU varies little while SUV changes substantially with activation, implying potential label ambiguity when training on non-stimulated cohorts.
Methodology
Design: Train CNNs to predict PET SUV maps of BAT from unenhanced CT in regions likely to contain BAT, focusing on the supraclavicular depot. Use predicted PET to segment active BAT and to derive BAT volumes for downstream classification/stratification. Datasets: Four cohorts comprising paired [18F]-FDG PET/CT scans (total 841 scans from 718 subjects): two cold-exposure interventional cohorts (Basel: 32 volumes; Granada: 244 volumes) and two retrospective clinical cohorts without controlled cold stimulation (Zurich: 480; MSKCC: 85). Demographics and acquisition protocols as per cohort descriptions. Cold-exposure protocols ensured BAT activation; clinical cohorts followed standard fasting and uptake procedures without controlled cooling. Pre-processing: Convert CT using DICOM rescale/intercept; convert PET MBq/mL to SUV. Resample PET and CT to 0.976×0.976×1.5 mm³ (Granada default). Crop a 320×480×C volume around the supraclavicular region (C varies per subject based on expert selection). Normalize CT per-volume using min–max between the minimum and 99th percentile; normalize PET per-cohort by the 99th percentile SUV. Model: 2D Attention U-Net operating on axial slices. Inputs: CT slices from cropped ROI. Targets: corresponding PET SUV slices. 2D predictions are stacked to form 3D predicted PET volumes. Trained with MSE loss, learning rate 0.003, for 1000 epochs. Data augmentation: translation, rotation, random cropping, scaling, horizontal/vertical flips. Training/evaluation protocol: For each dataset, create 5 random splits with non-overlapping train/validation/test sets; approximately 20% held out for validation and 20% for testing in each fold. Train 5 models per dataset (one per split) and report averages. Split sizes (Train/Val/Test): Basel 20/6/6; Granada 148/48/48; Zurich 282/94/94; MSKCC 51/17/17. Segmentation of active BAT: Threshold PET volumes at SUV=1.5, then mask with expert-drawn supraclavicular ROI to suppress non-depot false positives, for both ground truth PET and predicted PET. Baseline comparator: HU thresholding on CT using −180 to −10 HU to estimate BAT regions; apply ROI masking to reduce false positives. Metrics and analyses: Dice score for overlap between predicted- and PET-derived active BAT segmentations. Inter-cohort tests: train on one cold-exposure cohort and test on the other. Statistical significance via permutation tests. Classification of subjects into BAT+ vs BAT− using BAT volumes derived from predicted PET versus HU-thresholding; compute AUC across a range of BAT volume thresholds for ground-truth labeling. Stratification experiment: form cohorts of 50 BAT+ or 50 BAT− subjects based on predicted BAT volumes, repeated 100 times to estimate the average number of mistakenly included subjects. Additional analyses: relationship between Dice and predicted/target BAT volume; correlation of Dice with BMI; qualitative visualization; exploration of other BAT depots in supplementary material.
Key Findings
- Intra-cohort segmentation (Dice): CNNs substantially outperform HU thresholding in cold-exposure cohorts: • Basel: CNN 0.745 vs HU 0.427 (~75% relative improvement), p << 1%. • Granada: CNN 0.521 vs HU 0.421 (~23% improvement), p << 1%. • Clinical cohorts show low Dice for both: Zurich HU 0.248 vs CNN 0.189; MSKCC HU 0.115 vs CNN 0.130 (all very low accuracy). - Inter-cohort segmentation: Training on one cold-exposure cohort and testing on the other shows partial generalization: • Basel-trained → Granada test: Dice 0.486 (~15% better than HU thresholding baseline on Granada). • Granada-trained → Basel test: Dice 0.538 (~25% better than HU thresholding baseline on Basel). Permutation tests: Granada model intra vs inter not significant (p=0.77); Basel model significant degradation (p << 1%). - Classification of BAT activity (AUC): Using CNN-predicted BAT volume improves intra-cohort AUC from ~0.6 (HU thresholding) to ~0.8 (CNN). Inter-cohort AUCs are high and broadly comparable across methods over a range of volume thresholds. - Stratification utility: When selecting 50 subjects predicted as BAT+, CNN-based preselection mistakenly includes ~17 BAT− subjects versus ~27 with HU thresholding (~37% reduction in misassignments). Similar reductions observed when selecting BAT− cohorts. - Segmentation accuracy scales with BAT volume: In Granada intra-cohort, restricting to predicted BAT volume >20 ml increases average Dice from 0.521 to 0.598; >40 ml increases to 0.698. - BMI confounding minimal: Pearson correlation between Dice and BMI is −0.131 (Granada) and −0.076 (Basel), indicating negligible influence of adiposity on predictive accuracy. - Qualitatively, predicted PET aligns well with ground truth in higher-activity cases; over-prediction occurs in low-activity cases, especially with training-set bias toward high BAT activity.
Discussion
Findings show that CNNs trained on cold-stimulated paired PET/CT can predict PET SUVs of BAT from CT in the supraclavicular region with significantly improved segmentation performance over HU thresholding and with useful discrimination between BAT+ and BAT− subjects. The method enables CT-only preselection for stratified cohorts, reducing unnecessary PET scans. Training on retrospective clinical cohorts without controlled cold exposure performed poorly due to HU–SUV label ambiguity: HU does not substantially differ between active and inactive BAT, whereas SUV increases with activation; thus similar CT appearances map to divergent PET activities, destabilizing learning. Dataset bias toward high BAT activity in cold-exposure cohorts leads to over-predictions in low-activity subjects. Inter-cohort generalization is asymmetric: the larger, more diverse Granada model generalizes better to Basel than vice versa, underscoring the need for balanced training sets across activity levels and harmonized cohort characteristics (e.g., lifestyle/cooling protocols) to improve robustness. Operationally, performance improves with larger BAT volumes, suggesting the use of an operating threshold on predicted BAT volume to obtain more reliable segmentations when precise delineation is required. Correlation analyses indicate that model accuracy is not materially confounded by BMI. While results extend to additional BAT depots (shown in supplementary material), the approach should not be assumed to generalize to predicting PET activity in other tissues (e.g., tumors) without validated CT signatures linking structure to metabolic function. 2D CNNs were chosen due to data size; 3D models and transformer-based architectures could yield further gains given sufficiently large, high-quality, cold-exposed datasets.
Conclusion
CNNs (Attention U-Net) can predict PET SUV activity of supraclavicular BAT from unenhanced CT, enabling accurate segmentation of active BAT and effective classification of subjects into BAT+ and BAT− using CT alone. Compared to HU thresholding, CNNs improve segmentation Dice by ~23–75% in cold-exposed cohorts and increase classification AUC to ~0.8, supporting cost-effective, low-radiation preselection for stratified cohorts. Inter-cohort tests demonstrate partial generalization when training on cold-exposure data. Future work should (1) assemble larger, balanced, cold-stimulated paired PET/CT datasets; (2) standardize cohort characteristics to reduce domain shift; (3) explore 3D CNNs and transformer-based models; (4) refine operating strategies (e.g., volume thresholds) for reliable deployment; and (5) extend analyses to additional BAT depots with rigorous validation.
Limitations
- Poor performance and training instability on non–cold-stimulated clinical cohorts due to HU–SUV ambiguities (inactive vs active BAT with similar HU but different SUV). - Domain generalization limitations across cohorts; performance degrades with distribution shifts in imaging protocols and population characteristics. - Dataset bias toward high BAT activity in cold-exposure cohorts causes over-prediction in low-activity subjects. - Use of 2D CNNs limits exploitation of 3D contextual information; small to medium dataset sizes precluded effective 3D or transformer training. - Heterogeneity in cohort characteristics (e.g., activity level, cold exposure protocols, demographics) was not harmonized, complicating fair generalization assessments. - Reliance on expert-defined ROI cropping and SUV thresholding (1.5) for segmentation may introduce procedural bias. - Small sample size in Basel cohort relative to Granada; datasets are not publicly available (privacy constraints), potentially limiting external validation.
Listen, Learn & Level Up
Over 10,000 hours of research content in 25+ fields, available in 12+ languages.
No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.
listen to research audio papers with researchbunny