Medicine and Health

Deep learning-enabled point-of-care sensing using multiplexed paper-based sensors

Z. S. Ballard, H. Joung, et al.

Discover an innovative deep learning framework designed by researchers Zachary S. Ballard, Hyou-Arm Joung, Artem Goncharov, Jesse Liang, Karina Nugroho, Dino Di Carlo, Omai B. Garner, and Aydogan Ozcan for high-sensitivity C-reactive protein testing. This low-cost, paper-based vertical flow assay redefines access to cardiovascular disease testing with impressive accuracy and robustness.

00:00

~3 min • Beginner • English

Index

Introduction

The study addresses the challenge of improving accuracy and robustness of point-of-care (POC) diagnostics, particularly paper-based immunoassays, which often suffer from limited sensitivity, specificity, and variability due to low-cost materials and operational constraints. The authors hypothesize that computational sensing with deep learning can jointly optimize multiplexed sensor design and inference algorithms to accurately quantify biomarkers from noisy, variable signals. As a use case, they target high-sensitivity C-reactive protein (hsCRP) quantification (0–10 mg/L) for cardiovascular disease (CVD) risk stratification, which demands high accuracy near clinical cutoffs (1 and 3 mg/L) and is susceptible to the hook effect over CRP’s wide dynamic range. The purpose is to create a low-cost, rapid, mobile, paper-based VFA platform and a deep learning framework to: (1) select robust subsets of spatially multiplexed immunoreaction spots/conditions, (2) infer CRP concentration accurately, and (3) extend dynamic range and mitigate hook-effect, enabling reliable POC hsCRP testing.

Literature Review

The paper situates its work within prior advances in deep learning for medical imaging (tumor detection, virtual staining, super-resolution) and mobile/POC diagnostics. Traditional paper-based rapid diagnostic tests (RDTs) for infectious diseases provide affordability and ease-of-use but are limited by reagent stability, fabrication variability, matrix effects, and hook-effect, which can cause false low readings at high analyte concentrations. Prior hsCRP POC demonstrations achieving accurate high-sensitivity quantification generally relied on fluorescent/chemiluminescent assays and benchtop readers rather than simple colorimetric formats. Recent machine learning approaches have enhanced POC assays, including neural networks for Lyme disease diagnostics using VFA. However, quantitative protein biomarker measurement at POC with colorimetric multiplexed assays and a data-driven joint optimization of sensing hardware and computational algorithms remains an unmet need, which this work addresses.

Methodology

Study design and platform: The authors developed a multiplexed paper-based vertical flow assay (VFA) contained in a 3D-printed cassette, imaged by a custom mobile-phone reader. The sensing membrane is a nitrocellulose (NC) substrate patterned with up to 81 spatially isolated immunoreaction spots (9×9 grid, 1.3 mm pitch). Each spot belongs to one of seven spotting conditions (capture protein plus buffer), distributed via an algorithm to avoid positional bias from flow non-uniformities. Assay operation: The hsCRP assay uses 5 µL serum diluted 10× in running buffer (3% Tween 20, 1.6% BSA in PBS) to 50 µL, mixed 1:1 with 50 µL Au nanoparticle (AuNP)-secondary Ab conjugate. Steps: (1) 200 µL running buffer preload; (2) add 100 µL total sample/conjugate mix; (3) 400 µL wash; wait 10 minutes; open cassette and image the NC membrane with the Android camera (ISO 50, 1/125 s, autofocus). The assay time is under 12 minutes. Fabrication: Proteins were dispensed (0.1 µL/spot) using an automated liquid dispenser (Formulatrix MANTIS) onto NC, generating up to 24 membranes per sheet (one fabrication batch); up to three batches per day. Two reagent batches (different lots/storage) were used. Each membrane is labeled with fabrication batch ID (FID ∈ {1,2,3}) and reagent batch ID (RID ∈ {1,2}). Post-spotting: 4 h room-temperature incubation; 1% BSA block (30 min); drying at 37 °C (10 min); cut to 1.2×1.2 cm; assembled with other paper layers in a 3D-printed cassette. Image processing and feature extraction: Custom software segments spots from raw DNG images. For each spot, the local background (BSA-blocked NC) is subtracted, and the signal is normalized by the sum over all spots to mitigate inter-test variability (pipetting, fabrication, operation). Spot signal S_mp is defined by background subtraction and normalization; condition-level feature X_m is computed as the average over spot redundancies for that condition. Additional integer features include RID and FID. Data and clinical testing: Remnant human serum samples (UCLA IRB #19-000172; consent waived) previously quantified by the Siemens Dimension Vista hsCRP Flex were measured with the VFA. Dataset: 85 clinical samples (triplicate VFAs), mostly within 0–10 mg/L; one at 83.6 mg/L. Additional nine CRP-free sera and nine artificial acute samples spiked at 200, 500, 1000 mg/L. In total 273 VFAs were fabricated; one test and two triplicates excluded due to fabrication issues/non-specific binding artifacts. Data were split into training (N_train=209) and blind testing (N_test=57), stratified across hsCRP range and FIDs. Machine learning framework and feature selection: A two-stage feature selection was performed on the training set using 5-fold cross-validation. - Spot selection: Define a per-spot cost j_m,p capturing deviation from the mean of like-spots across samples; iteratively eliminate spots with highest j_m,p, evaluating a fully connected neural network’s mean-squared logarithmic error (MSLE) versus number of spots. The curve indicated improved performance after removing ~30–40 least stable spots; a subset of 38 spots was selected. - Condition selection: Iteratively eliminate entire spotting conditions and assess cross-validation MSLE/R². Eliminating Mix 1 (Ab/Ag mixture 1) and Ag-low yielded slightly better or equivalent performance, resulting in five retained conditions. Neural network models: A tiered fully connected architecture (two hidden layers per tier with 512 and 64 nodes, ReLU activations, dropout 0.5) optimized with MSLE as the loss, selected via random hyperparameter search on the training set (varying nodes, layers, regularization, batch size, cost function). Regularization included L2 and dropout to mitigate overfitting. Final training and inference: After feature selection, a quantification network was trained on the full training set using the selected spot/condition features plus RID and FID. A separate classification network first determines if a sample is within hsCRP range (<10 mg/L) or represents acute inflammation (>10 mg/L); hsCRP-range samples are then passed to the quantification network. Blind testing used the computationally selected subset of 28 spots across 5 conditions.

Key Findings

- Blind testing performance (N_test=57): - Initial classification into hsCRP (<10 mg/L) vs acute (>10 mg/L): 100% accuracy; 6 acute samples correctly identified. - Quantification within hsCRP range (51 samples): R²=0.95; linear fit slope=0.98, intercept=0.074 mg/L; overall average coefficient of variation (CV)=11.2%. - CV by CVD risk strata: low risk (<1 mg/L): 11.5%; intermediate (1–3 mg/L): 10.1%; high (>3–10 mg/L): 12.2%. - Compared to multivariable linear regression: neural network markedly improved accuracy (linear regression yielded average %CV ~47% and R²=0.79). - Incorporating batch metadata (RID, FID) as input features improved blind test performance: MSLE reduced by 12.9%, overall %CV reduced from 16.6% to 11.2%, and R² increased from 0.92 to 0.95. - Feature selection results: reducing from 81 spots to 38 based on stability, then eliminating two conditions (Mix 1 and Ag-low) improved or maintained performance; blind testing used 28 spots across 5 conditions. - Hook-effect mitigation: Multiplexed conditions (including an antigen capture channel with monotonic response) enabled correct identification of high-CRP (acute) samples; models restricted to fewer conditions (e.g., only Ab or Ab plus secondary Ab) would falsely report high-CRP samples as within hsCRP (e.g., 83.6, 200, 1000 mg/L misreported as 7.81, 7.34, 3.84 mg/L). - Cost implications: Implementing only computationally selected chemistries reduced reagent cost for immunoreaction spots by 62% (from $2.61 to $0.97 per test); total cost per test projected to < $0.5 at scale. - Operational metrics: Low-cost, rapid colorimetric assay, mobile phone reader, assay time <12 minutes, analytical range 0–10 mg/L for hsCRP, with detection of acute samples beyond this range.

Discussion

The findings support the hypothesis that computational sensing with deep learning can significantly enhance POC paper-based assays. Joint optimization of multiplexed sensing channels and inference models improved quantitative accuracy and precision within the clinically relevant hsCRP range, meeting or approaching FDA guidance on precision while maintaining low cost and simplicity. Deep neural networks, with non-linear modeling capacity and regularization, outperformed linear methods in extracting robust patterns from multiplexed spot signals subject to fabrication and operational variability. Incorporating fabrication (FID) and reagent (RID) batch identifiers as model inputs allowed the network to learn batch-specific variations, improving generalization and enabling quality assurance workflows. The multiplexed VFA design, analyzed computationally, mitigated the hook effect by combining channels with complementary response profiles, extending the effective dynamic range and preventing false low readings at high CRP. Feature selection not only improved performance but also informed sensor design and manufacturing by identifying spatially unstable regions and redundant/unstable chemistries, reducing cost without sacrificing performance. The approach exemplifies a broader framework where cost function choice aligns with clinical priorities (e.g., using MSLE to weight errors relative to clinical cutoffs), and data-driven selection guides assay chemistry and layout for future POC diagnostics.

Conclusion

This work demonstrates a data-driven, deep learning-enabled framework for designing and reading multiplexed paper-based POC sensors, validated on hsCRP testing for CVD risk stratification. By jointly optimizing spot selection, condition selection, and inference networks, the platform achieved strong agreement with a clinical gold standard (R²=0.95; overall CV=11.2%), correctly classified acute inflammation samples, and mitigated hook-effect artifacts, all using a simple colorimetric assay and mobile reader. The framework reduces reagent costs through informed feature selection and can guide fabrication improvements by revealing spatial/batch variability. Future research directions include: developing clinically tailored loss functions that emphasize accuracy near decision thresholds; expanding datasets to further improve generalization, especially for extreme concentrations; integrating batch metadata capture (e.g., via QR codes) into workflows; and applying this computational sensing paradigm to other multiplexed assays (microarrays, multi-channel fluidics) to engineer robust, low-cost POC diagnostics.

Limitations

- Data size and class balance: Limited number of acute (>10 mg/L) samples (N_train=6, N_test=6), with some high-CRP samples created by spiking, may constrain generalization for extreme concentrations. - Potential overfitting risk inherent to deep learning with limited datasets; mitigated via cross-validation, L2 regularization, and 50% dropout, but larger, more diverse datasets would strengthen validation. - Fabrication variability: Tests were fabricated without industry-grade environmental controls and involved manual steps, contributing to batch effects (though partially modeled via RID/FID features). - Generalizability and regulatory status: Although performance is competitive, no FDA-approved POC hsCRP test exists; further clinical validation and regulatory pathways are required. - Data and code availability are upon reasonable request rather than fully open, which may limit independent replication.

Related Publications

Explore these studies to deepen your understanding of the subject.

Medicine and Health

Creation of a point-of-care therapeutics sensor using protein engineering, electrochemical sensing and electronic integration

R. Cai, C. Ngwadom, et al.

Medicine and Health

Recent Advancements and Perspectives in the Diagnosis of Skin Diseases Using Machine Learning and Deep Learning: A Review

J. Zhang, F. Zhong, et al.

Medicine and Health

Impact of a deep learning sepsis prediction model on quality of care and survival

A. Boussina, S. P. Shashikumar, et al.

Medicine and Health

Design and Analysis of a Deep Learning Ensemble Framework Model for the Detection of COVID-19 and Pneumonia Using Large-Scale CT Scan and X-ray Image Datasets

X. Xue, S. Chinnaperumal, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny