Medicine and Health
Library adaptors with integrated reference controls improve the accuracy and reliability of nanopore sequencing
H. M. Gunter, S. E. Youlten, et al.
Next-generation sequencing (NGS) library adaptors are essential oligonucleotides that carry primer- and flowcell-binding sites and can incorporate functional elements such as barcodes and unique molecular identifiers to enhance analysis. Reference standards are needed to measure accuracy and performance, but common natural controls (e.g., NA12878) are unsuitable as internal per-sample controls, and spike-in controls add protocol steps and risk consuming sequencing capacity—especially for low-input or degraded samples. The study introduces Control Library Adaptors (CAPTORS) that embed qualitative and quantitative reference sequences directly into library adaptors, enabling real-time assessment of sequencing accuracy and quantitative performance within each read and library, with applications to benchmarking ONT nanopore sequencing, improving cross-sample normalisation, and enhancing clinical variant detection in BRCA genes.
Prior approaches to improve library analysis include index barcodes for multiplexing and unique molecular identifiers (UMIs) to enable consensus error correction and mitigate PCR duplication artefacts. Reference standards used to assess sequencing performance include natural materials (e.g., NA12878) and synthetic spike-ins (e.g., ERCC, SIRVs, sequins), which provide ground truth but require careful addition and calibration and may overwhelm low-input samples. ONT sequencing is known to suffer systematic errors, particularly in repetitive and GC-rich regions; numerous error-correction tools model these systematic errors using machine learning or homology-based polishing. CAPTORS are proposed to combine the benefits of internal controls with seamless integration into the library workflow without extra steps, addressing limitations of existing controls and enabling per-read and per-pore accuracy assessment, quantitative benchmarking, and targeted clinical error correction.
Design and synthesis: 72 custom 90 nt CAPTOR adaptors were designed with three regions: a 5′ constant 30 nt burn-in sequence (to traverse initial high-error cycles), a 30 nt variable region unique to each CAPTOR encompassing a designed diversity of 6-mers (generated with Shortcake), and a 3′ constant region common to all CAPTORS to prevent preferential ligation. Sequences were screened via BLAST to avoid extended homology (>20 nt) to natural sequences and checked for secondary structure to avoid hairpins (>8 nt). Adaptors were synthesized by enzymatic DNA synthesis using the DNA Script SYNTAX system, desalted, quantified to 2 μM, and pooled.
Library preparation and sequencing: CAPTORS were ligated to DNA or cDNA fragments using the ONT LSK109 ligation protocol. DNA samples were sheared to ~25 kb using Covaris g-tubes. Libraries were sequenced on ONT MinION/GridION instruments using R9.4.1 or R10.3 flowcells with live basecalling (Guppy v4). CAPTOR sequences at read termini were identified and classified by their unique variable sequences.
Accuracy analysis: Reads were clipped to the first 500 nt (fastp), aligned to a custom CAPTOR reference (minimap2 -ax map-ont), sorted and indexed (samtools), and per-nucleotide error profiles were computed (pysamstats). Variable regions (30 nt) were parsed into overlapping 6-mers (IRanges in R), and per-6-mer mismatch/indel error rates were aggregated across CAPTOR instances. Sequence-context effects (GC content, repeats, homopolymers) were assessed using Brown-Forsythe and Welch’s ANOVA. Per-read, per-pore, and time-resolved error statistics were extracted using FASTQ header metadata and bamtools splitting.
Quantitative ladder: The 72 CAPTORs were divided into groups of nine and diluted across an 8-fold series (undiluted to 1:128). Dilution concentrations were verified by Qubit and equally mixed to form a staggered master mix. During sequencing, observed CAPTOR counts were compared to expected concentrations to form a per-library quantitative reference ladder. Library subsampling (seqtk) evaluated the effect of read depth on quantitative accuracy and limit of quantification.
Metagenome experiment: Triplicate ONT libraries were prepared from two synthetic mock microbial communities (Mixture A and B). Reads were aligned to CAPTORs and synthetic microbial sequences (metasequins). Observed read counts and fold-changes between Mixtures A and B were compared to expected values. Normalisation methods compared included TMM (edgeR) and RUVg using CAPTORS as negative control factors; performance was assessed by true-positive/true-negative detection of known fold-changes.
RNA-seq experiment: UHRR RNA was converted to double-stranded cDNA (Superscript IV), ligated with CAPTORS, and sequenced on R9.4.1 or R10.3 flowcells. CAPTOR ladders assessed quantitative accuracy and LOQ via segmental regression; gene expression levels were compared to the CAPTOR ladder to identify genes above LOQ and estimate measurement uncertainty.
Clinical BRCAPTORs: Custom BRCAPTORs encoding one BRCA1 exon (5172 nt) and two BRCA2 exons (2054 and 2254 nt) were synthesized. BRCA1/2 regions were PCR-amplified from NA12878 genomic DNA, ligated to BRCAPTORs, and sequenced (R9.4.1). Reads were aligned to a combined BRCA/BRCAPTOR reference (minimap2), and per-nucleotide error profiles were computed (pysamstats). Error correction was performed by subtracting the BRCAPTOR per-nucleotide error profile from the corresponding BRCA positions, with evaluation on all bases and on COSMIC-listed pathogenic sites.
- CAPTOR-derived per-6-mer error profiling revealed a mean per-base error rate of 0.089 (SD 0.035) in ONT libraries, consistent with prior MinION performance. Mismatch errors averaged 0.034 (SD 0.021) and indels 0.062 (SD 0.033). A 7-fold variation existed among 6-mers (most accurate AATCGA at 0.030 errors/nt; least accurate CGGGGG at 0.219 errors/nt). Errors were higher in repetitive and GC-rich sequences, with cumulative effects in GC-rich homopolymers.
- Error profiles were highly reproducible across technical replicates: mean 8.4% difference across k-mer error rates; indels were more reproducible (6.7% difference) than mismatches (12.1%), indicating predominantly systematic errors suitable for modeling/normalisation.
- CAPTOR accuracy within reads matched adjacent microbial DNA accuracy; CAPTOR error rates were higher for failed vs passed reads (median 0.068 vs 0.045; p<0.0001). Pore performance varied ~3.2-fold over experiments; poorly performing pores exhibited lower throughput. No significant temporal trend for active pores over 72 h (two-way ANOVA p=0.1308). Pore location had no apparent impact.
- Benchmarking nanopore chemistries: R10.3 pores showed lower mean error (0.037 errors/nt) than R9.4.1 (0.045), largely due to lower insertion rates (0.021 vs 0.032 errors/nt). At low-complexity repeats, R10.3 had notably lower error (mean 0.048) than R9.4.1 (0.083).
- Quantitative ladder: Observed CAPTOR counts closely matched expected concentrations (per-library ladder R^2=0.9779), enabling assessment of quantitative accuracy and uncertainty vs abundance and coverage. CAPTOR ladder counts correlated with synthetic microbe abundances in mock communities. Sequencing coverage threshold for reliable quantification across the CAPTOR dynamic range (<1% frequency) was ~5×10^4 reads, typically achieved within ~2 h; below this threshold, uncertainty increased.
- Normalisation: Using CAPTORS as negative controls with RUVg improved detection of known fold-changes between microbial communities compared to TMM, reducing unwanted technical variation and improving agreement between expected and observed fold changes.
- RNA-seq: CAPTOR ladder in UHRR cDNA libraries showed strong linearity (R^2=0.9552) down to a LOQ inflection determined by segmental regression (at −7.1 reads threshold as reported), below which measurements became variable (R^2=0.2210). 5,903 human genes (36.1% of 16,354 detected GENCODE genes) exceeded the LOQ, enabling confident quantification; CAPTORs provided gene-level uncertainty estimates.
- Clinical error correction (BRCAPTORs): BRCAPTOR error profiles correlated with corresponding BRCA1/2 errors, enabling per-nucleotide normalisation. Median BRCA error rate reduced from 0.042 to 0.018 after correction; deletion errors reduced from mean 0.020 to 0.007. At COSMIC-listed pathogenic sites, median error decreased from 0.032 to 0.012, improving interpretability for clinical diagnostics.
The study demonstrates that embedding reference controls directly in library adaptors (CAPTORS) enables real-time, per-read and per-pore measurement of ONT sequencing accuracy and quantitative performance without additional preparation steps. CAPTOR k-mer error profiles reveal systematic, reproducible error patterns driven by sequence context (repeats, GC content), which can be benchmarked across chemistries (e.g., R10.3 vs R9.4.1), instruments, and runs. The quantitative CAPTOR ladder validates and tracks library sensitivity, dynamic range, and required coverage thresholds, guiding decisions about sequencing duration and depth. Using CAPTORS as negative control scaling factors with RUVg effectively normalises unwanted technical variation between samples, improving detection of true fold-changes in metagenomics use cases. In RNA-seq, CAPTOR ladders provide empirical LOQ and confidence metrics within a single library. Clinically, gene-specific CAPTORS (BRCAPTORs) enable empirical modeling of systematic errors along clinically relevant sequences, facilitating per-nucleotide error correction and improving diagnostic accuracy for BRCA variants. Overall, CAPTORS address the need for integrated qualitative and quantitative controls in NGS, supporting reproducibility and cross-sample comparability, and enabling more reliable application of ONT sequencing in research and clinical contexts.
CAPTORS are a new class of library adaptors that integrate qualitative (accuracy) and quantitative (abundance) reference controls directly into NGS libraries. They enable per-read and per-pore accuracy monitoring, quantitative benchmarking via an internal staggered ladder, improved cross-sample normalisation using RUVg, and per-nucleotide error correction for targeted clinical genes. CAPTORS distinguished performance between nanopore chemistries and provided actionable coverage thresholds for reliable quantification. Gene-targeted BRCAPTORs reduced median BRCA error rates and improved interpretability at clinically relevant sites. Future work could extend CAPTORS to short-read platforms (with shorter control elements), broader clinical panels, and integration into adaptive sequencing strategies for real-time quality control and decision-making.
- Gene-specific CAPTORS (e.g., BRCAPTORs) are practical for small, high-priority panels but not scalable to whole genomes or very large gene sets; traditional spike-ins may be preferable for broad coverage.
- Short-read platforms would require much shorter control elements (e.g., ~12 nt), limiting the richness of accuracy profiling and necessitating careful design for cluster discrimination; comprehensive accuracy analysis like in long reads is constrained.
- Quantitative reliability depends on achieving sufficient read depth; below ~5×10^4 reads, uncertainty increases and LOQ is not met across the full dynamic range.
- Systematic error profiles and minimal thresholds may vary by experiment, chemistry, basecaller, and analysis objectives, necessitating per-run calibration.
- Experiments used technical replicates and specific mock communities/reference samples; generalisability to all sample types and workflows may require further validation.
Related Publications
Explore these studies to deepen your understanding of the subject.

