
Medicine and Health
Overloading And unpacking (OAK) - droplet-based combinatorial indexing for ultra-high throughput single-cell multiomic profiling
B. Wu, H. M. Bennett, et al.
Discover the groundbreaking OAK technique, developed by the researchers from Genentech and Harvard Medical School, which streamlines single-cell multiomic profiling. With its unique droplet-based barcoding approach, OAK reveals intricate cellular diversity, capturing valuable insights from complex samples, including melanoma responses to RAF inhibitors. Dive into the future of molecular analysis today!
~3 min • Beginner • English
Introduction
Single-cell sequencing technologies are rapidly evolving to enable precise identification of cell types and states, capture rare populations and perform large-scale perturbation screens at reduced cost. Droplet-based microfluidic approaches co-encapsulate cells with barcoded beads for massively parallel single-cell profiling, but to avoid multiplets they require low cell loading, resulting in a majority of droplets being cell-free and barcoding capacity underutilized. In contrast, combinatorial indexing on microwell plates achieves high-efficiency barcoding and can scale to over 100,000 cells, but involves lengthy, labor-intensive split-pool protocols. To bridge these strengths and limitations, the study introduces OAK (Overloading And unpacking), which replaces the first split-pool step with droplet-based barcoding on a commercial system, then performs a second indexing via aliquoting, aiming to combine ultra-high throughput, sensitivity and experimental simplicity across transcriptome and chromatin accessibility modalities.
Literature Review
Prior droplet methods (e.g., Drop-seq, inDrops, 10x Chromium) revolutionized single-cell RNA-seq but sacrifice efficiency because Poisson loading yields a high fraction of empty droplets, wasting reagents and bead barcodes. Combinatorial indexing strategies (e.g., sci-RNA-seq, SPLIT-seq, sci-CAR, Paired-seq, scifi-RNA-seq) scale to very large cell numbers without microfluidics but require multiple split-pool cycles and complex reagent preparation, increasing hands-on time and cost. Additional droplet-based combinatorial indexing for chromatin (e.g., dscATAC) and emerging systems with dissolvable hydrogel beads (Hydrop) have expanded modality options. Despite these advances, paired multiome profiling at ultra-high throughput with simple workflows remains limited. The literature highlights trade-offs between throughput, sensitivity, cost, and protocol complexity that motivate OAK’s hybrid droplet-plus-combinatorial indexing design.
Methodology
Overview: OAK uses fixed cells or nuclei as reaction chambers for two rounds of indexing. First, cells/nuclei are overloaded into droplets on a commercial 10x Genomics Chromium microfluidic system for in-droplet barcoding (primary index) via reverse transcription (for RNA) and, in multiome, tagmentation (for ATAC). After barcoding, emulsions are broken (“unpacking”), fixed cells/nuclei are recovered and pooled, then randomly redistributed into multiple aliquots for a secondary indexing PCR, forming combinatorial barcodes (droplet index × secondary index). A user-defined number of aliquots are converted into sub-libraries to enable stepwise sequencing.
Experimental design and controls: Species-mixing (K562 human + NIH/3T3 mouse) experiments at different loading levels (150,000 vs 450,000 cells per channel) assessed recovery, sensitivity, and multiplet rates against theoretical expectations. Standard Chromium NextGEM 3' scRNA-seq served as a benchmark for sensitivity and throughput. For multiplexing, hashed human bronchial epithelial cells were profiled with both OAK and standard Chromium. For multiome, fixation conditions (methanol vs formaldehyde) were compared in K562. For tissue-scale multiome, human retinal peripheral nuclei were profiled by OAK and a standard Chromium multiome run for comparison. For lineage tracing, IPC-298 melanoma cells were transduced with a lentiviral library of 100,000 lineage barcodes (TraCe-seq), expanded, and sampled at multiple time points during belvarafenib treatment.
OAK scRNA-seq workflow: Cells were methanol-fixed (−20 °C, 30 min), washed in SSC buffer with RNase inhibitor and DTT, and loaded (typically ≥150,000 cells/channel) onto the Chromium 3' RNA-seq system, intentionally overloaded to reduce empty droplets. After droplet generation and reverse transcription (53 °C, 45 min), emulsions were broken with recovery agent, fixed cells recovered and washed. The suspension was adjusted and evenly distributed into multiple aliquots (e.g., ~20 aliquots per 150,000 loaded cells targeting ~4,000 cells/aliquot) and stored at −80 °C. For library prep, selected aliquots were heated (80 °C, 5 min), first-strand cDNA purified (silane beads), and PCR-amplified using a TSO-recognition primer plus a secondary indexing primer to add the secondary barcode. For hashing, an appropriate HTO primer was included. Pooled cDNA from 2–4 aliquots formed a sub-library following standard Chromium library construction with a partial P5 and i7 indexing primers. Sequencing used Illumina platforms with 28 cycles Read 1, 10 cycles i7, 8 cycles i5, and 90 cycles Read 2; target depth ~20,000 read pairs/cell.
OAK multiome (paired snRNA-seq + snATAC-seq): Nuclei from human retina were fixed in 0.3% formaldehyde, washed, and resuspended to ≥2,400 nuclei/μL. For large-scale transposition, TDE1 (Illumina) was used alongside 10x buffers (e.g., 12,000 nuclei per 15 μL reaction; 8 reactions for ~100,000 nuclei). Reactions were pooled, washed, and the entire pool was loaded to one Chromium Multiome channel. After GEM generation, barcoding, quenching, emulsions were broken and nuclei washed and aliquoted to ~4,000 nuclei/aliquot. Selected aliquots were heated (80 °C, 5 min), first-strand cDNA and ATAC fragments purified, pre-amplified (10 cycles), and split for snATAC and snRNA library construction. snATAC libraries used partial P5 and 10x sample index primers with recommended cycling; standard double-sided size selection was applied. snATAC target depth ~25,000 read pairs/cell (R1 50 cycles, i7 8, i5 24, R2 49). Cost analysis for the retinal multiome indicated ~$0.09 per nucleus for OAK versus ~$0.39 per nucleus for standard Chromium multiome.
Hashed multiplexing: Human bronchial epithelial cells were stained with TotalSeq-A hashtag antibodies across nine samples, pooled, live-sorted, and processed in parallel by OAK (with methanol fixation post-stain) and standard Chromium (unfixed). Four of 22 OAK aliquots were sequenced and compared to the standard workflow for hashing assignment and cell-type composition.
Lineage tracing and drug treatment: IPC-298 melanoma cells were transduced with a 100,000-barcode lentiviral library (MOI 0.05–0.1), sorted for eGFP, expanded (~17 doublings), and sampled for Day 0 OAK scRNA-seq (two channels; 39 aliquots; 20 sequenced). Belvarafenib (10 μM) was applied; Day 10 OAK (three channels; 44 aliquots; 12 sequenced), and standard Chromium scRNA-seq was used at Days 20 and 90 as lineage diversity declined. Lineage barcode libraries were generated from pooled OAK cDNA using semi-nested PCR.
Data processing and analysis: Illumina Bcl2Fastq demultiplexed reads per sub-library; Cell Ranger v6 processed scRNA-seq; Cell Ranger ARC v2 processed multiome data. Droplet occupancy was modeled by Poisson. Theoretical multiplet rate followed a birthday-problem-based expectation with D = number of droplets × number of aliquots (e.g., ~100,000 droplets/channel). Species-mixing estimated true multiplets by doubling observed human–mouse multiplets for a 1:1 mixture. Hashtag assignment used Cell Ranger; OAK and standard datasets were integrated with Harmony and annotated in Seurat. Retinal multiome used Seurat for snRNA-seq QC/annotation and ArchR for snATAC analysis (hg38, TSS enrichment >4, nFrags >1000), MACS3-based peak calling, marker peak detection, and browser tracks. Epiregulon inferred TF activity by integrating expression and accessibility with TF motif/ChIP-seq information. TraCe-seq analysis used Scanpy for expression analysis, decoupleR for hallmark enrichment, and PROGENY for pathway scores; differentiation scores followed a melanoma four-stage model.
Key Findings
- OAK principle and throughput: By overloading 10x Chromium droplets and then unpacking and aliquoting for secondary indexing, OAK converts droplet barcodes into combinatorial indices, dramatically increasing usable barcodes and throughput.
- Recovery and multiplets: With 150,000 cells/channel, projected recovery was 87,864 cells (59%); with 450,000 cells/channel, 223,680 cells (50%). Species-mixing estimated overall multiplet rates of 6.6% (150,000 cells; 12 aliquots) and 10.6% (450,000 cells; 40 aliquots), consistent with theoretical expectations.
- Sensitivity vs standard Chromium and other ultra-high-throughput methods: At ~15,000 reads/cell in K562, OAK detected a mean of 3,014 genes/cell versus 3,905 for standard Chromium (mildly reduced sensitivity). Reduced detection was concentrated among lowly expressed genes. OAK had lower mitochondrial read percentages and higher intronic read fractions; intronic and exonic UMI counts per gene correlated (Spearman 0.65). Mean UMI counts per gene across cells correlated strongly between OAK and Chromium (Spearman 0.92). Compared with sci-RNA-seq, SPLIT-seq, sci-CAR, Paired-seq, and scifi-RNA-seq, OAK yielded higher genes and UMIs per cell.
- Sample multiplexing: In hashed human bronchial epithelial cells, OAK assigned hashtags to 80% of cells vs 81% with standard Chromium; hashtag abundance correlated strongly (Pearson 0.98). Cell-type compositions were similar, indicating compatibility and no evident compositional bias.
- Multiome flexibility and fixation: OAK was adapted to the Chromium Multiome reagents with only secondary indexing primer changes. In K562, methanol fixation reduced TSS fragment percentages (chromatin denaturation), while formaldehyde fixation produced high-quality gene expression and accessibility metrics.
- Human retina paired snRNA-seq/snATAC-seq: From ~100,000 transposed nuclei, OAK recovered 42,632 snATAC nuclei and 46,487 snRNA nuclei, with 40,691 paired. A parallel standard Chromium multiome run recovered 5,655 snATAC, 6,510 snRNA, with 5,551 paired. OAK snRNA-seq detected a mean of 1,666 genes/cell vs 2,029 with standard; OAK snATAC-seq detected a mean of 12,539 fragments/cell vs 14,217 with standard; mean TSS enrichment was 14.71. Major retinal cell classes and subtypes were recovered, with cell-type-specific open chromatin regions, including expected accessibility at ARR3 in cones and DOK5 in DB5 bipolar cells. Epiregulon highlighted cell-type TF activities, e.g., elevated BLIMP1/PRDM1 activity in cones and ONECUT1/2 in horizontal cells.
- Lineage tracing in melanoma belvarafenib treatment: At Day 0, 144,300 cells were profiled; lineage representation in single-cell data correlated with bulk (Spearman 0.93), improving with larger sampled populations. At Day 10 and Day 20, five lineages showed >10-fold enrichment (drug tolerance), and 61 lineages were depleted (<1% of Day 20 cells). A resistant clone emerged post-Day 20 and dominated by Day 90. Resistant lineage frequency across time: 0.12% (Day 0), 0.15% (Day 10), 2.4% (Day 20), 100% (Day 90). FN1 was overexpressed in enriched lineages by Day 20; EMT hallmark genes were enriched by Day 90. Many Day 20 differentially expressed genes already differed at Day 0, suggesting pre-existing programs. Pathway dynamics within the resistant lineage showed early downregulation of MAPK, PI3K, and EGFR signatures, followed by EGFR rebound and TGF-β activation by Day 20, and subsequent MAPK/PI3K reactivation through Day 90. Resistant cells transiently shifted toward a more differentiated state at Day 10 but ultimately de-differentiated by Day 90.
- Cost and practicality: For retinal multiome, OAK’s per-nucleus cost was ~$0.09 versus ~$0.39 for standard Chromium. The aliquot-based, stepwise sequencing allows QC on subsets and preserves aliquots for future sequencing.
Discussion
OAK addresses inefficiencies of standard droplet loading by deliberately overloading droplets and subsequently resolving individual cells via combinatorial indexing across aliquots. This strategy increases usable barcodes and throughput while maintaining compatibility with commercial Chromium chemistries and analysis pipelines. The method’s sensitivity is modestly reduced vs standard Chromium for lowly expressed genes but remains competitive and surpasses other ultra-high-throughput combinatorial methods in genes and UMIs detected per cell. OAK’s compatibility with cell hashing facilitates large experimental designs across donors and conditions without biasing cell-type composition. Adapting OAK to paired snRNA-seq and snATAC-seq enables generation of high-resolution multiomic maps from complex primary tissues, as demonstrated in human retina with broad cell-type coverage and TF activity inference. The lineage tracing experiment exemplifies OAK’s power for rare-event biology: only ultra-high sampling captured a 0.12% baseline resistant lineage, revealing early tolerance programs (FN1 upregulation, EGFR/TGF-β activation) followed by MAPK/PI3K reactivation and de-differentiation. Collectively, OAK provides a scalable, cost-efficient, and flexible framework for multiomic single-cell studies, particularly valuable for rare populations and longitudinal perturbation analyses.
Conclusion
The study introduces OAK, a droplet-based combinatorial indexing method that couples microfluidic overloading with post-emulsion aliquoting to deliver ultra-high throughput single-cell profiling across RNA and chromatin modalities. OAK achieves high cell recovery, stepwise scalable sequencing, broad modality compatibility (including hashing and multiome), and cost effectiveness, while retaining strong concordance with standard workflows. Applications to human retina and melanoma lineage tracing highlight OAK’s ability to resolve rare cell types and rare resistant lineages and to dissect regulatory programs across modalities and time. Future directions include optimizing fixation and detergent conditions for fragile cells (e.g., PBMCs), expanding compatibility with emerging droplet platforms (e.g., GEM-X, Flex, inDrops, Hydrop), and extending to additional modalities such as immune repertoire, surface proteomics, and CRISPR perturbation screens.
Limitations
While OAK generally produces high-quality data, sensitivity for lowly expressed genes is modestly reduced compared to standard Chromium. Methanol fixation lowers mitochondrial reads and increases intronic content; for multiome, methanol decreased TSS fragment percentages, favoring formaldehyde fixation. The protocol currently shows limited library complexity for PBMCs, likely due to fragility and detergent sensitivity, necessitating further optimization of fixation and detergent conditions. The work primarily validates OAK on the Chromium platform; broader performance across different droplet systems remains to be systematically benchmarked. Trade-offs between loading level, multiplet rate, and sensitivity should be tuned per study objectives.
Related Publications
Explore these studies to deepen your understanding of the subject.