Engineering and Technology

Dynamic and scalable DNA-based information storage

K. N. Lin, K. Volkel, et al.

Discover DORIS, a groundbreaking DNA-based information storage system that enhances data access and manipulation! This innovative research by Kevin N. Lin, Kevin Volkel, James M. Tuck, and Albert J. Keung explores how DORIS achieves greater scalability and storage density while enabling file operations like locking and deleting, all without damaging the DNA. Dive into the future of data storage!... show more

Introduction

The study addresses the challenge of dynamically accessing and manipulating data in DNA-based storage systems, which are poised for archival applications due to DNA’s density, longevity, and energy efficiency. Traditional systems rely heavily on PCR for random access, which limits scalability, reduces encoding density due to off-target primer binding in payload regions, and consumes the database during access. The authors propose an alternative architecture using double-stranded DNA with a single-stranded overhang (ss-dsDNA) and an internal T7 promoter to satisfy three key criteria: physical scalability to extreme capacities, compatibility with dense encodings, and repeatable (non-destructive) access. The overhang serves as a physical address for file-specific separation and in-storage operations; the T7 promoter enables transcription-based, reusable reads. The work introduces DORIS (Dynamic Operations and Reusable Information Storage) as a framework to improve access specificity, increase theoretical density and capacity, reduce computational burden for address design, and support dynamic operations in DNA storage.

Literature Review

Prior work established the four steps of DNA storage (encoding, synthesis, sequencing, decoding) and focused on improving each. PCR is the predominant method for information access and scales with modifications but suffers from off-target primer binding after thermal melting, database consumption, and constraints that reduce encoding density. DNA strand displacement and toeholds have enabled computation, search, detection, and rewritable storage, but existing approaches trade off scalability, density, or reusability. Hierarchical addressing and constrained encodings attempt to prevent address-payload conflicts (e.g., using Hamming distance constraints), but at large database sizes these become computationally intractable and limit capacity. The proposed ss-dsDNA approach leverages known molecular biology techniques (magnetic separations, T7 IVT, toehold-mediated strand displacement) to overcome these limitations.

Methodology

ss-dsDNA construction: 160-nt ssDNAs with a T7 promoter sequence inset 20 nt from the 3′ end were converted to ss-dsDNA with a 20-nt single-stranded overhang (file address) via single-primer extension (Q5 polymerase). Optimization identified a 1:20 ssDNA:primer ratio and only 4 PCR-like cycles as sufficient to maximize conversion.
One-pot creation and file-specific separation: Mixed pools (e.g., three files A/B/C) were converted in one pot. File access/separation was performed isothermally by hybridizing a 5′-biotinylated oligo complementary to the overhang and retrieving the target with streptavidin magnetic beads. Separation worked efficiently at room temperature; performance depended on overhang length and temperature (longer addresses and lower temperatures improved separation).
Blocking off-target interactions: Compared DORIS vs PCR by designing strands where access oligos had internal complements within payloads. In DORIS (no denaturation), internal binding was physically blocked; PCR (with melting) allowed off-target priming and truncated amplicons.
Density/capacity simulations: Monte Carlo simulations assessed address availability and system capacity across encoding densities (payload codeword lengths determining bytes per strand) for a database of 10^9 strands. DORIS addresses were only constrained to be mutually orthogonal, not to avoid payload matches; PCR addresses had to avoid payload matches.
Repeatable access cycle: Workflow included (1) file separation via biotin-oligo and beads, (2) in vitro transcription (IVT) by T7 RNA polymerase (typically 37 °C, 2–48 h) on bead-immobilized DNA, (3) returning the file DNA to the database, (4) reverse transcription of RNA to cDNA for downstream analysis. Retention of database and file fractions was quantified by qPCR across repeated access rounds.
IVT quality/time course: Six template lengths (110–180 nt) were assessed for RNA yield and product uniformity across IVT times (2–48 h), with RT-PCR and gel analysis confirming correct lengths and increasing yields with time.
Promoter context library: An oligo pool (1088 sequences) tested all 5-nt variants upstream (NNNNN-T7, 1024 variants) and all 3-nt variants downstream (T7-NNN, 64 variants) while keeping the core T7 promoter constant. Libraries were prepared as ss-dsDNA or dsDNA, transcribed (8 h), RT-PCR amplified, and sequenced. Barcodes in payloads mapped reads to variant sequences; normalized abundances quantified transcriptional efficiency. Motif and AT-content preferences were analyzed (WebLogo, ANOVA). Error profiles were assessed.
In-storage operations: Implemented at or near room temperature using toehold-mediated strand displacement on the overhang: (i) Locking with a 50-nt lock oligo partially complementary to the overhang; (ii) Unlocking with a fully complementary 50-nt key exploiting a 30-nt toehold; (iii) Renaming by hybridizing a 40-nt oligo that converts address A to B; (iv) Deletion by blocking the address with a fully complementary 20-nt oligo. Operational temperatures (25–98 °C) and molar ratios were optimized (typical file:lock:key:accessing oligo 10:10:15; renaming 1:10:15).
Analytical methods: qPCR quantification, agarose gel electrophoresis for DNA/RNA, NanoDrop/Qubit quantitation, NGS (Amplicon-EZ) with computational analysis (python). Equilibrium binding fractions were modeled from ΔG values via OligoCalc to predict separation efficiencies across oligo lengths and temperatures. Capacity/density computed from codeword-based encodings and address availability.

Key Findings

Efficient one-pot ss-dsDNA construction: A 1:20 ssDNA:primer ratio and 4 cycles maximized conversion to ss-dsDNA (clear gel shift), enabling scalable preparation of addressed strands.
Specific, isothermal file access: Biotinylated oligos targeting the 20-nt overhang separated only the target file from a three-file mixture with high specificity at room temperature. Longer overhangs (20–25 nt) and lower temperatures (15–25 °C) improved separation efficiency, matching thermodynamic predictions.
Elimination of off-target priming: DORIS blocked internal payload binding and prevented truncated products, whereas PCR access produced undesired amplicons due to payload priming.
Higher theoretical density and capacity: Simulations with a 10^9-strand database showed DORIS address availability was independent of payload encoding density, allowing capacity to increase monotonically with denser encodings. PCR-compatible addresses collapsed to zero at higher densities due to payload conflicts, causing a sharp capacity drop.
Repeatable access with high retention: After one access, approximately 90% of the target file A’s strands were recovered across retained database plus retained file fractions. Over five repeated accesses, about 50% of file A copies remained in the database while non-accessed files B/C remained stable, implying only ~2 copies per strand are needed per five accesses (vs. PCR consuming copies of all files).
IVT yields and fidelity: T7 IVT produced uniform-length RNA matching templates from 110–180 nt. RNA yields increased with IVT time up to 48 h; clear products appeared by 2 h. Presence of RNAP was required to obtain cDNA, confirming specificity.
Promoter context controls yield: A comprehensive 1088-variant library revealed sequence preferences influencing transcriptional efficiency: generally G or A at the −5 position upstream and C or T at the +3 position downstream of the T7 promoter yielded higher abundances; upstream regions favored ~50% A/T and downstream favored low A/T content. A broad, near-continuous dynamic range of normalized abundances demonstrated potential for compositional information encoding. Error analysis showed no systematic indels/substitutions and overall error rates below synthesis errors.
In-storage operations at low temperatures: Locking blocked access to file A; unlocking with a key restored access even at room temperature due to a long toehold. Renaming converted address A to B with near-complete fidelity (only B’ accessed the file after renaming). Deletion via a 20-nt blocking oligo effectively prevented future access. Operational fidelity improved when locks were applied at elevated temperatures (e.g., 98 °C or acceptable at 45 °C).

Discussion

The DORIS architecture directly addresses key limitations of PCR-based DNA storage access by enabling isothermal, specific, and reusable file retrieval. The ss-dsDNA overhang provides a physical handle that prevents payload cross-hybridization, thereby improving access specificity and simplifying address design. This physical separation from the payload eliminates the need for restrictive encodings that limit density and capacity, and thus DORIS can support denser encodings and higher system capacities. Transcription-based readout enables multiple non-destructive accesses, preserving the database and extending its usable lifespan compared to PCR amplification workflows that consume and bias the database. The promoter context study offers a tunable layer to modulate transcript abundance, suggesting that quantitative mixture compositions could encode auxiliary information. Furthermore, DORIS supports in-storage operations (lock/unlock, rename, delete) using strand displacement principles, bringing file-system-like dynamics to DNA storage. Collectively, these capabilities position DORIS as a practical and scalable approach amenable to automation, magnetic actuation, and room-temperature operation, with significant implications for future high-capacity, dynamic DNA archives.

Conclusion

This work introduces DORIS, a simple ss-dsDNA architecture with a T7 promoter and an overhang address that enables repeatable, isothermal, and specific file access, increases theoretical density and capacity by removing address–payload conflicts, reduces computational burdens for address design, and supports in-storage operations. Experimental validation demonstrated efficient one-pot construction, high-specificity separations, repeated access with substantial retention, tunable transcription yields via promoter-proximal sequence variations, and robust file operations (lock/unlock, rename, delete). Future research should focus on scaling to much larger and more diverse strand pools, optimizing enzymes and reaction conditions to maximize file retention and minimize degradation during IVT, integrating with automated microfluidic/magnetic systems, refining encoding/error-correction schemes under DORIS’s relaxed constraints, and expanding the repertoire of in-storage computations and operations.

Limitations

Demonstrations used small databases (e.g., three files and 10^9-strand simulations); behavior in highly diverse pools (≥10^12 strands) requires empirical validation.
IVT at 37 °C and prolonged reaction times reduced retained file amounts due to unbinding and potential DNA degradation; recovery is not yet fully lossless.
Inclusion of a T7 promoter reduces raw payload length per strand, although overall density/capacity gains from relaxed constraints likely outweigh this cost.
Operational fidelity for locking depends on application temperature; secondary structures can cause leakage at low-temperature locking.
Capacity simulations used conservative codeword densities and limited address searches; real-world synthesis/sequencing constraints and larger search spaces may alter absolute capacities.
Error and efficiency metrics were characterized for specific lengths and conditions; broader parameter sweeps and end-to-end decoding performance in large-scale systems remain to be assessed.

Related Publications

Explore these studies to deepen your understanding of the subject.

Chemistry

Low cost DNA data storage using photolithographic synthesis and advanced information reconstruction and error correction

P. L. Antkowiak, J. Lietard, et al.

Medicine and Health

Infant cries convey both stable and dynamic information about age and identity

M. Lockhart-bouron, A. Anikin, et al.

Education

Enhancing senior high school student engagement and academic performance using an inclusive and scalable inquiry-based program

L. D. Huyer, N. I. Callaghan, et al.

Engineering and Technology

Scalable and highly selective graphene-based ion-exchange membranes with tunable permselectivity

A. Aixalà-perelló, A. Pedico, et al.

Listen, Learn & Level Up

Over 10,000 hours of research content in 25+ fields, available in 12+ languages.

No more digging through PDFs, just hit play and absorb the world's latest research in your language, on your time.

listen to research audio papers with researchbunny