Researchers Develop a File System for DNA-Based Storage

Researchers Develop a File System for DNA-Based Storage

Most of your cells contain a complete set of instructions to build a person stored in DNA. Scientists have worked for years on developing a storage technology that could harness the incredible density of DNA to store other types of data, but it’s been slow going. Now, a team from Microsoft Research and the University of Washington may have cracked the code to make DNA a viable storage medium.

DNA’s coding sequence is described by four base pairs: cytosine, guanine, adenine, and thymine. Those are the A,C,T, and G you always see used in DNA sequences. In your cells, bases are read three at a time, and each set of three describes a different amino acid. Put amino acids together and you get a protein. To store something else as DNA, you need to come up with a different encoding scheme, and there are several ways to do that. The real problem is how you read and retrieve the data.

To read the data you’ve encoded in DNA, you need to chop it up into shorter sequences, as there’s no way to read a full, unbroken piece of DNA. Thus, a DNA storage system needs markers that tell you where each sequence fits. You can probably see where this is going — you have to read the entire sequence to retrieve a single file. The work from Microsoft and the University of Washington has to do with adding random access to DNA storage. The researchers designed new sequence markers that can target specific files without accessing unneeded files.

Researchers Develop a File System for DNA-Based Storage

The key is finding enough marker sequences to tag all your files, and the team identified thousands that will work. That means you could amplify a specific sequence that identifies the files you want, and just sequence those. If you want to keep more files than you have markers, you simply have to keep additional separate pools of DNA. The other innovative tweak to DNA storage in the new study is the use of bit-flipping operation (XOR) in long strings of identical bases. DNA sequencing tends to get messy when there are too many repeated bases. The team used XOR to insert a random sequence to break up these long runs and make the data faster to read.

Microsoft Research and the University of Washington have basically described a file system for DNA. This gets us closer to using DNA for storage, but it’s not likely to replace your SSD. Even with the improvements, it’s slower and vastly more complicated to use than electronic storage. Still, DNA could be valuable for archival with data densities measured in hundreds of petabytes per gram.

Continue reading

The Xbox Series S Is Handicapped by Its Storage Capacity
The Xbox Series S Is Handicapped by Its Storage Capacity

The Xbox Series S has been favorably received, for the most part, but the console's low base storage makes the Xbox Series X a better value for a lot of people.

Google Pixel Slate Owners Report Failing Flash Storage
Google Pixel Slate Owners Report Failing Flash Storage

Google's product support forums are flooded with angry Pixel Slate owners who say their devices are running into frequent, crippling storage errors.

Seagate Announces Its Own RISC-V Cores for Future Storage Controllers
Seagate Announces Its Own RISC-V Cores for Future Storage Controllers

To hit its 50TB per-drive target over the next few years, Seagate decided it needed a custom storage controller. RISC-V offered a solution.

Google Kills Free Photo Storage, Changes What Counts Toward Storage Caps
Google Kills Free Photo Storage, Changes What Counts Toward Storage Caps

Google has announced some significant changes to Photos, especially if you use the service for automatic backup.