Researchers Develop a File System for DNA-Based Storage

Researchers Develop a File System for DNA-Based Storage

Most of your cells contain a complete set of instructions to build a person stored in DNA. Scientists have worked for years on developing a storage technology that could harness the incredible density of DNA to store other types of data, but it’s been slow going. Now, a team from Microsoft Research and the University of Washington may have cracked the code to make DNA a viable storage medium.

DNA’s coding sequence is described by four base pairs: cytosine, guanine, adenine, and thymine. Those are the A,C,T, and G you always see used in DNA sequences. In your cells, bases are read three at a time, and each set of three describes a different amino acid. Put amino acids together and you get a protein. To store something else as DNA, you need to come up with a different encoding scheme, and there are several ways to do that. The real problem is how you read and retrieve the data.

To read the data you’ve encoded in DNA, you need to chop it up into shorter sequences, as there’s no way to read a full, unbroken piece of DNA. Thus, a DNA storage system needs markers that tell you where each sequence fits. You can probably see where this is going — you have to read the entire sequence to retrieve a single file. The work from Microsoft and the University of Washington has to do with adding random access to DNA storage. The researchers designed new sequence markers that can target specific files without accessing unneeded files.

Researchers Develop a File System for DNA-Based Storage

The key is finding enough marker sequences to tag all your files, and the team identified thousands that will work. That means you could amplify a specific sequence that identifies the files you want, and just sequence those. If you want to keep more files than you have markers, you simply have to keep additional separate pools of DNA. The other innovative tweak to DNA storage in the new study is the use of bit-flipping operation (XOR) in long strings of identical bases. DNA sequencing tends to get messy when there are too many repeated bases. The team used XOR to insert a random sequence to break up these long runs and make the data faster to read.

Microsoft Research and the University of Washington have basically described a file system for DNA. This gets us closer to using DNA for storage, but it’s not likely to replace your SSD. Even with the improvements, it’s slower and vastly more complicated to use than electronic storage. Still, DNA could be valuable for archival with data densities measured in hundreds of petabytes per gram.

Continue reading

The Best Smart Home Security Systems
The Best Smart Home Security Systems

Once a niche business with a few traditional players and some startups, home security systems are now a major battleground for not just security companies, but several internet giants. We round up highlights of the most popular options for 2020.

Look Up: You Can See All the Planets in Our Solar System Tonight
Look Up: You Can See All the Planets in Our Solar System Tonight

You've probably seen diagrams of the solar system that place the planets in nice, orderly lines, but the truth is they're often on the other side of the sun from Earth. We happen to be going through a period during which all the planets are visible. You just have to know where and when to look.

Meteorite Fragment Points to Missing Dwarf Planet in Early Solar System
Meteorite Fragment Points to Missing Dwarf Planet in Early Solar System

Every asteroid that falls to Earth is a potential window into the origins of the solar system, but scientists have stumbled upon something quite strange when studying a fragment of the Almahata Sitta asteroid.

Linus Tovalds Blames Intel for Killing ECC RAM in Consumer Systems
Linus Tovalds Blames Intel for Killing ECC RAM in Consumer Systems

Intel stripped ECC RAM support off its consumer products over a decade ago, and Linus Torvalds is still unhappy about it.