DNA thumb drives: How we’ll store holographic videos of cute cats in 2030.

I was reading the news today, and I came across an article that researchers have proven the ability to store 700 terabytes of data in a gram of DNA.

This is amazing! I do not think that anybody as recently as twenty five years ago would have dreamed of where this is going. Back in 1990 the Human Genome Project set out to begin sequencing the entire human genome. It wasn’t until 1995 that a genome had been sequenced- only a bacterial genome much smaller than ours! Finally in 2003, after spending $3 billion the human genome had been sequenced by the NIH project. Some people did have an idea that gene sequencing could be done more quickly and efficiently, and Celera Genomics set out to race the Human Genome Project and sequence the genome first. At a cost of only $300 million, Celera was able to reach a draft sequence in January of 2000.

It’s not surprising that a private company taking some risks could beat a government program on cost and speed, but what is really amazing is that the drop in price didn’t slow down- it sped up. Recently it sped up a lot. For nearly a decade the cost of sequencing a given number of base pairs of DNA was falling at about the same speed as the cost of putting transistors onto microchips, improving at the same rate as Moore’s Law. In early 2008, a new generation of gene sequencers was introduced, and then things went nuts.

Today it is now possible to have your entire genome- your personal genome- sequenced for as little as $4,000, and it can be done as quickly as twenty seven hours. That’s on Illumina equipment, and they have many competitors working to unseat them with new techniques. Regardless of whether those techniques come along this year or next, I have heard talks given where folks who know the workings of the Illumina systems are willing to say that they are able to squeeze more performance and lower cost out of their existing technology with fairly boring incremental improvements, and most people in the business recognize that the $1,000 genome is just around the corner, and the $100 genome is not a crazy idea.

Probably the coolest part of all of this is what it could mean for medicine. The ability to quickly and cheaply read somebody’s entire genome would make it far easier to test for genetic diseases. Today you have to suspect a particular disease and order a test that targets just that chromosome. I once had a blood test to determine whether or not I was a carrier for Spinal Muscular Atrophy.  The geneticist ordered a test to detect the most common mutation of a specific gene to see if I was a carrier. Under certain circumstances I would have been referred for a second test that would do more extensive sequencing of my copies of of genes that produce survival motor neuron protein. These tests will not tell me anything else about my health or what I may or may not pass on to offspring.

When whole-genome diagnostic sequencing becomes common, then it will be possible to say “While we’re looking at your DNA, lets go ahead and see if you’re a carrier for anything else.. or if you’re pre-symptomatic with anything serious”. There are certainly ethical issues when it comes to messing around with our genes, but how much could it improve our quality of life when we can learn ten years before symptoms that we’re likely to be quite ill by the age of 40 with some rare disorder? What if we can devise personalized approaches that can nip the disease in the bud and let us live long and healthy lives? This could mean either chemical drugs or therapeutic insertion of genes into our living cells to change our genetic fate. My only hope is that if such things become possible, the prices will continue to drop at an extraordinary pace so that these treatments do not remain a privilege of the super rich.

Saving lives and improving quality of life is great, but I mentioned holographic videos of cats earlier and I don’t want to disappoint anybody. As the article about the Harvard team said, we are now able to store 700 terabytes of data in one gram of DNA. Isn’t that cool?

I need to go XKCD  on you for a minute to  explain this data density. The highest capacity USB drive that you can buy today (or at least pre-order) holds one terabyte and weighs about 30 grams. Imagine that you have a DNA based thumb drive that holds about 12 grams of DNA in an 18 gram container. You could put that DNA drive on your keychain, and it wouldn’t be bothered by magnets, and within certain limits the data would persist for thousands of years. Such a drive would store about 8,400 terabytes, and if you it would take 252 kilograms worth of flash memory based thumb drives. That’s equivalent to the weight of Andre the Giant at his peak fighting weight holding this rock:


How much data is that? The Blu-Ray FAQ says a 50 gb dual-layer DVD can hold “over nine hours” of HD video. Since there are 1,024 gigabytes in a terabyte and we’re storing 8,400 terabytes, that’s 1,548,289 hours of full HD video! You could watch cute cat videos for the next 176 years without pausing or seeing the same thing twice.