Amazon, NIH put 1000 Genomes Project in the cloud

It took 10 years and billions of dollars to sequence and publish the first human genome. Now, anyone with an Internet connection can, in theory, access 200 terabytes of genomic data, including DNA sequenced from more than 1,700 individuals.

Amazon and the U.S. National Institutes of Health (NIH) announced today that the complete 1000 Genomes Project will be available on Amazon Web Services as a public data set. The announcement, made at the White House Big Data Summit, will make the largest collection of human genetics available free of charge, according to a TechCrunch article.  

"Previously, researchers wanting access to public data sets such as the 1000 Genomes Project had to download them from government data centers to their own systems, or have the data physically shipped to them on discs," Lisa D. Brooks, Ph.D., Program Director for NIH's Genetic Variation Program, said in an announcement about the project. "This process took a long time, and that's assuming a lab had the bandwidth to download the data and sufficient storage and compute infrastructure to hold and analyze the data once they had it."

Putting the data in the cloud "means researchers and labs of all sizes and budgets have access to the complete 1000 Genomes Project data and can immediately start analyzing and crunching the data without the investment it would normally require in hardware, facilities and personnel," Deepak Singh, Ph.D., principal product manager for Amazon Web Services, added.

Amazon is providing a useful service, but could end up making money on the deal, notes The New York Times' Bits blog

"Manipulating this much information requires a lot of computing power, and Amazon will be charging its regular rates for use of computers," according to the post.

"AWS recently created for a pharmaceutical client a virtual supercomputer of 30,000 semiconductor 'cores,' for which it charged $1,279 an hour," according to the post. But put into perspective, it's still a bargain; Amazon Web Services' machine executed the equivalent of 11 years of work on the company's computers in a few hours, according to the post.

"That is still significantly less than buying the kind of supercomputers needed for most big genetic research," the author notes.  

To learn more:
- see the Amazon/NIH announcement
- check out the Bits blog post
- read the TechCrunch article
- find the genomic dataset on Amazon