U.S. cyber infrastructure can handle big data challenges of genomics research

The cyber infrastructure used by researchers in other disciplines can be put to work to handle the "big data" problems associated with genomic research--with some customization, according to an article published this week in the Journal of the American Medical Informatics Association.

This system of research supercomputers and other IT facilities, along with the high-speed networks that connect them, is heavily used in disciplines such as high-energy physics, astronomy, and climatology, but less so in biomedical research.

"Resources such as the Extreme Science and Discovery Environment, the Open Science Grid, and Internet2 provide economical and proven infrastructures for big data challenges, but these resources can be difficult to approach," the authors write. "Specialized web portals, support centers, and virtual organizations can be constructed on these resources to meet defined computational challenges, specifically for genomics."

They say the scale offered by this system will be vital in the future of disease research.

They point to a number of projects outlining the possibilities, including:

  • The iPlant Collaborative, which brings together plant biologists, bioinformaticians, computational scientists, and HPC professionals to address grand challenges in plant and animal sciences.
  • The Extreme Science and Engineering Discovery Environment (XSEDE), which combines supercomputers and the resources of 13 supercomputing centers to further research and education across the USA.
  • The National Center for Genome Analysis Support (NCGAS), which relies on XSEDE and the Open Science Grid, which uses advanced networks like Internet2 and National Lambda Rail to provide access to advancing computing resources.

Biomedical data comes from an array of different instruments, each with an individual workflow, the authors note. However, they add, technologies are being built to solve the problems of storing, moving and effectively analyzing all that data.

More massive databases storing reams of data in disease research including efforts focused on cancer and Alzheimer's. Data-sharing among researchers, however, presents an array of challenges. More than 70 major research and healthcare organizations from 41 countries recently formed an alliance aimed at creating a framework for sharing data from massive genomics databases worldwide.

To learn more:
- here's the article
- read about iPlant
- check out the National Center for Genome Analysis Support
- find info about XSEDE