The medical community has increasingly turned to genetic information to understand, treat and prevent disease in humans; but analyzing information from a single genome can take many months. Now, researchers working with one of the fastest supercomputers in the world are able to get data on 240 complete genomes in only 2 days.
The researchers, from the University of Chicago, have published results of their analysis in the journal Bioinformatics.
Aptly named Beagle - in reference to the ship that accompanied Charles Darwin on his well-known scientific journey in 1831 - the computer is based at Argonne National Laboratory in Illinois. Housed in the Theory and Computing Sciences building, Beagle supports computation, simulation and data analysis for the biomedical research community.
The team notes that the declining cost of producing DNA sequences is resulting in an increase in whole genome sequencing. But this currently brings about a "computational bottleneck" due to the limited power of analyzing several genomes at once.
Rather than looking at genomes one at a time, the supercomputer can process many genomes simultaneously.
"It converts whole genome sequencing, which has primarily been used as a research tool, into something that is immediately valuable for patient care," says first author Megan Puckelwartz.
Dr. Elizabeth McNally, the AJ Carlson Professor of Medicine and Human Genetics and director of the Cardiovascular Genetics clinic at the University of Chicago Medicine, says:
"This is a resource that can change patient management and, over time, add depth to our understanding of the genetic causes of risk and disease."
Why is whole genome sequencing so useful?
The team says that because the genome is so extensive, clinical geneticists have opted for exome sequencing, which involves looking closely at less than 2% of the genome, in regions that code for proteins.
Although 85% of mutations that cause diseases are located in these regions, the other 15% of clinically important mutations come from non-coding regions. Formerly referred to as "junk DNA," these mutations from non-coding regions are now known to have significance.
But analyzing these regions requires sequencing the whole genome.
In order to test Beagle, Dr. McNally and colleagues used raw sequencing data from 61 human genomes and analyzed it on the supercomputer.
Using only one quarter of Beagle's total capacity and publicly available software, the team found that it improved accuracy and greatly accelerated speed.
Dr. McNally says these improvements reduce the price per genome, adding that "the price for analyzing an entire genome is less than the cost of looking at just a fraction of a genome."
Additionally, the team says this method of analysis will relieve the bottleneck scientists have been experiencing with cheaper and faster genetic sequencing.
Findings have 'immediate medical applications'
Dr. McNally says their findings have medical applications that can immediately be applied at the Cardiovascular Genetics clinic, where they rely on looking at genes from an initial patient and their family members to understand, treat or prevent disease.
"We start genetic testing with the patient, but when we find a significant mutation we have to think about testing the whole family to identify individuals at risk," she says.
Dr. McNally adds:
"In 2007, we did our first five-gene panel. Now we order 50 to 70 genes at a time, which usually gets us an answer. At that point, it can be more useful and less expensive to sequence the whole genome."
By studying these genomes in light of patient and family histories, she says they can gain more knowledge about inherited disorders.
"By paying close attention to family members with genes that place them at increased risk, but who do not yet show signs of disease, we can investigate early phases of a disorder. In this setting, each patient is a big-data problem," she adds.
Medical News Today recently reported on a study published in PLOS Genetics that suggested some genetic variants could indicate the presence of rare genetic mutations that have yet to be discovered. The researchers said their discovery may point to a genetic "missing link" that could uncover previously unknown mechanisms behind common diseases.