An international team of scientists has mapped 95 per cent of the genetic variation that can occur in any human, completing the first phase of the 1000 Genomes Project, which aims to produce a comprehensive public resource to help researchers study all types of genetic variation that might cause disease in humans.

The map, and an account of how the consortium behind the project compiled it, appears in a paper published in the 28 October issue of the journal Nature.

Many of the variants had been identified already, but more than half are new: so far 95 per cent of the currently measurable variants in any person have been found, and when completed, the project will have identified more than 99 per cent of all human variants.

The new map also includes some surprises.

For instance, the researchers found that on average, each of us has between 250 and 300 genetic changes that would cause a gene to stop working properly, confirming that none of us has a “perfect” genome. Another surprise was that each person carries between 50 and 100 genetic variations that have already been linked with an inherited disease.

However, because we each carry two copies of a gene, one from each biological parent, the chances are that we stay healthy, as long as one of the copies is ok.

As well as looking at common variants, the researchers looked in detail at the genomes of two families of three members each: mother, father and daughter.

They located new variants in the daughters that were not present in either of their parents. This helped them determine the rate of mutation of DNA in humans, which they worked out to be about 60 new mutations per generation (ie variants that are not passed on from the parents, but are still “faults” compared to the “perfect” genome).

The work was spread across three pilot studies where the researchers used next-generation DNA sequencing technologies to map genetic variation in 180 people.

Dr Richard Durbin, of the Wellcome Trust Sanger Institute and co-Chair of the consortium, told the media that the pilot studies have laid a “critical foundation” for studying genetic variation among humans.

The pilot studies have proven the technologies and principles, so the scientists can now press on and sequence the DNA of 2,500 people from different populations worldwide, to produce a comprehensive, publically available map of genetic variation to support future genetics research, he added.

The 1000 Genomes Project started in 2008 and is variously funded by numerous foundations and national governments; it will cost about $120 million over five years, and ends in 2012.

The project is different to many large scale genome projects in that its main aim is to create a public resource that uses samples from many human populations obtained with informed consent that allows free release of the data without restriction on use. The data already obtained has been used in numerous studies looking at the genetics of diseases.

It brings to mind a “Google map” of the human genome, that over the next two years or so will fill in with more and more detail.

Dr David Altshuler, Deputy Director of the Broad Institute of Harvard and MIT, and also a co-chair of the project, said that making the data freely available to scientists is already impacting research on rare and common diseases.

“Biotech companies have developed genotyping products to test common variants from the project for a role in disease,” said Altshuler, adding that:

“Every published study using next-generation sequencing to find rare disease mutations, and those in cancer, used project data to filter out variants that might obscure their results.”

The next phase of the project, designed as a larger scale version of the pilots, is already under way using data collected from more than 1,000 people.

Human DNA comprises four chemical units or “bases”: adenine (A), cytosine (C), guanine (G) and thymine (T). Imagine an alphabet with only four letters, and the human genome being a huge instruction manual for creating a person, but written only in this alphabet.

Genetic variations in the instruction manual or genome occur when the “bases” appear in a different order. The variations can be small scale, the equivalent of two different spellings of the same “word” in the same place in the “instruction manual”, or they can be large scale, like a section from one chapter of the manual being taken out and inserted in the middle of a different chapter.

The small scale variations, where a single base is replaced by a different one, is called an SNP (pronounced “snip”), short for single nucleotide polymorphism. Larger scale variations are where, for example, whole sections of a chromosome are duplicated or relocated to another position on the genome.

Such genetic variations among humans, even small ones, can help explain why some of us have a higher risk of developing certain diseases like diabetes or cancer than others. Also, some variations are common, and some are rare.

There are two ways that scientists can produce a “map” of all these variations: by comparing individuals, and by comparing populations.

So far the consortium has studied populations with European, West African and East Asian ancestors.

In the three pilots they sequenced the whole genome of 179 people and the protein-coding genes of 697 people.

Using the latest DNA sequencing technologies, and doing the work at the project’s nine centres in different continents, they sequenced each region several times, collecting over 4.5 terabases of DNA sequence (that’s 4.5 million, million base letters).

The map of variations has about 15 million SNPs, 1 million short insertion/deletion changes, and more than 20,000 larger scale structural variations.

The technologies the researchers used employ innovative computer methods to organize, store, analyze and share DNA sequencing data.

When the project began, DNA sequencing technology was expensive and the scientists did two things to keep the costs down: they brought together partial data on many people, and they focused only on that part of the genome that codes for proteins.

They then calibrated these methods against “gold standard” data produced more completely at greater accuracy and showed that they worked well enough, and complemented each, so the work could go ahead at a cost-effective pace.

They have decided to continue with these two methods in the full-scale project because even though the cost of DNA sequencing is coming down, it is still quite expensive.

The researchers have made the data from the pilots and the full-scale project freely available at the project website www.1000genomes.org.

“A map of human genome variation from population-scale sequencing.”
The 1000 Genomes Project Consortium.
Nature, Volume 467, pp 1061-1073, published 28 October 2010.
DOI:10.1038/nature09534

Additional sources: Nature News, Wellcome Trust Sanger Institute.

Written by: Catharine Paddock, PhD