Search is Powered by Google
Genetics News

New Gene Prediction Method Capitalizes On Multiple Genomes

Main Category: Genetics
Also Included In: Biology / Biochemistry
Article Date: 22 Dec 2007 - 4:00 PDT

email icon email to a friend   printer icon printer friendly   write icon view / write opinions   rate icon rate article
Current Article Ratings:

Patient / Public:not yet rated

Health Professional:5 stars

5 (1 votes)

Article Opinions: 0 posts

Researchers at Stanford University report in the online open access journal, Genome Biology, a new approach to computationally predicting the locations and structures of protein-coding genes in a genome. Gene finding remains an important problem in biology as scientists are still far from fully mapping the set of human genes. Furthermore, gene maps for other vertebrates, including important model organisms such as mouse, are much more incomplete than the human annotation. The new technique, known as CONTRAST (CONditionally TRAined Search for Transcripts), works by comparing a genome of interest to the genomes of several related species.

CONTRAST exploits the fact that the functional role protein-coding genes play a specific part within a cell and are therefore subjected to characteristic evolutionary pressures. For example, mutations that alter an important part of a protein's structure are likely to be deleterious and thus selected against. On the other hand, mutations that preserve a protein's amino acid sequence are normally well tolerated. Thus, protein-coding genes can be identified by searching a genome for regions that show evidence such patterns of selection. However, learning to recognize such patterns when more than two species are compared has proved difficult.

Previous systems for gene prediction were able to effectively make use of one additional 'informant' genome. For example, when searching for human genes, taking into account information from the mouse genome led to a substantial increase in accuracy. But, no system was able to leverage additional informant genomes to improve upon state-of-the-art performance using mouse alone, although it was expected that adding informants would make patterns of selection clearer. CONTRAST solves this problem by learning to recognize the signature of protein-coding gene selection in a fundamentally different way from previous approaches. Instead of constructing a model of sequence evolution, CONTRAST directly 'learns' which features of a genomic alignment are most useful for recognizing genes. This approach leads to overall higher levels of accuracy and is able to extract useful information from several informant sequences.

In a test on the human genome, CONTRAST exactly predicted the full structure of 59% of the genes in the test set, compared with the previous best result of 36%. Its exact exon sensitivity of 93%, compared with a previous best of 84%, translates into many thousands of exons correctly predicted by CONTRAST but missed by previous methods. Importantly, CONTRAST's accuracy using a combination of eleven informant genomes was significantly higher than its accuracy using any single informant. The substantial advance in predictive accuracy represented by CONTRAST will further efforts to complete protein-coding gene maps for human and other organisms.

Further information about existing gene-prediction methods and the advance CONTRAST brings to the field can be found in a minireview by Paul Flicek, which accompanies the article by Batzoglou and colleagues.

----------------------------
Article adapted by Medical News Today from original press release.
----------------------------

1) CONTRAST: A discriminative, phylogeny-free approach to multiple informant de novo gene prediction Samuel S Gross, Chuong B Do, Marina Sirota and Serafim Batzoglou Genome Biology (in press)

2) Click here to link to the article. All articles are available free of charge, according to BioMed Central's open access policy.

3) Genome Biology publishes articles from the full spectrum of biology. Subjects covered include any aspect of molecular, cellular, organismal or population biology studied from a genomic perspective, as well as genomics, proteomics, bioinformatics, genomic methods (including structure prediction), computational biology, sequence analysis (including large-scale and cross-genome analyses), comparative biology and evolution. Genome Biology has an impact factor of 7.12.

4) BioMed Central is an independent online publishing house committed to providing immediate access without charge to the peer-reviewed biological and medical research it publishes. This commitment is based on the view that open access to research is essential to the rapid and efficient communication of science.

Source: Charlotte Webber
BioMed Central




Customized Homepage Weekly Newsletters Daily News Alerts
Home About Us News Licensing Free Website Feeds Free Tools & Content Links Tell a Friend Accessibility Help / FAQ Article Submission Contact Us
Psychiatry Urology
Bipolar Diabetes Schizophrenia

add medical news today to your facebook

medical news gadget

Add to Google


developers
website gadget code
website news code
medical news rss feed links


MedReader RSS Reader

customize your homepage


These are the most read articles from this news category for the last 6 months:
Top Article Star
HIV/AIDS Pandemic Started 100 Years Ago
02 Oct 2008
An international team of scientists investigating African human tissue samples preserved for nearly 50 years have suggested that the HIV/AIDS pandemic started around 100 years ago, between 1884 and 1924, at the same time...


When Your Cycle Becomes a Major Headache
When Your Cycle Becomes a Major Headache

Cathy's gets as many as 12 to 15 headaches a month and they are all associated with her menstrual cycle. Migraines like hers tend to last longer and be more severe than other migraines. Figuring out what was triggering her headaches helped Cathy and her doctor come up with a successful treatment plan.

more videos are available in our health videos section.