A new initiative in the Public Library of Science is attempting to harness new technology to create a new, efficient process of archiving information about the human genetic sequence. This work is reported in an article released on July 7, 2008 in the open access journal PLoS Biology.

In sequencing DNA, scientists determine a long list of information encoded in the form of a four letter alphabet, with each letter corresponding to a specific chemical in the DNA molecule. These sequences can be very long and complicated: the human genome, for example, consists of three billion of these letters. Once this sequence is determined, many scientific advances have made it possible to interpret what biochemical products result from its use, usually by using concepts in biochemistry and by comparing the genetic sequences of similar organisms. However, this work is painstaking, slow, and requires a high level of expertise.

The vital information about a given gene is quite extensive. Each identified gene in the genome has a name, sequence, position on a specific chromosome, protein, interaction partners, and many more characteristics that could influence its function and structure. Once the information is actually obtained about a gene, the options to access the work done by scientists on it can be limited.. While the presently existing libraries of information, sometimes called gene portals, are considered extremely reliable — even definitive — are usually filled with information from only a few major contributors which must be reviewed and updated by specific experts. Since information about these genes is actually coming from many different researchers working independently, resources to collect the information together for efficient access is important.

In this project, researchers in San Diego, California, and St. Louis, Missouri, are attempting to use the Wikipedia to collect information, including citations, about specific genes in the human genome and their associated proteins. Wikipedia is a web based information system that relies on the contributions and audits of its users to accumulate and edit information. This team has attempted to establish a “Gene Wiki” which will allow a network of articles to be created by a computer program, then enhanced by user comments. This information would, cumulatively, work towards describing the relationships between and functions of all human genes. The researchers hope that this would allow a more flexible accumulation of scientific information, as all readers would also be able to edit and add to the Gene Wiki pages.

To stimulate this project, a system has been developed to automatically post information from the existing gene portals as ‘stubs’ on Wikipedia. This program downloads the information from one system, formats in the Wikipedia format, and then posts the information on Wikipedia as necessary. The authors are confident that this will seed more detailed information from scientists who find these stubs on Wikipedia. Since the start of their efforts, the absolute number of edits on genes in the mammalian genome has doubled. They encourage users to seek gene information on Wikipedia to observe this new phenomenon.

About PLoS Biology

All works published in PLoS Biology are open access. Everything is immediately available”to read, download, redistribute, include in databases, and otherwise use”without cost to anyone, anywhere, subject only to the condition that the original authorship and source are properly attributed. Copyright is retained by the authors. The Public Library of Science uses the Creative Commons Attribution License.

A gene wiki for community annotation of gene function.
Huss JW III, Orozco C, Goodale J, Wu C, Batalov S, et al.
PLoS Biol 6(7):e175.
doi:10.1371/journal.pbio.0060175
Click Here for Full Length Article

Written by Anna Sophia McKenney