Researchers with the U.S. Department of Energy (DOE)'s Lawrence Berkeley National Laboratory (Berkeley Lab) have achieved a major advance in understanding how genetic information is transcribed from DNA to RNA by providing the first step-by-step look at the biomolecular machinery that reads the human genome.

"We've provided a series of snapshots that shows how the genome is read one gene at a time," says biophysicist Eva Nogales who led this research. "For the genetic code to be transcribed into messenger RNA, the DNA double helix has to be opened and the strand of gene sequences has to be properly positioned so that RNA polymerase, the enzyme that catalyzes transcription, knows where the gene starts. The electron microscopy images we produced show how this is done."

Says Paula Flicker of the National Institutes of Health's National Institute of General Medical Sciences, which partly funded the research, "The process of transcription is essential to all living things so understanding how it initiates is enormously important. This work is a beautiful example of integrating multiple approaches to reveal the structure of a large molecular complex and provide insight into the molecular basis of a fundamental cellular process."

Nogales, who holds joint appointments with Berkeley Lab, the University of California (UC) at Berkeley, and the Howard Hughes Medical Institute (HHMI), is the corresponding author of a paper describing this study in the journal Nature. The paper is titled "Structural visualization of key steps in human transcription initiation." Co-authors are Yuan He, Jie Fang and Dylan Taatjes.

The fundamental process of life by which information in the genome of a living cell is used to generate biomolecules that carry out cellular activities is the so-called "central dogma of molecular biology." It states that genetic information flows from DNA to RNA to proteins. This straightforward flow of information is initiated by an elaborate system of proteins that operate in a highly choreographed fashion with machine-like precision. Understanding how this protein machinery works in the context of passing genetic information from DNA to RNA (transcription) is a must for identifying malfunctions that can turn cells cancerous or lead to a host of other problems.

Nogales and members of her research group used cryo-electron microscopy (cryo-EM), where protein samples are flash-frozen at liquid nitrogen temperatures to preserve their structure, to carry out in vitro studies of reconstituted and purified versions of the "transcription pre-initiation complex." This complex is a large assemblage of proteins comprised of RNA polymerase II (Pol II) plus a class of proteins known as general transcription factors that includes the TATA-binding protein (TBP), TFIIA, TFIIB, TFIIF, TFIIE and TFIIH. All of the components in this complex work together to ensure the accurate loading of DNA into Pol II at the start of a gene sequence.

"There's been a lack of structural information on how the transcription pre-initiation complex complex is assembled, but with cryo-EM and our in vitro reconstituted system we've been able to provide pseudo-atomic models at various stages of transcription initiation that illuminate critical molecular interactions during this step-by-step process," Nogales says.

The in vitro reconstituted transcription pre-initiation complex was developed by Yuan He, lead author on the Nature paper and a post-doctoral student in Nogales's research group.

"This reconstituted system provided a model for the sequential assembly pathway of transcription initiation and was essential for us to get the most biochemically homogenous samples," Nogales says. "Also essential was our ability to use automated data collection and processing so that we could generate all our structures in a robust manner."

Among the new details revealed in the step-by-step cryo-EM images was how the transcription factor protein TFIIF engages Pol II and promoter DNA to stabilize both a closed DNA pre-initiation complex and an open DNA-promoter complex, and also how it regulates the selection of a transcription start-site.

"Comparing the closed versus open DNA states led us to propose a model that describes how DNA is moved during the process of promoter opening," says He. "Our studies provide insight into how THIIH uses ATP hydrolysis as a source of energy to actually open and push the DNA to the active site of Pol II."

Nogales and her colleagues plan to further investigate the process of DNA loading into Pol II, as well as to include additional transcription factors into the assembly that are required for regulation of gene expression.

"Our goal is to actually build a structural model of the entire - more than two million daltons - protein machinery that recognizes and regulates all human DNA promoters," Nogales says. "For now we have the structural framework that's been needed to integrate biochemical and structural data into a unified mechanistic understanding of transcription initiation."

This research was funded by the National Institute of General Medical Sciences and the National Cancer Institute under NIH grant numbers GM063072 and CA127364.

YouTube video of the step-by-step assembly of the transcription pre-initiation complex: