Genome Research has published six articles describing recent advancements from the modENCODE Project. Initially launched in 2007, the goal of the modENCODE Project is to comprehensively characterize functional genomic elements in two model organisms, the fly Drosophila melanogaster, and the worm Caenorhabditis elegans. Comparative analyses in these well-established systems are expected to guide efforts to further our understanding of human biology. The published articles present new genomic advances shedding light upon embryonic development, DNA replication, transcriptional regulation, and more. Highlights from the modENCODE Project articles can be found below.

1. Duplicated development after metamorphosis

It has long been hypothesized that embryonic development is highly conserved across organisms, but how much conservation, especially over large evolutionary time scales, is unknown. Jingyi Jessica Li and colleagues compared developmental gene expression between two very different model organisms which diverged 600 million years ago - the fruit fly D. melanogaster and the nematode worm, C. elegans - across 30 and 35 distinct developmental stages in each species, respectively.

Despite many differences between the two species, including morphology, size, lifecycle, and the relative proportion of each sex (99.5% of adult C. elegans are hermaphrodites), the authors found that worms and flies share conserved gene expression patterns during development. To examine this, the authors developed a novel statistical approach to identify "stage-associated" genes, or genes that have relatively high expression in a given developmental stage relative to other stages, and then looked for these genes in the other organism. Developmental stages in worm could be matched to their respective developmental stages in fly, and surprisingly, in some cases one stage in worm corresponded to two stages in fly. Flies, unlike worms, undergo metamorphosis during which most of the larval cells die, and cells must proliferate to form the adult animal. "It appears that the fly recapitulates a portion of the developmental program during development, which was thoroughly surprising, though in retrospect one can see how it makes sense," said co-corresponding author Steven Brenner.

Reference: Li JJ, Huang H, Bickel PJ, Brenner SE. 2014. Comparison of D. melanogaster and C. elegans developmental stages, tissues, and cells by modENCODE RNA-seq data. Genome Research, doi: 10.1101/gr.170100.113

2. Chromatin cues for DNA replication

Our genomes are passed on to the next generation by replicating our DNA. The fidelity of this process is critical to avoid mutations that cause diseases, including predisposition to cancer. Replication is initiated independently and non-simultaneously at hundreds of thousands of sites in our genome by origin recognition complexes (ORC), and a stretch of DNA can be categorized into 'early' or 'late' replicating. DNA is assembled into chromatin, including histone proteins with various chemical modifications that direct transcription and compaction of the DNA. The effect of these modifications on DNA replication, however, is unclear. "DNA replication has to be excruciatingly accurate and coordinated with other ongoing DNA-templated processes like transcription," said David MacAlpine.

MacAlpine and colleagues used a technique known as Repli-seq to characterize newly replicated regions of DNA with next generation sequencing. Early replicating DNA segments, which make up about one third of the genome, are correlated with active histone marks, high gene density, and gene expression. Late replicating segments have an absence of active histone marks, are enriched in repressive histone modifications, and are in gene poor regions. Importantly, decreasing levels of the histone mark H4K16 acetylation on the male X chromosome, which serves to upregulate transcription from the entire X chromosome for dosage compensation, not only reduces transcription but also shifts DNA replication from early to late. The number of ORC binding to the X chromosome remains unchanged, leading the authors to attribute this shift in timing to reduced origin activation at ORC binding sites.

"The modENCODE Project, and the tremendous resource of high quality genomic datasets associated with the Project, have been instrumental in enabling us to begin to understand the rules by which the local chromatin environment regulates the DNA replication program in higher eukaryotes," said MacAlpine.

Reference: Lubelsky Y, Prinz JA, DeNapoli L, Li Y, Belsky JA, MacAlpine DM. 2014. The DNA replication and transcription programs respond to the same chromatin cues. Genome Research, doi: 10.1101/gr.160010.113

3. Genomic blueprint for gene expression

Although genome sequences for a growing number of organisms are available, the function of the vast majority of the genome is unknown. Specifically, the functions of noncoding regions of the genome, which can regulate gene expression, are poorly understood.

Building upon previous work from the modENCODE Consortium, Kevin White and colleagues surveyed genome-wide binding profiles of 84 diverse transcription regulatory factors, or TRFs. The authors identified over 400 million binding sites in the fly genome, most of which occurred near gene promoters but a considerable fraction also occurred at distal sites. "Annotation of regulatory elements and identification of the transcriptional regulators targeting these elements are key steps in understanding how a given cell interprets its genetic blueprint," said Matthew Slattery, first author of the study.

Approximately 10% of the identified TRF regions were bound by 14 or more different factors; these regions, known as 'HOT' regions, are associated with active chromatin and genes that are highly and ubiquitously expressed, whereas 'COLD' regions (regions bound by 1-3 TRFs), are associated with inactive chromatin environments. HOT regions were also more likely to drive gene expression in multiple cells lines. Interestingly, unlike most TRFs, which bind in active chromatin regions, several important developmental TRFs bind at inactive regions, including regions that repressed by Polycomb protein. The genes encoding the respective TRFs are they themselves repressed by Polycomb, suggesting a self-contained regulatory network in embryonic development.

"The regulatory networks that convert DNA sequence into a functional multicellular organism are vast and complex. Our work highlights select nodes and connections within this network, but most of the topology remains unexplored," said Slattery.

Reference: Slattery M, Ma L, Spokony RF, Arthur RK, Kheradpour P, Kundaje A, Negre N, Crofts A, Ptashkin R, Zieba J, Ostapenko A, Suchy S, Victorsen A, Jameel M, Grundstad AJ, Gao W, Moran JR, Rehm EJ, Grossman RL, Kellis M, White KP. 2014. Diverse patterns of genomic targeting by transcriptional regulators in Drosophila melanogaster. Genome Research, doi: 10.1101/gr.168807.113

4. Comparative Drosophila transcriptomics

Also in this issue, the modENCODE Consortium presents an improved genome-wide annotation of fly transcripts. RNA sequencing from multiple tissues and developmental stages of 15 different Drosophila species demonstrated that a large fraction of transcripts are evolutionarily conserved across species. This study also presents comparative methods that may prove useful in improving human genome annotation.

Reference: Chen ZX, Sturgill D, Qu J, Jiang H, Park S, Boley N, Suzuki AM, Fletcher AR, Plachetzki DC, FitzGerald PC, et al. 2014. Comparative validation of the D. melanogaster modENCODE transcriptome annotation. Genome Research, doi: 10.1101/gr.159384.113

In addition to the four articles highlighted above, the following modENCODE articles will also be published in the issue:

Arthur RK, Ma L, Slattery M, Spokony RF, Ostapenko A, Negre N, White KP. 2014. Evolution of H3K27me3-marked chromatin is linked to gene expression evolution and to patterns of gene duplication and diversification. Genome Research, doi: 10.1101/gr.162008.113

Wen J, Mohammed J, Bortolamiol-Becet D, Tsai H, Robine N, Westholm JO, Ladewig E, Dai Q, Okamura K, Flynt AS, et al. 2014. Diversity of miRNAs, siRNAs, and piRNAs across 25 Drosophila cell lines. Genome Research, doi: 10.1101/gr.161554.113