Small non-coding RNAs can be used to predict if individuals have breast cancer conclude researchers who contribute to The Cancer Genome Atlas project. The results, which are published in EMBO Reports, indicate that differences in the levels of specific types of non-coding RNAs can be used to distinguish between cancerous and non-cancerous tissues. These RNAs can also be used to classify cancer patients into subgroups of individuals that have different survival outcomes.

Small non-coding RNAs are RNA molecules that do not give rise to proteins but which may have other important functions in the cell. "For many years, small non-coding RNAs near transcriptional start sites have been regarded as 'transcriptional noise' due to their apparent chaotic distribution and an inability to correlate these molecules with known functions or disease," explains Steven Jones, one of the lead authors of the study, a professor at Simon Fraser University and the University of British Columbia, and a distinguished scientist at the BC Cancer Agency. "By using a computational approach to analyze small RNA sequence information that we generated as part of The Cancer Genome Atlas project, we have been able to filter through this noise to find clinically useful information," adds Jones. "The data from our experiments show that genome-wide changes in the expression levels of small non-coding RNAs in the first exons of protein-coding genes are associated with breast cancer."

The scientists were able to distinguish between the many different small non-coding RNAs that are found near the transcriptional start sites of genes in healthy individuals and patients with breast cancer (in this case, breast invasive carcinoma). They mapped these RNA molecules to specific locations on the DNA sequence and looked for correlations between the non-coding RNAs that were strongly expressed and the disease status of the patients from whom the tissue samples were isolated. The researchers then tested if the expression of the small RNAs in genomic locations that they were able to identify could be used to predict the presence of disease in another group of tissue samples obtained from patients known to have breast cancer. The test efficiently predicted the correct disease status for the samples in the new study group. "The potential to predict cancer status is restricted to only a subset of the many small non-coding RNAs found near transcription start sites of the genes. What's more, these RNA locations are highly enriched with CpG islands," says Athanasios Zovoilis, the first author of the study. CpG islands are genomic regions that contain a high frequency of cytosine and guanine. The presence of these RNAs in these islands may implicate their involvement with DNA methylation processes and the onset of disease but additional experiments are needed to explore and prove this link.

"This is the first time that small non-coding RNAs near the transcription start site of genes have been associated with disease," says Jones. "Further work is required but based on our data we believe there is considerable diagnostic potential for these small non-coding RNAs as a predictive tool for cancer. In addition, they may help us understand better the mechanisms underlying oncogenesis at the epigenetic level and lead to potential new drugs employing small non-coding RNAs." The researchers also note that this class of small non-coding RNAs may be useful in predicting the existence of other types of cancer or disease.

The generation of data by The Cancer Genome Atlas project, which now provides access to large amounts of sequencing information for diseased and normal tissues, made the work possible. The Cancer Genome Atlas is now one of the largest resources for small non-coding RNAs in existence.