Over the last ten years, a number of studies have suggested that, in animal cells, translation and protein turnover play a larger role in determining the different levels at which proteins are expressed than transcription.

The major evidence supporting these claims is a weak correlation between system-wide protein and mRNA abundance measurements. A highly cited Nature article by Schwanhausser et al. in 2011 provides the most comprehensive example of such analyses. A new study just published in PeerJ by Li et al., however, questions the conclusions of these papers. This new study suggests that the major reason why protein and mRNA abundance measurements are poorly correlated is because of various types of measurement error in the protein and mRNA abundance, rather than transcription having minimal impact on protein expression levels.

Li et al. first show that Schwanhausser et al.'s protein abundances have a non linear error that leads to a dramatic underestimation of low abundance proteins, a result that has been independently supported by a separate benchmarking study by Ahrne et al. Li et al. rescale Schwanhausser et al.'s protein abundance estimates using data for housekeeping proteins and show that the rescaled data show a higher correlation with mRNA abundances than the uncorrected protein data. In addition, they estimate the impact of other sources of error on the mRNA and protein abundance measurements using direct experimental data, and they find that, when error is explicitly measured and modeled, an even greater correlation between mRNA and protein is expected. Li et al. use a second, independent strategy to determine the contribution of mRNA levels to protein expression: they show that the variance in translation rates directly measured by ribosome profiling is dramatically lower than that inferred by Schwanhausser et al., and that the measured and inferred translation rates correlate poorly. Incorporating protein and mRNA turnover data in this analysis, the results from Li et al. suggest that mRNA levels explain ~81% of the variance in protein levels, transcription 71%, RNA degradation 10%; translation 11%; and protein degradation 8%. This conclusion differs dramatically from the previous estimates of differences in mRNA levels explaining 10-40% of the differences in protein levels in the current literature.

Li et al.'s analysis provides an accurate framework for quantifying gene expression and protein abundance levels by explicitly considering sources of error. This work highlights the importance of appropriate statistical analyses of the large quantitative data sets that are increasingly being produced by experimentalists and are being used to study fundamental cellular mechanisms.