Research Paper Highlight:
Pervasive functional translation of noncanonical human open reading frames
Jin Chen, J. Zachery Cogan, James K. Nuñez, Alexander P. Fields, Britt Adamson, Daniel N. Itzhak, Jason Y. Li, Matthias Mann, Manuel D. Leonetti, Jonathan S. Weissman.
Science volume 367, Issue 6482, pp. 1140-1146 (2020); DOI: 10.1126/science.aay0262
From the perspective of gene prediction, it is really important to be able to correctly identify open reading frames (ORFs) that are likely to code for functional proteins. In the simplest of cases the ORF starts with a AUG codon, and ends with a stop codon (UAA/UGA/UAG) and the nucleotide count is divisible by three. Even then, one of the biggest challenges for curation and function annotation, has been to deal with the exceedingly high number of predicted small ORFs (containing less than 100 codons). More than 260,000 small ORFs are predicted in budding yeast alone, and their probability of being biologically meaningless is quite high. At the same time, several really interesting microproteins (encoded by the small ORFs) have been identified and characterized in diverse cellular pathways. The question is that of finding a needle in a haystack- how do we identify functional small ORFs amongst the thousands that are predicted.
One of the biggest strides that helped in this quest of finding these tiny protein coding ORFs, came from the development of a highly sensitive and quantitative tool to assess active ribosomal translation called Ribosome profiling by Nicholas Ingolia and Jonathan Weissman in their 2009 paper. Right away, it became obvious that ribosomes actively translate non-canonical small ORFs, many that don’t even start with conventional start codons- an observation that has been made in every model organism and cell system ever since. But the proof of the actual protein products and their functional significance was still lacking.
In this work, they attempt to address that systematically. They perform ribosome profiling in four different cell lines and detect thousands of overlapping small ORFs with ribosomal footprints. But they go further and find evidence of existence of at least a subset of the protein products by total proteome MS as well as by HLA-I peptidomics. The coolest part of the paper, though, is the preliminary functional analysis they present. By performing pooled CRISPR-KO screens, they provide evidence that loss of many of these small ORF products results in definitive disadvantage to cellular fitness. They follow-up candidates of two categories of non-canoical ORFs: those embedded in annotated lncRNAs (long non-coding RNAs) and small ORFs that exist upstream to a main/ annotated/ canonical ORFs downstream. The most intriguing finding of all is that in most of the cases, the protein products of the small uORFs localize to the same cellular location as the protein product of the main ORF, and show physical association with it. The protein products of lncRNAs also localize to discrete cellular locations suggestive of unique functions.
Intriguing as it is, this work opens the door to allow more detailed characterization of function and regulation of these new microproteins. Do non-AUG start codons in many of these small ORFs provide a unique mechanism for regulating their expression? Are there conserved features in these microproteins that might allow identification of their homologs in other organisms (if they exist)? What is the function of these microproteins? Do the protein products of the small uORFs promote or inhibit the functions of the protein products of the main ORFs? Or if the function of the protein product is truly needed? Could it be another mechanism to regulate the function of the corresponding RNA which encodes it- such as by preventing the lncRNA to perform its regulatory function? Lots of new questions to explore!