Common and phylogenetically widespread coding for peptides by bacterial small RNAs [article]

Robin C Friedman, Stefan Kalkhof, Olivia Doppelt-Azeroual, Stephan Mueller, Martina Chovancova, Martin von Bergen, Benno Schwikowski
2015 bioRxiv   pre-print
While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. Here, we apply flexible machine learning techniques based on sequence
more » ... tures and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in phylogenetically diverse bacteria. A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 188 ± 25.5 unannotated sRNA ORFs are under selection to maintain coding, an average of 13 per species considered here. This implies that overall at least 7.5 ± 0.3% of sRNAs have a coding ORF, and in some species at least 20% do. 84 ± 9.8 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated according to ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and two S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are novel components of type I toxin/antitoxin systems. Our predictions for sRNA coding ORFs, including novel type I toxins, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr.
doi:10.1101/030619 fatcat:zkylcb5yczhsjosvogmn6bbvga