Origins of introns based on the definition of exon modules and their conserved interfaces

A. D. G. de Roos
2004 Bioinformatics  
Central in the unraveling of the early evolution of the genome is the origin and role of introns. The evolution of the genome can be characterized by a continuous expansion of functional modules that occurs without interruption of existing processes. The design-by-contract methodology of software development offers a modular approach to design that seeks to increase flexibility by focusing on the design of constant interfaces between functional modules. Here, it is shown that design-by-contract
more » ... can offer a framework for genome evolution. The definition of an ancient exon module with identical splice sites leads to a relatively simple sequence of events that explains the role of introns, intron phase differences and the evolution of multi-exon proteins in an RNA world. An interaction of the experimentally-defined six-nucleotide splicing consensus sequence together with a limited number of primitive ribozymes can account for a rapid creation of protein diversity. 2003; Rzhetsky and Ayala, 1999). There are now two main competing theories that try to explain the role of introns and are both based on the involvement of DNA based introns and exons . The 'introns early' or exon theory of genes states that the introns are ancient and have been subsequently lost in prokaryotes (Gilbert, 1987; Gilbert et al, 1996; Gilbert et al, 1997). In this theory, the first exons coded for ancient protein modules from which multi-modular proteins were assembled by means of exon shuffling and recombination. Introns facilitated this process by providing the actual sites of recombination. On the other hand, the 'introns late theory' maintains that the spliceosomal introns were inserted into the eukaryote genes later in evolution (Palmer and Logsdon, 1998; Cavalier-Smith, 1991; Cho and Doolittle, 1997; Logsdon, 1998) after the evolution of multi-modular proteins. In introns-late, the appearance of introns could also have aided in the creation of diversity by facilitating recombination. No conclusive evidence has been found to prove or disprove intron-early or intron-late, although these theories are based on completely different genome architectures and mechanisms of evolution. The genome has evolved from a simple RNA based self-replicating system, the RNA world (Gilbert, 1986; Joyce, 2002) to a complex system of multi-exon genes coding for multi-modular proteins. During this evolutionary process, numerous new functions were added or modified without disrupting the functioning of older systems. The evolution from strands of RNA to multi-exon genes with sophisticated expression systems implies that the genome was able to increase in size and complexity many orders of magnitude without losing flexibility. Any genome architecture that would form the basis of genome evolution should therefore be flexible and robust in order to meet the requirements for virtual unlimited expansion of size and function. Modern software designs seek to increase flexibility by using a modular approach which allows for the addition, replacement and changing operations within individual modules. Complex software architectures are based on a methodology in which a software system is viewed as a set of communicating modules whose interaction is based on precisely defined interfaces. The interfaces can be viewed as specifications of the mutual obligations or contracts. The effect of constant interfaces across modules is a reduction of the interdependencies across modules or components and a reduction of the risk that changes within one module will create unanticipated changes in other modules. This methodology is also known as design-by-contract (Meyer, 1997). Since the characteristics of the design-by-contract methodology are similar to those required in genome evolution, it is hypothesized here that genome architecture reflects the paradigms of design-by-contract: definition of functional modules that interact with each other by well-defined interfaces. Modularity and interfaces in the genome The basic unit of genetic information, the gene, can be regarded as a self-contained module with a welldefined interface. A gene contains all the necessary information from which the encoded protein can be generated, whereas the highly conserved genetic code functions as the interface between gene and protein. Eukaryotic genes consist themselves of parts of coding sequences, exons, interrupted by noncoding sequence, the introns (Fig. 1A) . The introns have to be spliced out in order to form a continuous coding sequence, mRNA, that can be recognized by the translation machinery. In principle, an intron contains all the necessary information to be spliced out which enables it to function independently from the exon sequence. The intron can therefore be regarded as a self-contained modules with a welldefined (conserved) interface, the splice recognition site (Fig. 1B) , which is located exclusively in the intron. This configuration enables the excision of introns independent from exon sequence. Exons are, in contrast to introns, dependent upon information that lies outside of the exons, since the splice recognition sites of the intron determine the span of the exon. A dependence on intron sequences would severely hamper independent movement and exchange of coding sequences between genes. However, extensive recombination of exons by exon shuffling is believed to played an important role not random loss. J. Mol. Evol. 44, 573-584 Clark, F. (2003) Gene data sets derived from GenBank [web page]; http://www.maths.uq.edu.au/~fc/datasets/. Dibb N.J., Newman A.J. (1989). Evidence that introns arose at proto-splice sites. EMBO J. 8: 2015-21. Ekland E.H., Szostak J.W., Bartel D.P. (1995) Structurally complex and highly active RNA ligases derived from random RNA sequences. The yeast splice site revisited: new exon consensus from genomic analysis. Cell. 12:739-40. Long M., Rosenberg C., Gilbert W. (1995) Intron phase correlations and the evolution of the intron/exon structure of genes. Tomita M., Shimizu N., Brutlag D.L. (1996) Introns and reading frames: correlation between splicing sites and their codon positions. Mol Biol Evol. 13: 1219-23. Yoshida H. (2001). The ribonuclease T1 family. Methods Enzymol.341:28-41.
doi:10.1093/bioinformatics/bth475 pmid:15308547 fatcat:rdj72losrnc7fi33fjybkkjpx4