FROM PLANT GENOMES TO PROTEIN FAMILIES: COMPUTATIONAL TOOLS

Manuel Martinez
2013 Computational and Structural Biotechnology Journal  
The access to the primary DNA sequence has become a fundamental resource in biology. Recent advances in sequencing technologies and associated bioinformatic and computational tools have led to a deep increase in our knowledge of plant genomes [1] [2] [3] . According to the Genomes On-Line Database (GOLD), more than 20 plant genomes have been already completed and there are more than 200 ongoing plant genomic projects. Searches at the NCBI genomes database increase the number of species with
more » ... t DNA nuclear genomic sequences to more than sixty. Information of about 50 species of land plants with draft genome sequences is compiled in the CoGepedia web page (Table 1 ). The genomes of the eudicot model plant for plant biology, Arabidopsis thaliana, and the monocot crop model plant rice (Oryza sativa) were the first genomes to be sequenced. Nowadays, several other plant species from both, eudicot and monocot clades have been completely sequenced and their sequences are publicly available (Figure 1 ). Among eudicot species, there are examples from the most important orders included in the subclasses Rosids and Asterids, as well as the genome of the basal eudicot Amborella trichopoda. In contrast, due to their global agronomical value, most sequenced monocots species belongs to the Poales order. Besides, great efforts have been made to sequence basal plant species to deal with evolutionary challenges. Several algae genomes belonging to the main algal orders, a moss, Physcomitrella patens, and a pseudofern, Selaginella moellendorffii have been completely sequenced. Besides, technology advancements have now made feasible the sequencing of the extremely large conifer genomes, and very recently the first gymnosperm genome has been sequenced and published [4] .
doi:10.5936/csbj.201307001 pmid:24688740 pmcid:PMC3962197 fatcat:pz33c37r2jdrpdzccpqo4txm5i