Resequencing of the Leishmania infantum (strain JPCM5) genome and de novo assembly into 36 contigs
Leishmania parasites are the causative of leishmaniasis, a group of potentially fatal human diseases. Control strategies for leishmaniasis can be enhanced by genome based investigations. The publication in 2005 of the Leishmania major genome sequence, and two years later the genomes for the species Leishmania braziliensis and Leishmania infantum were major milestones. Since then, the L. infantum genome, although highly fragmented and incomplete, has been used widely as the reference genome to
... dress whole transcriptomics and proteomics studies. Here, we report the sequencing of the L. infantum genome by two NGS methodologies and, as a result, the complete genome assembly on 36 contigs (chromosomes). Regarding the present L. infantum genome-draft, 495 new genes have been annotated, a hundred have been corrected and 75 previous annotated genes have been discontinued. These changes are not only the result of an increase in the genome size, but a significant contribution derives from the existence of a large number of incorrectly assembled regions in current chromosomal scaffolds. Furthermore, an improved assembly of tandemly repeated genes has been obtained. All these analyses support that the de novo assembled L. infantum genome represents a robust assembly and should replace the currently available in the databases. Protists of the genus Leishmania belong to the order Trypanosomatida, an early-branching line from the eukaryotic tree 1 . Many species of the genus are highly pathogenic for humans and other mammals, causing several clinical manifestations that are globally known as leishmaniasis. These pathogenic Leishmania species are transmitted by phlebotomine sand flies 2 . Although it is not absolute, there exists an association between the clinical forms of leishmaniasis and the infecting Leishmania species 3 . Thus, the clinical spectrum of leishmaniasis encompasses subclinical (asymptomatic) infections, self-healing cutaneous lesions, and disseminated forms (diffuse cutaneous, mucosal, or visceral leishmaniasis). Leishmania major is the prototypical species associated with cutaneous leishmaniasis in the Old World, mucosal affections (also known as mucocutaneous leishmaniasis) are hallmarks of Leishmania braziliensis infection, whereas Leishmania donovani and Leishmania infantum are the causative agents of visceral leishmaniasis (VL). The latter species are closely related, according to molecular genetic criteria 4 , even though they are found in different geographical regions: L. donovani is the primary cause of VL in the Indian subcontinent and East Africa, and L. infantum is the causative agent of VL in the Mediterranean basin, the Middle East, and Latin America 5 . The medical relevance, together with the peculiarities in molecular mechanisms and biological structures present in this group of microorganisms 6 , justified efforts leading to determine their precise genome sequence. L. major was the first species of them to have its genome sequenced 7 , and it provided the model/template for subsequent genomic analyses of other Leishmania species. Afterwards, in 2007, the sequences of the L. braziliensis and L. infantum genomes were published 8 . During the last decade, the extraordinary progress in genome Published: xx xx xxxx OPEN www.nature.com/scientificreports/ genes uncovered in the new assembled L. infantum genome. The complete annotation of the new genome (GFF3 file) is provided in the Supplementary file 1 in Excel format. Synteny analysis. Synteny was evaluated via SyMAP 45 and progressive MAUVE 46 algorithms using current L. infantum (v.9, GeneDB.org) and L. major 24 genomes as reference. Synteny graphs were prepared by geno-PlotR 47 , and provided as Supplementary Figures S1-S36. Data availability. The Illumina paired ends reads (FASTQ) and PacBio bax.h5 reads of L. infantum (JPCM5 strain) generated for this study are available at The European Nucleotide Archive (ENA; http://www.ebi.ac.uk/ena/). Also, the assembled genome sequence and an annotation file were uploaded. All data have been deposited under the Study accession number PRJEB20254 and Study unique name: ena-STUDY-CBMSO-04-04-2017-10:39:08:689-498. The new L. infantum genome sequence will also be available at the Leish-ESP web site (https://leishseq.neocities.org/).