116,032 Hits in 6.6 sec

Clustering of reads with alignment-free measures and quality values

Matteo Comin, Andrea Leoni, Michele Schimd
2015 Algorithms for Molecular Biology  
Also results on de novo assembly and metagenomic reads classification show that the introduction of quality values improves over standard alignment-free measures.  ...  To the best of our knowledge this is the first study that incorporates quality value information and k-mers counts, in the context of alignment-free measures, for the comparison of reads data.  ...  In this paper we presented a family of alignment-free measures, called D q -type, that incorporate quality value information and k-mers counts for the comparison of reads data.  ... 
doi:10.1186/s13015-014-0029-x pmid:25691913 pmcid:PMC4331138 fatcat:gjj5whpcezblvgglurwux4tjre

Fast comparison of genomic and meta-genomic reads with alignment-free measures based on quality values

Matteo Comin, Michele Schimd
2016 BMC Medical Genomics  
Results: In this paper we present a family of alignment-free measures, called d q -type, that are based on k-mer counts and quality values.  ...  In this context it will be fundamental to exploit quality values information within the framework of alignment-free measures.  ...  The use of quality values within alignment-free measures on average improves the classification accuracy and the impact of quality values increases when the reads are more noisy and the coverage is low  ... 
doi:10.1186/s12920-016-0193-6 pmid:27535823 pmcid:PMC4989896 fatcat:dg6ox7nwwrcijakfk5bvct3eye

De novo clustering of long-read transcriptome data using a greedy, quality-value based algorithm [article]

Kristoffer Sahlin, Paul Medvedev
2018 bioRxiv   pre-print
To address this challenge, we develop isONclust, a clustering algorithm that is greedy (in order to scale) and makes use of quality values (in order to handle variable error rates).  ...  A common bottleneck is the dearth of scalable and accurate algorithms for clustering long reads according to their gene family of origin.  ...  Evaluation metrics: There exists several metrics to measure quality of clustering. We mainly use the V-measure and its two components completeness and homogeneity [45] .  ... 
doi:10.1101/463463 fatcat:eqh6gmqp6rcyzabdeqv3mpnoa4

MeShClust2: Application of alignment-free identity scores in clustering long DNA sequences [article]

Benjamin T. James, Hani Z. Girgis
2018 bioRxiv   pre-print
ABSTRACTGrouping sequences into similar clusters is an important part of sequence analysis. Widely used clustering tools sacrifice quality for speed.  ...  Although MeShClust outperformed related tools in terms of cluster quality, the alignment algorithm used for generating training data for the classifier was not scalable to longer sequences.  ...  This research was supported mainly by funds from the Oklahoma Center for the Advancement of Science and Technology [PS17-015] and in part by internal funds provided by the College of Engineering and Natural  ... 
doi:10.1101/451278 fatcat:rxlexall6rd33dlh44kfkreika

Transcriptome Analysis for Non-Model Organism: Current Status and Best-Practices [chapter]

Vahap Eldem, Gokmen Zararsiz, Tunahan Taşçi, Izzet Parug Duru, Yakup Bakir, Melike Erkan
2017 Applications of RNA-Seq and Omics Strategies - From Microorganisms to Human Health  
In this chapter, we aim to provide an overview of the state-of-the-art methods including (i) quality check and pre-processing of raw reads, (ii) the pros and cons of de novo transcriptome assemblers, (  ...  In spite of immense potential of RNA-Seq-based methods, particularly in recovering full-length transcripts and spliced isoforms from short-reads, the accurate results can be only obtained by the procedures  ...  Acknowledgements All authors contributed to the editing of the manuscript and the content is solely the responsibility of the authors.  ... 
doi:10.5772/intechopen.68983 fatcat:vatg4hbanrchxhuxbf3meb3hye

Searching for SNPs with cloud computing

Ben Langmead, Michael C Schatz, Jimmy Lin, Mihai Pop, Steven L Salzberg
2009 Genome Biology  
Crossbow is a cloud-computing software tool that combines the aligner Bowtie and the SNP caller SOAPsnp.  ...  Executing in parallel using Hadoop, Crossbow analyzes data comprising 38-fold coverage of the human genome in three hours using a 320-CPU cluster rented from a cloud computing service for about $85.  ...  We also thank Miron Livny and his team for providing access to their compute cluster.  ... 
doi:10.1186/gb-2009-10-11-r134 pmid:19930550 pmcid:PMC3091327 fatcat:pppdfms72fe4lbfa25blly4l3i

Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory

Mark J Chaisson, Glenn Tesler
2012 BMC Bioinformatics  
Results: We describe the method BLASR (Basic Local Alignment with Successive Refinement) for mapping Single Molecule Sequencing (SMS) reads that are thousands of bases long, with divergence between the  ...  Conclusions: The results indicate that it is possible to map SMS reads with high accuracy and speed.  ...  Acknowledgements We thank Jon Sorenson, James Bullard, Eric Schadt, and Jonas Korlach for useful comments in writing this manuscript. Author details  ... 
doi:10.1186/1471-2105-13-238 pmid:22988817 pmcid:PMC3572422 fatcat:yap5l2k3w5dabpjwl3zois7zzu

HUGO: Hierarchical mUlti-reference Genome cOmpression for aligned reads

Pinghao Li, Xiaoqian Jiang, Shuang Wang, Jihoon Kim, Hongkai Xiong, Lucila Ohno-Machado
2014 JAMIA Journal of the American Medical Informatics Association  
free content of genomic fragments without quality values and 5× CR when quality values are included.  ...  The k-means clustering aims at partitioning L r quality values into k clusters, so that the quality values within the same cluster can be replaced by the quality values of the cluster center. 31 We can  ...  Provenance and peer review Not commissioned; externally peer reviewed.  ... 
doi:10.1136/amiajnl-2013-002147 pmid:24368726 pmcid:PMC3932469 fatcat:xpdcsupzxjg5fhc6i667fcyiti

Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data

Thomas S. Rask, Bent Petersen, Donald S. Chen, Karen P. Day, Anders Gorm Pedersen
2016 BMC Bioinformatics  
provides sequence characteristics that allow generation of a set of high confidence error-free sequences.  ...  Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames.  ...  using Usearch v5.2.32 with seeds (cluster member with highest number of replicate reads) as output [15] .  ... 
doi:10.1186/s12859-016-1032-7 pmid:27102804 pmcid:PMC4841065 fatcat:we7tkmxsufbe5ntsad5ntyzdsy

Open-Source Sequence Clustering Methods Improve the State Of the Art

Evguenia Kopylova, Jose A. Navas-Molina, Céline Mercier, Zhenjiang Zech Xu, Frédéric Mahé, Yan He, Hong-Wei Zhou, Torbjørn Rognes, J. Gregory Caporaso, Rob Knight, Nicola Segata
2016 mSystems  
Sequence clustering is a common early step in amplicon-based microbial community analysis, when raw sequencing reads are clustered into operational taxonomic units (OTUs) to reduce the run time of subsequent  ...  Furthermore, we observed that stringent quality filtering, such as is done in UPARSE, can cause a significant underestimation of species abundance and diversity, leading to incorrect biological results  ...  ACKNOWLEDGMENTS We thank William Walters, Amnon Amir, Amanda Birmingham, Embriette Hyde, and Daniel McDonald for their time and valuable suggestions to improve the quality of the manuscript.  ... 
doi:10.1128/msystems.00003-15 pmid:27822515 pmcid:PMC5069751 fatcat:vtt6hkurmzbrrektwi3xl5qh7e

Alignment-free sequence comparison: benefits, applications, and tools

Andrzej Zielezinski, Susana Vinga, Jonas Almeida, Wojciech M. Karlowski
2017 Genome Biology  
Alignment-free sequence analyses have been applied to problems ranging from whole-genome phylogeny to the classification of protein families, identification of horizontally transferred genes, and detection  ...  We address these questions and provide a guide to the currently available alignment-free sequence analysis tools.  ...  with alignment-free measures (k-mer based) and quality values Software (C++) [159] ciompin/main/ qcluster.html Reads error correction Lighter Correction of sequencing  ... 
doi:10.1186/s13059-017-1319-7 pmid:28974235 pmcid:PMC5627421 fatcat:5s7yd22l7bbmpljqc7fj4cbifm

IMSEQ—a fast and error aware approach to immunogenetic sequence analysis

Leon Kuchenbecker, Mikalai Nienen, Jochen Hecht, Avidan U. Neumann, Nina Babel, Knut Reinert, Peter N. Robinson
2015 Bioinformatics  
This type of analysis requires efficient and unambiguous clonotype assignment to a large number of NGS read sequences, including the identification of the incorporated V and J gene segments and the CDR3  ...  Current tools have deficits with respect to performance, accuracy and documentation of their underlying algorithms and usage.  ...  Funding This research was funded by the German Federal Ministry of Education and Research (BMBF) within the grants "Primage" (0315895A) to N.B. and "eKid" (01ZX1312A) to N.B. as well as by the Investitionsbank  ... 
doi:10.1093/bioinformatics/btv309 pmid:25987567 fatcat:lnequrncmrdqfhmchpf7fe4nqq

Reader preferences and behavior on Wikipedia

Janette Lehmann, Claudia Müller-Birn, David Laniado, Mounia Lalmas, Andreas Kaltenbrunner
2014 Proceedings of the 25th ACM conference on Hypertext and social media - HT '14  
We show that the most read articles do not necessarily correspond to those frequently edited, suggesting some degree of non-alignment between user reading preferences and author editing preferences.  ...  Wikipedia is a collaboratively-edited online encyclopaedia that relies on thousands of editors to both contribute articles and maintain their quality.  ...  For each longevity value, we plot the percentage of articles with that value.  ... 
doi:10.1145/2631775.2631805 dblp:conf/ht/LehmannMLLK14 fatcat:ex6qa5pq7bd7pp6mx3kwtrhwoa

Computational Methods for DNA Copy-Number Analysis of Tumors [chapter]

Jude Kendall, Alexander Krasnitz
2014 Msphere  
With the help of a comprehensive multistep computational procedure described here, copy-number profiles of tumor tissues or individual tumor cells may be generated and interpreted, starting with data acquired  ...  These include accounting for variation of ploidy and distilling somatic copy number alterations from the inherited background.  ...  There is one tab-delimited line for each read aligned, showing the read ID, read sequence and quality scores and the alignment position in the reference genome. Biol.  ... 
doi:10.1007/978-1-4939-0992-6_20 pmid:25030933 pmcid:PMC5136461 fatcat:ylc5pd5wfra5vbvac3nb7ercwi

FreePSI: an alignment-free approach to estimating exon-inclusion ratios without a reference transcriptome

Jianyu Zhou, Shining Ma, Dongfang Wang, Jianyang Zeng, Tao Jiang
2017 Nucleic Acids Research  
In this paper, we propose an alignment-free method, FreePSI, to perform genomewide estimation of exon-inclusion ratios from RNA-Seq data without relying on the guidance of a reference transcriptome.  ...  We compare FreePSI with the existing methods on simulated and real RNA-seq data in terms of both accuracy and efficiency and show that it is able to achieve very good performance even though a reference  ...  We would like to thank Dr Rui Jiang (Tsinghua University) for the support of computational resources, and the anonymous referees for many constructive suggestions.  ... 
doi:10.1093/nar/gkx1059 pmid:29136203 pmcid:PMC5778508 fatcat:4iyj2ivgybdbvlj3hvczpnzule
« Previous Showing results 1 — 15 out of 116,032 results