Filters








375 Hits in 4.4 sec

GenMap: Ultra-fast Computation of Genome Mappability

Christopher Pockrandt, Mai Alzamel, Costas S Iliopoulos, Knut Reinert, Jinbo Xu
2020 Bioinformatics  
Motivation Computing the uniqueness of k-mers for each position of a genome while allowing for up to e mismatches is computationally challenging.  ...  More formally, the uniqueness or (k, e)-mappability can be described for every position as the reciprocal value of how often this k-mer occurs approximately in the genome, i.e., with up to e mismatches  ...  Acknowledgements The authors acknowledge the support of the de.NBI network for bioinformatics infrastructure, the Intel SeqAn IPCC and the IMPRS for Computational Biology and Scientific Computing.  ... 
doi:10.1093/bioinformatics/btaa222 pmid:32246826 pmcid:PMC7320602 fatcat:nlhqhjtokzbanimmj5zsnviite

GenMap: Fast and Exact Computation of Genome Mappability [article]

Christopher Pockrandt, Mai Alzamel, Costas S. Iliopoulos, Knut Reinert
2019 bioRxiv   pre-print
We present a fast and exact algorithm to compute the (k,e)-mappability. Its inverse, the (k,e)-frequency counts the number of occurrences of each k-mer with up to e errors in a sequence.  ...  The algorithm we present is a magnitude faster than the algorithm in the widely used GEM suite while not relying on heuristics, and can even compute the mappability for short k-mers on highly repetitive  ...  Acknowledgements The authors acknowledge the support of the de.NBI network for bioinformatics infrastructure, the Intel SeqAn IPCC and the IMPRS for Computational Biology and Scientific Computing.  ... 
doi:10.1101/611160 fatcat:h7vr3jtvezaxjpywrblg52sjwm

BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data

Weilong Guo, Petko Fiziev, Weihong Yan, Shawn Cokus, Xueguang Sun, Michael Q Zhang, Pao-Yang Chen, Matteo Pellegrini
2013 BMC Genomics  
We also defined CGmap and ATCGmap file formats for full representations of DNA methylomes, as part of the outputs of BS-Seeker2 pipeline together with BAM and WIG files.  ...  BS-Seeker2 improves mappability over existing aligners by using local alignment. It can also map reads from RRBS library by building special indexes with improved efficiency and accuracy.  ...  Our comparisons with respect to two other popular BS aligners showed that BS-Seeker2 has a comparable performance on WGBS data and outperforms on RRBS data with the others.  ... 
doi:10.1186/1471-2164-14-774 pmid:24206606 pmcid:PMC3840619 fatcat:wdo4csgmozbyvmgia6oy2z67bi

Faster Algorithms for 1-Mappability of a Sequence [chapter]

Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, Jakub Radoszewski, Wing-Kin Sung
2017 Lecture Notes in Computer Science  
In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are  ...  We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n).  ...  By computing the mappability of the reference genome, we can then assemble the genome of an individual with greater confidence by first mapping the segments of the DNA that correspond to regions with low  ... 
doi:10.1007/978-3-319-71147-8_8 fatcat:hgqbvfm24bep5p67yqx2z5b5xa

Faster algorithms for 1-mappability of a sequence [article]

Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, Jakub Radoszewski, Wing-Kin Sung
2017 arXiv   pre-print
In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are  ...  We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n).  ...  By computing the mappability of the reference genome, we can then assemble the genome of an individual with greater confidence by first mapping the segments of the DNA that correspond to regions with low  ... 
arXiv:1705.04022v1 fatcat:xh4iqa7ufvbgzfgofmizebfiie

BiSpark: a Spark-based highly scalable aligner for bisulfite sequencing data

Seokjun Soe, Yoonjae Park, Heejoon Chae
2018 BMC Bioinformatics  
Due to the selective nucleotide conversion on unmethylated cytosines after treatment with sodium bisulfite, processing bisulfite-treated sequencing reads requires additional steps which need high computational  ...  Conclusions: Experimental results on methylome datasets show that BiSpark significantly outperforms other state-of-the-art bisulfite sequencing aligners in terms of alignment speed and scalability with  ...  Availability of data and materials The implementation of BiSpark software package, source code, and test data sets are available at https://bhi-kimlab.github.io/BiSpark/.  ... 
doi:10.1186/s12859-018-2498-2 fatcat:fyolmxaegnds7bqiwtp4pityge

FANSe2: A Robust and Cost-Efficient Alignment Tool for Quantitative Next-Generation Sequencing Applications

Chuan-Le Xiao, Zhi-Biao Mai, Xin-Lei Lian, Jia-Yong Zhong, Jing-jie Jin, Qing-Yu He, Gong Zhang, Zhang Zhang
2014 PLoS ONE  
Correct and bias-free interpretation of the deep sequencing data is inevitably dependent on the complete mapping of all mappable reads to the reference sequence, especially for quantitative RNA-seq applications  ...  With three normal office computers, we demonstrated that FANSe2 mapped an RNA-seq dataset generated from an entire Illunima HiSeq 2000 flowcell (8 lanes, 608 M reads) to masked human genome within 4.1  ...  Hanqi Yin (Shanghai Biotechnology Corporation) for his help on analyzing the microarray data. Author Contributions  ... 
doi:10.1371/journal.pone.0094250 pmid:24743329 pmcid:PMC3990525 fatcat:rsozob7ut5brfb3oglu2rjhmli

Mapping Billions of Short Reads to a Reference Genome

Jui-Hung Hung, Zhiping Weng
2016 Cold Spring Harbor Protocols  
To account for polymorphism, one can allow for a small number of mismatches (e.g., one mismatch per 36 bases).  ...  BW transform-based algorithms align a query sequence one base at a time with the BW transform of a reference genome, with each base growth winnowing down the list of possible matches in the genome.  ... 
doi:10.1101/pdb.top093153 pmid:27574203 fatcat:ce62fskuy5aqtb2tlkypnhfrnu

PASS: a program to align short sequences

D. Campagna, A. Albiero, A. Bilardi, E. Caniato, C. Forcato, S. Manavski, N. Vitulo, G. Valle
2009 Bioinformatics  
The algorithm is based on a data structure that holds in RAM the index of the genomic positions of "seed" words (typically 11-12 bases) as well as an index of the precomputed scores of short words (typically  ...  For instance, gap alignment is achieved hundreds of times faster than BLAST and several times faster than SOAP, especially when gaps are allowed.  ...  However, even with words of 14, PASS has a better sensitivity than SOAP with words of 11, and it runs at least 10 times faster.  ... 
doi:10.1093/bioinformatics/btp087 pmid:19218350 fatcat:zigaeyvftzevvmug3mvvqv6bpq

Using quality scores and longer reads improves accuracy of Solexa read mapping

Andrew D Smith, Zhenyu Xuan, Michael Q Zhang
2008 BMC Bioinformatics  
Second-generation sequencing has the potential to revolutionize genomics and impact all areas of biomedical science.  ...  The increase in speed, sensitivity and availability of sequencing technology brings demand for advances in computational technology to perform associated analysis tasks.  ...  Acknowledgements The authors would like to thank Richard McCombie, Vivekanand Balija, Melissa Kramer in Genome center of Cold Spring Harbor Laboratory for providing us the Solexa sequencing data and helpful  ... 
doi:10.1186/1471-2105-9-128 pmid:18307793 pmcid:PMC2335322 fatcat:tzzpgve2pzhjbnm6mwm2lrfwem

Diminishing return for increased Mappability with longer sequencing reads: implications of the k-mer distributions in the human genome

Wentian Li, Jan Freudenberg, Pedro Miramontes
2014 BMC Bioinformatics  
Although a greater length increases the chance for reads being uniquely mapped to the reference genome, a quantitative analysis of the influence of read lengths on mappability has been lacking.  ...  The location of the most frequent 1000-mers comprises 172 kilobase-ranged regions, including four large stretches on chromosomes 1 and X, containing genes with biomedical implications.  ...  WL acknowledges the support from the Robert S Boas Center for Genomics and Human Genetics, and JF was supported by the National Institute of Arthritis and Musculoskeletal and Skin Diseases of the National  ... 
doi:10.1186/1471-2105-15-2 pmid:24386976 pmcid:PMC3927684 fatcat:war6ll3apffbleo7qqy5a366du

Mappability and read length

Wentian Li, Jan Freudenberg
2014 Frontiers in Genetics  
DISTRIBUTION OF APPROXIMATE REPEATS IN THE HUMAN REFERENCE GENOME The distribution P D,C,M or P D 0 ,C,M allowing up to M mismatches is much harder to obtain due to computational constraints (Derrien  ...  DISCUSSION The central thesis of this paper is that if the sequencing produces shorter reads, the length of any repeat unit in the genome sets an upper limit on mappability (a concept applicable to both  ... 
doi:10.3389/fgene.2014.00381 pmid:25426137 pmcid:PMC4226227 fatcat:bql46z6hnngojfghyz6ogvffbm

FANSe: an accurate algorithm for quantitative mapping of large scale sequencing reads

Gong Zhang, Ivan Fedyunin, Sebastian Kirchner, Chuanle Xiao, Angelo Valleriani, Zoya Ignatova
2012 Nucleic Acids Research  
millions of reads to small or large reference genomes.  ...  We developed a new, fast and accurate algorithm for nucleic acid sequence analysis, FANSe, with adjustable mismatch allowance settings and ability to handle indels to accurately and quantitatively map  ...  ACKNOWLEDGEMENTS We are grateful to Wei Chen and Na Li (MDC Berlin) for their help with the Illumina GAIIx sequencing instrument.  ... 
doi:10.1093/nar/gks196 pmid:22379138 pmcid:PMC3367211 fatcat:hpqoblv265evxoe22u6bvfnvuy

WALT: fast and accurate read mapping for bisulfite sequencing

Haifeng Chen, Andrew D. Smith, Ting Chen
2016 Bioinformatics  
Whole-genome bisulfite sequencing (WGBS) has emerged as the gold-standard technique in genome-scale studies of DNA methylation.  ...  WALT uses a strategy of hashing periodic spaced seeds, which leads to significant speedup compared with the most efficient methods currently available.  ...  Funding This work was partially supported by National Institutes of Health grant HG006015 (ADS)  ... 
doi:10.1093/bioinformatics/btw490 pmid:27466624 pmcid:PMC5181568 fatcat:4kfyagcytnghthjpkanofcfqoy

MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data

Parameswaran Ramachandran, Gareth A. Palidwor, Christopher J. Porter, Theodore J. Perkins
2013 Computer applications in the biosciences : CABIOS  
We observe that the mappability of different parts of the genome can introduce an artificial bias into cross-correlation computations, resulting in incorrect fragment-length estimates.  ...  We analyze the computational complexity of this approach, and evaluate its performance on a test suite of NGS datasets, demonstrating its superiority to traditional cross-correlation analysis.  ...  If the mappable intervals are not available at all, then they need to be computed based on the reference genome.  ... 
doi:10.1093/bioinformatics/btt001 pmid:23300135 pmcid:PMC3570216 fatcat:2g6xschejnaurpshsbqkuvr2rm
« Previous Showing results 1 — 15 out of 375 results