Efficient Computation of Sequence Mappability

2022
*
Algorithmica
*

In the (k, m)-

doi:10.1007/s00453-022-00934-y
fatcat:x6frpxpvlnbwtczhvcid2nj6zi
*mappability*problem, for a given*sequence*T*of*length n, the goal is to*compute*a table whose ith entry is the number*of*indices $$j \ne i$$ j ≠ i such that the length-m substrings*of*T starting ... We present several*efficient*algorithms for the general case*of*the problem. ... help*of*a reference*sequence*. ...##
###
Efficient Computation of Sequence Mappability
[article]

2021
*
arXiv
*
pre-print

In the (k,m)-

arXiv:1807.11702v3
fatcat:czz74g3soje6dgkqogb5jrsaaq
*mappability*problem, for a given*sequence*T*of*length n, the goal is to*compute*a table whose ith entry is the number*of*indices j i such that the length-m substrings*of*T starting at positions ... Previous works on this problem focused on heuristics*computing*a rough approximation*of*the result or on the case*of*k=1. We present several*efficient*algorithms for the general case*of*the problem. ... A first step*of*these techniques is to*compute*the distances between all pairs*of**sequences*representing the set*of*species or taxa under study. ...##
###
Efficient Computation of Sequence Mappability
[chapter]

2018
*
Lecture Notes in Computer Science
*

*Sequence*

*mappability*is an important task in genome re-

*sequencing*. ... In the (k, m)-

*mappability*problem, for a given

*sequence*T

*of*length n, our goal is to

*compute*a table whose ith entry is the number

*of*indices j = i such that length-m substrings

*of*T starting at positions ... In turn, the process

*of*re-

*sequencing*depends heavily on how

*mappable*a genome is given a set

*of*reads

*of*some fixed length m. ...

##
###
Fast Computation and Applications of Genome Mappability

2012
*
PLoS ONE
*

We present a fast mapping-based algorithm to

doi:10.1371/journal.pone.0030377
pmid:22276185
pmcid:PMC3261895
fatcat:suu3w7k7qjfknetlv2p2exc5da
*compute*the*mappability**of*each region*of*a reference genome up to a specified number*of*mismatches. ... Knowing the*mappability**of*a genome is crucial for the interpretation*of*massively parallel*sequencing*experiments. ... Acknowledgments We would like to thank Rachel Harte from the University*of*California Santa Cruz for her substantial help in the integration*of*our*mappability*tracks into the UCSC Genome Browser. ...##
###
GenMap: Ultra-fast Computation of Genome Mappability

2020
*
Bioinformatics
*

Motivation

doi:10.1093/bioinformatics/btaa222
pmid:32246826
pmcid:PMC7320602
fatcat:nlhqhjtokzbanimmj5zsnviite
*Computing*the uniqueness*of*k-mers for each position*of*a genome while allowing for up to e mismatches is*computationally*challenging. ... This allows for the*computation**of*marker*sequences*or finding candidates for probe design by identifying approximate k-mers that are unique to a genome or that are present in all genomes. ... Acknowledgements The authors acknowledge the support*of*the de.NBI network for bioinformatics infrastructure, the Intel SeqAn IPCC and the IMPRS for*Computational*Biology and Scientific*Computing*. ...##
###
GenMap: Fast and Exact Computation of Genome Mappability
[article]

2019
*
bioRxiv
*
pre-print

We present a fast and exact algorithm to

doi:10.1101/611160
fatcat:h7vr3jtvezaxjpywrblg52sjwm
*compute*the (k,e)-*mappability*. Its inverse, the (k,e)-frequency counts the number*of*occurrences*of*each k-mer with up to e errors in a*sequence*. ... We also show that*mappability*can be*computed*on multiple*sequences*to identify marker genes illustrated by the example*of*E. coli strains. ... Acknowledgements The authors acknowledge the support*of*the de.NBI network for bioinformatics infrastructure, the Intel SeqAn IPCC and the IMPRS for*Computational*Biology and Scientific*Computing*. ...##
###
BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data

2013
*
BMC Genomics
*

BS-Seeker2 improves

doi:10.1186/1471-2164-14-774
pmid:24206606
pmcid:PMC3840619
fatcat:wdo4csgmozbyvmgia6oy2z67bi
*mappability*over existing aligners by using local alignment. It can also map reads from RRBS library by building special indexes with improved*efficiency*and accuracy. ... Libraries such as whole genome bisulfite*sequencing*(WGBS) and reduced represented bisulfite*sequencing*(RRBS) are widely used for generating DNA methylomes, demanding*efficient*and versatile tools for ... Conclusions We provide a BS alignment pipeline, BS-Seeker2, for fast and accurate mapping*of*BS reads from various types*of*library. ...##
###
Faster Algorithms for 1-Mappability of a Sequence
[chapter]

2017
*
Lecture Notes in Computer Science
*

In the k-

doi:10.1007/978-3-319-71147-8_8
fatcat:hgqbvfm24bep5p67yqx2z5b5xa
*mappability*problem, we are given a string x*of*length n and integers m and k, and we are asked to count, for each length-m factor y*of*x, the number*of*other factors*of*length m*of*x that are ... We focus here on the version*of*the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). ... Another direction*of*practical interest is thus to devise*efficient*algorithms for the problems*of*1-*mappability*and k-*mappability*for the External Memory model*of**computation*. ...##
###
Faster algorithms for 1-mappability of a sequence
[article]

2017
*
arXiv
*
pre-print

In the k-

arXiv:1705.04022v1
fatcat:xh4iqa7ufvbgzfgofmizebfiie
*mappability*problem, we are given a string x*of*length n and integers m and k, and we are asked to count, for each length-m factor y*of*x, the number*of*other factors*of*length m*of*x that are ... We focus here on the version*of*the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n). ... Another direction*of*practical interest is thus to devise*efficient*algorithms for the problems*of*1-*mappability*and k-*mappability*for the External Memory model*of**computation*. ...##
###
PICS: Probabilistic Inference for ChIP-seq

2010
*
Biometrics
*

In order to improve the

doi:10.1111/j.1541-0420.2010.01441.x
pmid:20528864
fatcat:lhbuhiji2reupdbopdbhfdfjo4
*computational**efficiency**of*the PICS package, we recommend the utilisation*of*the parallel package, which allows for easy parallel*computations*. ... which stores the*sequences**of*genome locations and associatedd annotations. ...##
###
False positives in trans-eQTL and co-expression analyses arising from RNA-sequencing alignment errors

2019
*
F1000Research
*

*Sequence*similarity among distinct genomic regions can lead to errors in alignment

*of*short reads from next-generation

*sequencing*. ... Over 75%

*of*trans-eQTLs using a standard pipeline occurred between regions

*of*

*sequence*similarity and therefore could be due to alignment errors. ... They also provide software for

*efficiently*

*computing*cross-

*mappability*available at a GitHub link. The command line software has detailed instructions online. ...

##
###
Efficient and Comprehensive Representation of Uniqueness for Next-Generation Sequencing by Minimum Unique Length Analyses

2013
*
PLoS ONE
*

We have developed the Minimum Unique Length Tool (MULTo), a framework for

doi:10.1371/journal.pone.0053822
pmid:23349747
pmcid:PMC3548888
fatcat:rmjnw25v7zflleib5uxi2bmnga
*efficient*and comprehensive representation*of**mappability*information, through identification*of*the shortest possible length required ... As next generation*sequencing*technologies are getting more*efficient*and less expensive, RNA-Seq is becoming a widely used technique for transcriptome studies. ... In this paper we present a novel approach to*efficiently*and comprehensively describe*mappability**of*a genome or transcriptome. ...##
###
CLImAT: accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using whole-genome sequencing data

2014
*
Computer applications in the biosciences : CABIOS
*

Therefore,

doi:10.1093/bioinformatics/btu346
pmid:24845652
pmcid:PMC4155249
fatcat:6v765jmdfzeurifsilmwruah34
*efficient**computational*methods are required to address these issues. ... Motivation: Whole-genome*sequencing**of*tumor samples has been demonstrated as an*efficient*approach for comprehensive analysis*of*genomic aberrations in cancer genome. ... Funding: National Natural Science Foundation*of*China (31100955, 61101061). Conflict*of*Interest: none declared. ...##
###
MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data

2013
*
Computer applications in the biosciences : CABIOS
*

We observe that the

doi:10.1093/bioinformatics/btt001
pmid:23300135
pmcid:PMC3570216
fatcat:2g6xschejnaurpshsbqkuvr2rm
*mappability**of*different parts*of*the genome can introduce an artificial bias into cross-correlation*computations*, resulting in incorrect fragment-length estimates. ... Motivation: Reliable estimation*of*the mean fragment length for next-generation short-read*sequencing*data is an important step in next-generation*sequencing*analysis pipelines, most notably because*of*... Naı¨ve cross-correlation, on the other hand, simply*computes*correlation between rows 1 and 4, regardless*of**mappability*more*efficient*, especially if the lists*of*reads and*mappable*intervals are short ...##
###
From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

2016
*
Human Mutation
*

Therefore, it is essential to evaluate the robustness

doi:10.1002/humu.23114
pmid:27604516
pmcid:PMC5129537
fatcat:5nsmszxr6bhi5dp6j63oh7m5fe
*of*the variant detection process taking into account the*computing*resources required. ... We have benchmarked six combinations*of*state-*of*-the-art read aligners (BWA-MEM and GEM3) and variant callers (FreeBayes, GATK Haplo-typeCaller, SAMtools) on whole genome and whole exome*sequencing*data ... Acknowledgments We thank Raul Tonda for help with pipeline implementation and figure generation, and Nvidia for their donation*of*part*of*the systems used in this work. ...
