Efficient Computation of Sequence Mappability

Panagiotis Charalampopoulos, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Juliusz Straszyński
2022 Algorithmica
In the (k, m)-mappability problem, for a given sequence T of length n, the goal is to compute a table whose ith entry is the number of indices \$\$j \ne i\$\$ j ≠ i such that the length-m substrings of T starting  ...  We present several efficient algorithms for the general case of the problem.  ...  help of a reference sequence.  ...

Efficient Computation of Sequence Mappability [chapter]

Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Tomasz Kociumaka, Solon P. Pissis, Jakub Radoszewski, Juliusz Straszyński
2018 Lecture Notes in Computer Science
Sequence mappability is an important task in genome re-sequencing.  ...  In the (k, m)-mappability problem, for a given sequence T of length n, our goal is to compute a table whose ith entry is the number of indices j = i such that length-m substrings of T starting at positions  ...  In turn, the process of re-sequencing depends heavily on how mappable a genome is given a set of reads of some fixed length m.  ...

Fast Computation and Applications of Genome Mappability

Thomas Derrien, Jordi Estellé, Santiago Marco Sola, David G. Knowles, Emanuele Raineri, Roderic Guigó, Paolo Ribeca, Christos A. Ouzounis
2012 PLoS ONE
We present a fast mapping-based algorithm to compute the mappability of each region of a reference genome up to a specified number of mismatches.  ...  Knowing the mappability of a genome is crucial for the interpretation of massively parallel sequencing experiments.

GenMap: Ultra-fast Computation of Genome Mappability

Christopher Pockrandt, Mai Alzamel, Costas S Iliopoulos, Knut Reinert, Jinbo Xu
2020 Bioinformatics
Motivation Computing the uniqueness of k-mers for each position of a genome while allowing for up to e mismatches is computationally challenging.  ...  This allows for the computation of marker sequences or finding candidates for probe design by identifying approximate k-mers that are unique to a genome or that are present in all genomes.

BS-Seeker2: a versatile aligning pipeline for bisulfite sequencing data

Weilong Guo, Petko Fiziev, Weihong Yan, Shawn Cokus, Xueguang Sun, Michael Q Zhang, Pao-Yang Chen, Matteo Pellegrini
2013 BMC Genomics
BS-Seeker2 improves mappability over existing aligners by using local alignment. It can also map reads from RRBS library by building special indexes with improved efficiency and accuracy.  ...  Libraries such as whole genome bisulfite sequencing (WGBS) and reduced represented bisulfite sequencing (RRBS) are widely used for generating DNA methylomes, demanding efficient and versatile tools for  ...  Conclusions We provide a BS alignment pipeline, BS-Seeker2, for fast and accurate mapping of BS reads from various types of library.

Faster Algorithms for 1-Mappability of a Sequence [chapter]

Mai Alzamel, Panagiotis Charalampopoulos, Costas S. Iliopoulos, Solon P. Pissis, Jakub Radoszewski, Wing-Kin Sung
2017 Lecture Notes in Computer Science
In the k-mappability problem, we are given a string x of length n and integers m and k, and we are asked to count, for each length-m factor y of x, the number of other factors of length m of x that are  ...  We focus here on the version of the problem where k = 1. The fastest known algorithm for k = 1 requires time O(mn log n/ log log n) and space O(n).  ...  Another direction of practical interest is thus to devise efficient algorithms for the problems of 1-mappability and k-mappability for the External Memory model of computation.  ...

PICS: Probabilistic Inference for ChIP-seq

Xuekui Zhang, Gordon Robertson, Martin Krzywinski, Kaida Ning, Arnaud Droit, Steven Jones, Raphael Gottardo
2010 Biometrics
In order to improve the computational efficiency of the PICS package, we recommend the utilisation of the parallel package, which allows for easy parallel computations.  ...  which stores the sequences of genome locations and associatedd annotations.

False positives in trans-eQTL and co-expression analyses arising from RNA-sequencing alignment errors

Ashis Saha, Alexis Battle
2019 F1000Research
Sequence similarity among distinct genomic regions can lead to errors in alignment of short reads from next-generation sequencing.  ...  Over 75% of trans-eQTLs using a standard pipeline occurred between regions of sequence similarity and therefore could be due to alignment errors.  ...  They also provide software for efficiently computing cross-mappability available at a GitHub link. The command line software has detailed instructions online.

Efficient and Comprehensive Representation of Uniqueness for Next-Generation Sequencing by Minimum Unique Length Analyses

Helena Storvall, Daniel Ramsköld, Rickard Sandberg, Noam Shomron
2013 PLoS ONE
We have developed the Minimum Unique Length Tool (MULTo), a framework for efficient and comprehensive representation of mappability information, through identification of the shortest possible length required  ...  As next generation sequencing technologies are getting more efficient and less expensive, RNA-Seq is becoming a widely used technique for transcriptome studies.  ...  In this paper we present a novel approach to efficiently and comprehensively describe mappability of a genome or transcriptome.  ...

CLImAT: accurate detection of copy number alteration and loss of heterozygosity in impure and aneuploid tumor samples using whole-genome sequencing data

Zhenhua Yu, Yuanning Liu, Yi Shen, Minghui Wang, Ao Li
2014 Computer applications in the biosciences : CABIOS
Therefore, efficient computational methods are required to address these issues.  ...  Motivation: Whole-genome sequencing of tumor samples has been demonstrated as an efficient approach for comprehensive analysis of genomic aberrations in cancer genome.

MaSC: mappability-sensitive cross-correlation for estimating mean fragment length of single-end short-read sequencing data

Parameswaran Ramachandran, Gareth A. Palidwor, Christopher J. Porter, Theodore J. Perkins
2013 Computer applications in the biosciences : CABIOS
We observe that the mappability of different parts of the genome can introduce an artificial bias into cross-correlation computations, resulting in incorrect fragment-length estimates.  ...  Motivation: Reliable estimation of the mean fragment length for next-generation short-read sequencing data is an important step in next-generation sequencing analysis pipelines, most notably because of  ...  Naı¨ve cross-correlation, on the other hand, simply computes correlation between rows 1 and 4, regardless of mappability more efficient, especially if the lists of reads and mappable intervals are short

From Wet-Lab to Variations: Concordance and Speed of Bioinformatics Pipelines for Whole Genome and Whole Exome Sequencing

Steve Laurie, Marcos Fernandez-Callejo, Santiago Marco-Sola, Jean-Remi Trotta, Jordi Camps, Alejandro Chacón, Antonio Espinosa, Marta Gut, Ivo Gut, Simon Heath, Sergi Beltran
2016 Human Mutation
Therefore, it is essential to evaluate the robustness of the variant detection process taking into account the computing resources required.  ...  We have benchmarked six combinations of state-of-the-art read aligners (BWA-MEM and GEM3) and variant callers (FreeBayes, GATK Haplo-typeCaller, SAMtools) on whole genome and whole exome sequencing data
