Filters








500 Hits in 5.4 sec

Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform

A. J. Cox, M. J. Bauer, T. Jakobi, G. Rosone
2012 Bioinformatics  
Motivation The Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented  ...  building of compressed full text indexes such as the FM-index on large-scale DNA sequence collections.  ...  ACKNOWLEDGEMENT The authors would like to thank Dirk Evers for his support throughout this project.  ... 
doi:10.1093/bioinformatics/bts173 pmid:22556365 fatcat:fz2icliuazaifkskeyvvzgbmce

25 Years of the Burrows-Wheeler Transform (Dagstuhl Seminar 19241)

Travis Gagie, Giovanni Manzini, Gonzalo Navarro, Jens Stoye, Michael Wagner
2019 Dagstuhl Reports  
Dagstuhl Seminar 19241 ("25 Years of the Burrows-Wheeler Transform") took place from June 10th to 14th, 2019, and was attended by 45 people from 13 countries and the three fields of Algorithms and Data  ...  Feedback was generally positive and we are confident the seminar fostered interdisciplinary connections and will eventually result in noteworthy joint publications. License Creative Commons BY 3.0  ...  We present an index data structure for the bijective Burrows-Wheeler transform [1] . The index data structure is based on the FM index [2] .  ... 
doi:10.4230/dagrep.9.6.55 dblp:journals/dagstuhl-reports/GagieMNS19 fatcat:vw3nruzqkzbrdi3rpxpqtjkosa

Burrows-Wheeler transform for terabases [article]

Jouni Sirén
2016 arXiv   pre-print
A key method is the Burrows-Wheeler transform (BWT), which is widely used for compressing and indexing reads.  ...  With large projects sequencing thousands of individuals, this raises the need for tools capable of handling terabases of sequence data.  ...  We can also sort the reads by their likely positions in a reference genome. This position order is useful for both compression and storing the pairing information for the reads.  ... 
arXiv:1511.00898v2 fatcat:ohalmhyi5jhihifc7jq22vphcy

Annotating Large Genomes With Exact Word Matches

J. Healy
2003 Genome Research  
We create a Burrows-Wheeler transform of the genome, which together with auxiliary data structures facilitating counting, can reside in about one gigabyte of RAM.  ...  Thus we can readily annotate any sequence, including the entire human genome, with the counts of its constituent words.  ...  ACKNOWLEDGMENTS The publication costs of this article were defrayed in part by payment of page charges.  ... 
doi:10.1101/gr.1350803 pmid:12975312 pmcid:PMC403711 fatcat:ji7w5kiwa5hjfiyf6n746uke4e

BWtrs: A tool for searching for tandem repeats in DNA sequences based on the Burrows–Wheeler transform

Rafal Pokrzywa, Andrzej Polanski
2010 Genomics  
We introduce a very efficient, web-based tool for large scale searching for exact tandem repeats in genomes, based on the use of the Burrows-Wheeler Transform.  ...  The Burrows-Wheeler Tandem Repeat Searcher (BWtrs) is an on-line application that searches for the exact occurrences of tandem repetitions in DNA sequences.  ...  N516 441938 "Efficient methods of genome browsing based on the Burrows Wheeler Transform" and by the European Community from the European Social Fund.  ... 
doi:10.1016/j.ygeno.2010.08.001 pmid:20709168 fatcat:gscb3epo3bf65dwfc5qdbfvmw4

Computational biology in the 21st century

Bonnie Berger, Noah M. Daniels, Y. William Yu
2016 Communications of the ACM  
Acknowledgments This work is supported by the National Institutes of Health, under grant GM108348. Y.W.Y. is also supported by a Hertz Fellowship.  ...  techniques such as the Burrows-Wheeler Transform (BWT) take advantage of aspects of sequence structure 3 to speed up computation and save storage.  ...  rely on algorithmic approaches such as the Burrows-Wheeler transform (BWT), which provides efficient string compression through a reversible transformation, while the FM-index data structure is a compressed  ... 
doi:10.1145/2957324 pmid:28966343 pmcid:PMC5615407 fatcat:h33qu34kdvehjldnbvvvgnrnqq

Next Generation Sequencing Data and its Compression

Bruno Carpentieri
2019 IOP Conference Series: Earth and Environment  
Thanks to the large-scale sequencing of samples of DNA, the interest and the new research in these areas by the scientific community are suddenly grown.  ...  of generic compression tools such as gzip and bzip2 by confronting them with a specific system that was designed specifically for genomic file compression: quip.  ...  Acknowledgment The author thanks his student Felice D'Avino for conducting preliminary tests on the compression of FASTQ and SAM/BAM files.  ... 
doi:10.1088/1755-1315/362/1/012059 fatcat:ufnk3r4r25aq3ko275brwqxlca

Distributed hybrid-indexing of compressed pan-genomes for scalable and fast sequence alignment

Altti Ilari Maarala, Ossi Arasalo, Daniel Valenzuela, Veli Mäkinen, Keijo Heljanko, Yanbin Yin
2021 PLoS ONE  
Computational pan-genomics utilizes information from multiple individual genomes in large-scale comparative analysis.  ...  (n = 599) were blasted to the compressed index of 488 GB GenBank database (n = 13,375,031) in 26 minutes on 25 nodes. 78 MB mixed sequences (n = 4,167) were blasted to the compressed index of 18 GB E.  ...  Acknowledgments CSC-IT Center for Science and the Finnish Grid and Cloud Infrastructure2 (FGCI2) are gratefully acknowledged for providing the computing capacity and their expertise.  ... 
doi:10.1371/journal.pone.0255260 pmid:34343181 pmcid:PMC8330939 fatcat:wq2tg5obivbzhpnneb3nylpbnm

NINJA-OPS: Fast Accurate Marker Gene Alignment Using Concatenated Ribosomes

Gabriel A. Al-Ghalith, Emmanuel Montassier, Henry N. Ward, Dan Knights, Jonathan A. Eisen
2016 PLoS Computational Biology  
NINJA takes advantage of the Burrows-Wheeler (BW) alignment using an artificial reference chromosome composed of concatenated reference sequences, the "concatesome," as the BW input.  ...  We present an alternative technique that takes advantage of a high-speed Burrows-Wheeler alignment procedure combined with rapid filtering and parsing of the data to remove bottlenecks in the pipeline.  ...  Originally conceived as a means to make data more compressible, the Burrows-Wheeler transform (BWT) [11] is a lossless, reversible transformation that effectively positions series of like characters  ... 
doi:10.1371/journal.pcbi.1004658 pmid:26820746 pmcid:PMC4731464 fatcat:vypfem2cefgnrg4baetkrf223y

metaBEETL: high-throughput analysis of heterogeneous microbial populations from shotgun DNA sequences

Christina Ander, Ole B Schulz-Trieglaff, Jens Stoye, Anthony J Cox
2013 BMC Bioinformatics  
It uses a Burrows-Wheeler Transform (BWT) index of the sequencing reads and an indexed database of microbial reference sequences.  ...  The advent of next-generation sequencing has made it feasible for the Human Microbiome Project and other initiatives to generate ESS data on a large scale, but computationally efficient methods for analysing  ...  Acknowledgements C.A. receives a scholarship from the CLIB Graduate Cluster Industrial Biotechnology. Declarations The fees for the publication of this article were funded by Illumina Inc.  ... 
doi:10.1186/1471-2105-14-s5-s2 pmid:23734710 pmcid:PMC3622627 fatcat:bakccnlzlzglpkosw6c6x3whgq

The real cost of sequencing: scaling computation to keep pace with data generation

Paul Muir, Shantao Li, Shaoke Lou, Daifeng Wang, Daniel J Spakowicz, Leonidas Salichos, Jing Zhang, George M. Weinstock, Farren Isaacs, Joel Rozowsky, Mark Gerstein
2016 Genome Biology  
Communal sequence databases were developed in the 1980s [5, 6], but most investigators worked with data of a scale that allowed transfer to and processing on a local client.  ...  The relative scaling behavior of these evolving technologies will impact genomics research moving forward.  ...  (BLAST-like Alignment Tool) [23] , MAQ [24] , and Novoalign [25] ) or suffix arrays with the Burrows-Wheeler transform (for example, STAR (Spliced Transcripts Alignment to a Reference) [26] , BWA  ... 
doi:10.1186/s13059-016-0917-0 pmid:27009100 pmcid:PMC4806511 fatcat:nfbkfi3q55dxfg7fsozykzdtr4

A Very Fast Algorithm for Detecting Partially Plagiarized Documents Using FM-Index

Chang SeokOck, JongKyu Seo, Sung-Hwan Kim, Hwan-Gue Cho
2013 International Journal of Computer and Communication Engineering  
The method is based on the Burrows-Wheeler Transform (BWT) and the FM-index for BWT search.  ...  Index Terms-Burrows-wheeler transform, FM-index, plagiarism detection.  ...  Generation Sequencing (NGS) [1] , [2] . Burrows-Wheeler Transform (BWT), which is a block-sorting algorithm [3] , and FM-index data structures [4] are used to index the corpus.  ... 
doi:10.7763/ijcce.2013.v2.194 fatcat:lvt5msu6gnbz5jnwrlmfr3m4aq

Optimizing Burrows-Wheeler Transform-Based Sequence Alignment on Multicore Architectures

Jing Zhang, Heshan Lin, Pavan Balaji, Wu-Chun Feng
2013 2013 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing  
Computational biology sequence alignment tools using the Burrows-Wheeler Transform (BWT) are widely used in next-generation sequencing (NGS) analysis.  ...  However, despite extensive optimization efforts, the performance of these tools still cannot keep up with the explosive growth of sequencing data.  ...  ACKNOWLEDGMENT This work is in part supported by NSF Grant 0916719 "Collaborative Research: Hybrid Opportunistic Computing for Green Clouds" and NSF Grant 1048253 "Commoditizing Data-Intensive Biocomputing in the  ... 
doi:10.1109/ccgrid.2013.67 dblp:conf/ccgrid/ZhangLBF13 fatcat:6olzfr37wfcjvp62vboodprqfa

Burrows Wheeler Transform on a Large Scale: Algorithms Implemented in Apache Spark [article]

Ylenia Galluzzo, Raffaele Giancarlo, Mario Randazzo, Simona E. Rombo
2021 arXiv   pre-print
Indexing and compressing large sequences datasets are some of the most important tasks in this context.  ...  Here we propose algorithms for the computation of Burrows Wheeler transform relying on Big Data technologies, i.e., Apache Spark and Hadoop.  ...  BWT The Burrows-Wheeler transform of S is useful in order to rearrange it into runs of similar characters. This may have advantages both for indexing and for compressing more efficiently S.  ... 
arXiv:2107.03341v1 fatcat:cjnjnkpoovfdzpuqcef63jceq4

Computational solutions for omics data

Bonnie Berger, Jian Peng, Mona Singh
2013 Nature reviews genetics  
This trend towards the democratization of genome-scale technologies means that large data sets are being generated and used by individual bench biologists.  ...  Further complicating matters is that new genomic data are often best interpreted in the context of the heterogeneous large-scale data sets that have already been deposited in publicly available repositories  ...  Acknowledgments The authors thank and L. Cowen for valuable feedback. B.B. thanks the US National Institutes of Health (NIH) for grant GM081871.  ... 
doi:10.1038/nrg3433 pmid:23594911 pmcid:PMC3966295 fatcat:b7n6xwzyc5gqzo7plgyoe257iq
« Previous Showing results 1 — 15 out of 500 results