Filters








6 Hits in 3.0 sec

BEETL-fastq: a searchable compressed archive for DNA reads

L. Janin, O. Schulz-Trieglaff, A. J. Cox
2014 Bioinformatics  
Here, we present BEETL-fastq, a tool that not only compresses FASTQ-formatted DNA reads more compactly than gzip but also permits rapid search for k-mer queries within the archived sequences.  ...  Motivation: FASTQ is a standard file format for DNA sequencing data, which stores both nucleotides and quality scores.  ...  De novo assembly of insertions One of the advantages of BEETL-fastq is its ability to extract not only the read containing the query k-mer but also its read partner from the same DNA fragment.  ... 
doi:10.1093/bioinformatics/btu387 pmid:24950811 fatcat:eol2df6h75faxlp6vxenk7b37e

BEETL-fastq: a searchable compressed archive for DNA reads [article]

Lilian Janin and Ole Schulz-Trieglaff and Anthony J. Cox
2014 arXiv   pre-print
Here we present BEETL-fastq, a tool that not only compresses FASTQ-formatted DNA reads more compactly than gzip, but also permits rapid search for k-mer queries within the archived sequences.  ...  Motivation: FASTQ is a standard file format for DNA sequencing data which stores both nucleotides and quality scores.  ...  De-novo assembly of insertions One of the advantages of BEETL-fastq is its ability to extract not only the read containing the query k-mer but also its read partner from the same DNA fragment.  ... 
arXiv:1406.4376v1 fatcat:k2n33cm34be4jpfknal5ca7eum

SFQ: Constructing and Querying a Succinct Representation of FASTQ Files

Robert Bakarić, Damir Korenčić, Dalibor Hršak, Strahil Ristov
2022 Electronics  
We provide SFQ, a software for the construction and usage of the sFASTQ format that supports variable length reads, pairing of records, and both lossless and lossy compression of quality scores.  ...  The searchable sFASTQ archive is of comparable size to the corresponding Gzip file. sFASTQ format outputs (interleaved) FASTQ records to the STDOUT stream.  ...  Acknowledgments: We are grateful to Szymon Grabowski for his useful comments and his contribution to the discussions on this work.  ... 
doi:10.3390/electronics11111783 fatcat:jojdnfdb2zfebbnxxseivkkrua

Reference-free compression of high throughput sequencing data with a probabilistic de Bruijn graph

Gaëtan Benoit, Claire Lemaitre, Dominique Lavenier, Erwan Drezen, Thibault Dayris, Raluca Uricaru, Guillaume Rizk
2015 BMC Bioinformatics  
Each read is encoded as a path in this graph, by memorizing an anchoring kmer and a list of bifurcations.  ...  The method is based on a reference probabilistic de Bruijn Graph, built de novo from the set of reads and stored in a Bloom filter.  ...  The GenOuest BioInformatics Platform provided the computing resources necessary for benchmarking.  ... 
doi:10.1186/s12859-015-0709-7 pmid:26370285 pmcid:PMC4570262 fatcat:3oohunodhbdzfduf5yfmll7wga

Using reference-free compressed data structures to analyze sequencing reads from thousands of human genomes

Dirk D. Dolle, Zhicheng Liu, Matthew Cotten, Jared T. Simpson, Zamin Iqbal, Richard Durbin, Shane A. McCarthy, Thomas M. Keane
2016 Genome Research  
In recent years, the Burrows-Wheeler transform (BWT) and FM-index have been widely employed as a full text searchable index for read alignment and de novo assembly.  ...  A key feature is that as more genomes are added, identical read sequences are increasingly observed and compression becomes more efficient.  ...  We thank John Marshall (Wellcome Trust Sanger Institute) for providing technical help and support for several aspects of this project.  ... 
doi:10.1101/gr.211748.116 pmid:27986821 pmcid:PMC5287235 fatcat:agiljybbfraljdnhhgiufoxkae

Using reference-free compressed data structures to analyse sequencing reads from thousands of human genomes [article]

Dirk-Dominic Dolle, Zhicheng Liu, Matthew L Cotten, Jared T Simpson, Zamin Iqbal, Richard Durbin, Shane McCarthy, Thomas Keane
2016 bioRxiv   pre-print
In recent years, the Burrows-Wheeler transform (BWT) and FM-index have been widely employed as a full text searchable index for read alignment and de novo assembly.  ...  A key feature is that as more genomes are added, identical read sequences are increasingly observed and compression becomes more efficient.  ...  We thank John Marshall (Wellcome Trust Sanger Institute) for providing technical help and support for several aspects of this project.  ... 
doi:10.1101/060186 fatcat:n74plfbhmfd3xhi7rcwwm5odly