A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2019; you can also visit the original URL.
The file type is application/pdf
.
Filters
Compression and fast retrieval of SNP data
2014
Bioinformatics
This, in turn, is leading to a compelling need for new methods for compression and fast retrieval of SNP data. ...
Results: We present a novel algorithm and file format for compressing and retrieving SNP data, specifically designed for large-scale association studies. ...
Conflict of interest: none declared. ...
doi:10.1093/bioinformatics/btu495
pmid:25064564
pmcid:PMC4609015
fatcat:x2rzzvaf3ze25hidwqml5oqn7a
A database for efficient storage and management of multi panel SNP data
2013
Archives Animal Breeding
Due to its vector based database storage, data imports and exports are much faster than those of other SNP databases. ...
A new strategy using SNP and individual selection vectors allows us to view SNP data as matrices or sets. ...
Secondly, for each individual the compressed genotype vector is retrieved by one SQL select and shrunk on the basis of the snp_sel_vec which can be implemented as fast shifts. ...
doi:10.7482/0003-9438-56-103
fatcat:ivj56rlnlrfcfccczksxdqkp2e
Indexing k-mers in Linear-space for Quality Value Compression
2019
Proceedings of the 12th International Joint Conference on Biomedical Engineering Systems and Technologies
Most of the entropy of sequencing data lies in the quality scores, and thus they are difficult to compress. ...
We show how a dictionary of significant k-mers, obtained from SNPs databases and multiple genomes, can be indexed in linear space and used to improve the compression of quality value. ...
The next step usually involves some sort of sorting and indexing for allowing fast retrieval of some particular k-mer (or its neighbors) and link this information to the position in the original sequence ...
doi:10.5220/0007369100210029
dblp:conf/biostec/ShibuyaC19
fatcat:lcqqjj3ffvf2hdeywhrcjm5lmu
Fast randomized approximate string matching with succinct hash data structures
2015
BMC Bioinformatics
We point out that our data structure reaches its goals without compressing its input: another positive feature, as in biological applications data is often very close to be un-compressible. ...
In this work we show that, combining hashing and succinct indexing techniques, we can attain good performances and accuracy with a memory footprint comparable to that of the most popular compressed indexes ...
Tests on both simulated and real data, using the most popular short reads aligners, allowed us to validate also in practice the efficiency of our algorithm, which proved to be extremely accurate and fast ...
doi:10.1186/1471-2105-16-s9-s4
pmid:26051265
pmcid:PMC4464037
fatcat:4yccrahs2jd5vd6condhlvsceu
Tabix: fast retrieval of sequence features from generic TAB-delimited files
2011
Bioinformatics
Tabix features include few seek function calls per query, data compression with gzip compatibility and direct FTP/HTTP access. ...
Tabix is the first generic tool that indexes position sorted files in TAB-delimited formats such as GFF, BED, PSL, SAM and SQL export, and quickly retrieves features overlapping specified regions. ...
of direct FTP/HTTP access and Jim Kent, James Bonfield and Richard Durbin for their helpful discussions on general indexing techniques. ...
doi:10.1093/bioinformatics/btq671
pmid:21208982
pmcid:PMC3042176
fatcat:5pshpfozwnb75piffwhkpd7agq
The variant call format and VCFtools
2011
Bioinformatics
VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome. ...
The variant call format (VCF) is a generic format for storing DNA polymorphism data such as SNPs, insertions, deletions and structural variants, together with rich annotations. ...
Conflict of Interest: none declared. ...
doi:10.1093/bioinformatics/btr330
pmid:21653522
pmcid:PMC3137218
fatcat:bu6imoalw5hypbfua45gzlsnpy
TheSNPpit—A High Performance Database System for Managing Large Scale SNP Data
2016
PLoS ONE
TheSNPpit has implemented three ideas to also accomodate such large scale experiments: highly compressed vector storage in a relational database, set based data manipulation, and a very fast export written ...
The fast development of high throughput genotyping has opened up new possibilities in genetics while at the same time producing considerable data handling issues. ...
Discussion and Conclusions TheSNPpit is a fast database system for storage and management of large volumes of SNP data. It can handle panels of any size, even those derived from whole genome scans. ...
doi:10.1371/journal.pone.0164043
pmid:27780248
pmcid:PMC5079601
fatcat:johokbs5vnddphcxw3qptptz3y
Relative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval
[chapter]
2010
Lecture Notes in Computer Science
Self-indexes -data structures that simultaneously provide fast search of and access to compressed text -are promising for genomic data but in their usual form are not able to exploit the high level of ...
Our 'RLZ' approach is to store a self-index for a base sequence and then compress every other sequence as an LZ77 encoding relative to the base. ...
However, large resources are required to be shared among users of the compressed data (4.2 GB of reference and SNPs in Chirstley et al.'s software). ...
doi:10.1007/978-3-642-16321-0_20
fatcat:uo2k4572obdlfat5cx6qo7sasm
De Novo NGS Data Compression
[chapter]
2017
Algorithms for Next-Generation Sequencing Data
Compression and decompression is performed as a standalone process independently of external knowledge. ...
The chapter explains the main NGS compression techniques, including lossless and lossy compression. ...
Introduction During the last decade, the fast evolution of the sequencing technologies has led to an explosion of DNA data. Every field of life science is now concerned. ...
doi:10.1007/978-3-319-59826-0_4
fatcat:pjctkfpul5cqvejdr5mtymfjm4
PanTools: representation, storage and exploration of pan-genomic data
2016
Bioinformatics
We define the pan-genome as a comprehensive representation of multiple annotated genomes, facilitating analyses on the similarity and divergence of the constituent genomes at the nucleotide, gene and genome ...
We demonstrate the performance of the tool using datasets of 62 E. coli genomes, 93 yeast genomes and 19 Arabidopsis thaliana genomes. ...
Acknowledgements We thank Maria-Anna Misiakou and Salvador Casani Galdon for valuable input. ...
doi:10.1093/bioinformatics/btw455
pmid:27587666
fatcat:sa6pinj46rgitb7ofrlum4p2jq
A SUPER Powerful Method for Genome Wide Association Study
2014
PLoS ONE
This restriction potentially leads to less statistical power when compared to using all SNPs. We developed a method to extract a small subset of SNPs and use them in FaST-LMM. ...
This method not only retains the computational advantage of FaST-LMM, but also remarkably increases statistical power even when compared to using the entire set of SNPs. ...
Miller and Linda R. Klein for editing the manuscript.
New Powerful Method for GWAS Conceived and designed the experiments: ZZ YP ESB. Performed the experiments: QW FT. Analyzed the data: QW ZZ. ...
doi:10.1371/journal.pone.0107684
pmid:25247812
pmcid:PMC4172578
fatcat:eddmwp3f6raw5kqhhzm6m3zkde
SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access
2008
BMC Bioinformatics
A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference ...
Results: We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs ...
Thanks to Albert Vernon Smith, Lalitha Krishnan and Marcela K Tello-Ruiz of HapMap for their long-standing interest and support, and to Juan Villasuso and Natalia Costas of Centro de Supercomputación de ...
doi:10.1186/1471-2105-9-428
pmid:18847484
pmcid:PMC2576268
fatcat:nd3ijfze4bbjvkczoq3tkpd24i
Better quality score compression through sequence-based quality smoothing
2019
BMC Bioinformatics
We use the FM-Index, a type of compressed suffix array, to reduce the storage requirements of a dictionary of k-mers and an effective smoothing algorithm to maintain high precision for SNP calling pipelines ...
As a result, there is an exponential growth of genomic data unfortunately not followed by an exponential growth of storage, leading to the necessity of compression. ...
of YALFF's inner structure widely used for benchmarking in other papers, because the list of known SNPs is available and it can retrieved from ftp://ussd-ftp.illumina.com/2017-1.0/hg38.
.P. ...
doi:10.1186/s12859-019-2883-5
pmid:31757199
pmcid:PMC6873394
fatcat:ec6a5zsokrfbpdjcsfynxd4ysy
SNPchiMp: a database to disentangle the SNPchip jungle in bovine livestock
2014
BMC Genomics
In addition, SNPchiMp can retrieve this information on subsets of SNPs, accessing such data either via physical position on a supported assembly, or by a list of SNP IDs, rs or ss identifiers. ...
Most researchers and breed associations manage SNP data in real-time and thus require tools to standardise data in a user-friendly manner. ...
Williams for his important feedback on the early version of this tool. ...
doi:10.1186/1471-2164-15-123
pmid:24517501
pmcid:PMC3923093
fatcat:pokwyijpyjebrepjr27wr3scfy
EnsMart: A Generic System for Fast and Flexible Access to Biological Data
2003
Genome Research
The EnsMart system (www.ensembl.org/EnsMart) provides a generic data warehousing solution for fast and flexible querying of large biological data sets and integration with third-party data and tools. ...
Both tabulated list data and biological sequence output can be generated dynamically, in HTML, text, Microsoft Excel, and compressed formats. ...
We thank the following for providing data sets: South African National Bioinformatics Institute (SANBI) and Electric Genetics, Genomics Institute of the Novartis Research Foundation (GNF), Affymetrix, ...
doi:10.1101/gr.1645104
pmid:14707178
pmcid:PMC314293
fatcat:u5twj5oxfncwjn2mkpicfx4hga
« Previous
Showing results 1 — 15 out of 1,494 results