Filters








37 Hits in 4.2 sec

Tabix: fast retrieval of sequence features from generic TAB-delimited files

H. Li
2011 Bioinformatics  
Tabix is the first generic tool that indexes position sorted files in TAB-delimited formats such as GFF, BED, PSL, SAM and SQL export, and quickly retrieves features overlapping specified regions.  ...  Tabix features include few seek function calls per query, data compression with gzip compatibility and direct FTP/HTTP access.  ...  of direct FTP/HTTP access and Jim Kent, James Bonfield and Richard Durbin for their helpful discussions on general indexing techniques.  ... 
doi:10.1093/bioinformatics/btq671 pmid:21208982 pmcid:PMC3042176 fatcat:5pshpfozwnb75piffwhkpd7agq

The Biological Reference Repository (BioR): a rapid and flexible system for genomics annotation

Jean-Pierre A. Kocher, Daniel J. Quest, Patrick Duffy, Michael A. Meiners, Raymond M. Moore, David Rider, Asif Hossain, Steven N. Hart, Valentin Dinu
2014 Computer applications in the biosciences : CABIOS  
The BioR toolkit provides the functionality to combine and retrieve annotation from these catalogs via the command-line interface.  ...  Commands from the toolkit can be combined with other UNIX commands for advanced annotation processing. We also provide instructions for the development of custom annotation pipelines.  ...  ACKNOWLEDGEMENT The authors thank the Center for Individualized Medicine at Mayo Clinic for funding the development of BioR. Conflict of Interest: none declared.  ... 
doi:10.1093/bioinformatics/btu137 pmid:24618464 pmcid:PMC4071205 fatcat:x6dcfudyynfa5pqqnj4ba3sle4

The variant call format and VCFtools

P. Danecek, A. Auton, G. Abecasis, C. A. Albers, E. Banks, M. A. DePristo, R. E. Handsaker, G. Lunter, G. T. Marth, S. T. Sherry, G. McVean, R. Durbin
2011 Bioinformatics  
VCF is usually stored in a compressed manner and can be indexed for fast data retrieval of variants from a range of positions on the reference genome.  ...  VCFtools is a software suite that implements various utilities for processing VCF files, including validation, merging, comparing and also provides a general Perl API. Availability:  ...  Conflict of Interest: none declared.  ... 
doi:10.1093/bioinformatics/btr330 pmid:21653522 pmcid:PMC3137218 fatcat:bu6imoalw5hypbfua45gzlsnpy

GORpipe: a query tool for working with sequence data based on a Genomic Ordered Relational (GOR) architecture

Hákon Guðbjartsson, Guðmundur Fr. Georgsson, Sigurjón A. Guðjónsson, Ragnar þór Valdimarsson, Jóhann H. Sigurðsson, Sigmar K. Stefánsson, Gísli Másson, Gísli Magnússon, Vilmundur Pálmason, Kári Stefánsson
2016 Bioinformatics  
Motivation: Our aim was to create a general-purpose relational data format and analysis tools to provide an efficient and coherent framework for working with large volumes of DNA sequence data.  ...  The system can for instance be used to annotate sequence variants, find genomic spatial overlap between various types of genomic features, filter and aggregate them in various ways.  ...  In the rest of this paper we introduce the GOR architecture, briefly describe our tab-delimited storage format and explain how other genomic ordered tabular formats can be used with our system.  ... 
doi:10.1093/bioinformatics/btw199 pmid:27339714 pmcid:PMC5048061 fatcat:rxxgwarcorgidb2fo3go5chzly

ClinVar data parsing

Xiaolei Zhang, Eric V. Minikel, Anne H. O'Donnell-Luria, Daniel G. MacArthur, James S. Ware, Ben Weisburd
2017 Wellcome Open Research  
Li H: Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics. 2011; 27(5): 718-9. PubMed Abstract | Publisher Full Text | Free Full Text 5.  ...  This software repository provides a pipeline for converting raw ClinVar data files into analysis-friendly tab-delimited tables, and also provides these tables for the most recent ClinVar release.  ...  Join the TXT file to aggregate the clinical significances from multiple submitters and generate VCF files. • Join with ExAC or gnomAD data and generate table files.  ... 
doi:10.12688/wellcomeopenres.11640.1 pmid:28630944 pmcid:PMC5473414 fatcat:ahkz2duzebdtnoeaksockxplc4

HTSlib: C library for reading/writing high-throughput sequencing data

James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies
2021 GigaScience  
Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use  ...  Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files.  ...  HTSlib includes 2 standalone programs that work with BGZF; bgzip is a general-purpose compression tool while tabix works on tab-delimited genome coordinate files (e.g., BED and GFF) and provides indexing  ... 
doi:10.1093/gigascience/giab007 pmid:33594436 pmcid:PMC7931820 fatcat:sxbk4myxvzajbl2zg27vf3jqzu

HTSlib - C library for reading/writing high-throughput sequencing data [article]

James K Bonfield, John Marshall, Petr Danecek, Heng Li, Valeriu Ohan, Andrew Whitwham, Thomas Keane, Robert M Davies
2020 bioRxiv   pre-print
Considerable improvements have been made to the original code plus many new features including newer access protocols, the addition of the CRAM file format, better indexing and iterators, and better use  ...  Since the original publication of the VCF and SAM formats, an explosion of software tools have been created to process these data files.  ...  HTSlib includes two standalone programs that work with BGZF; bgzip is a general purpose compression tool while tabix works on tab delimited genome coordinate files (e.g.  ... 
doi:10.1101/2020.12.16.423064 fatcat:6e6lbp36zral7kpezgkhqrc3l4

FEATnotator: A tool for integrated annotation of sequence features and variation, facilitating interpretation in genomics experiments

Ram Podicheti, Keithanne Mockaitis
2015 Methods  
Association of genomic positional information, such as results from an expansive variety of next-generation sequencing experiments, with annotated reference features such as genes or predicted protein  ...  When the experimental system includes polymorphic genomic inputs, rapid calculation of gene structural and protein translational effects of sequence variation from the reference can be invaluable.  ...  inputs for FEATnotator are tab delimited text files in which each row represents a single locus.  ... 
doi:10.1016/j.ymeth.2015.04.028 pmid:25934264 fatcat:5bdztvdijjaufpgbxwc5nj5xge

The Pancreatic Islet Regulome Browser

Loris Mularoni, Mireia Ramos-Rodríguez, Lorenzo Pasquali
2017 Frontiers in Genetics  
We herein present the Islet Regulome Browser, a tool that allows fast access and exploration of pancreatic islet epigenomic and transcriptomic data produced by different labs worldwide.  ...  of the non-coding genome.  ...  We would also like to thank Iñaki Martinez, System Administrator at the Program for Predictive and Personalized Medicine of Cancer at the Institute Germans Trias i Pujol (IGPT) Bioinformatics Core, for  ... 
doi:10.3389/fgene.2017.00013 pmid:28261261 pmcid:PMC5306130 fatcat:ramx2joutrco3kumparlxzkgea

VAS: a convenient web portal for efficient integration of genomic features with millions of genetic variants

Eric Ho, Qin Cao, Sau Lee, Kevin Y Yip
2014 BMC Genomics  
Conclusions: VAS is specially designed to handle annotation tasks with long lists of genetic variants and large numbers of annotating features efficiently.  ...  High-throughput experimental methods have fostered the systematic detection of millions of genetic variants from any human genome.  ...  The integration results are stored in a tab-delimited file. The user will then be shown a summary page of the integration results.  ... 
doi:10.1186/1471-2164-15-886 pmid:25306238 pmcid:PMC4210471 fatcat:wflio4bi6nhdpnfwu7aqxeycle

VIVA (VIsualization of VAriants): A VCF file visualization tool [article]

George A. Tollefson, Jessica Schuster, Fernando Gelin, Ashok Ragavendran, Isabel Restrepo, Paul Stey, James Padbury, Alper Uzun
2019 bioRxiv   pre-print
ABSTRACTThe volume and pace of data accumulation from high-throughput sequencing studies have been amplified by recent rapid technological advances in biological sciences.  ...  Visualization of genomic data is essential for quality control, exploration, and interpretation.  ...  Since next generation sequencing is becoming increasingly accessible to 31 researchers and clinicians, the ability to easily retrieve and visualize genomic data from 32 VCF files is needed.  ... 
doi:10.1101/589879 fatcat:2amqyvoq6ra3ffoklwelfzkvye

FASTAFS: file system virtualisation of random access compressed FASTA files [article]

Youri Hoogstrate, Guido Jenster, Harmen JG van de Werken
2020 bioRxiv   pre-print
The relatively large files require additional files beyond the scope of the original format, to identify sequences and provide random access.  ...  This guarantees in-sync virtualised metadata files and offers fast random-access decompression using Zstandard (zstd).  ...  Li, "Tabix: fast retrieval of sequence features from generic TAB-delimited files," Bioinformatics, 249 vol. 27, no. 5, pp. 718-719, 2011, doi: 10.1093/bioinformatics/btq671. 250 [20] C.  ... 
doi:10.1101/2020.11.11.377689 fatcat:4es44ocbanhhnko3nt4ll4o4ya

Vcfanno: fast, flexible annotation of genetic variants [article]

Brent Pedersen, Ryan Layer, Aaron Quinlan
2016 bioRxiv   pre-print
Vcfanno can extract and summarize multiple attributes from one or more annotation files and append the resulting annotations to the INFO field of the original VCF file.  ...  However, comprehensive variant annotation with diverse file formats is difficult with existing methods.Results: We have developed vcfanno as a flexible toolset that simplifies the annotation of genetic  ...  A simple configuration file is used to specify both the source files and the set of attributes (in the case of VCF) or columns (in the case of BED or other tab-delimited) that should be added to the query  ... 
doi:10.1101/041863 fatcat:aqnglywf4rhctkwhk4efy2uj7y

Security Provisioning and Compression of Diverse Genomic Data based on Advanced Encryption Standard (AES) Algorithm

Raveendra Gudodagi, R. Venkata Siva Reddy
2021 International Journal of Biology and Biomedical Engineering  
The paper discusses sequenced DNA, which may take the form of raw data obtained from sequencing.  ...  One of the main issues faced by genomic laboratories is the 'cost of storage' due to the large data file of the human genome (ranging from 30 GB to 200 GB).  ...  The SAM Format is a text format used in a series of ASCII columns delimited by tab to store the sequence data.  ... 
doi:10.46300/91011.2021.15.14 fatcat:qwaxau5ia5bsnns53unacgc7wq

Vcfanno: fast, flexible annotation of genetic variants

Brent S. Pedersen, Ryan M. Layer, Aaron R. Quinlan
2016 Genome Biology  
Here we describe vcfanno, which flexibly extracts and summarizes attributes from multiple annotation files and integrates the annotations within the INFO column of the original VCF file.  ...  The integration of genome annotations is critical to the identification of genetic variants that are relevant to studies of disease or other traits.  ...  A simple configuration file is used to specify both the source files and the set of attributes (in the case of VCF) or columns (in the case of BED or other tab-delimited formats) that should be added to  ... 
doi:10.1186/s13059-016-0973-5 pmid:27250555 pmcid:PMC4888505 fatcat:3dxgarc47zdmhpqrsoksmhleia
« Previous Showing results 1 — 15 out of 37 results