298 Hits in 4.0 sec

NOBLE – Flexible concept recognition for large-scale biomedical natural language processing

Eugene Tseytlin, Kevin Mitchell, Elizabeth Legowski, Julia Corrigan, Girish Chavan, Rebecca S. Jacobson
2016 BMC Bioinformatics  
Fast Dictionary Lookup Annotator.  ...  We benchmarked speed and accuracy of the system against the CRAFT and ShARe corpora as reference standards and compared it to MMTx, MGrep, Concept Mapper, cTAKES Dictionary Lookup Annotator, and cTAKES  ...  We also thank Guergana Savova, Sean Finan, and the cTAKES team at Boston Children's Hospital for assistance in use of the cTAKES pipeline and dictionary annotators.  ... 
doi:10.1186/s12859-015-0871-y pmid:26763894 pmcid:PMC4712516 fatcat:kyhuk7x4ordsblw7imamiymbsi

PGx: Putting Peptides to BED

Manor Askenazi, Kelly V. Ruggles, David Fenyö
2015 Journal of Proteome Research  
To bring the resulting genomic, transcriptomic, and proteomic data sets into coherence, tools must be developed that do not constrain data acquisition and analytics in any way but rather provide simple  ...  Every molecular player in the cast of biology's central dogma is being sequenced and quantified with increasing ease and coverage.  ...  The dictionary is used to rapidly lookup and to retrieve all proteins that might contain an experimentally observed peptide based on the occurrence of its constituent 4-mers.  ... 
doi:10.1021/acs.jproteome.5b00870 pmid:26638927 pmcid:PMC4782174 fatcat:kv6zfpcdbnfrbkanxucashi75u

SeqRepo: A system for managing local collections of biological sequences

Reece K. Hart, Andreas Prlić, Ruslan Kalendar
2020 PLoS ONE  
SeqRepo provides fast random access to sequence slices.  ...  For example, a digest-based identifier may be used to refer to proprietary reference genomes or segments of a graph genome, for which conventional identifiers will not be available.  ...  A genomic slice can be retrieved in one query, resulting in a theoretical maximum throughput of 10 sequence slices/second. Server-side rate limiting also defeats client-side parallelism.  ... 
doi:10.1371/journal.pone.0239883 pmid:33270643 fatcat:nll22vjngfb2dklabpoj2jkdwq

Text analytics for life science using the Unstructured Information Management Architecture

R. Mack, S. Mukherjea, A. Soffer, N. Uramoto, E. Brown, A. Coden, J. Cooper, A. Inokuchi, B. Iyer, Y. Mass, H. Matsuzawa, L. V. Subramaniam
2004 IBM Systems Journal  
Acknowledgments BioTeKS is in large part a systems integration effort that builds on technologies and expertise developed  ...  Dictionary Lookup Assigns a lexical or semantic category to a string (e.g., "Trk A is a protein") and other lexical information General-purpose dictionary-based pattern-matching lookup.  ...  Dictionary lookup methods identify terms relative to ontology and dictionary resources, but in some cases, like MeSH (Medical Subject Headings), there is additional information for placing terms in a hierarchical  ... 
doi:10.1147/sj.433.0490 fatcat:altfinouzbdy7mrcqx2kzdzecy

How to Store a Random Walk [article]

Emanuele Viola, Omri Weinstein, Huacheng Yu
2019 arXiv   pre-print
the information-theoretic minimum space, while at the same time decoding each X_i in constant time.  ...  i=1^n-1 (deg(v_i)), we present a data structure with O(1) extra bits at the price of O( n) decoding time, and show that any improvement on this would lead to an improved solution on the long-standing Dictionary  ...  To see why a vertex between two milestones V i and V i+Θ(lg n) can be retrieved efficiently, note that the DPT dictionary allows us to retrieve each symbol in the vector of milestones in constant time.  ... 
arXiv:1907.10874v1 fatcat:4r2jonfohnaz5ksrtzjmxremuq

Fast, Quantitative and Variant Enabled Mapping of Peptides to Genomes

Christoph N. Schlaffner, Georg J. Pirklbauer, Andreas Bender, Jyoti S. Choudhary
2017 Cell Systems  
other omics datasets are inadequate for large-scale studies and capture only basic sequence identity information.  ...  In addition, extended functionality enables representation of single-nucleotide variants, post-translational modifications, and quantitative features.  ...  Identifying Proteins of Origin for Peptides To allow fast lookup of proteins containing any given peptide PoGo creates a dictionary of words with length k (k-mer) overlapping by k-1 amino acids from the  ... 
doi:10.1016/j.cels.2017.07.007 pmid:28837811 pmcid:PMC5571441 fatcat:ylacbkl75bgojjhw6tmsigf3zu

PoGo: Jumping from Peptides to Genomic Loci [article]

Christoph N. Schlaffner, Georg Pirklbauer, Andreas Bender, Jyoti S. Choudhary
2016 bioRxiv   pre-print
We developed PoGo for mapping peptides identified through mass spectrometry to a reference genome to overcome these limitations.  ...  Current tools for visualization and integration of proteomics with other omics datasets are inadequate for large-scale studies and capture only basic sequence identity information.  ...  To allow fast lookup of proteins containing 224 any given peptide PoGo creates a dictionary of words with length k (k-mer) overlapping by k-225 1 amino acids from the protein sequences in the FASTA input  ... 
doi:10.1101/079772 fatcat:u4urrl6wfnd2rjvcgjb33ofqay

EPR-dictionaries: A practical and fast data structure for constant time searches in unidirectional and bidirectional FM-indices [article]

Christopher Pockrandt, Marcel Ehrhardt, Knut Reinert
2016 arXiv   pre-print
This is done by replacing the binary wavelet tree by a new data structure, the Enhanced Prefixsum Rank dictionary (EPR-dictionary).  ...  We introduce a new, practical method for conducting an exact search in a uni- and bidirectional FM index in O(1) time per step while using O(σ * n) + o(σ * σ * n) bits of space.  ...  Acknowledgments We would like to acknowledge Enrico Siragusa for his previous implementations of the FM index in SeqAn.  ... 
arXiv:1608.02413v2 fatcat:t32w7z4uxnfzlpfggmgxmdn44i

PSimScan: Algorithm and Utility for Fast Protein Similarity Search

Anna Kaznadzey, Natalia Alexandrova, Vladimir Novichkov, Denis Kaznadzey, Haixu Tang
2013 PLoS ONE  
The optimization starts at the lookup table construction, then the initial lookup table-based hits are passed through a pipeline of filtering and aggregation routines of increasing computational complexity  ...  Citation: Kaznadzey A, Alexandrova N, Novichkov V, Kaznadzey D (2013) PSimScan: Algorithm and Utility for Fast Protein Similarity Search. PLoS ONE 8(3): e58505.  ...  Niels Larsen of Danish Genome Institute for fruitful discussions on PSimScan requirements, use cases, test design and results. Author Contributions Algorithm design: VN DK.  ... 
doi:10.1371/journal.pone.0058505 pmid:23505522 pmcid:PMC3591303 fatcat:pf736teqrrg3jpx7dwsnllmmmy

SEA: The Small RNA Expression Atlas [article]

Raza-Ur Rahman, Abdul Sattar, Maksims Fiosins, Abhivyakti Gautam, Daniel Sumner Magruder, Joern Bethune, Sumit Madan, Juliane Fluck, Stefan Bonn
2017 bioRxiv   pre-print
We believe that SEAs simple interface and fast search in combination with its detailed interactive reports will enable researchers to better understand the potential function and diagnostic value of sRNAs  ...  SEA contains re-analyzed sRNA expression information for over 2000 published samples, including over 700 disease datasets and over 600 novel, high-quality predicted miRNAs.  ...  Genome versions will be updated with every major release of SEA. SEA will be backwards compatible in the future by allowing users to choose previous genome versions and annotations.  ... 
doi:10.1101/133199 fatcat:o2q67o6dcfgarlr4j4gswkkbqm

HIVE-Hexagon: High-Performance, Parallelized Sequence Alignment for Next-Generation Sequencing Data Analysis

Luis Santana-Quintero, Hayley Dingerdissen, Jean Thierry-Mieg, Raja Mazumder, Vahan Simonyan, Tom Gilbert
2014 PLoS ONE  
Inexact alignments can take up to 90% of total CPU time in bioinformatics pipelines.  ...  PhD, Associate Director for Research, Office of Vaccines Research and Review, FDA CBER, for providing a deep insight and great advice regarding the nature and biology of the next-generation sequencing, in  ...  Lookup Step For every short read, HIVE-hexagon retrieves the K-mers sequentially and matches them to a seed-dictionary to obtain the list of occurrences of each particular K-mer on a reference sequence  ... 
doi:10.1371/journal.pone.0099033 pmid:24918764 pmcid:PMC4053384 fatcat:xw3gg4d52vgfhn2u4eq5256g6y

Identifying peripheral arterial disease cases using natural language processing of clinical notes

Naveed Afzal, Sunghwan Sohn, Sara Abram, Hongfang Liu, Iftikhar J. Kullo, Adelaide M. Arruda-Olson
2016 2016 IEEE-EMBS International Conference on Biomedical and Health Informatics (BHI)  
In this paper, we describe a natural language processing (NLP) algorithm for automated ascertainment of PAD status from clinical notes using predetermined criteria.  ...  Acknowledgments This study has supported by National Heart, Lung, and Blood Institute of the National Institutes of Health award number K01HL124045, the NHGRI's eMERGE (Electronic Records and Genomics)  ...  evidence from clinical notes For text processing, we used our in-house program MedTagger [29] , a NLP pipeline with a fast dictionary lookup, to process clinical text and annotate clinical concepts.  ... 
doi:10.1109/bhi.2016.7455851 pmid:28111640 pmcid:PMC5248569 dblp:conf/bhi/AfzalSALKA16 fatcat:l5zhta53mnhpbbnesy6bp7d3xi

Selected abstracts of "Bioinformatics: from Algorithms to Applications 2020" conference

2020 BMC Bioinformatics  
P8 Installing and searching BLAST databases in a data science framework Graham Alvare 1 , Abiel Roche-Lima 2 , Brian Fristensky 3* Acknowledgments This research was supported by the Russian Science  ...  P10 Transcriptomic signatures of seed maturation heterochrony in garden pea (Pisum sativum L) accessions Acknowledgments This work was financially supported by the Russian Science Foundation (Grant No  ...  Second, we represent a partition as a sorted list of super-k-mers to ensure fast retrieval of k-mers.  ... 
doi:10.1186/s12859-020-03838-2 pmid:33327929 fatcat:2t65jee32rgsnohhdwd7vbwj74

Using Nanoinformatics Methods for Automatically Identifying Relevant Nanotoxicology Entities from the Literature

Miguel García-Remesal, Alejandro García-Ruiz, David Pérez-Rey, Diana de la Iglesia, Víctor Maojo
2013 BioMed Research International  
Nanoinformatics is an emerging research field that uses informatics techniques to collect, process, store, and retrieve data, information, and knowledge on nanoparticles, nanomaterials, and nanodevices  ...  This research is a "proof of concept" that can be expanded to stimulate further developments that could assist researchers in managing data, information, and knowledge at the nanolevel, thus accelerating  ...  methods from text mining, information retrieval, and how they perform with different information types.  ... 
doi:10.1155/2013/410294 pmid:23509721 pmcid:PMC3591181 fatcat:2wiewurvpvbqjmwwrr7nplijey

Full-text and Keyword Indexes for String Searching [article]

Aleksander Cisłak
2015 arXiv   pre-print
In our approach, the count table and the occurrence lists store information about selected q-grams in addition to the individual characters.  ...  Query times in the order of 1 microsecond were reported for one mismatch for a few-megabyte natural language dictionary on a medium-end PC.  ...  retrieval.  ... 
arXiv:1508.06610v1 fatcat:5pmce2d72veuxpw3s5u6hbidim
« Previous Showing results 1 — 15 out of 298 results