Filters








2,039 Hits in 6.7 sec

Human gene name normalization using text matching with automatically extracted synonym dictionaries

Haw-ren Fang, Kevin Murphy, Yang Jin, Jessica S. Kim, Peter S. White
2006 Proceedings of the HLT-NAACL BioNLP Workshop on Linking Natural Language and Biology - LNLBioNLP '06   unpublished
The identification of genes in biomedical text typically consists of two stages: identifying gene mentions and normalization of gene names.  ...  The system identifies human gene synonyms from online databases to generate an extensive synonym lexicon.  ...  The authors acknowledge Shannon Davis and Jeremy Lautman for gene dictionary assessment, Steven Carroll for gene tagger implementation and results, Penn BioIE annotators for annotation of the gold standard  ... 
doi:10.3115/1654415.1654423 fatcat:gpzzssscgnejphbiottilgzlpm

Human gene name normalization using text matching with automatically extracted synonym dictionaries

Haw-ren Fang, Kevin Murphy, Yang Jin, Jessica S. Kim, Peter S. White
2006 Proceedings of the Workshop on Linking Natural Language Processing and Biology Towards Deeper Biological Literature Analysis - BioNLP '06   unpublished
The identification of genes in biomedical text typically consists of two stages: identifying gene mentions and normalization of gene names.  ...  The system identifies human gene synonyms from online databases to generate an extensive synonym lexicon.  ...  The authors acknowledge Shannon Davis and Jeremy Lautman for gene dictionary assessment, Steven Carroll for gene tagger implementation and results, Penn BioIE annotators for annotation of the gold standard  ... 
doi:10.3115/1567619.1567627 fatcat:3hzbaobfcfe2tpzb5736zkrupi

Overview of BioCreative II gene normalization

Alexander A Morgan, Zhiyong Lu, Xinglong Wang, Aaron M Cohen, Juliane Fluck, Patrick Ruch, Anna Divoli, Katrin Fundel, Robert Leaman, Jörg Hakenberg, Chengjie Sun, Heng-hui Liu (+8 others)
2008 Genome Biology  
We selected abstracts associated with articles previously curated for human genes.  ...  It is a challenging task, even for the human expert; genes are often described rather than referred to by gene symbol and, confusingly, one gene name may refer to different genes (often from different  ...  This article has been published as part of Genome Biology Volume 9 Supplement 2, 2008: The BioCreative II -Critical Assessment for Information Extraction in Biology Challenge.  ... 
doi:10.1186/gb-2008-9-s2-s3 pmid:18834494 pmcid:PMC2559987 fatcat:qxd4defmunckvn7pheotofs4aq

ProNormz – An integrated approach for human proteins and protein kinases normalization

Suresh Subramani, Kalpana Raja, Jeyakumar Natarajan
2014 Journal of Biomedical Informatics  
ProNormz incorporates a specialized synonyms dictionary for human proteins and protein kinases, a set of 15 string matching rules and a disambiguation module to achieve the normalization.  ...  The task of recognizing and normalizing protein name mentions in biomedical literature is a challenging task and important for text mining applications such as protein-protein interactions, pathway reconstruction  ...  The two preceding steps to complex biomedical text mining tasks are the automatic recognition of named entities such as genes/proteins names mention (GM) and their subsequent gene normalization (GN) to  ... 
doi:10.1016/j.jbi.2013.10.003 pmid:24144801 fatcat:jed2by6ibfcxbgcgocyuhd3uee

Moara: a Java library for extracting and normalizing gene and protein mentions

Mariana L Neves, José-María Carazo, Alberto Pascual-Montano
2010 BMC Bioinformatics  
Gene/protein recognition and normalization are important preliminary steps for many biological text mining tasks, such as information retrieval, protein-protein interactions, and extraction of semantic  ...  Moara can be used as a stand-alone application or can be incorporated in the workflow of a more general text mining system.  ...  Normalizing mentions by flexible matching Flexible matching is accomplished by exact matching between the mention extracted from the text and the synonyms in the dictionaries.  ... 
doi:10.1186/1471-2105-11-157 pmid:20346105 pmcid:PMC2851609 fatcat:hygmaaiburdnvpqg24npgux4sy

IBM Research and the University of Colorado - TREC 2003 Genomics Track

Eric W. Brown, Andrew Dolbey, Lawrence Hunter
2003 Text Retrieval Conference  
Conclusion Based on our results, we conclude that using a comprehensive gene dictionary with appropriate normalization during matching is an effective way to annotate gene mentions in biomedical text.  ...  Load gene dictionary; normalize for case, punctuation, etc.; annotate all synonyms with the canonical form Annotate Gene Mentions Load gene dictionary; normalize for case, punctuation, etc.; annotate  ... 
dblp:conf/trec/BrownDH03 fatcat:5ijtoxtrmbbkjdedthloeibi6a

What's in a gene name? Automated refinement of gene name dictionaries

Jörg Hakenberg
2007 Workshop on Biomedical Natural Language Processing  
Strategies for matching entries in a dictionary against arbitrary text use either inexact string matching that allows for known deviations, dictionaries enriched according to some observed rules, or a  ...  For instance, knowledge about words that are frequently missing in (or added to) a name ("antigen", "protein", "human") could automatically be extracted from dictionaries.  ...  The dictionary can directly be used for matching entries against text and covers 32,980 genes. The main Java classes are available on request from the authors.  ... 
dblp:conf/bionlp/Hakenberg07 fatcat:dcj5lyu6tnh7fp2uwrykohinlu

Gene and protein nomenclature in public databases

Katrin Fundel, Ralf Zimmer
2006 BMC Bioinformatics  
in terms of size of extracted dictionaries and overlap of synonyms between those.  ...  of all used names referring to a given gene or protein.  ...  A frequent approach consists in the compilation of large dictionaries of gene names that are subsequently used for matching text fragments to database identifiers.  ... 
doi:10.1186/1471-2105-7-372 pmid:16899134 pmcid:PMC1560172 fatcat:n56wp6k6ajdspchk2toalqbsyq

Normalizing biomedical terms by minimizing ambiguity and variability

Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou
2008 BMC Bioinformatics  
We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS.  ...  A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size.  ...  Our thanks to the Rebholz Text Mining Group at EMBL-EBI, Hixton, for domain expertise related to bio-resources.  ... 
doi:10.1186/1471-2105-9-s3-s2 pmid:18426547 pmcid:PMC2352870 fatcat:linka5335bhcnjft42qtnoztdm

Quantitative Assessment of Dictionary-based Protein Named Entity Tagging

H. Liu, Z.-Z. Hu, M. Torii, C. Wu, C. Friedman
2006 JAMIA Journal of the American Medical Informatics Association  
Terms obtained from the annotation fields comprised the Raw Dictionary. An automatic curation process was performed using the UMLS.  ...  A critical step for biological literature mining is biological named entity tagging (BNET) that identifies names mentioned in text and normalizes them with entries in biological databases.  ...  names mentioned in text and normalizes them with entries in biological databases. 16, 17 For example, in EDGAR, 2 which extracted relationships between cancer-related drugs and genes from the literature  ... 
doi:10.1197/jamia.m2085 pmid:16799122 pmcid:PMC1561801 fatcat:slluxn6gknhx5hpvexewpnr47a

Soft tagging of overlapping high confidence gene mention variants for cross-species full-text gene normalization

Cheng-Ju Kuo, Maurice HT Ling, Chun-Nan Hsu
2011 BMC Bioinformatics  
and full-text gene normalization.  ...  Previously, gene normalization (GN) systems are mostly focused on disambiguation using contextual information.  ...  normalization task.  ... 
doi:10.1186/1471-2105-12-s8-s6 pmid:22152021 pmcid:PMC3269941 fatcat:5ubvyowfgveudhu74sxadmmft4

DISEASES: Text mining and data integration of disease–gene associations [article]

Sune Pletscher-Frankild, Albert Pallejà, Kalliopi Tsafou, Janos X Binder, Lars Juhl Jensen
2014 bioRxiv   pre-print
The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences  ...  Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease–gene associations from biomedical abstracts.  ...  Dictionary-based methods instead rely-as the name suggests-on matching a dictionary of names against text.  ... 
doi:10.1101/008425 fatcat:psmdcpugzjbrxfuwr73cgvw2y4

DISEASES: Text mining and data integration of disease–gene associations

Sune Pletscher-Frankild, Albert Pallejà, Kalliopi Tsafou, Janos X. Binder, Lars Juhl Jensen
2015 Methods  
The system consists of a highly efficient dictionary-based tagger for named entity recognition of human genes and diseases, which we combine with a scoring scheme that takes into account co-occurrences  ...  Text mining is a flexible technology that can be applied to numerous different tasks in biology and medicine. We present a system for extracting disease-gene associations from biomedical abstracts.  ...  Dictionary-based methods instead rely-as the name suggests-on matching a dictionary of names against text.  ... 
doi:10.1016/j.ymeth.2014.11.020 pmid:25484339 fatcat:sb27mqklhzdu3jdp5pu5ug4spa

HPIminer: A text mining system for building and visualizing human protein interaction networks and pathways

Suresh Subramani, Raja Kalpana, Pankaj Moses Monickaraj, Jeyakumar Natarajan
2015 Journal of Biomedical Informatics  
HPIminer extracts human PPI information and PPI pairs from biomedical literature, and visualize their associated interactions, networks and pathways using two curated databases HPRD and KEGG.  ...  In this paper, we present HPIminer, a text mining system for visualizing human protein interactions and pathways from biomedical literature.  ...  Additionally, we utilized the curated human gene/protein names and synonyms dictionary constructed for normalization task which contains 33,580 human genes/proteins with their known synonyms [28] .  ... 
doi:10.1016/j.jbi.2015.01.006 pmid:25659452 fatcat:crlnnsuoqzhwnfwxrfwikvbsba

LINNAEUS: A species name identification system for biomedical literature

Martin Gerner, Goran Nenadic, Casey M Bergman
2010 BMC Bioinformatics  
LINNAEUS uses a dictionary-based approach (implemented as an efficient deterministic finite-state automaton) to identify species names and a set of heuristics to resolve ambiguous mentions.  ...  The task of recognizing and identifying species names in biomedical literature has recently been regarded as critical for a number of applications in text and data mining, including gene name recognition  ...  Our testing of the system reveals that MapIT will map common names such as "human" to any species with a name or synonym that contains human, e.g.  ... 
doi:10.1186/1471-2105-11-85 pmid:20149233 pmcid:PMC2836304 fatcat:eru576q7u5befdadlzwjhw3mqi
« Previous Showing results 1 — 15 out of 2,039 results