Filters








2,646 Hits in 5.2 sec

Towards a Mixed Approach to Extract Biomedical Terms from Text Corpus

Juan Antonio Lossio Ventura, Clement Jonquet, Mathieu Roche, Maguelonne Teisseire
2014 International Journal of Knowledge Discovery in Bioinformatics  
The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.  ...  () = () × () if an ested wa fa S fb a bS a ∉ () × () −× ()      ∈ ∑ 1                                  , otherwise Okapi a X () TFIDFa X () ∈ Okapi a M () HAL Id:  ... 
doi:10.4018/ijkdb.2014010101 fatcat:2qlcf6cbuba47bxkalf4ehhvn4

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [article]

Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon
2021 arXiv   pre-print
To facilitate this investigation, we compile a comprehensive biomedical NLP benchmark from publicly-available datasets.  ...  Our experiments show that domain-specific pretraining serves as a solid foundation for a wide range of biomedical NLP tasks, leading to new state-of-the-art results across the board.  ...  However, from the perspective of biomedical applications, SciBERT still adopts the mixed-domain pretraining approach, as computer science text is clearly out-domain.  ... 
arXiv:2007.15779v5 fatcat:emddce6qdzgmdmsboyrza27564

Protein Named Entity Identification Based on Probabilistic Features Derived from GENIA Corpus and Medical Text on the Web

Sagara Sumathipala, Koichi Yamada, Muneyuki Unehara, Izumi Suzuki
2015 International Journal of Fuzzy Logic and Intelligent Systems  
We present a robust and effective approach to classify biomedical named entities into protein and non-protein classes, based on a rich set of features: orthographic, keyword, morphological and newly introduced  ...  Protein named entity identification is one of the most essential and fundamental predecessor for extracting information about protein-protein interactions from biomedical literature.  ...  Introduction The evolution of biomedical text produces a strong demand for automated text mining techniques that can facilitate biomedical researchers to gather and make use of the knowledge in biomedical  ... 
doi:10.5391/ijfis.2015.15.2.111 fatcat:bjl3pfodv5aptdbrbmf57lpb3y

Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario [article]

Casimiro Pio Carrino, Jordi Armengol-Estapé, Asier Gutiérrez-Fandiño, Joan Llop-Palao, Marc Pàmies, Aitor Gonzalez-Agirre, Marta Villegas
2021 arXiv   pre-print
Interestingly, in the absence of enough clinical data to train a model from scratch, we applied mixed-domain pretraining and cross-domain transfer approaches to generate a performant bio-clinical model  ...  To the best of our knowledge, we provide the first biomedical and clinical transformer-based pretrained language models for Spanish, intending to boost native Spanish NLP applications in biomedicine.  ...  We show that biomedical models exhibit a remarkable cross-domain transfer ability on the clinical domain.  ... 
arXiv:2109.03570v2 fatcat:kxnedqd43ng7hnedtjfdmrgpmm

A Novel Text-Mining Approach for Retrieving Pharmacogenomics Associations From the Literature

Maria-Theodora Pandi, Peter J. van der Spek, Maria Koromina, George P. Patrinos
2020 Frontiers in Pharmacology  
In this study, we describe a novel text-mining approach for the extraction of pharmacogenomics associations.  ...  Articles (abstracts or full texts) that correspond to a specified query were extracted from PubMed, while concept annotations were derived by PubTator Central.  ...  As previously shown, text mining has become a widely used approach for the identification and extraction of information from unstructured text (Westergaard et al., 2018) .  ... 
doi:10.3389/fphar.2020.602030 pmid:33343371 pmcid:PMC7748107 fatcat:xj4weej36zallen7hmqucbje3q

Effectiveness of Lexico-syntactic Pattern Matching for Ontology Enrichment with Clinical Documents

K. Liu, W. W. Chapman, G. Savova, C. G. Chute, N. Sioutos, R. S. Crowley
2010 Methods of Information in Medicine  
In the first step, domain experts annotated Medically Meaningful Terms (MMTs) from each sentence within the LSP.  ...  From this set, we randomly sampled LSP instances which were examined by human judges. We used a two-step method to determine the utility of these patterns for enrichment.  ...  Acknowledgments The authors wish to thank Karma Lisa Edwards and Lucy Cafeo of the University of Pittsburgh for editorial assistance, and Kevin Mitchell for expert technical help. We thank Dr.  ... 
doi:10.3414/me10-01-0020 pmid:21057720 pmcid:PMC3125434 fatcat:bdow3usvu5hvvph73qcnrh54p4

Frontiers of biomedical text mining: current progress

P. Zweigenbaum, D. Demner-Fushman, H. Yu, K. B. Cohen
2007 Briefings in Bioinformatics  
However, a number of problems at the frontiers of biomedical text mining continue to present interesting challenges and opportunities for great improvements and interesting research.  ...  In this article we review the current state of the art in biomedical text mining or 'BioNLP' in general, focusing primarily on papers published within the past year.  ...  KBC was supported by NIH grants 'Construction of a Full Text Corpus for Biomedical Text Mining' (#1G08LM009639-01) and 'Technology Development for a Molecular Biology Knowledge-base' (#5R01 LM008111-03  ... 
doi:10.1093/bib/bbm045 pmid:17977867 pmcid:PMC2516302 fatcat:4nfbokb7lfdjbnkijw2v6754qy

An Examination of the Statistical Laws of Semantic Change in Clinical Notes

Kevin J Peterson, Hongfang Liu
2021 AMIA Annual Symposium Proceedings  
We also find that domain-specific biomedical terms change faster compared to general English words.  ...  We also explore a new facet of change: whether domain-specific clinical terms exhibit different change patterns compared to general-purpose English.  ...  QuickUMLS, 38 a concept extraction tool built on the UMLS, was used to detect whether a word was a biomedical term.  ... 
pmid:34457167 pmcid:PMC8378619 fatcat:zhsmwfw4ynfgpio6zgouotmdhi

Extracting information from textual documents in the electronic health record: a review of recent research

S M Meystre, G K Savova, K C Kipper-Schuler, J F Hurdle
2008 IMIA Yearbook of Medical Informatics  
to stimulate advances in this field and to increase the acceptance and usage of these systems in concrete clinical and biomedical research contexts.  ...  Competitive challenges for information extraction from clinical text, along with the availability of annotated clinical text corpora, and further improvements in system performance are important factors  ...  Extracting Codes from Clinical Text A popular approach in the literature over the last several years has been to use NLP to extract codes mapped to controlled sources from text.  ... 
pmid:18660887 fatcat:ckd5m65lefarfcgtvzumblyd5u

Extracting Information from Textual Documents in the Electronic Health Record: A Review of Recent Research

G. K. Savova, K. C. Kipper-Schuler, J. F. Hurdle, S. M. Meystre
2008 IMIA Yearbook of Medical Informatics  
to stimulate advances in this field and to increase the acceptance and usage of these systems in concrete clinical and biomedical research contexts.  ...  Competitive challenges for information extraction from clinical text, along with the availability of annotated clinical text corpora, and further improvements in system performance are important factors  ...  Extracting Codes from Clinical Text A popular approach in the literature over the last several years has been to use NLP to extract codes mapped to controlled sources from text.  ... 
doi:10.1055/s-0038-1638592 fatcat:pwgvfjuubvcedm46ubc36leg7m

Biomedical text mining for research rigor and integrity: tasks, challenges, directions

Halil Kilicoglu
2017 Briefings in Bioinformatics  
With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote  ...  In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity  ...  Furthermore, biomedical abstracts differ from full text in terms of structure and content [145] .  ... 
doi:10.1093/bib/bbx057 pmid:28633401 fatcat:va4d3u6zzjbpnfptseb23tnv7y

PMCVec: Distributed Phrase Representation for Biomedical Text Processing

Zelalem Gero, Joyce Ho
2019 Journal of Biomedical Informatics: X  
A new unsupervised method has been proposed to collect over 700,000 common phrases that may be useful for biomedical NLP from PubMed articles [20] .  ...  Learning a distributed phrase and word embeddings have been shown to be effective on a general, non-domain specific corpus [26] . Yet, one of the key challenges is to identify useful phrases.  ...  On the other hand, the top 10 phrases using Info_Freq contain a good mix of long and short phrases that are biomedical-relevant terms.  ... 
doi:10.1016/j.yjbinx.2019.100047 fatcat:puejryvlavaivaxrlovkmsculi

Unsupervised biomedical named entity recognition: Experiments with clinical and biological texts

Shaodian Zhang, Noémie Elhadad
2013 Journal of Biomedical Informatics  
In this paper, we propose an unsupervised approach to extracting named entities from biomedical text.  ...  A noun phrase chunker followed by a filter based on inverse document frequency extracts candidate entities from free text.  ...  MedLEE is a general natural language processor for clinical texts, encoding and mapping terms to a controlled vocabulary [1] ; GENIES is a system extracting molecular pathways from journal articles, which  ... 
doi:10.1016/j.jbi.2013.08.004 pmid:23954592 pmcid:PMC3865922 fatcat:m2p7zurdanhshgy6u5qtw6hb6u

Auto-CORPus: A Natural Language Processing Tool for Standardizing and Reusing Biomedical Literature

Tim Beck, Tom Shorter, Yan Hu, Zhuoyu Li, Shujian Sun, Casiana M. Popovici, Nicholas A. R. McQuibban, Filip Makraduli, Cheng S. Yeung, Thomas Rowlands, Joram M. Posma
2022 Frontiers in Digital Health  
convenient machine-interpretable outputs to support biomedical text analytics.  ...  The BioC format is a community-driven simple data structure for sharing text and annotations, however there is limited access to biomedical literature in BioC format and a lack of bioinformatics tools  ...  Several algorithms have been developed to extract abbreviations and their definitions from biomedical text (9–11).  ... 
doi:10.3389/fdgth.2022.788124 pmid:35243479 pmcid:PMC8885717 fatcat:fx5juyzi35cgdm3rlrcvhchike

An efficient prototype method to identify and correct misspellings in clinical text

T. Elizabeth Workman, Yijun Shao, Guy Divita, Qing Zeng-Treitler
2019 BMC Research Notes  
and corpus term frequencies.  ...  We used the prototype method to process two different corpora, surgical pathology reports, and emergency department progress and visit notes, extracted from Veterans Health Administration resources.  ...  "Suicidal", a correctly spelled term and frequent word in the EDVP corpus, was extracted as a target input word.  ... 
doi:10.1186/s13104-019-4073-y fatcat:vvrvodad2jbelcciae4kqikp4e
« Previous Showing results 1 — 15 out of 2,646 results