42 Hits in 2.2 sec

BioSentVec: creating sentence embeddings for biomedical texts [article]

Qingyu Chen, Yifan Peng, Zhiyong Lu
2019 arXiv   pre-print
Although pre-trained sentence encoders are available in the general domain, none exists for biomedical texts to date.  ...  We evaluate BioSentVec embeddings in two sentence pair similarity tasks in different text genres.  ...  Alexis Allot for helpful discussion. We are grateful to the authors of sent2vec, BIOSSES, and MedSTS for making their software and data publicly available.  ... 
arXiv:1810.09302v5 fatcat:mdrxw3634jab3gb3pp7qgajxge

Representing document-level semantics of biomedical literature using pre-trained embedding models: Novel assessments

Jon Stevens, Brandon Punturo, Derek Chen, Mike Kim, Jacob Zimmer
2019 Conference on Natural Language Processing  
ontologies, (2) sequence embedding, which includes the NCBI's BioSentVec model and BioBERT.  ...  For both of our tasks, lexical pooling outperformed sequence embedding, and the best overall method was mean pooling of BioWordVec word embeddings.  ...  and biomedically relevant terms, and then pool only these word embeddings to create the document embedding.  ... 
dblp:conf/konvens/StevensPCKZ19 fatcat:f3uebgvswndgfclv2n5aebbu2a

Benchmarking Effectiveness and Efficiency of Deep Learning Models for Semantic Textual Similarity in the Clinical Domain: Validation Study

Qingyu Chen, Alex Rankine, Yifan Peng, Elaheh Aghaarabi, Zhiyong Lu
2021 JMIR Medical Informatics  
We call for community efforts to create more biomedical and clinical STS data sets from different perspectives to reflect the multifaceted notion of sentence-relatedness.  ...  Methods We benchmarked five DL models, which are the top-ranked systems for STS tasks: Convolutional Neural Network, BioSentVec, BioBERT, BlueBERT, and ClinicalBERT.  ...  Alexis Allot for helpful discussions on the sentence search pipeline in LitSense.  ... 
doi:10.2196/27386 pmid:34967748 pmcid:PMC8759018 fatcat:3zl6c7di2japjnen6yizt7zc6e

Using language models and ontology topology to perform semantic mapping of traits between biomedical datasets [article]

Yi Liu, Benjamin L Elsworth, Tom R Gaunt
2022 bioRxiv   pre-print
Motivation: Human traits are typically represented in both the biomedical literature and large population studies as descriptive text strings.  ...  Recent developments in language modelling have created new methods for semantic representation of words and phrases, and these methods offer new opportunities to map human trait names in the form of words  ...  Text embedding methods BioSentVec is an established model created using sent2vec 44 . The model was trained on Wikipedia and other generalised texts with no focus on biomedical information.  ... 
doi:10.1101/2022.08.02.502449 fatcat:deahnbdzfzb25aagdebz7b4v64


Bavishi Hilloni And Debalina Nandy
2021 Zenodo  
This will also provide the information on using SPECTER-document level relatedness like CORD 19 embeddings for pre-training a Transformer language model.  ...  BERT is a language model that powerfully learns from tokens and sentence-level training.  ...  The aim for developing BioSentVec was to generate pre-trained encoders mainly for biomedical texts and thus it has helped a lot in carrying out research on COVID-19 literature.  ... 
doi:10.5281/zenodo.5201690 fatcat:3vm7q7ubjze73a2z3k5sxbtt34

Evaluating semantic textual similarity in clinical sentences using deep learning and sentence embeddings

Rui Antunes, João Figueira Silva, Sérgio Matos
2020 Proceedings of the 35th Annual ACM Symposium on Applied Computing  
In this paper we present an approach that explores neural networks and different types of text preprocessing pipelines, and that evaluates the impact of using word embeddings or sentence embeddings.  ...  data, it is imperative to develop solutions that can condense information whilst retaining its value, with a possible methodology involving the assessment of the semantic similarity between clinical text  ...  It is posted here for your personal use. Not for redistribution.  ... 
doi:10.1145/3341105.3373987 dblp:conf/sac/AntunesSM20 fatcat:v76jxq3ja5ag7kro4yfkiibmsu

Drug Reaction Discriminator within Encoder-Decoder Neural Network Model: COVID-19 Pandemic Case Study

Hanane Grissette, El Habib Nfaoui
2020 2020 Seventh International Conference on Social Networks Analysis, Management and Security (SNAMS)  
In this study, we propose to develop an encoder-decoder for drug reaction discrimination that involves an enhanced distributed biomedical representation from controlled medical vocabulary such as PubMed  ...  The embedding mechanism primarily leverages contextual information and learn from predefined clinical relationships in term of medical conditions in order to define possible drug reaction of individual  ...  • N-grams biomedical embedding Matrix construction by involving distributed biomedical at sentence-level gener- ation [8].  ... 
doi:10.1109/snams52053.2020.9336561 fatcat:co7uqgtuunaixnd5tunorkqsd4

Unsupervised Pre-training for Biomedical Question Answering [article]

Vaishnavi Kommaraju, Karthick Gunasekaran, Kun Li, Trapit Bansal, Andrew McCallum, Ivana Williams, Ana-Maria Istrate
2020 arXiv   pre-print
We explore the suitability of unsupervised representation learning methods on biomedical text -- BioBERT, SciBERT, and BioSentVec -- for biomedical question answering.  ...  To further improve unsupervised representations for biomedical QA, we introduce a new pre-training task from unlabeled data designed to reason about biomedical entities in the context.  ...  Acknowledgements This work was supported in part by the UMass Amherst Center for Data Science and the Center for Intelligent Information Retrieval, in part by the Chan Zuckerberg Initiative, and in part  ... 
arXiv:2009.12952v1 fatcat:lwjccrybcrcjpon3tnoxavazsa

Automated Confirmation of Protein Annotation Using NLP and the UniProtKB Database

Jin Tao, Kelly A. Brayton, Shira L. Broschat
2020 Applied Sciences  
Our ensemble model achieves 91.25% recall, 71.26% accuracy, 65.19% precision, and an F1 score of 76.05% and outperforms the Bidirectional Encoder Representations from Transformers for Biomedical Text Mining  ...  Natural language processing in the form of word embeddings is used with journal publication titles retrieved from the UniProtKB database.  ...  For the logistic regression and SVM models, we use BioSentVec [21] , which converts a sentence into a biomedical sentence embedding more efficiently than using BioWordVec embeddings.  ... 
doi:10.3390/app11010024 fatcat:o7tbtpwbdbbetf72b55dtl4y2u

Protocol for a reproducible experimental survey on biomedical sentence similarity

Alicia Lara-Clares, Juan J. Lastra-Díaz, Ana Garcia-Serrano, Bridget McInnes
2021 PLoS ONE  
Measuring semantic similarity between sentences is a significant task in the fields of Natural Language Processing (NLP), Information Retrieval (IR), and biomedical text mining.  ...  For this reason, the proposal of sentence similarity methods for the biomedical domain has attracted a lot of attention in recent years.  ...  Finally, we are grateful to the anonymous reviewers for their valuable comments to improve the quality of the paper.  ... 
doi:10.1371/journal.pone.0248663 pmid:33760855 fatcat:wij55k3vyfecbol4gfvq7u4uqy

A Comparative Study on Feature Selection in Relation Extraction from Electronic Health Records

Ilseyar Alimova, Elena Tutubalina
2019 International Conference on Data Analytics and Management in Data Intensive Domains  
We propose a machine learning model with a novel set of knowledge and context embedding features.  ...  The context embedding feature gives the highest increase in results among the other explored features.  ...  Acknowledgments This research was supported by the Russian Foundation for Basic Research grant no. 19-07-01115.  ... 
dblp:conf/rcdl/AlimovaT19 fatcat:dgych6bezna3to5mlyddeg2oey

A self-attention based deep learning method for lesion attribute detection from CT reports [article]

Yifan Peng, Ke Yan, Veit Sandfort, Ronald M. Summers, Zhiyong Lu
2019 arXiv   pre-print
This paper outlines a novel deep learning method to automatically extract attributes of lesions of interest from the clinical text.  ...  Different from classical CNN models, we integrated the multi-head self-attention mechanism to handle the long-distance information in the sentence, and to jointly correlate different portions of sentence  ...  Lu, “BioSentVec: creating sentence embed- dings for biomedical texts,” arXiv preprint arXiv:1810.09302, 2018. [19] C. P.  ... 
arXiv:1904.13018v1 fatcat:nztvz56jgffjxaoxtgy2q3x5c4

Extraction of chemical-protein interactions from the literature using neural networks and narrow instance representation

Rui Antunes, Sérgio Matos
2019 Database: The Journal of Biological Databases and Curation  
Methods for improved retrieval and automatic relation extraction from biomedical literature are therefore required for collecting structured information from the growing number of published works.  ...  In this paper, we follow a deep learning approach for extracting mentions of chemical-protein interactions from biomedical articles, based on various enhancements over our participation in the BioCreative  ...  Acknowledgments We thank the organizers of the BioCreative VI CHEMPROT task, the authors of BioSentVec embeddings for making their models publicly available and the reviewers for their valuable comments  ... 
doi:10.1093/database/baz095 pmid:31622463 pmcid:PMC6796919 fatcat:3l6gyodd65b7joh6t3vzrny5li

BioConceptVec: creating and evaluating literature-based biomedical concept embeddings on a large scale [article]

Qingyu Chen, Kyubum Lee, Shankai Yan, Sun Kim, Chih-Hsuan Wei, and Zhiyong Lu
2019 arXiv   pre-print
Here, we propose to leverage state-of-the-art text mining tools and machine learning models to learn the semantics via vector representations (aka. embeddings) of over 400,000 biological concepts mentioned  ...  concepts, such as genes and mutations, is of significant importance to many research tasks in computational biology such as protein-protein interaction detection, gene-drug association prediction, and biomedical  ...  Robert Leaman for helpful discussions. We also thank Dr W. John Wilbur for proofreading the manuscript.  ... 
arXiv:1912.10846v1 fatcat:6m7wcv35hzabvnbsiqse5uf7sm

Improved Methods to Aid Unsupervised Evidence-Based Fact Checking for Online Heath News

Pritam Deka, Anna Jurek-Loughrey, Deepak P.
2022 Journal of Data Intelligence  
The second method involves a filtering approach for extracting the most relevant information for the claims.  ...  We propose a three-step approach for it and illustrate that our method is able to generate effective queries which can be used for retrieval of information from medical knowledge databases.  ...  Domain specific named entities are better suited as candidate keys than noun phrases/nouns for keyword/keyphrase extraction from biomedical text. 3.  ... 
doi:10.26421/jdi3.4-5 fatcat:33e2qsnhezcnjjgvmg44gpvzre
« Previous Showing results 1 — 15 out of 42 results