Filters








6,939 Hits in 5.0 sec

The Unreasonable Effectiveness of the Baseline: Discussing SVMs in Legal Text Classification [chapter]

Benjamin Clavié, Marc Alphonsus
2021 Frontiers in Artificial Intelligence and Applications  
We also highlight that error reduction obtained by using specialised BERT-based models over baselines is noticeably smaller in the legal domain when compared to general language tasks.  ...  Recently, the focus for most legal text classification tasks has shifted towards large pre-trained deep learning models such as BERT.  ...  classification tasks compared to deep learning ers, (B) show that the gains from BERT-based approaches is noticeably smaller al-domain tasks than on general tasks and (C) discuss three hypotheses to  ... 
doi:10.3233/faia210317 fatcat:344knwujxzeqtkyuzg2wohagsi

HPViewer: sensitive and specific genotyping of human papillomavirus in metagenomic DNA

Yuhan Hao, Liying Yang, Antonio Galvao Neto, Milan R Amin, Dervla Kelly, Stuart M Brown, Ryan C Branski, Zhiheng Pei, Inanc Birol
2018 Bioinformatics  
based hypotheses generation in biomedical text domain V.Gopalakrishnan, K.Jha, G.Xun, H.Q.Ngo and A.Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  ...  mass spectrometry I.Laponogov, N.Sadawi, D.Galea, R.Mirnezami and K.A.Veselkov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2096 Data and text mining Towards self-learning  ... 
doi:10.1093/bioinformatics/bty037 pmid:29377990 fatcat:zw42xvevibcffoam7osp4osexm

Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing [article]

Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, Hoifung Poon
2021 arXiv   pre-print
In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual  ...  pretraining of general-domain language models.  ...  Models using biomedical text in pretraining generally perform better. However, mixing out-domain data in pretraining generally leads to worse performance.  ... 
arXiv:2007.15779v5 fatcat:emddce6qdzgmdmsboyrza27564

Unsupervised Resolution of Acronyms and Abbreviations in Nursing Notes Using Document-Level Context Models

Katrin Kirchhoff, Anne M. Turner
2016 Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis  
In addition we investigate the use of mismatched training data and self-training.  ...  A frequent obstacle to developing more robust NLP systems for the clinical domain is the lack of annotated training data.  ...  Deidentified clinical records used in this research were provided by the i2b2 National Center for Biomedical Computing funded by U54LM008748 and were originally prepared for the Shared Tasks for Challenges  ... 
doi:10.18653/v1/w16-6107 dblp:conf/acl-louhi/KirchhoffT16 fatcat:g3awnklulzfd7nhzooa77allfe

CBAG: Conditional Biomedical Abstract Generation [article]

Justin Sybrandt, Ilya Safro
2020 arXiv   pre-print
Biomedical research papers use significantly different language and jargon when compared to typical English text, which reduces the utility of pre-trained NLP models in this domain.  ...  We sample this distribution in order to generate biomedical abstracts given only a proposed title, an intended publication year, and a set of keywords.  ...  We anticipate that conditioned language generation can be used to build new applications in the biomedical domain, such as a hypothesis generation system that produces textual descriptions of proposed  ... 
arXiv:2002.05637v1 fatcat:p4glip75jzdqhdunrbdz4e4ssu

Deep Learning, Natural Language Processing, and Explainable Artificial Intelligence in the Biomedical Domain [article]

Milad Moradi, Matthias Samwald
2022 arXiv   pre-print
We narrow down the focus of the study on textual data in Section 3, where natural language processing and its applications in the biomedical domain are described.  ...  In Section 4, we give an introduction to explainable artificial intelligence and discuss the importance of explainability of artificial intelligence systems, especially in the biomedical domain.  ...  NLP and text mining methods in the biomedical domain are divided into two main categories, i.e. 1) rule-based or knowledge-based, and 2) statistical or machine learning based methods (Cohen, 2014) .  ... 
arXiv:2202.12678v2 fatcat:4nv42mbpuveb7euxkr4b6ojuxi

OMG U got flu? Analysis of shared health messages for bio-surveillance

Nigel Collier, Nguyen Son, Ngoc Nguyen
2011 Journal of Biomedical Semantics  
Results: We created guidelines for tagging self protective behaviour based on Jones and Salath\'e (2009)'s behaviour response survey.  ...  We employed supervised learning using unigrams, bigrams and regular expressions as features with two supervised classifiers (SVM and Naive Bayes) to classify tweets into 4 self-reported protective behaviour  ...  Acknowledgements We thank RC UK, BBSRC, SRIF 2,3 for funding the work reported in this paper.  ... 
doi:10.1186/2041-1480-2-s5-s9 pmid:22166368 pmcid:PMC3239309 fatcat:eboubzw4sfdive54tx5fwlmbqi

AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach [article]

Justin Sybrandt, Ilya Tyagin, Michael Shtutman, Ilya Safro
2020 arXiv   pre-print
We present AGATHA, a deep-learning hypothesis generation system that can introduce data-driven insights earlier in the discovery process.  ...  We additionally explore biomedical sub-domains, and demonstrate AGATHA's predictive capacity across the twenty most popular relationship types.  ...  Because our graph spans all of MEDLINE, we are able to generate hypotheses from a large range of biomedical subdomains.  ... 
arXiv:2002.05635v1 fatcat:t6jbr53fqrhytm2avreoufvs4y

A Survey on Event Extraction for Natural Language Understanding: Riding the Biomedical Literature Wave

Giacomo Frisoni, Gianluca Moro, Antonella Carbonaro
2021 IEEE Access  
hypotheses.  ...  INDEX TERMS Biomedical text mining, event extraction, natural language understanding, semantic parsing.  ...  ACKNOWLEDGMENT The authors thank Giulio Carlassare for his contributions during productive discussions and practical experiments on biomedical corpora.  ... 
doi:10.1109/access.2021.3130956 fatcat:wlr7zeikdva77ojuppqx3vmocy

Selected papers from the 15th Annual Bio-Ontologies Special Interest Group Meeting

Larisa N Soldatova, Susanna-Assunta Sansone, Michel Dumontier, Nigam H Shah
2013 Journal of Biomedical Semantics  
generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences.  ...  The seven papers and the commentary selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data, annotating patent records, NCBO  ...  We are grateful for help from Sarah Headley from BioMed Central in putting this supplement together.  ... 
doi:10.1186/2041-1480-4-s1-i1 pmid:23735191 pmcid:PMC3633002 fatcat:ylbmtrgirvfwzgvioplopby4wi

Biomedical text mining for research rigor and integrity: tasks, challenges, directions

Halil Kilicoglu
2017 Briefings in Bioinformatics  
In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity  ...  With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote  ...  They used features based on cue phrases in the citation sentence, position of the citation and self-citation, which yielded a j of 0.57. In the biomedical domain, Agarwal et al.  ... 
doi:10.1093/bib/bbx057 pmid:28633401 fatcat:va4d3u6zzjbpnfptseb23tnv7y

Will the future of knowledge work automation transform personalized medicine?

Gauri Naik, Sanika S. Bhide
2014 Applied and Translational Genomics  
Today, we live in a world of 'information overload' which demands high level of knowledge-based work.  ...  Engineering intelligent software systems that can process large data sets using unstructured commands and subtle judgments and have the ability to learn 'on the fly' are a significant step towards automation  ...  Among solutions to this is Optra Bio-NLP, a web-based automated annotation system for scientific biomedical English language text.  ... 
doi:10.1016/j.atg.2014.05.003 pmid:27284504 pmcid:PMC4886728 fatcat:3megihqrynd5vabqvipx5g4rh4

LinkBERT: Pretraining Language Models with Document Links [article]

Michihiro Yasunaga, Jure Leskovec, Percy Liang
2022 arXiv   pre-print
We show that LinkBERT outperforms BERT on various downstream tasks across two domains: the general domain (pretrained on Wikipedia with hyperlinks) and biomedical domain (pretrained on PubMed with citation  ...  Language model (LM) pretraining can learn various knowledge from text corpora, helping downstream tasks.  ...  We train LinkBERT in two domains: the general domain, using Wikipedia articles with hyperlinks ( §4), and the biomedical domain, using PubMed articles with citation links ( §6).  ... 
arXiv:2203.15827v1 fatcat:xo6alwunz5chvcrdpeahhs4oaa

Optimizing Corpus Creation for Training Word Embedding in Low Resource Domains: A Case Study in Autism Spectrum Disorder (ASD)

Yang Gu, Gondy Leroy, Sydney Pettygrove, Maureen Kelly Galindo, Margaret Kurzius-Spencer
2018 AMIA Annual Symposium Proceedings  
We evaluate the importance of corpus specificity versus size and hypothesize that for specific domains small corpora can generate excellent word embeddings.  ...  Due to diversity in its vocabulary, the abstract-based embeddings generated fewer related terms and saw minimal improvement when the size of the corpus increased.  ...  Acknowledgement The data presented in this paper were collected by the Centers for Disease Control (CDC) and Prevention Autism and Developmental Disabilities Monitoring (ADDM) Network supported by CDC  ... 
pmid:30815091 pmcid:PMC6371367 fatcat:3xzfd3hcnvhupeoac5oomfk3c4

Citation Sentiment Analysis in Clinical Trial Papers

Jun Xu, Yaoyun Zhang, Yonghui Wu, Jingqi Wang, Xiao Dong, Hua Xu
2015 AMIA Annual Symposium Proceedings  
methods for citation sentiment analysis in biomedical publications.  ...  A comprehensive comparison between citation sentiment analysis of clinical trial papers and other general domains were conducted, which additionally highlights the unique challenges within this domain.  ...  Acknowledgments This study is supported in part by grants from NLM 2R01LM010681-05, NIGMS 1R01GM103859 and 1R01GM102282. The first author (JX) is partially supported by NSFC 61203378.  ... 
pmid:26958274 pmcid:PMC4765697 fatcat:24nr3efh4fhytk7doj3vrsn3je
« Previous Showing results 1 — 15 out of 6,939 results