A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
The Unreasonable Effectiveness of the Baseline: Discussing SVMs in Legal Text Classification
[chapter]
2021
Frontiers in Artificial Intelligence and Applications
We also highlight that error reduction obtained by using specialised BERT-based models over baselines is noticeably smaller in the legal domain when compared to general language tasks. ...
Recently, the focus for most legal text classification tasks has shifted towards large pre-trained deep learning models such as BERT. ...
classification tasks compared to deep learning
ers, (B) show that the gains from BERT-based approaches is noticeably smaller
al-domain tasks than on general tasks and (C) discuss three hypotheses to ...
doi:10.3233/faia210317
fatcat:344knwujxzeqtkyuzg2wohagsi
HPViewer: sensitive and specific genotyping of human papillomavirus in metagenomic DNA
2018
Bioinformatics
based hypotheses generation in biomedical text domain V.Gopalakrishnan, K.Jha, G.Xun, H.Q.Ngo and A.Zhang . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
mass spectrometry I.Laponogov, N.Sadawi, D.Galea, R.Mirnezami and K.A.Veselkov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2096 Data and text mining Towards self-learning ...
doi:10.1093/bioinformatics/bty037
pmid:29377990
fatcat:zw42xvevibcffoam7osp4osexm
Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
[article]
2021
arXiv
pre-print
In this paper, we challenge this assumption by showing that for domains with abundant unlabeled text, such as biomedicine, pretraining language models from scratch results in substantial gains over continual ...
pretraining of general-domain language models. ...
Models using biomedical text in pretraining generally perform better. However, mixing out-domain data in pretraining generally leads to worse performance. ...
arXiv:2007.15779v5
fatcat:emddce6qdzgmdmsboyrza27564
Unsupervised Resolution of Acronyms and Abbreviations in Nursing Notes Using Document-Level Context Models
2016
Proceedings of the Seventh International Workshop on Health Text Mining and Information Analysis
In addition we investigate the use of mismatched training data and self-training. ...
A frequent obstacle to developing more robust NLP systems for the clinical domain is the lack of annotated training data. ...
Deidentified clinical records used in this research were provided by the i2b2 National Center for Biomedical Computing funded by U54LM008748 and were originally prepared for the Shared Tasks for Challenges ...
doi:10.18653/v1/w16-6107
dblp:conf/acl-louhi/KirchhoffT16
fatcat:g3awnklulzfd7nhzooa77allfe
CBAG: Conditional Biomedical Abstract Generation
[article]
2020
arXiv
pre-print
Biomedical research papers use significantly different language and jargon when compared to typical English text, which reduces the utility of pre-trained NLP models in this domain. ...
We sample this distribution in order to generate biomedical abstracts given only a proposed title, an intended publication year, and a set of keywords. ...
We anticipate that conditioned language generation can be used to build new applications in the biomedical domain, such as a hypothesis generation system that produces textual descriptions of proposed ...
arXiv:2002.05637v1
fatcat:p4glip75jzdqhdunrbdz4e4ssu
Deep Learning, Natural Language Processing, and Explainable Artificial Intelligence in the Biomedical Domain
[article]
2022
arXiv
pre-print
We narrow down the focus of the study on textual data in Section 3, where natural language processing and its applications in the biomedical domain are described. ...
In Section 4, we give an introduction to explainable artificial intelligence and discuss the importance of explainability of artificial intelligence systems, especially in the biomedical domain. ...
NLP and text mining methods in the biomedical domain are divided into two main categories, i.e. 1) rule-based or knowledge-based, and 2) statistical or machine learning based methods (Cohen, 2014) . ...
arXiv:2202.12678v2
fatcat:4nv42mbpuveb7euxkr4b6ojuxi
OMG U got flu? Analysis of shared health messages for bio-surveillance
2011
Journal of Biomedical Semantics
Results: We created guidelines for tagging self protective behaviour based on Jones and Salath\'e (2009)'s behaviour response survey. ...
We employed supervised learning using unigrams, bigrams and regular expressions as features with two supervised classifiers (SVM and Naive Bayes) to classify tweets into 4 self-reported protective behaviour ...
Acknowledgements We thank RC UK, BBSRC, SRIF 2,3 for funding the work reported in this paper. ...
doi:10.1186/2041-1480-2-s5-s9
pmid:22166368
pmcid:PMC3239309
fatcat:eboubzw4sfdive54tx5fwlmbqi
AGATHA: Automatic Graph-mining And Transformer based Hypothesis generation Approach
[article]
2020
arXiv
pre-print
We present AGATHA, a deep-learning hypothesis generation system that can introduce data-driven insights earlier in the discovery process. ...
We additionally explore biomedical sub-domains, and demonstrate AGATHA's predictive capacity across the twenty most popular relationship types. ...
Because our graph spans all of MEDLINE, we are able to generate hypotheses from a large range of biomedical subdomains. ...
arXiv:2002.05635v1
fatcat:t6jbr53fqrhytm2avreoufvs4y
A Survey on Event Extraction for Natural Language Understanding: Riding the Biomedical Literature Wave
2021
IEEE Access
hypotheses. ...
INDEX TERMS Biomedical text mining, event extraction, natural language understanding, semantic parsing. ...
ACKNOWLEDGMENT The authors thank Giulio Carlassare for his contributions during productive discussions and practical experiments on biomedical corpora. ...
doi:10.1109/access.2021.3130956
fatcat:wlr7zeikdva77ojuppqx3vmocy
Selected papers from the 15th Annual Bio-Ontologies Special Interest Group Meeting
2013
Journal of Biomedical Semantics
generally the organisation, presentation and dissemination of knowledge in biomedicine and the life sciences. ...
The seven papers and the commentary selected for this supplement span a wide range of topics including: web-based querying over multiple ontologies, integration of data, annotating patent records, NCBO ...
We are grateful for help from Sarah Headley from BioMed Central in putting this supplement together. ...
doi:10.1186/2041-1480-4-s1-i1
pmid:23735191
pmcid:PMC3633002
fatcat:ylbmtrgirvfwzgvioplopby4wi
Biomedical text mining for research rigor and integrity: tasks, challenges, directions
2017
Briefings in Bioinformatics
In this article, we pose the question of whether biomedical text mining techniques can assist the stakeholders in the biomedical research enterprise in doing their part toward enhancing research integrity ...
With the exponential increase in biomedical research output and the ability of text mining approaches to perform automatic tasks at large scale, we propose that such approaches can support tools that promote ...
They used features based on cue phrases in the citation sentence, position of the citation and self-citation, which yielded a j of 0.57. In the biomedical domain, Agarwal et al. ...
doi:10.1093/bib/bbx057
pmid:28633401
fatcat:va4d3u6zzjbpnfptseb23tnv7y
Will the future of knowledge work automation transform personalized medicine?
2014
Applied and Translational Genomics
Today, we live in a world of 'information overload' which demands high level of knowledge-based work. ...
Engineering intelligent software systems that can process large data sets using unstructured commands and subtle judgments and have the ability to learn 'on the fly' are a significant step towards automation ...
Among solutions to this is Optra Bio-NLP, a web-based automated annotation system for scientific biomedical English language text. ...
doi:10.1016/j.atg.2014.05.003
pmid:27284504
pmcid:PMC4886728
fatcat:3megihqrynd5vabqvipx5g4rh4
LinkBERT: Pretraining Language Models with Document Links
[article]
2022
arXiv
pre-print
We show that LinkBERT outperforms BERT on various downstream tasks across two domains: the general domain (pretrained on Wikipedia with hyperlinks) and biomedical domain (pretrained on PubMed with citation ...
Language model (LM) pretraining can learn various knowledge from text corpora, helping downstream tasks. ...
We train LinkBERT in two domains: the general domain, using Wikipedia articles with hyperlinks ( §4), and the biomedical domain, using PubMed articles with citation links ( §6). ...
arXiv:2203.15827v1
fatcat:xo6alwunz5chvcrdpeahhs4oaa
Optimizing Corpus Creation for Training Word Embedding in Low Resource Domains: A Case Study in Autism Spectrum Disorder (ASD)
2018
AMIA Annual Symposium Proceedings
We evaluate the importance of corpus specificity versus size and hypothesize that for specific domains small corpora can generate excellent word embeddings. ...
Due to diversity in its vocabulary, the abstract-based embeddings generated fewer related terms and saw minimal improvement when the size of the corpus increased. ...
Acknowledgement The data presented in this paper were collected by the Centers for Disease Control (CDC) and Prevention Autism and Developmental Disabilities Monitoring (ADDM) Network supported by CDC ...
pmid:30815091
pmcid:PMC6371367
fatcat:3xzfd3hcnvhupeoac5oomfk3c4
Citation Sentiment Analysis in Clinical Trial Papers
2015
AMIA Annual Symposium Proceedings
methods for citation sentiment analysis in biomedical publications. ...
A comprehensive comparison between citation sentiment analysis of clinical trial papers and other general domains were conducted, which additionally highlights the unique challenges within this domain. ...
Acknowledgments This study is supported in part by grants from NLM 2R01LM010681-05, NIGMS 1R01GM103859 and 1R01GM102282. The first author (JX) is partially supported by NSFC 61203378. ...
pmid:26958274
pmcid:PMC4765697
fatcat:24nr3efh4fhytk7doj3vrsn3je
« Previous
Showing results 1 — 15 out of 6,939 results