30,774 Hits in 7.2 sec

Identifying Abbreviation Definitions Machine Learning with Naturally Labeled Data

Lana Yeganova, Donald C. Comeau, W. John Wilbur
2010 2010 Ninth International Conference on Machine Learning and Applications  
In this study, we make use of what we term naturally labeled data. Positive training examples are extracted from text, which provides naturally occurring potential abbreviation-definition pairs.  ...  In this work, we develop a machine learning algorithm for abbreviation definition identification in text.  ...  CONCLUSIONS In this work, we used the idea of naturally labeled data to develop a machine learning approach for identifying abbreviation definitions in text.  ... 
doi:10.1109/icmla.2010.166 dblp:conf/icmla/YeganovaCW10 fatcat:by5z2q3vnvbgnpn5opprkjkvsq

Machine learning with naturally labeled data for identifying abbreviation definitions

Lana Yeganova, Donald C Comeau, W Wilbur
2011 BMC Bioinformatics  
Methods: In this work, we develop a machine learning algorithm for abbreviation definition identification in text which makes use of what we term naturally labeled data.  ...  Supervised learning techniques, which offer more flexibility in detecting abbreviation definitions, have also been applied to the problem. However, they require manually labeled training data.  ...  Conclusions In this work, we used the idea of naturally labeled data to develop a machine learning approach for identifying abbreviation definitions in text.  ... 
doi:10.1186/1471-2105-12-s3-s6 pmid:21658293 pmcid:PMC3111592 fatcat:6of76rvkg5czjosa4fvqyasua4

A hybrid named entity tagger for tagging human proteins/genes

Kalpana Raja, Suresh Subramani, Jeyakumar Natarajan
2014 International Journal of Data Mining and Bioinformatics  
In this paper, we propose a new hybrid approach based on i) machine learning algorithm (conditional random fields) ii) set of (manually constructed) rules, and iii) a novel abbreviation identification  ...  Like all supervised machine learning techniques, a CRF-based system must be trained on labelled data.  ...  NLProt is based on a machine learning technique called support vector machines (SVMs).  ... 
doi:10.1504/ijdmb.2014.064545 pmid:25946866 fatcat:36zzd5gionet3feejsaf7onwqm

Learning Abbreviations from Chinese and English Terms by Modeling Non-Local Information

Xu Sun, Naoaki Okazaki, Jun'ichi Tsujii, Houfeng Wang
2013 ACM Transactions on Asian Language Information Processing  
First, in order to incorporate nonlocal information into abbreviation generation tasks, we present both implicit and explicit solutions: the latent variable model and the label encoding with global information  ...  Learning abbreviations from Chinese and English terms by modeling non-local information.  ...  Yeganova et al. [2010] explored to use "naturally labeled data", in which positive instances are naturally occurring potential abbreviation definition pairs in text, and negative instances are generated  ... 
doi:10.1145/2461316.2461317 fatcat:7eqebhlq7vfmbj3qtdw7uren6m

Automated Identification of Semantic Similarity between Concepts of Textual Business Rules

Abdellatif Haj, Hassan I University, Youssef Balouki, Taoufiq Gadi, Hassan I University, Hassan I University
2021 International Journal of Intelligent Engineering and Systems  
Our method is unique in that it also identifies abbreviations/expansions (as a special case of synonym) which is not possible using a dictionary.  ...  Whereas identification of synonyms is manual or totally neglected in most approaches dealing with natural language Business Rules.  ...  Author Contributions All authors analysed the data, discussed the results, provided critical feedback, helped shape the research and contributed to the final manuscript.  ... 
doi:10.22266/ijies2021.0228.15 fatcat:uvw7r74yyvegde2gdibcaj6gli

Identifying the Development and Application of Artificial Intelligence in Scientific Text [article]

James Dunham and Jennifer Melot and Dewey Murdick
2020 arXiv   pre-print
We compose a functional definition of AI relevance by learning these subjects from paper metadata, and then inferring the arXiv-subject labels of papers in larger corpora: Clarivate Web of Science, Digital  ...  This offers a method for identifying AI-relevant publications that updates at the pace of research output, without reliance on subject-matter experts for query development or labeling.  ...  boltzmann machine multi label classification character recognition multi task learning classification algorithm natural language generation classification label* natural language processing clustering  ... 
arXiv:2002.07143v2 fatcat:b5ssr77c5jfxzoytnv6stiv37q

Extracting Diagnoses and Investigation Results from Unstructured Text in Electronic Health Records by Semi-Supervised Machine Learning

Zhuoran Wang, Anoop D. Shah, A. Rosemary Tate, Spiros Denaxas, John Shawe-Taylor, Harry Hemingway, Vladimir Brusic
2012 PLoS ONE  
Aim: To develop an algorithm to identify relevant free texts automatically based on labelled examples.  ...  We compared the performance of S3CM with the Transductive Vector Support Machine (TVSM), the original fully-supervised Set Covering Machine (SCM) and our 'Freetext Matching Algorithm' natural language  ...  We would like to thank Julie Sanders for helpful discussions and Alexander Martin for assistance with manual annotation of the ovarian cancer texts.  ... 
doi:10.1371/journal.pone.0030412 pmid:22276193 pmcid:PMC3261909 fatcat:wt2u6qw5rfcrhcgwsrmdmogxne

Constructive and Toxic Speech Detection for Open-domain Social Media Comments in Vietnamese [article]

Luan Thanh Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen
2021 arXiv   pre-print
Besides, we implement various baseline models as traditional Machine Learning and Deep Neural Network-Based models to evaluate the dataset.  ...  With the results, we can solve several tasks on the online discussions and develop the framework for identifying constructiveness and toxicity of Vietnamese social media comments automatically.  ...  Before labeling data, we built an annotation scheme with detailed and necessary information, which helps annotators label data quickly and precisely. We describe each task with a detailed definition.  ... 
arXiv:2103.10069v4 fatcat:nzlt4sj5qvapjpkhp36arlmkaa

Automated and flexible identification of complex disease: building a model for systemic lupus erythematosus using noisy labeling

Sara G Murray, Anand Avati, Gabriela Schmajuk, Jinoos Yazdany
2018 JAMIA Journal of the American Medical Informatics Association  
Electronic Health Records, Machine Learning, Lupus Erythematosus, Phenotype, Algorithms.  ...  SGM and AA performed the acquisition and analysis of data, and SGM, AA, GS, and JY contributed to interpretation of the results. Conflict of interest statement. None declared.  ...  Analysis We combined a variant of the EasyEnsemble method with the technique of Learning with Noisy Labels. We used the noisy labeled patients as the training data.  ... 
doi:10.1093/jamia/ocy154 pmid:30476175 pmcid:PMC6308013 fatcat:bbpkgw52hvajxcbs3acmhpsdqq

Toward Automated Definition Acquisition From Operations Law

Yi Chang, Jana Diesner, Kathleen M. Carley
2012 IEEE Transactions on Systems Man and Cybernetics Part C (Applications and Reviews)  
We approach the task of identifying definitional sentences from operations law documents by formalizing this task as a sentence-classification task and solving it by using machine-learning methods.  ...  different types or genres of text data.  ...  His current research interests include machine-learning, natural-language processing, information retrieval, web search, and data mining. Dr.  ... 
doi:10.1109/tsmcc.2011.2110643 fatcat:7xjd3nvwc5cn3fu3ear6xl2ux4

WCL-BBCD: A Contrastive Learning and Knowledge Graph Approach to Named Entity Recognition [article]

Renjie Zhou, Qiang Hu, Jian Wan, Jilin Zhang, Qiang Liu, Tianxiang Hu, Jianjun Li
2022 arXiv   pre-print
In this paper, we propose a novel named entity recognition model WCL-BBCD (Word Contrastive Learning with BERT-BiLSTM-CRF-DBpedia) incorporating the idea of contrastive learning.  ...  Finally, the recognition results are corrected in combination with prior knowledge such as knowledge graphs, so as to alleviate the recognition caused by word abbreviations low-rate problem.  ...  of each word belonging to each entity label is predicted by machine learning models, and the entity label with the highest probability value is usually taken as the entity label of the word.  ... 
arXiv:2203.06925v4 fatcat:danjvmcjtzf3xekh3vxljrunqq

Towards automating systematic reviews on immunization using an advanced natural language processing–based extraction system

David Begert, Justin Granek, Brian Irwin, Chris Brogly
2020 Canada Communicable Disease Report  
Based on available exposure information, 11.3% (n=59/520) of cases aged younger than 20 years had no known contact with a case. Canadian findings align with those of other countries.  ...  Among provinces and territories with more than 100 cases, 1.6% to 9.8% of cases were younger than 20 years of age.  ...  Machine learning methods have previously been identified for the automation of systematic reviews (1, 5) .  ... 
doi:10.14745/ccdr.v46i06a04 pmid:32558812 pmcid:PMC7279124 fatcat:mh773igmu5d73kaa43x7gfitmq

Ontology-driven weak supervision for clinical entity classification in electronic health records

Jason A. Fries, Ethan Steinberg, Saelig Khattar, Scott L. Fleming, Jose Posada, Alison Callahan, Nigam H. Shah
2021 Nature Communications  
Our approach, unlike hand-labeled notes, is easy to share and modify, while offering performance comparable to learning from manually labeled training data.  ...  The information needs of the COVID-19 pandemic highlight the need for agile methods of training machine learning models for clinical notes.  ...  Computational resources were provided by Nero, a shared big data computing platform made possible by the Stanford School of Medicine Research Office and Stanford Research Computing Center.  ... 
doi:10.1038/s41467-021-22328-4 pmid:33795682 fatcat:fgsu7egrrjbh7dkvfkryke3rk4

Trove: Ontology-driven weak supervision for medical entity classification [article]

Jason A. Fries, Ethan Steinberg, Saelig Khattar, Scott L. Fleming, Jose Posada, Alison Callahan, Nigam H. Shah
2020 arXiv   pre-print
However, manually labeling data for entity tasks is time consuming and expensive, creating barriers to using machine learning in new medical applications.  ...  We perform within an average of 3.5 F1 points (4.2%) of NER classifiers trained with hand-labeled data.  ...  ., using Schwartz-Hearst abbreviation disambiguation [51] to identify out-of-dictionary abbreviations. Our labeling functions generate word-level labels.  ... 
arXiv:2008.01972v1 fatcat:zsgzvw33pjhdlaxocnx2gbnlsm

An annotated corpus with nanomedicine and pharmacokinetic parameters

Nastassja A Lewinski, Ivan Jimenez, Bridget T McInnes
2017 International Journal of Nanomedicine  
Author contributions NAL and BTM conceived the project idea, led the project, conducted data analysis, and wrote the manuscript. IJ performed annotations and assisted with data analysis.  ...  This work was presented at the 8th International Nanotoxicology Congress as a poster presentation with interim findings.  ...  Analysis of the parameters also showed that not all parameters may need a machine learning component to identify them within the text.  ... 
doi:10.2147/ijn.s137117 pmid:29066897 pmcid:PMC5644562 fatcat:zqjltfpm2fepheat3cx3yehs3m
« Previous Showing results 1 — 15 out of 30,774 results