Filters








1,058 Hits in 7.4 sec

Improving Rule Induction Precision for Automated Annotation by Balancing Skewed Data Sets [chapter]

Gustavo E. A. P. A. Batista, Maria C. Monard, Ana L. C. Bazzan
2004 Lecture Notes in Computer Science  
Therefore, this is a step towards producing more accurate rules for automating annotation.  ...  A challenging problem for automatic annotation is that traditional ML algorithms assume a balanced training set.  ...  Prati for his helpful comments and valuable discussions on the draft of this paper. This research was partially supported by Brazilian Research Councils CAPES, CNPq and FAPESP.  ... 
doi:10.1007/978-3-540-30478-4_3 fatcat:qwpnb7whwjfqxecsojgo2eh4bm

Rule-Enhanced Active Learning for Semi-Automated Weak Supervision

David Kartchner, Davi Nakajima An, Wendi Ren, Chao Zhang, Cassie S. Mitchell
2022 AI  
REGAL (Rule-Enhanced Generative Active Learning) is an improved framework for weakly supervised text classification that performs active learning over labeling functions rather than individual instances  ...  reduce the annotation burden of writing labeling functions for weak supervision.  ...  We balanced data by calculating the total number of noisy label votes for each class and randomly replacing votes for dominant classes until all label distribution was approximately balanced.  ... 
doi:10.3390/ai3010013 fatcat:cf4765c2mrb2lkxkvfdolydio4

Constructing large scale biomedical knowledge bases from scratch with rapid annotation of interpretable patterns

Julien Fauqueur, Ashok Thillaisundaram, Theodosia Togia
2019 Proceedings of the 18th BioNLP Workshop and Shared Task  
rules.  ...  By marking patterns as compatible with the desired relationship type, experts indirectly batch-annotate candidate pairs whose relationship is expressed with such patterns in the literature.  ...  Acknowledgments We are very grateful to Nathan Patel for his engineering support and to Alex de Giorgio for his thorough feedback and domain expertise.  ... 
doi:10.18653/v1/w19-5016 dblp:conf/bionlp/FauqueurTT19 fatcat:37rfrzbqmbbqfcqsfoe3rvfyrm

ATLAS: Automated Amortised Complexity Analysis of Self-adjusting Data Structures [chapter]

Lorenz Leutgeb, Georg Moser, Florian Zuleger
2021 Lecture Notes in Computer Science  
Possibly for these reasons, and despite the recent progress in automated resource analysis, they have so far eluded automation.  ...  In this paper, we report on the first fully-automated amortised complexity analysis of self-adjusting data structures.  ...  In particular, the potential function for skew heaps, which counts "right heavy" nodes, is interesting, because it is also used as a building block by Iacono in his improved analysis of pairing heaps  ... 
doi:10.1007/978-3-030-81688-9_5 fatcat:edp7b45jyvaxve5kmaz3lci2h4

ICD Code Retrieval: Novel Approach for Assisted Disease Classification [chapter]

Stefano Giovanni Rizzo, Danilo Montesi, Andrea Fabbri, Giulio Marchesini
2015 Lecture Notes in Computer Science  
The problem, which has been only partially tamed for a subset of ICD-9-CM, becomes even harder in real world applications, where the labeled data are scarce and noisy.  ...  dataset and improves the overall accuracy over time, learning from user selection.  ...  Since a lot of labeled data are available in the source domain, the inductive Transfer Learning setting aims at improving the learning task in the source domain by trasferring knowledge from the source  ... 
doi:10.1007/978-3-319-21843-4_12 fatcat:4dqqr73txjdr5hwi4ibzn6mtgy

Evaluating the effect of unbalanced data in biomedical document classification

Rosalía Laza, Reyes Pavón, Miguel Reboiro-Jato, Florentino Fdez-Riverola
2011 Journal of Integrative Bioinformatics  
In this context, machine learning techniques are usually applied to text classification by using a general inductive process that automatically builds a text classifier from a set of pre-classified documents  ...  We conclude that BN classifier is sensitive to both balancing strategies and existing techniques can improve its overall performance.  ...  Acknowledgements This work is supported in part by the project MEDICAL-BENCH: Platform for the development and integration of knowledge-based data mining techniques and their application to the clinical  ... 
doi:10.2390/biecoll-jib-2011-177 pmid:21926440 fatcat:b3ikz3rvivdf5onvdfwrybts3i

Evaluating the effect of unbalanced data in biomedical document classification

Rosalía Laza, Reyes Pavón, Miguel Reboiro-Jato, Florentino Fdez-Riverola
2011 Journal of Integrative Bioinformatics  
In this context, machine learning techniques are usually applied to text classification by using a general inductive process that automatically builds a text classifier from a set of pre-classified documents  ...  We conclude that BN classifier is sensitive to both balancing strategies and existing techniques can improve its overall performance.  ...  Acknowledgements This work is supported in part by the project MEDICAL-BENCH: Platform for the development and integration of knowledge-based data mining techniques and their application to the clinical  ... 
doi:10.1515/jib-2011-177 fatcat:jwgqvt3k5naqbb4wb3ytjygequ

Generation of relation-extraction-rules based on Markov logic network for document classification

M.D.S Seneviratne, K.S.D Fernando, D.D Karunaratne
2019 International Journal of Advanced Computer Research  
It is possible to replace overwhelming text classification techniques which involve thousands of words, document features or numerous patterns of word combinations by a set of rules which involves a much  ...  Our experimental results show that the use of relation extraction rules on document classification yields a very high precision in the selected domain.  ...  In proposed method a set of rules is generated by Inductive logic programming (ILP) for each relation [22] from the typed dependencies of the sentences annotated with the two or more entities.  ... 
doi:10.19101/ijacr.2018.838015 fatcat:fsniixweyvhrzeucphsvifhqke

ENHANCE (ENriching Health data by ANnotations of Crowd and Experts): A case study for skin lesion classification [article]

Ralf Raumanns, Gerard Schouten, Max Joosten, Josien P. W. Pluim, Veronika Cheplygina
2021 arXiv   pre-print
We then study multi-task learning (MTL) with the annotations as additional labels, and show that non-expert annotations can improve (ensembles of) state-of-the-art convolutional neural networks via MTL  ...  We hope that our dataset can be used in further research into multiple annotations and/or MTL. All data and models are available on Github: https://github.com/raumannsr/ENHANCE.  ...  Acknowledgments We would like to acknowledge all students and crowd workers who have contributed to this project with image annotation.  ... 
arXiv:2107.12734v2 fatcat:j5zcye4wqzhqzgg5oow6eew6sy

Automated Prompting in a Smart Home Environment

Barnan Das, Chao Chen, Nairanjana Dasgupta, Diane J. Cook, Adriyana M. Seelye
2010 2010 IEEE International Conference on Data Mining Workshops  
In this paper, with the introduction of "The PUCK", we take the very first approach to automate a prompting system without any predefined rule set or user feedback.  ...  We statistically analyze realistic prompting data and devise a classifier from statistical outlier detection methods. Further, we devise a sampling technique to help with skewed and scanty data sets.  ...  After the features are generated, the modified form of the data set contains steps performed by participants as instances. This data set is then re-annotated for the prompts.  ... 
doi:10.1109/icdmw.2010.147 dblp:conf/icdm/DasCDCS10 fatcat:6wy6zwln4rhqjkiven4twznpm4

Using a logical model to predict the growth of yeast

KE Whelan, RD King
2008 BMC Bioinformatics  
for model identification/improvement.  ...  growth data/essential gene listings.  ...  Table 3 shows the skewed accuracy and skewed precision results for the aber model, iND750 and the majority classifier (maj) for the 3 sets of genes described above and Table 4 presents the results of  ... 
doi:10.1186/1471-2105-9-97 pmid:18269749 pmcid:PMC2335308 fatcat:irhgahxrhfhphjsvvld2sw6fsa

Clock distribution networks in synchronous digital integrated circuits

E.G. Friedman
2001 Proceedings of the IEEE  
Minimum and maximum timing constraints are developed from the relative timing between the localized clock skew and the data paths.  ...  A theoretical background of clock skew is provided in order to better understand how clock distribution networks interact with data paths.  ...  The system-wide clock period is minimized by finding a set of clock skew values that satisfies (5) and (6) for each local data path and (18) for each global data path.  ... 
doi:10.1109/5.929649 fatcat:eppzijpvzncvnpjzkgenkug6ni

What's in a Message? [article]

Stergos D. Afantenos, Nicolas Hernandez
2009 arXiv   pre-print
In this paper we present the first step in a larger series of experiments for the induction of predicate/argument structures.  ...  Acknowledgments The authors would like to thank Konstantina Liontou and Maria Salapata for their help on the annotation of the messages, as well as the anonymous reviewers for their insightful and constructive  ...  set of rules.  ... 
arXiv:0902.2345v1 fatcat:h4phytp5pnh3lozo3bfshpyvui

Exploring Relational Features and Learning under Distant Supervision for Information Extraction Tasks

Ajay Nagesh
2015 Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Student Research Workshop  
By assigning more weight to precision, we are able to improve over the precision of Hoffmann by ∼1.6% (Table 5 .4).  ...  This way, all CD rules for the person annotator will invoke only dictionaries and regular expressions that have been specifically set aside for induction for the person type, and likewise for each of the  ...  We note that this set of constructs form only a subset of the AQL language necessary for the purpose of this work. The complete specification can be found in the AQL manual (IBM, 2012) .  ... 
doi:10.3115/v1/n15-2006 dblp:conf/naacl/Nagesh15 fatcat:3nhbkrm4vnhjvh37usn72wvb7i

Extracting actionable information from microtexts [article]

Ali Hürriyetoğlu
2020 arXiv   pre-print
We mostly first developed an automated approach, then we extended and improved it by integrating human intervention at various steps of the automated approach.  ...  Second, we suggest a method which facilitates the definition of relevance for an analyst's context and the use of this definition to analyze new data.  ...  Nelleke Oostdijk, for their invaluable support, guidance, enthusiasm, and optimism.  ... 
arXiv:2008.00343v1 fatcat:6bzalp37orfeveablkxupep3we
« Previous Showing results 1 — 15 out of 1,058 results