Filters








11,476 Hits in 3.2 sec

Bootstrapping for hierarchical document classification

Giordano Adami, Paolo Avesani, Diego Sona
2003 Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03  
For this reason we propose a method for the bootstrapping 1 process that makes a first hypothesis of categorization for a set of unlabeled documents, with respect to a given empty hierarchy of concepts  ...  Within this process, bootstrapping a taxonomy with examples represents a critical factor for the effective exploitation of any supervised learning model.  ...  This is very important for a subsequent supervised classification, since it is the premise to obtain a homogeneous assignment of the documents to the nodes, and consequently, a highly accurate hierarchical  ... 
doi:10.1145/956863.956920 dblp:conf/cikm/AdamiAS03 fatcat:okxiwqm5sjbpbod2fntsewvoki

Bootstrapping for hierarchical document classification

Giordano Adami, Paolo Avesani, Diego Sona
2003 Proceedings of the twelfth international conference on Information and knowledge management - CIKM '03  
For this reason we propose a method for the bootstrapping 1 process that makes a first hypothesis of categorization for a set of unlabeled documents, with respect to a given empty hierarchy of concepts  ...  Within this process, bootstrapping a taxonomy with examples represents a critical factor for the effective exploitation of any supervised learning model.  ...  This is very important for a subsequent supervised classification, since it is the premise to obtain a homogeneous assignment of the documents to the nodes, and consequently, a highly accurate hierarchical  ... 
doi:10.1145/956919.956920 fatcat:hfsgutcmiffnppqc5usuuwc4xa

On Dataless Hierarchical Text Classification

Yangqiu Song, Dan Roth
2014 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
In this paper, we systematically study the problem of dataless hierarchical text classification.  ...  Our results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples.  ...  Dataless + Bootstrapping Inspired by the dataless flat classification paper (Chang et al. 2008) , we also propose a bootstrapping procedure for dataless hierarchical classification.  ... 
doi:10.1609/aaai.v28i1.8938 fatcat:sayc3vwx3vdtzkha54egcrb5du

Recurrent Neural Networks with Mixed Hierarchical Structures and EM Algorithm for Natural Language Processing [article]

Zhaoxin Luo, Michael Zhu
2022 arXiv   pre-print
Simulation studies and real data applications demonstrate that the EM-HRNN model with bootstrap training outperforms other RNN-based models in document classification tasks.  ...  Furthermore, we develop two bootstrap strategies to effectively and efficiently train the EM-HRNN model on long text documents.  ...  Document Classification In the paper, we focus on the task of document classification.  ... 
arXiv:2201.08919v1 fatcat:jubyuozwrjcydalzpcz26upo54

Clustering documents into a web directory for bootstrapping a supervised classification

Giordano Adami, Paolo Avesani, Diego Sona
2005 Data & Knowledge Engineering  
The management of hierarchically organized data is starting to play a key role in the knowledge management community due to the proliferation of topic hierarchies for text documents.  ...  This paper proposes some solutions for the bootstrapping problem, that implicitly or explicitly use taxonomy definition: a baseline approach that classifies documents according to the class terms, and  ...  Specifically, for any taxonomy, 90% of the documents were used to bootstrap the taxonomy.  ... 
doi:10.1016/j.datak.2004.11.003 fatcat:76buyyrlhzbpjfmzeyazymhuei

Clustering documents in a web directory

Giordano Adami, Paolo Avesani, Diego Sona
2003 Proceedings of the fifth ACM international workshop on Web information and data management - WIDM '03  
Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents.  ...  In this paper, we propose some solutions for the bootstrapping problem, implicitly or explicitly using a taxonomy definition: a baseline approach where documents are classified according to class labels  ...  Nevertheless, a first sample of classified documents is always required. Moreover, these models are devised for non-hierarchical sets of classes.  ... 
doi:10.1145/956714.956715 fatcat:yspvbfs6y5dqjkemql3kpjvq64

Clustering documents in a web directory

Giordano Adami, Paolo Avesani, Diego Sona
2003 Proceedings of the fifth ACM international workshop on Web information and data management - WIDM '03  
Hierarchical categorization of documents is a task receiving growing interest due to the widespread proliferation of topic hierarchies for text documents.  ...  In this paper, we propose some solutions for the bootstrapping problem, implicitly or explicitly using a taxonomy definition: a baseline approach where documents are classified according to class labels  ...  Nevertheless, a first sample of classified documents is always required. Moreover, these models are devised for non-hierarchical sets of classes.  ... 
doi:10.1145/956699.956715 dblp:conf/widm/AdamiAS03 fatcat:4zebjctjpndbpl3n3pb43igla4

Learning to Integrate Web Taxonomies

Dell Zhang, Wee Sun Lee
2004 Social Science Research Network  
The second technique, Co-Bootstrapping, tries to facilitate the exploitation of inter-taxonomy relationships by providing category indicator functions as additional features for the objects.  ...  We investigate machine learning methods for automatically integrating objects from different taxonomies into a master taxonomy.  ...  For each ik i S S ∈ ⊂ x , one reasonable way to achieve hierarchical CS is as follows: first compute (1 ) To extend Co-Bootstrapping, it is useful to consider hierarchies as trees.  ... 
doi:10.2139/ssrn.3199170 fatcat:bvvssera2fdqlntmgwespr62mq

Convex Point Estimation using Undirected Bayesian Transfer Hierarchies [article]

Gal Elidan, Ben Packer, Geremy Heitz, Daphne Koller
2012 arXiv   pre-print
We show that our framework is effective for learning models that are part of transfer hierarchies for two real-life tasks: object shape modeling using Gaussian density estimation and document classification  ...  When related learning tasks are naturally arranged in a hierarchy, an appealing approach for coping with scarcity of instances is that of transfer learning using a hierarchical Bayes framework.  ...  We consider the task of density estimation for multivariate Gaussian shape models as well as a document classification task.  ... 
arXiv:1206.3252v1 fatcat:zym5jswsznc3hdmt7lgarohtpy

Bootstrapping Wikipedia to answer ambiguous person name queries

Toni Gruetze, Gjergji Kasneci, Zhe Zuo, Felix Naumann
2014 2014 IEEE 30th International Conference on Data Engineering Workshops  
While such features yield satisfactory results for a wide range of queries, they aggravate the problem of search for ambiguous entities: Searching for a person yields satisfactory results only if the person  ...  We show that when searching with ambiguous person names the information from Wikipedia can be bootstrapped to group the results according to the individuals occurring in them.  ...  WEB PAGE CLASSIFICATION Our goal is to transform the clustering of search results to ambiguous person name queries into a classification task, by bootstrapping knowledge base entities about people with  ... 
doi:10.1109/icdew.2014.6818303 dblp:conf/icde/GrutzeKZN14 fatcat:wxwuwip2hbfzbcejg7m5jeqxwe

Weakly-Supervised Hierarchical Text Classification

Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han
2019 PROCEEDINGS OF THE THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE AND THE TWENTY-EIGHTH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE  
In this paper, we propose a weakly-supervised neural method for hierarchical text classification.  ...  Hierarchical text classification, which aims to classify text documents into a given hierarchy, is an important task in many real-world applications.  ...  We thank anonymous reviewers for valuable and insightful feedback.  ... 
doi:10.1609/aaai.v33i01.33016826 fatcat:imnwyi4h5zh2zp74gi44dvqky4

Weakly-Supervised Hierarchical Text Classification [article]

Yu Meng, Jiaming Shen, Chao Zhang, Jiawei Han
2018 arXiv   pre-print
In this paper, we propose a weakly-supervised neural method for hierarchical text classification.  ...  Hierarchical text classification, which aims to classify text documents into a given hierarchy, is an important task in many real-world applications.  ...  We thank anonymous reviewers for valuable and insightful feedback.  ... 
arXiv:1812.11270v1 fatcat:ka6tzyjmjzccbp4fts6cppovwe

Coarse2Fine: Fine-grained Text Classification on Coarsely-grained Annotated Data [article]

Dheeraj Mekala, Varun Gangal, Jingbo Shang
2021 arXiv   pre-print
Our framework uses the fine-tuned generative models to sample pseudo-training data for training the classifier, and bootstraps on real unlabeled data for model refinement.  ...  To accommodate such requirements, we introduce a new problem called coarse-to-fine grained classification, which aims to perform fine-grained classification on coarsely annotated data.  ...  Acknowledgements We thank anonymous reviewers and program chairs for their valuable and insightful feedback.  ... 
arXiv:2109.10856v1 fatcat:xcsqk4vzgvhxpdfeprqaiozbxi

Limitations of Transformers on Clinical Text Classification

Shang Gao, Mohammed Alawad, Michael Todd Young, John Gounley, Noah Schaefferkoetter, Hong-Jun Yoon, Xiao-Cheng Wu, Eric B. Durbin, Jennifer Doherty, Antoinette Stroup, Linda Coyle, Georgia D Tourassi
2021 IEEE journal of biomedical and health informatics  
In this work, we introduce four methods to scale BERT, which by default can only handle input sequences up to approximately 400 words long, to perform document classification on clinical texts several  ...  classification on long clinical texts is limited.  ...  For more information, see https://creativecommons.org/licenses/by/4.0/ This article has been accepted for publication in a future issue of this journal, but has not been fully edited.  ... 
doi:10.1109/jbhi.2021.3062322 pmid:33635801 pmcid:PMC8387496 fatcat:yofi4nzvybehdpidxpirrmscm4

Classification of a COVID-19 dataset by using labels created from clustering algorithms

Layth Rafea, Abdulrahman Ahmed, Wisam D. Abdullah
2021 Indonesian Journal of Electrical Engineering and Computer Science  
In this paper, the hierarchical and k-means clustering techniques are used to create a tool for identifying similar articles on COVID-19 and filtering them based on their titles.  ...  By using this tool, specialists can limit the number of articles they need to study and pre-process these articles via data framing, tokenisation, normalisation and term frequency-inverse document frequency  ...  ACKNOWLEDGEMENTS This work was funded by the Allen Institute for AI, which prepared the CORD-19 dataset in partnership with leading research groups, and Kaggle, which hosted the COVID-19 Open Research  ... 
doi:10.11591/ijeecs.v21.i1.pp164-173 fatcat:qrl3526k25b6lfvx22viw6swui
« Previous Showing results 1 — 15 out of 11,476 results