Filters








23,058 Hits in 5.5 sec

World Knowledge as Indirect Supervision for Document Clustering [article]

Chenguang Wang, Yangqiu Song, Dan Roth, Ming Zhang, Jiawei Han
2016 arXiv   pre-print
We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain.  ...  In this paper, we provide an example of using world knowledge for domain dependent document clustering.  ...  (www.bd2k.nih.gov), and MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC.  ... 
arXiv:1608.00104v1 fatcat:bdkeudnywfbsld52xesxlvloxi

Incorporating World Knowledge to Document Clustering via Heterogeneous Information Networks

Chenguang Wang, Yangqiu Song, Ahmed El-Kishky, Dan Roth, Ming Zhang, Jiawei Han
2015 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15  
We consider the framework to use the world knowledge as indirect supervision. World knowledge is general-purpose knowledge, which is not designed for any specific domain.  ...  In this paper, we provide an example of using world knowledge for domain dependent document clustering.  ...  (BD2K) initiative (www.bd2k.nih.gov), and MIAS, a DHS-IDS Center for Multimodal Information Access and Synthesis at UIUC.  ... 
doi:10.1145/2783258.2783374 pmid:26705504 pmcid:PMC4688021 dblp:conf/kdd/WangSERZH15 fatcat:5pah4l4su5b7tmqzfwf4nmi5gu

Machine Learning with World Knowledge: The Position and Survey [article]

Yangqiu Song, Dan Roth
2017 arXiv   pre-print
representation, inference for knowledge linking and disambiguation, and learning with direct or indirect supervision.  ...  We start from the comparison of world knowledge with domain-specific knowledge, and then introduce three key problems in using world knowledge in learning processes, i.e., explicit and implicit feature  ...  Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon.  ... 
arXiv:1705.02908v1 fatcat:t4fypa6h3vampcp64eosvppsfe

Latent semantic modeling for slot filling in conversational understanding

Gokhan Tur, Asli Celikyilmaz, Dilek Hakkani-Tur
2013 2013 IEEE International Conference on Acoustics, Speech and Signal Processing  
Our method decomposes the task into two steps: latent n-gram clustering using a semi-supervised latent Dirichlet allocation (LDA) and sequence tagging for learning semantic structures in a CU system.  ...  Then, the topic posteriors obtained from the new LDA model are used as additional constraints to a sequence learning model for the semantic template filling task.  ...  Hence, rather than using utterances as documents, for each word, we compile documents based on their context and inject "direct" and "indirect" supervision.  ... 
doi:10.1109/icassp.2013.6639285 dblp:conf/icassp/TurCH13 fatcat:qqqqwo7rrvdbfmlka7agooovme

Interface agents: A review of the field [article]

Stuart E. Middleton
2002 arXiv   pre-print
Relevance feedback is normally used to provide labels for documents, allowing supervised learning techniques to be employed.  ...  Hierarchical agglomeration clustering -Starts with one document cluster, and agglomerates the most similar clusters until the desired number of clusters exists.  ... 
arXiv:cs/0203012v1 fatcat:7nzkypnpcjbczabewdo4ltj6xe

Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links [chapter]

M. Eduardo Ares, Javier Parapar, Álvaro Barreiro
2009 Lecture Notes in Computer Science  
In this paper we present a new clustering algorithm which extends the traditional batch k-means enabling the introduction of domain knowledge in the form of Must, Cannot, May and May-Not rules between  ...  Besides, we have applied the presented method to the task of avoiding bias in clustering.  ...  These methods, called "semi-supervised clustering", use background knowledge to impose some restrictions on the process, trying to influence the grouping that it finds in the data.  ... 
doi:10.1007/978-3-642-04417-5_32 fatcat:ly5wkjqxhva7pfdewo6cawdc4e

Recovering Traceability Links in Requirements Documents

Zeheng Li, Mingrui Chen, LiGuo Huang, Vincent Ng
2015 Proceedings of the Nineteenth Conference on Computational Natural Language Learning  
We propose a knowledge-rich approach to the task, where we extend a supervised baseline system with (1) additional training instances derived from human-provided annotator rationales; and (2) additional  ...  Acknowledgments We thank the three anonymous reviewers for their insightful comments on an earlier draft of the paper. This research was supported in part by the U.S.  ...  knowledge.  ... 
doi:10.18653/v1/k15-1024 dblp:conf/conll/LiCHN15 fatcat:4ctznpaxgfakzagyvvsmkpm3fe

A Review Of Trends In Research On Web Mining

Manoj Pandia, Subhendu Kumar Pani, Sanjay Kumar Padhi, Lingaraj Panigrahy, R. Ramakrishna
2011 International Journal of Instrumentation Control and Automation  
But considering the impressive variety of the web, retrieving interesting content has become a very difficult task.So, the World Wide Web is a fertile area for data mining research.Web mining is a research  ...  Today there are several billions of HTML documents, pictures and other multimedia files available via internet and the number is still rising.  ...  The difference between classification and clustering is that the classes in classification are predefined (supervised), but in clustering are not predefined (unsupervised).  ... 
doi:10.47893/ijica.2011.1007 fatcat:arv3ub66ljdoxix7be4vve37pi

Resolving Event Coreference with Supervised Representation Learning and Clustering-Oriented Regularization [article]

Kian Kenyon-Dean, Jackie Chi Kit Cheung, Doina Precup
2018 arXiv   pre-print
We present an approach to event coreference resolution by developing a general framework for clustering that uses supervised representation learning.  ...  For both within- and cross-document coreference on the ECB+ corpus, our model obtains better results than models that require significantly more pre-annotated information.  ...  We thank the anonymous reviewers for their helpful comments and suggestions.  ... 
arXiv:1805.10985v1 fatcat:t44ar5vg7fbttjndzbt3brrwlq

Resolving Event Coreference with Supervised Representation Learning and Clustering-Oriented Regularization

Kian Kenyon-Dean, Jackie Chi Kit Cheung, Doina Precup
2018 Proceedings of the Seventh Joint Conference on Lexical and Computational Semantics  
We present an approach to event coreference resolution by developing a general framework for clustering that uses supervised representation learning.  ...  For both withinand cross-document coreference on the ECB+ corpus, our model obtains better results than models that require significantly more preannotated information.  ...  We thank the anonymous reviewers for their helpful comments and suggestions.  ... 
doi:10.18653/v1/s18-2001 dblp:conf/starsem/Kenyon-DeanCP18 fatcat:qg42kzyvrzhkbmqi7xmgczlsl4

Author Name Disambiguation in Bibliographic Databases: A Survey [article]

Muhammad Shoaib, Ali Daud, Tehmina Amjad
2020 arXiv   pre-print
In this survey, we start with three basic AND problems, followed by need for solution and challenges. A generic, five-step framework is provided for handling AND issues.  ...  These steps are; (1) Preparation of dataset (2) Selection of publication attributes (3) Selection of similarity metrics (4) Selection of models and (5) Clustering Performance evaluation.  ...  Acknowledgement We are grateful to the Higher Education Commission (HEC) of Pakistan for their financial assistance to promote the research trend in the country under Indigenous 5000 Fellowship Program  ... 
arXiv:2004.06391v1 fatcat:g6ohfpzeejbwhlxmt7vlmyjqo4

An Approach to Automated Learning of Conceptual Graphs from Text [chapter]

Fulvio Rotella, Stefano Ferilli, Fabio Leuzzi
2013 Lecture Notes in Computer Science  
Many document collections are private and accessible only by selected people.  ...  In this process, considering relational information allows a broader perspective in the similarity assessment for clustering, and ensures more flexible and understandable descriptions of the generalized  ...  In some cases this leads to the bridging of potentially disjoint portion of the graph, but are exploitable for tasks as retrieval of documents of interest, as well as for the shifting of the representation  ... 
doi:10.1007/978-3-642-38577-3_35 fatcat:rxu5cs76bfe6vatyfwx36wnoim

Quantitive evaluation of Web site content and structure

Christian Bauer, Arno Scharl
2000 Internet Research  
Based on the preprocessed information, a multi-methodological approach is chosen that comprises statistical clustering, textual analysis, supervised and nonsupervised neural networks and manual classification  ...  Describes an approach automatically to classify and evaluate publicly accessible World Wide Web sites.  ...  appropriate for real world Web information system.  ... 
doi:10.1108/10662240010312138 fatcat:zewuf65ggvfxvk4xowuoj55poa

Uncertainty Reduction for Knowledge Discovery and Information Extraction on the World Wide Web

Heng Ji, Hongbo Deng, Jiawei Han
2012 Proceedings of the IEEE  
ABSTRACT | In this paper, we give an overview of knowledge discovery (KD) and information extraction (IE) techniques on the World Wide Web (WWW).  ...  This overview of knowledge discovery (KD) and information extraction (IE) techniques for web-based applications focuses on new techniques for handling the uncertainty challenge in the web setting.  ...  These could be used for various forms of indirect or distant supervision [22] , where instances in a large corpus of such pairs are taken as (positive) training instances.  ... 
doi:10.1109/jproc.2012.2190489 fatcat:4rye7lknyvbe5ggxtpv7fqptgm

A Review on Clustering Technique

Vivek Kumar
2015 International Journal on Recent and Innovation Trends in Computing and Communication  
Competitive learning is used for Clustering in Neural network. Example of Competitive learning, SOM and ART are famous for clustering.  ...  Hidden Knowledge is very important in data mining field.  ...  Typically, the number of clusters for supervised learning is pre-specified. Alternatively, in unsupervised clustering a training dataset is not used.  ... 
doi:10.17762/ijritcc2321-8169.1503136 fatcat:rdt7mpmyvnbm3ph6o7v72wmpdy
« Previous Showing results 1 — 15 out of 23,058 results