Filters








38,221 Hits in 7.6 sec

Using Both Latent and Supervised Shared Topics for Multitask Learning [chapter]

Ayan Acharya, Aditya Rawal, Raymond J. Mooney, Eduardo R. Hruschka
2013 Lecture Notes in Computer Science  
Experimental results on both document and image classification show that both types of supervision improve the performance of both DSLDA and NP-DSLDA and that sharing both latent and supervised topics  ...  This approach is particularly useful for multitask learning, in which both latent and supervised topics are shared between multiple categories.  ...  While training DSLDA-NSLT with examples from the y th class, only a subset of the first k 1 topics (or a subset of the supervised ones based on which of them are present in the training documents) and  ... 
doi:10.1007/978-3-642-40991-2_24 fatcat:aoifgbq5rnbgjnbwqp6g267el4

Aggregated topic models for increasing social media topic coherence

Stuart J. Blair, Yaxin Bi, Maurice D. Mulvenna
2019 Applied intelligence (Boston)  
We also make use of the aggregated topic model on social media data to validate the method in a realistic scenario and find that again it outperforms individual topic models.  ...  In this study we investigate the process of aggregating multiple topic models generated using different parameters with a focus on whether combining the general and specific topics is able to increase  ...  Acknowledgments This work was supported by a University of Ulster Vice-Chancellor's Research Scholarship.  ... 
doi:10.1007/s10489-019-01438-z fatcat:bvptwcm6n5hw3fshco6gyfesby

Evaluating Topic Modeling Interpretability Using Topic Labeled Gold Standard Sets

Biagio Palese, Northern Illinois University, Gabriele Piccoli, Louisiana State University
2020 Communications of the Association for Information Systems  
Finally, we showcase a methodology for designing and developing gold-standard sets for validating topic models, which researchers interested in developing gold-standard sets in domains and contexts appropriate  ...  Accordingly, we propose a method that researchers can use to select models when they assess topics' human interpretability.  ...  ., perplexity) have gained consensus in the literature. These measures require training the topic model on a corpus subset so different parametrizations can be evaluated on a test set.  ... 
doi:10.17705/1cais.04720 fatcat:2ozfytv23fczrmei6xdr2rmnm4

Exploring topic structure

Jiyin He
2012 SIGIR Forum  
Exploring topic structure: Coherence, diversity and relatedness He, J. Link to publication Citation for published version (APA): He, J. (2011).  ...  Exploring topic structure: Coherence, diversity and relatedness General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or  ...  It consists of 50 million English pages and is used as the test collection at the TREC 2009 Web track.  ... 
doi:10.1145/2215676.2215690 fatcat:mh34cmstt5cr3phgj2cltgikie

A New Sentence-Based Interpretative Topic Modeling and Automatic Topic Labeling

Olzhas Kozbagarov, Rustam Mussabayev, Nenad Mladenovic
2021 Symmetry  
The internal knowledge source is represented by the text corpus itself and often it is a single knowledge source in the traditional topic modeling approaches.  ...  The external knowledge source is represented by the BERT, a machine learning model which was preliminarily trained on a huge amount of textual data and is used for generating the context-dependent sentence  ...  Second step-Target Dataset Formation. • input-sentence embeddings of a text collection. • output-a subset of sentence embeddings of a text collection.  ... 
doi:10.3390/sym13050837 fatcat:ujgre6jzrzeozbynjnhclu47i4

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity [chapter]

Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke
2017 Lecture Notes in Computer Science  
General topics only include common information from a background corpus and are assigned to most of the documents in the collection.  ...  A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words.  ...  In addition, we remove the 100 most frequent words in the collection and words with fewer than 5 occurrences.  ... 
doi:10.1007/978-3-319-56608-5_6 fatcat:xu7xbtl25nedbk55bdcc6kvroa

Hierarchical Re-estimation of Topic Models for Measuring Topical Diversity [article]

Hosein Azarbonyad and Mostafa Dehghani and Tom Kenter and Maarten Marx and Jaap Kamps and Maarten de Rijke
2017 arXiv   pre-print
General topics only include common information from a background corpus and are assigned to most of the documents in the collection.  ...  A recent proposal for measuring topical diversity identifies three elements for assessing diversity: words, topics, and documents as collections of words.  ...  In addition, we remove the 100 most frequent words in the collection and words with fewer than 5 occurrences.  ... 
arXiv:1701.04273v1 fatcat:h2smns34qnc7bglwpm6fmyryly

Scalable Probabilistic Entity-Topic Modeling [article]

Neil Houlsby, Massimiliano Ciaramita
2013 arXiv   pre-print
Training such models is challenging because of the topic and vocabulary size, both in the millions.  ...  We report state-of-the-art performance on a public dataset.  ...  Given an input text where entity mentions have been identified by a pre-processor, e.g., a named entity tagger, the goal of a system is to disambiguate (link) the entity mentions with respect to a Wikipedia  ... 
arXiv:1309.0337v1 fatcat:kgf3zd2nvnel7bpmftqfiqr3iy

Topic-driven reader comments summarization

Zongyang Ma, Aixin Sun, Quan Yuan, Gao Cong
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
To evaluate the two models, we conducted experiments on 1005 Yahoo! News articles with more than one million comments.  ...  Both models treat a news article as a master document and each of its comments as a slave document.  ...  Both models are based on the observation that topics of news articles have significant effect 1 More than 60% of comments in our data collection has fewer than 10 words each. on topics of comments, and  ... 
doi:10.1145/2396761.2396798 dblp:conf/cikm/MaSYC12 fatcat:epqxoll4cbgzfa2ph43qhk2f6u

Twitter Topic Fuzzy Fingerprints

Hugo Rosa, Fernando Batista, Joao Paulo Carvalho
2014 2014 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE)  
In this paper we propose to approach the subject of Twitter Topic Detection using a new technique called Topic Fuzzy Fingerprints.  ...  A comparison is made with two popular text classification techniques, Support Vector Machines (SVM) and k-Nearest Neighbours (kNN).  ...  ACKNOWLEDGMENT This work was supported by national funds through FCT Fundacao para a Ciencia e a Tecnologia, under project PTDC/IVC-ESCT/4919/2012 and project PEst-OE/EEI/LA0021/2013.  ... 
doi:10.1109/fuzz-ieee.2014.6891781 dblp:conf/fuzzIEEE/RosaBC14 fatcat:g545gv4mxzacdpebmub2qaniem

Topic-oriented collaborative crawling

Chiasen Chung, Charles L. A. Clarke
2002 Proceedings of the eleventh international conference on Information and knowledge management - CIKM '02  
We propose a topic-oriented approach, in which the Web is partitioned into general subject areas with a crawler assigned to each.  ...  We examine design alternatives for a topic-oriented distributed crawler, including the creation of a Web page classifier for use in this context.  ...  In total, 8.9 million pages were downloaded and classified.  ... 
doi:10.1145/584800.584802 fatcat:rgcoz7oxuzdivj6tsivbzvnkeu

Topic-oriented collaborative crawling

Chiasen Chung, Charles L. A. Clarke
2002 Proceedings of the eleventh international conference on Information and knowledge management - CIKM '02  
We propose a topic-oriented approach, in which the Web is partitioned into general subject areas with a crawler assigned to each.  ...  We examine design alternatives for a topic-oriented distributed crawler, including the creation of a Web page classifier for use in this context.  ...  In total, 8.9 million pages were downloaded and classified.  ... 
doi:10.1145/584792.584802 dblp:conf/cikm/ChungC02 fatcat:a5aatkrugbdonkjed3aty6exvu

Special Topics

2007 Journal of the American College of Cardiology  
Methods: GWTG uses a collaborative model with web-based data collection, decision support, and on-demand reporting.  ...  Methods: GWTG-HF uses a collaborative model, and a web-based tool for data collection, decision support, and on-demand reporting.  ...  Fewer Group A patients, compared with Group B patients, were discharged on at least 3 of 4 EBMs (65% vs 87%, respectively; p<0.0001).  ... 
doi:10.1016/j.jacc.2007.01.041 fatcat:4sbuu5mngfdn3moqqpqy3iwzpi

Topic Models for Image Localization

Zheng Wang, Faisal Z. Qureshi
2013 2013 International Conference on Computer and Robot Vision  
Each scene group consists of "visually similar" images as determined by the topic model. Next raw SIFT features are collected from every image in a scene group and a FLANN index is constructed.  ...  Our method learns a topic model over the reference database, which in turn is used to divide the reference database into scene groups.  ...  Acknowledgements I would like to express my thanks to my lab colleagues, and in particular W. Starzyk, N. Parvin, L. Zarrabeitia and M. Helala-for their support and friendship.  ... 
doi:10.1109/crv.2013.36 dblp:conf/crv/WangQ13 fatcat:43daigumjbddfmaciic3kqno2q

HiTR: Hierarchical Topic Model Re-estimation for Measuring Topical Diversity of Documents [article]

Hosein Azarbonyad, Mostafa Dehghani, Tom Kenter, Maarten Marx, Jaap Kamps, Maarten de Rijke
2018 arXiv   pre-print
Topic models play a central role in this approach and, hence, their quality is crucial to the efficacy of measuring topical diversity.  ...  For measuring topical diversity of text documents, our HiTR approach improves over the state-of-the-art measured on PubMed dataset.  ...  In addition, we remove the 100 most frequent words in the collection and words with fewer than five occurrences.  ... 
arXiv:1810.05436v1 fatcat:y7qhgfr62zfhxl4webmf7nugna
« Previous Showing results 1 — 15 out of 38,221 results