Filters








9,657 Hits in 3.1 sec

Probabilistic text modeling with orthogonalized topics

Enpeng Yao, Guoqing Zheng, Ou Jin, Shenghua Bao, Kailong Chen, Zhong Su, Yong Yu
2014 Proceedings of the 37th international ACM SIGIR conference on Research & development in information retrieval - SIGIR '14  
Topic models have been widely used for text analysis. Previous topic models have enjoyed great success in mining the latent topic structure of text documents.  ...  In this paper, we propose the Orthogonalized Topic Model (OTM) which imposes an orthogonality constraint on the topic-term distributions.  ...  A model with better topic-word distributions can mine more diversified and reasonable topics, and therefore it may achieve better performance on many text mining tasks, like text classification and clustering  ... 
doi:10.1145/2600428.2609471 dblp:conf/sigir/YaoZJBCSY14 fatcat:h7lmhoqjrbd65llskjfih7zrw4

Persian Text Classification Enhancement by Latent Semantic Space

Mohammad Bagher Dastgheib, Sara Koleini
2019 International journal of information science and management  
Latent semantic indexing is used to transform VSM to orthogonal semantic space with term relation consideration.  ...  A traditional model in IR and text data representation is the vector space model. In this representation cost of computations are dependent upon the dimension of the vector.  ...  After the keywords are specified, all texts are processed, and a vector is provided for each text and the input matrix is provided with a probabilistic latent semantic analysis method.  ... 
doaj:89c8ded7bf0f4550a805fd711525e656 fatcat:ctvoxhnh6beo5kmlp5ozffvdke

Topic Modeling: A Comprehensive Review

Pooja Kherwa, Poonam Bansal
2018 EAI Endorsed Transactions on Scalable Information Systems  
Topic modelling is the new revolution in text mining. It is a statistical technique for revealing the underlying semantic structure in large collection of documents.  ...  At the end paper is concluded with detailed discussion on challenges of topic modelling, which will definitely give researchers an insight for good research.  ...  of topic modeling for exploring the topic modeling from multiple perspective are given in detail.  Quantitative evaluation with standard metric on two data set are also done on both probabilistic and  ... 
doi:10.4108/eai.13-7-2018.159623 fatcat:lu6al57vp5aahbytyejhqrlzry

Enhancing Text Categorization with Semantic-enriched Representation and Training Data Augmentation

X. Lu, B. Zheng, A. Velivelli, C. Zhai
2006 JAMIA Journal of the American Medical Informatics Association  
A probabilistic topic model was applied to extract major semantic topics from a corpus of text of interest.  ...  The representation of documents was projected from the high-dimensional vocabulary space onto a semantic topic space with reduced dimensionality.  ...  LDA Model The latent Dirichlet allocation (LDA) model 16 is a member of the generative probabilistic models that use a small number of topics to describe a collection of documents. 16 -19 The LDA model  ... 
doi:10.1197/jamia.m2051 pmid:16799127 pmcid:PMC1561790 fatcat:iix6pfcutnhvdg3q4sd53xihyy

A mixture model for contextual text mining

Qiaozhu Mei, ChengXiang Zhai
2006 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06  
Contextual text mining is concerned with extracting topical themes from a text collection with context information (e.g., time and location) and comparing/analyzing the variations of themes over different  ...  In this paper, we propose a new general probabilistic model for contextual text mining that can cover several existing models as special cases.  ...  As stressed in previous work [14] , a theme in a contextualized text collection D is a probabilistic distribution of words that characterizes a semantically coherent topic or subtopic.  ... 
doi:10.1145/1150402.1150482 dblp:conf/kdd/MeiZ06 fatcat:ecmv6lzi5ffrxlsgj3ezacohyy

Identifying biological concepts from a protein-related corpus with a probabilistic topic model

Bin Zheng, David C McLean, Xinghua Lu
2006 BMC Bioinformatics  
The identified topics/concepts provide parsimonious and semantically-enriched representation of the texts in a semantic space with reduced dimensionality and can be used to index text.  ...  The goal of this research is to identify the major biological topics/concepts from a corpus of protein-related MEDLINE titles and abstracts by applying a probabilistic topic model.  ...  Figure 1 shows how a topic can be represented as word-usage pattern in a probabilistic topic model.  ... 
doi:10.1186/1471-2105-7-58 pmid:16466569 pmcid:PMC1420333 fatcat:zzqmnkednrgtblyl3qqty6rdsq

Supplemental material for Empowering open science with reflexive and spatialised indicators

Juste Raimbault, Pierre-Olivier Chasset, Clémentine Cottineau, Hadrien Commenges, Denise Pumain, Christine Kosmopoulos, Arnaud Banos
2019 Figshare  
Supplemental Material for Empowering open science with reflexive and spatialised indicators by Juste Raimbault Pierre-Olivier Chasset Clémentine Cottineau Christine Kosmopoulos  ...  Topic allocation To choose the number of topics, we estimated the LDA model with the following number of topics: 2, 5, 10, 20, 25, 27, 29, 31, 33, 35, 50, 100 and 200 .  ...  Figure 4 . 4 (Left) Entropy of the LDA model per number of topics. (Right) Perplexity of the LDA model per number of topics.  ... 
doi:10.25384/sage.9733217 fatcat:sajv3qddg5dirll74b4zs4o6ny

EXPLORING INFORMATION RETRIEVAL BY LATENT SEMANTIC INDEXING AND LATENT DIRICHLET ALLOCATION TECHNIQUES

Radha Guha
2020 International Research Journal of Computer Science  
This paper explores information retrieval models and experiments Semantic Indexing (LSI) first and then with the more efficient topic modeling algorithm of Latent Dirichlet Allocation (LDA).  ...  Comparisons between the two models are described clearly and concisely in their ef topic modeling. Various applications of topic modeling are also reviewed in this paper from the literature.  ...  U is an orthogonal matrix with left singular vectors of AA T . Similarly is a topic by document matrix and an element in indicates how strongly a document relates to a topic.  ... 
doi:10.26562/irjcs.2020.v0705.001 fatcat:3mmmcy5kuve5hetxfh456bxwoy

Unsupervised Aspect Extraction Algorithm for Opinion Mining using Topic Modeling

Azizkhan F Pathan, Chetana Prakash
2021 Global Transitions Proceedings  
With the massive use of electronic gadgets and the developing fame of web-based media, a great deal of text information is being produced at the rate never observed.  ...  To determine topics in large textual documents Topic modelling is used.  ...  Probabilistic generative models, such as the topic model, are widely used in information retrieval and text mining.  ... 
doi:10.1016/j.gltp.2021.08.005 fatcat:qrbwvo34cjg47cpneagcy7pbdu

Representative Sampling for Text Classification Using Support Vector Machines [chapter]

Zhao Xu, Kai Yu, Volker Tresp, Xiaowei Xu, Jizhi Wang
2003 Lecture Notes in Computer Science  
The results demonstrated that representative sampling offers excellent learning performance with fewer labeled documents and thus can reduce human efforts in text classification tasks.  ...  In an empirical study we compared representative sampling both with random sampling and with SVM active learning.  ...  It has been used in probabilistic models and specifically in context with with the naïve Bayes model for text classification in a Bayes learning setting [McCallum & Nigam, 1998 ].  ... 
doi:10.1007/3-540-36618-0_28 fatcat:qrlfooa2nfefnfenwemffijww4

Numerical Methods, Software and Analysis

J. R. Rice, H. Saunders
1986 Journal of Vibration and Acoustics  
The chapter concludes with approximation of mathematical functions by means of orthogonal polynomials. An excellent chapter! Chapter 12 is a most informative chapter.  ...  Chapter 4 contains models and formulas for numerical computation. Beginning with polynomials, this chapter extends to piecewise polynomials.  ... 
doi:10.1115/1.3269330 fatcat:524dgr36pzcdbod3dfzcgpcbsm

Insights into explicit semantic analysis

Thomas Gottron, Maik Anderka, Benno Stein
2011 Proceedings of the 20th ACM international conference on Information and knowledge management - CIKM '11  
Based on this model we explain some of the phenomena that have been observed in previous work and support our findings with new experiments.  ...  In this paper we look at the foundations of ESA from a theoretical point of view and employ a general probabilistic model for term weights which reveals how ESA actually works.  ...  This topical focus is needed for the orthogonality property of the concepts and is referred to as concept hypothesis.  ... 
doi:10.1145/2063576.2063865 dblp:conf/cikm/GottronAS11 fatcat:yq2ladxexfecrdn65qex6km7ku

Representations for multi-document event clustering

Wim De Smet, Marie-Francine Moens
2012 Data mining and knowledge discovery  
As underlying models we consider the vector space model (both in a term setting and in a latent semantic analysis setting) and probabilistic topic models based on latent Dirichlet allocation.  ...  Content terms can be classified as topical terms or named entities, yielding several models for content fusion and comparison. All used methods are completely unsupervised.  ...  Recently probabilistic topic models have been proposed as an alternative to LSA. Proponents claim topic models deal with polysemy in a more natural way.  ... 
doi:10.1007/s10618-012-0270-1 fatcat:bnvptyrmwbhpxera5iq6se2exe

Editorial

2019 Intelligent Data Analysis  
Mirzaei et al. in the next article of this group discuss the topic of resource assignment in cooperative energy heterogeneous systems with non-orthogonal multiple access.  ...  Liu et al. in the first article of this issue argue that since volume of short text data increases rapidly it is essential to organize and summarize these data automatically where topic model is one of  ... 
doi:10.3233/ida-190001 fatcat:ref7hqkvtne6hb7wzfolakjmri

Empirical prior latent Dirichlet allocation model

M.A. Adegoke, J.O.A. Ayeni, P.A. Adewole
2019 Nigerian Journal of Technology  
The model was implemented and tested with benchmarked data and it achieves a prediction accuracy of 92.15%.  ...  In this study, empirical prior Dirichlet allocation (epLDA) model that uses latent semantic indexing framework to derive the priors required for topics computation from data is presented.  ...  Hierarchical Kernelised Probabilistic Matrix Factorisation (BH-KPMF) models respectively to model the text bills on the same data set for a predictive task.  ... 
doi:10.4314/njt.v38i1.27 fatcat:jkbwy6ydl5ci3kp7pupx5pd34q
« Previous Showing results 1 — 15 out of 9,657 results