Filters








2,600 Hits in 6.9 sec

Large scale link based latent Dirichlet allocation for web document classification [article]

István Bíró, Jácint Szabó
2010 arXiv   pre-print
In this paper we demonstrate the applicability of latent Dirichlet allocation (LDA) for classifying large Web document collections.  ...  The inferred LDA model can be applied for classification as dimensionality reduction similarly to latent semantic indexing.  ...  Acknowledgment To Zoltán Gyöngyi for fruitful discussions and for providing us with the DMOZ labels over the UK2007-WEBSPAM corpus, and to Ramesh Nallapati for providing us their link-PLSA-LDA and pairwise-link-LDA  ... 
arXiv:1006.4953v1 fatcat:qxs2pop55jcwdcxq6qvbrulavi

Topic Modeling: A Comprehensive Review

Pooja Kherwa, Poonam Bansal
2018 EAI Endorsed Transactions on Scalable Information Systems  
It includes classification hierarchy, Topic modelling methods, Posterior Inference techniques, different evolution models of latent Dirichlet allocation (LDA) and its applications in different areas of  ...  It is a statistical technique for revealing the underlying semantic structure in large collection of documents.  ...  In addition, being faster than sampling methods and is particularly well suited for large scale problems Posterior Inference for Latent Dirichlet Allocation Mean Field Variational Method Mean field  ... 
doi:10.4108/eai.13-7-2018.159623 fatcat:lu6al57vp5aahbytyejhqrlzry

Openaire2020 D10.2 - Clustering Algorithms

Omiros Metaxas, Theodoros Giannakopoulos
2016 Zenodo  
This deliverable describes clustering algorithms for scholarly content, their implementation and performance, and reports on the outcomes.  ...  semantic indexing Latent Dirichlet allocation Indexing by latent semantic analysis On an equivalence between PLSI and LDA Fast Algorithms for Mining Association Rules in Large DBs Mining association  ...  analysis and Latent Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, Latent Dirichlet allocation, 2003) (Blei D. , 2012) (Steyvers & Griffiths, 2006) (Teh, Jordan, M., Beal, & Blei, 2006) , that  ... 
doi:10.5281/zenodo.1257349 fatcat:urryukf52barnfmymb6noquzq4

Latent Dirichlet Allocation (LDA) and Topic modeling: models, applications, a survey [article]

Hamed Jelodar, Yongli Wang, Chi Yuan, Xia Feng, Xiahui Jiang, Yanchao Li, Liang Zhao
2018 arXiv   pre-print
There are various methods for topic modeling, which Latent Dirichlet allocation (LDA) is one of the most popular methods in this field.  ...  Topic modeling is one of the most powerful techniques in text mining for data mining, latent data discovery, and finding relationships among data, text documents.  ...  Acknowledgements This article has been awarded by the National Natural Science Foundation of China (61170035, 61272420, 81674099, 61502233), the Fundamental Research Fund for the Central Universities (  ... 
arXiv:1711.04305v2 fatcat:jzsx6owjyjfo3gkbohrc2ggkzq

A Survey of Topic Modeling in Text Mining

Rubayyi Alghamdi, Khalid Alfalqi
2015 International Journal of Advanced Computer Science and Applications  
These methods are Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), and Correlated Topic Model (CTM).  ...  for topics in a document θj and words in a topic φk have Dirichlet priors [12] Indeed, there are several applications and models based on the Latent Dirichlet Allocation (LDA) method such as:  Role  ...  This model, such as LDA [10] Latent Dirichlet Allocation (LDA) is an Algorithm for text mining that is based on statistical (Bayesian) topic models and it is very widely used.  ... 
doi:10.14569/ijacsa.2015.060121 fatcat:fb3calb3yrdcnimaeswz3yffuq

A Retrieval Sorting Approach for Online Forums Based on Domain Topics

Yu Yan Zhang
2013 Advanced Materials Research  
This work has important significance for the research of improving the performance of retrieval results of web forums.  ...  Focusing on the development of Web 2.0 applications, a result ranking approach is proposed on the basis of LDA model to rank the search results from Web forums.  ...  ACKNOWLEDGMENT The author is most grateful to the anonymous referees for their constructive and helpful comments on the earlier version of the manuscript that helped to improve the presentation of the  ... 
doi:10.4028/www.scientific.net/amr.756-759.2152 fatcat:uangrd6yafe4tjkeyxjyz4okaa

Review of Latent Dirichlet Allocation Methods Usable in Voice of Customer Analysis

Lucie Šperková
2018 Acta Informatica Pragensia  
The aim of the article is to detect and review existing topic modelling methods of Latent Dirichlet Allocation and their modifications usable in Voice of Customer analysis.  ...  The author completed a systematic literature review of peer-reviewed published journal articles indexed in leading databases of Scopus and Web of Science and concerning the current use of Latent Dirichlet  ...  Latent Dirichlet Allocation Latent Dirichlet Allocation (LDA) developed by (Blei et al., 2003) is an unsupervised learning model based on Bayesian networks searching for the semantic structure of the  ... 
doi:10.18267/j.aip.120 fatcat:z7o3lbsybbfxpa4qiev2zfi6pq

Feature LDA: A Supervised Topic Model for Automatic Detection of Web API Documentations from the Web [chapter]

Chenghua Lin, Yulan He, Carlos Pedrinaci, John Domingue
2012 Lecture Notes in Computer Science  
We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures  ...  In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not.  ...  In this paper, we propose a novel supervised topic model called feature latent Dirichlet allocation (feaLDA) for text classification by formulating the generative process that topics are draw dependent  ... 
doi:10.1007/978-3-642-35176-1_21 fatcat:kgpdsys245dazefay7q4zaw7yu

Automatic Text Summarization Using Latent Drichlet Allocation (LDA) for Document Clustering

Erwin Yudi Hidayat, Fahri Firdausillah, Khafiizh Hastuti, Ika Novita Dewi, Azhari Azhari
2015 IJAIN (International Journal of Advances in Intelligent Informatics)  
In this paper, we present Latent Drichlet Allocation in automatic text summarization to improve accuracy in document clustering.  ...  Several steps of clustering in this research are preprocessing, automatic document compression using feature method, automatic document compression using LDA, word weighting and clustering algorithm The  ...  In the model of automatic document summarization, Feature Based and Latent Dirichlet Allocation algorithm can be used for the sentence reduction process [5] .  ... 
doi:10.26555/ijain.v1i3.43 fatcat:vxk34l23nnbrld4lctqayhdqiq

Detecting Content Spam on the Web through Text Diversity Analysis

Anton Pavlov, Boris V. Dobrov
2011 Spring Young Researchers Colloquium on Databases and Information Systems  
In this paper we propose a set of content diversity features based on frequency rank distributions for terms and topics.  ...  Web spam is considered to be one of the greatest threats to modern search engines.  ...  A large amount of linguistic features were explored in a work by Piskorski et. al. [19] . Latent Dirichlet Allocation [5] is known to perform well in text classification tasks.  ... 
dblp:conf/syrcodis/PavlovD11 fatcat:fnd3c3pv7vcw3bepm4rtuluuqm

Using Probabilistic Topic Models in Enterprise Social Software [chapter]

Konstantinos Christidis, Gregoris Mentzas
2010 Lecture Notes in Business Information Processing  
We employ Latent Dirichlet Allocation in order to elicit latent topics and use the latter to assess similarities in resource and tag recommendation as well as for the expansion of query results.  ...  Enterprise social software (ESS) systems are open and flexible corporate environments which utilize Web 2.0 technologies to stimulate participation through informal interactions and aggregate these interactions  ...  Research reported in this paper has been partially financed by the European Commission in the OrganiK project (FP7: Research for the Benefit of SMEs, 222225).  ... 
doi:10.1007/978-3-642-12814-1_3 fatcat:xp64upkfqjdj5m475454afdu3y

Detecting Content Spam on the Web through Text Diversity Analysis

Anton S. Pavlov, Boris V. Dobrov
2018 Proceedings of the Institute for System Programming of RAS  
In this paper we propose a set of content diversity features based on frequency rank distributions for terms and topics.  ...  Web spam is considered to be one of the greatest threats to modern search engines.  ...  A large amount of linguistic features were explored in a work by Piskorski et. al. [6] . Latent Dirichlet Allocation [7] is known to perform well in text classification tasks.  ... 
doaj:b08c7ee047704df0b4574bcec267b951 fatcat:66tg4rydgnfqrk3xldlgsybroq

A Probe on Document Clustering Methodologies and its Performance Metrics

2019 International journal of recent technology and engineering  
This assessment gives an implication about the different methods(Vector Space Model, Latent Sematic Indexing, Latent Dirichlet Allocation, Singular Value Decomposition, Doc2Vec Model, Graph model), distance  ...  Due to the huge growth of internet usage, large volume of information flow has also been increased, which leads to the problem of information congestion.  ...  Latent Dirichlet Allocation LDA shows latent topics by using random mixtures based on probabilistic model.  ... 
doi:10.35940/ijrte.b2624.078219 fatcat:ymdvldvednhyzla7jpttza5hma

Ontology Construction Based on Latent Topic Extraction in a Digital Library [chapter]

Jian-hua Yeh, Naomi Yang
2008 Lecture Notes in Computer Science  
Human-provided knowledge network presents strong semantic features, but this generation process is both labor-intensive and inconsistent under large scale scenario.  ...  The method proposed in this paper combines the statistical correction and latent topic extraction of textual data in a digital library, which produces a semantic-oriented and OWL-based ontology.  ...  But with the growing size of real world concepts and their relationships, it is more difficult for humans to generate and maintain large scale ontologies.  ... 
doi:10.1007/978-3-540-89533-6_10 fatcat:7v3zpzhsqnestkczoz7obdfrpe

A Survey on Journey of Topic Modeling Techniques from SVD to Deep Learning

Deepak Sharma, Bijendra Kumar, Satish Chand
2017 International Journal of Modern Education and Computer Science  
Here we present a survey on journey of topic modeling techniques comprising Latent Dirichlet Allocation (LDA) and non-LDA based techniques and the reason for classify the techniques into LDA and non-LDA  ...  We have used the three hierarchical classification criteria's for classifying topic models that include LDA and non-LDA based, bag-of-words or sequence-of-words approach and unsupervised or supervised  ...  A lightweight Laplace approximation was used efficiently linked as bridge on a conditionally independent mixture of latent Dirichlet allocation and Gaussian process regression.  ... 
doi:10.5815/ijmecs.2017.07.06 fatcat:nadnmsoj4zdi7onlxivrne6gqm
« Previous Showing results 1 — 15 out of 2,600 results