9,448 Hits in 4.3 sec

Non-Parametric Document Clustering by Ensemble Methods

Edgar González, Jordi Turmo
2008 Revista de Procesamiento de Lenguaje Natural (SEPLN)  
Este artículo presenta una comparación de estrategias para clustering no paramétrico de documentos por consenso.  ...  Resumen: Los sesgos de los algoritmos individuales para clustering no paramétrico de documentos pueden conducir a soluciones no óptimas.  ...  Acknowledgments This work has been partially funded by the European CHIL Project (IP-506909); the Commissionate for Universities and Research of the Department of Innovation, Universities and Enterprises  ... 
dblp:journals/pdln/GonzalezT08 fatcat:s5aot64axjeclfusuaqrlspjci

A New Approach for Topic Detection using Adaptive Neural Networks [article]

Meriem Manai
2019 arXiv   pre-print
In the first step we used the FuzzyART algorithm for the training phase. In the second step we used a classifier using Paragraph Vector for the test phase.  ...  The comparative study of our approach on the 20 Newsgroups dataset showed that our approach is able to detect almost relevant topics.  ...  Topic Identification Method For Textual Document : Jamil 2017 L'identification d'un sujet est une tâche cruciale pour découvrir la connaissance à partir d'un document textuel.  ... 
arXiv:1903.03775v1 fatcat:c34harbnrjahtn6l6uv6q5bnie

Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach

Jian-Bo Yang, Qi Mao, Qiaoliang Xiang, Ivor Wai-Hung Tsang, Kian Ming Adam Chai, Hai Leong Chieu
2012 Conference on Empirical Methods in Natural Language Processing  
This method has three features: (1) it can optimize for any user-specified objective measure; (2) it can make document-specific prediction rather than rely on a fixed base model or a fixed set of base  ...  To the best of our knowledge, this work is the first to both (i) develop a domain adaptation algorithm for the coreference resolution problem and (ii) have the above features as an ensemble method.  ...  Then, we choose the base model that performs best on the documents in N (D(t) j ) as the method f (t) * j for document D (t) j .Firstly, we employ the parametric-distance (4) to measure the similarity  ... 
dblp:conf/emnlp/YangMXTCC12 fatcat:zqah4fwkzfgvrjogpuij7ccbxe

Using Text Mining and Clustering to Group Research Proposals for Research Project Selection

Yibo Wang, Wei Xu, Hongxun Jiang
2015 2015 48th Hawaii International Conference on System Sciences  
This paper applies an ensemble method to cluster research proposals to support research project selection.  ...  Several cluster algorithms are applied to group the proposals and the merits of each algorithm in text clustering are made full use of in the form of ensemble method.  ...  Gonzalez and Turno [32] proposed non-parametric ensemble methods for document clustering. Xu et al.  ... 
doi:10.1109/hicss.2015.153 dblp:conf/hicss/WangXJ15 fatcat:k4tkqar3jrfadn7ptzs77u2rjq

Distributed Non-Parametric Representations for Vital Filtering: UW at TREC KBA 2014

Ignacio Cano, Sameer Singh, Carlos Guestrin
2014 Text Retrieval Conference  
This approach of using word embeddings, non-parametric clustering, and staleness provides an efficient yet appropriate representation of entity contexts for the streaming setting, enabling accurate vital  ...  The word embeddings provide accurate and compact summaries of observed entity contexts, further described by topic clusters that are estimated in a non-parametric manner.  ...  We proposed a word embeddings based non-parametric representation of documents that groups entity references into topic clusters, and is suitable for streaming data.  ... 
dblp:conf/trec/CanoSG14 fatcat:nqob74sv55dyzgknx3izdtxcci

Enhancing web search result clustering model based on multiview multirepresentation consensus cluster ensemble (mmcc) approach

Ali Sabah, Sabrina Tiun, Nor Samsiah Sani, Masri Ayob, Adil Yaseen Taha, Gulistan Raja
2021 PLoS ONE  
ensemble method.  ...  Existing text clustering methods utilize only one representation at a time (single view), whereas multiple views can represent documents.  ...  [14] introduced the multiview clustering algorithmic method, where different ensemble methods are combined for a better effect.  ... 
doi:10.1371/journal.pone.0245264 pmid:33449949 fatcat:wkbwt3yxrzbxthhqc2ddhzmpoy

Cooking Is Creating Emotion: A Study on Hinglish Sentiments of Youtube Cookery Channels Using Semi-Supervised Approach

Gagandeep Kaur, Abhishek Kaushik, Shubham Sharma
2019 Big Data and Cognitive Computing  
We have modelled and evaluated both parametric and non-parametric learning algorithms.  ...  A majority of the population in India speak and write a mixture of two languages known as Hinglish for casual communication on social media.  ...  Different methods, like graph-based, wrapper-based, and topic-based methods, for labelling the data were compared in their work.  ... 
doi:10.3390/bdcc3030037 fatcat:zzx5snx3qze4rczaejyft5cngq

Apprentissage d'un espace de concepts de mots pour une nouvelle représentation des données textuelles

Young-Min Kim, Jean-François Pessiot, Massih-Reza Amini, Patrick Gallinari
2010 Document Numérique  
We then generalize this approach by extending the ÈÄË model for a simulataneous clustering of documents and terms.  ...  Nous montrons dans une dernière étape, la validité de notre approche en comparant le résultat de ce clustering avec ceux obtenus dans l'espace sac de mots initial et l'espace des groupes de mots induit  ...  La plupart des méthodes de clustering de documents reposent sur la représentation vectorielle sac de mots (Van Rijsbergen, 1979) .  ... 
doi:10.3166/dn.13.1.63-82 fatcat:ggtw33sj7fdvhfbbcwtpfa36qy

Stability of Topic Modeling via Matrix Factorization [article]

Mark Belford and Brian Mac Namee and Derek Greene
2017 arXiv   pre-print
To address this issue in the context of matrix factorization for topic modeling, we propose the use of ensemble learning strategies.  ...  Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents.  ...  We carried out a non-parametric Friedman's Aligned Rank test (García et al., 2010) for each of the five measures previously reported (ADSD, ATS, PNMI, NPMI, and NMI) to test for the presence of statistically  ... 
arXiv:1702.07186v2 fatcat:r5s3iwgb4nhctkzqjd545kcpom

Stability of topic modeling via matrix factorization

Mark Belford, Brian Mac Namee, Derek Greene
2018 Expert systems with applications  
To address this issue in the context of matrix factorization for topic modeling, we propose the use of ensemble learning strategies.  ...  This corresponds to the concept of "instability" which has previously been studied in the context of k-means clustering.  ...  We carried out a non-parametric Friedman's Aligned Rank test (García et al., 2010) for each of the five measures previously reported (ADSD, ATS, PNMI, NPMI, and NMI) to test for the presence of statistically  ... 
doi:10.1016/j.eswa.2017.08.047 fatcat:7l3qufvyzvbijogbmrioxhq63y

Outlier Detection using AI: A Survey [article]

Md Nazmul Kabir Sikder, Feras A. Batarseh
2021 arXiv   pre-print
Broad range of OD methods are categorized into six major categories: Statistical-based, Distance-based, Density-based, Clustering-based, Learning-based, and Ensemble methods.  ...  This survey aims to guide the reader to better understand recent progress of OD methods for the assurance of AI.  ...  Ensemble methods are mostly used in ML for their superior solutions compared to other traditional methods.  ... 
arXiv:2112.00588v1 fatcat:yonfnhohpnbxxgiwrxon74mcny

Modélisation de HMM en contexte avec des arbres de décision pour la reconnaissance de mots manuscrits

Anne-Laure Bianne-Bernard, Christopher Kermovant, Laurence Likformann-Sulem, Chafic Mokbel
2011 Document Numérique  
Nous effectuons un clustering sur chaque position d'état, basé sur des arbres de décision qui ont l'avantage, en phase de test, de pouvoir associer un modèle connu à un trigraphe non appris.  ...  Une telle modélisation augmente de manière considérable le nombre de paramètres à calculer, ce qui nous amène à considérer un partage des paramètres.  ...  Cela nous permet de comparer directement les deux approches, contextuelle et non contextuelle.  ... 
doi:10.3166/dn.14.2.29-52 fatcat:nerbukgbybeohj6qjvrvr3ticm

Big Data analytics. Three use cases with R, Python and Spark [article]

Philippe Besse, Jean-Michel Loubes
2016 arXiv   pre-print
This article offers an introduction for statisticians to these technologies by comparing the performance obtained by the direct use of three reference environments: R, Python Scikit-learn, Spark MLlib  ...  As main result, it appears that, if Spark is very efficient for data munging and recommendation by collaborative filtering (non-negative factorization), current implementations of conventional learning  ...  This article offers an introduction for statisticians to these technologies by comparing the performance obtained by the direct use of three reference environments : R, Python Scikit-learn, 1 Introduction  ... 
arXiv:1609.09619v1 fatcat:qwvrxxkjung7palqtephlu7xpm

A Novel Ensemble based Cluster Analysis using Similarity Matrices and Clustering Algorithm (SMCA)

Mayank Gupta, Dhanraj Varma
2014 International Journal of Computer Applications  
Ensemble uses the mechanism for criteria selection from newly formed clusters with a defined portioning and joining methods to generate a single result instead of multiple solutions.  ...  This paper propose a novel SMCA based ensemble clustering algorithm for improvements over the existing issues defined in the paper.  ...  association matrices for non parametric datasets.  ... 
doi:10.5120/17558-8171 fatcat:bxvkfob56jdjziu5kplllv64qy

Détection de signaux faibles dans des masses de données faiblement structurées

Julien Maitre, Michel Menard, Guillaume Chiron, Alain Bouju
2019 Recherche d'information document et web sémantique  
for document representations in a multi-dimensions space.  ...  We proposed 2 implementations of this idea, respectively able to : (1) finding the best k for LDA in terms of topic consistency ; (2) gathering the optimal clusters from different levels of clustering.  ...  LDA est une méthode de clustering (non supervisée) et n'associe pas un label aux clusters trouvés.  ... 
doi:10.21494/iste.op.2020.0463 fatcat:2aabrvdlhjc5fhlsjsep7cbndm
« Previous Showing results 1 — 15 out of 9,448 results