A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2020; you can also visit the original URL.
The file type is application/pdf
.
Filters
Non-Parametric Document Clustering by Ensemble Methods
2008
Revista de Procesamiento de Lenguaje Natural (SEPLN)
Este artículo presenta una comparación de estrategias para clustering no paramétrico de documentos por consenso. ...
Resumen: Los sesgos de los algoritmos individuales para clustering no paramétrico de documentos pueden conducir a soluciones no óptimas. ...
Acknowledgments This work has been partially funded by the European CHIL Project (IP-506909); the Commissionate for Universities and Research of the Department of Innovation, Universities and Enterprises ...
dblp:journals/pdln/GonzalezT08
fatcat:s5aot64axjeclfusuaqrlspjci
A New Approach for Topic Detection using Adaptive Neural Networks
[article]
2019
arXiv
pre-print
In the first step we used the FuzzyART algorithm for the training phase. In the second step we used a classifier using Paragraph Vector for the test phase. ...
The comparative study of our approach on the 20 Newsgroups dataset showed that our approach is able to detect almost relevant topics. ...
Topic Identification Method For Textual Document : Jamil 2017 L'identification d'un sujet est une tâche cruciale pour découvrir la connaissance à partir d'un document textuel. ...
arXiv:1903.03775v1
fatcat:c34harbnrjahtn6l6uv6q5bnie
Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach
2012
Conference on Empirical Methods in Natural Language Processing
This method has three features: (1) it can optimize for any user-specified objective measure; (2) it can make document-specific prediction rather than rely on a fixed base model or a fixed set of base ...
To the best of our knowledge, this work is the first to both (i) develop a domain adaptation algorithm for the coreference resolution problem and (ii) have the above features as an ensemble method. ...
Then, we choose the base model that performs best on the documents in N (D(t) j ) as the method f (t) * j for document D (t) j .Firstly, we employ the parametric-distance (4) to measure the similarity ...
dblp:conf/emnlp/YangMXTCC12
fatcat:zqah4fwkzfgvrjogpuij7ccbxe
Using Text Mining and Clustering to Group Research Proposals for Research Project Selection
2015
2015 48th Hawaii International Conference on System Sciences
This paper applies an ensemble method to cluster research proposals to support research project selection. ...
Several cluster algorithms are applied to group the proposals and the merits of each algorithm in text clustering are made full use of in the form of ensemble method. ...
Gonzalez and Turno [32] proposed non-parametric ensemble methods for document clustering. Xu et al. ...
doi:10.1109/hicss.2015.153
dblp:conf/hicss/WangXJ15
fatcat:k4tkqar3jrfadn7ptzs77u2rjq
Distributed Non-Parametric Representations for Vital Filtering: UW at TREC KBA 2014
2014
Text Retrieval Conference
This approach of using word embeddings, non-parametric clustering, and staleness provides an efficient yet appropriate representation of entity contexts for the streaming setting, enabling accurate vital ...
The word embeddings provide accurate and compact summaries of observed entity contexts, further described by topic clusters that are estimated in a non-parametric manner. ...
We proposed a word embeddings based non-parametric representation of documents that groups entity references into topic clusters, and is suitable for streaming data. ...
dblp:conf/trec/CanoSG14
fatcat:nqob74sv55dyzgknx3izdtxcci
Enhancing web search result clustering model based on multiview multirepresentation consensus cluster ensemble (mmcc) approach
2021
PLoS ONE
ensemble method. ...
Existing text clustering methods utilize only one representation at a time (single view), whereas multiple views can represent documents. ...
[14] introduced the multiview clustering algorithmic method, where different ensemble methods are combined for a better effect. ...
doi:10.1371/journal.pone.0245264
pmid:33449949
fatcat:wkbwt3yxrzbxthhqc2ddhzmpoy
Cooking Is Creating Emotion: A Study on Hinglish Sentiments of Youtube Cookery Channels Using Semi-Supervised Approach
2019
Big Data and Cognitive Computing
We have modelled and evaluated both parametric and non-parametric learning algorithms. ...
A majority of the population in India speak and write a mixture of two languages known as Hinglish for casual communication on social media. ...
Different methods, like graph-based, wrapper-based, and topic-based methods, for labelling the data were compared in their work. ...
doi:10.3390/bdcc3030037
fatcat:zzx5snx3qze4rczaejyft5cngq
Apprentissage d'un espace de concepts de mots pour une nouvelle représentation des données textuelles
2010
Document Numérique
We then generalize this approach by extending the ÈÄË model for a simulataneous clustering of documents and terms. ...
Nous montrons dans une dernière étape, la validité de notre approche en comparant le résultat de ce clustering avec ceux obtenus dans l'espace sac de mots initial et l'espace des groupes de mots induit ...
La plupart des méthodes de clustering de documents reposent sur la représentation vectorielle sac de mots (Van Rijsbergen, 1979) . ...
doi:10.3166/dn.13.1.63-82
fatcat:ggtw33sj7fdvhfbbcwtpfa36qy
Stability of Topic Modeling via Matrix Factorization
[article]
2017
arXiv
pre-print
To address this issue in the context of matrix factorization for topic modeling, we propose the use of ensemble learning strategies. ...
Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents. ...
We carried out a non-parametric Friedman's Aligned Rank test (García et al., 2010) for each of the five measures previously reported (ADSD, ATS, PNMI, NPMI, and NMI) to test for the presence of statistically ...
arXiv:1702.07186v2
fatcat:r5s3iwgb4nhctkzqjd545kcpom
Stability of topic modeling via matrix factorization
2018
Expert systems with applications
To address this issue in the context of matrix factorization for topic modeling, we propose the use of ensemble learning strategies. ...
This corresponds to the concept of "instability" which has previously been studied in the context of k-means clustering. ...
We carried out a non-parametric Friedman's Aligned Rank test (García et al., 2010) for each of the five measures previously reported (ADSD, ATS, PNMI, NPMI, and NMI) to test for the presence of statistically ...
doi:10.1016/j.eswa.2017.08.047
fatcat:7l3qufvyzvbijogbmrioxhq63y
Outlier Detection using AI: A Survey
[article]
2021
arXiv
pre-print
Broad range of OD methods are categorized into six major categories: Statistical-based, Distance-based, Density-based, Clustering-based, Learning-based, and Ensemble methods. ...
This survey aims to guide the reader to better understand recent progress of OD methods for the assurance of AI. ...
Ensemble methods are mostly used in ML for their superior solutions compared to other traditional methods. ...
arXiv:2112.00588v1
fatcat:yonfnhohpnbxxgiwrxon74mcny
Modélisation de HMM en contexte avec des arbres de décision pour la reconnaissance de mots manuscrits
2011
Document Numérique
Nous effectuons un clustering sur chaque position d'état, basé sur des arbres de décision qui ont l'avantage, en phase de test, de pouvoir associer un modèle connu à un trigraphe non appris. ...
Une telle modélisation augmente de manière considérable le nombre de paramètres à calculer, ce qui nous amène à considérer un partage des paramètres. ...
Cela nous permet de comparer directement les deux approches, contextuelle et non contextuelle. ...
doi:10.3166/dn.14.2.29-52
fatcat:nerbukgbybeohj6qjvrvr3ticm
Big Data analytics. Three use cases with R, Python and Spark
[article]
2016
arXiv
pre-print
This article offers an introduction for statisticians to these technologies by comparing the performance obtained by the direct use of three reference environments: R, Python Scikit-learn, Spark MLlib ...
As main result, it appears that, if Spark is very efficient for data munging and recommendation by collaborative filtering (non-negative factorization), current implementations of conventional learning ...
This article offers an introduction for statisticians to these technologies by comparing the performance obtained by the direct use of three reference environments : R, Python Scikit-learn, 1 Introduction ...
arXiv:1609.09619v1
fatcat:qwvrxxkjung7palqtephlu7xpm
A Novel Ensemble based Cluster Analysis using Similarity Matrices and Clustering Algorithm (SMCA)
2014
International Journal of Computer Applications
Ensemble uses the mechanism for criteria selection from newly formed clusters with a defined portioning and joining methods to generate a single result instead of multiple solutions. ...
This paper propose a novel SMCA based ensemble clustering algorithm for improvements over the existing issues defined in the paper. ...
association matrices for non parametric datasets. ...
doi:10.5120/17558-8171
fatcat:bxvkfob56jdjziu5kplllv64qy
Détection de signaux faibles dans des masses de données faiblement structurées
2019
Recherche d'information document et web sémantique
for document representations in a multi-dimensions space. ...
We proposed 2 implementations of this idea, respectively able to : (1) finding the best k for LDA in terms of topic consistency ; (2) gathering the optimal clusters from different levels of clustering. ...
LDA est une méthode de clustering (non supervisée) et n'associe pas un label aux clusters trouvés. ...
doi:10.21494/iste.op.2020.0463
fatcat:2aabrvdlhjc5fhlsjsep7cbndm
« Previous
Showing results 1 — 15 out of 9,448 results