9,544 Hits in 5.3 sec

Non-Parametric Document Clustering by Ensemble Methods

Edgar González, Jordi Turmo
2008 Revista de Procesamiento de Lenguaje Natural (SEPLN)  
Este artículo presenta una comparación de estrategias para clustering no paramétrico de documentos por consenso.  ...  Resumen: Los sesgos de los algoritmos individuales para clustering no paramétrico de documentos pueden conducir a soluciones no óptimas.  ...  Acknowledgments This work has been partially funded by the European CHIL Project (IP-506909); the Commissionate for Universities and Research of the Department of Innovation, Universities and Enterprises  ... 
dblp:journals/pdln/GonzalezT08 fatcat:s5aot64axjeclfusuaqrlspjci

Domain Adaptation for Coreference Resolution: An Adaptive Ensemble Approach

Jian-Bo Yang, Qi Mao, Qiaoliang Xiang, Ivor Wai-Hung Tsang, Kian Ming Adam Chai, Hai Leong Chieu
2012 Conference on Empirical Methods in Natural Language Processing  
We propose an adaptive ensemble method to adapt coreference resolution across domains.  ...  This method has three features: (1) it can optimize for any user-specified objective measure; (2) it can make document-specific prediction rather than rely on a fixed base model or a fixed set of base  ...  Acknowledgments This work is supported by DSO grant DSOCL10021.  ... 
dblp:conf/emnlp/YangMXTCC12 fatcat:zqah4fwkzfgvrjogpuij7ccbxe

Approche hybride de segmentation de pages à base d'un descripteur de traits

Medhi Fehli, Salvatore Tabbone, Maria V Ortiz Segovia
2014 Document Numérique  
We evaluate the performances of our approach by comparing it to the existing methods that participated in ICDAR page segmentation competition.  ...  Cette méthode est appliquée pour la segmentation des images réelles des documents numérisés (journaux et magazines) qui contiennent du texte, lignes et des régions de photos.  ...  Cette méthode nécessite d'introduire le nombre a priori de classes (clusters) en tant que paramètre ce qui cause le principal inconvénient de cette méthode.  ... 
doi:10.3166/dn.17.2.9-30 fatcat:rq3s2vprsnf7lktcgtwr4zdeya

A Performance Evaluation of SMCA Using Similarity Association & Proximity Coefficient Relation For Hierarchical Clustering

Mayank Gupta, Ritesh Jain
2014 International Journal of Engineering Trends and Technoloy  
Cluster analysis is applied to the data set and the resulting clusters are characterized by the features of the patterns that belong to these clusters.  ...  Clustering applications are used extensively in various fields such as artificial intelligence, pattern recognition, economics, ecology, psychiatry and marketing.There are several algorithms and methods  ...  Some of the approaches also lead towards non parametric data arrangements like in [16] . This paper presents a comparison of strategies for non-parametric document ensemble clustering [12] .  ... 

Distributed Non-Parametric Representations for Vital Filtering: UW at TREC KBA 2014

Ignacio Cano, Sameer Singh, Carlos Guestrin
2014 Text Retrieval Conference  
The word embeddings provide accurate and compact summaries of observed entity contexts, further described by topic clusters that are estimated in a non-parametric manner.  ...  This approach of using word embeddings, non-parametric clustering, and staleness provides an efficient yet appropriate representation of entity contexts for the streaming setting, enabling accurate vital  ...  , a Semiconductor Research Corporation program sponsored by MARCO and DARPA.  ... 
dblp:conf/trec/CanoSG14 fatcat:nqob74sv55dyzgknx3izdtxcci

Using Text Mining and Clustering to Group Research Proposals for Research Project Selection

Yibo Wang, Wei Xu, Hongxun Jiang
2015 2015 48th Hawaii International Conference on System Sciences  
This paper applies an ensemble method to cluster research proposals to support research project selection.  ...  Several cluster algorithms are applied to group the proposals and the merits of each algorithm in text clustering are made full use of in the form of ensemble method.  ...  Acknowledgments This work was supported in part by 973 Project  ... 
doi:10.1109/hicss.2015.153 dblp:conf/hicss/WangXJ15 fatcat:k4tkqar3jrfadn7ptzs77u2rjq

Apprentissage d'un espace de concepts de mots pour une nouvelle représentation des données textuelles

Young-Min Kim, Jean-François Pessiot, Massih-Reza Amini, Patrick Gallinari
2010 Document Numérique  
We then generalize this approach by extending the ÈÄË model for a simulataneous clustering of documents and terms.  ...  We hence find term clusters using a classifiant version of the Å algorithm ( Å) and documents are then represented in the space of these term clusters.  ...  La plupart des méthodes de clustering de documents reposent sur la représentation vectorielle sac de mots (Van Rijsbergen, 1979) .  ... 
doi:10.3166/dn.13.1.63-82 fatcat:ggtw33sj7fdvhfbbcwtpfa36qy

Cooking Is Creating Emotion: A Study on Hinglish Sentiments of Youtube Cookery Channels Using Semi-Supervised Approach

Gagandeep Kaur, Abhishek Kaushik, Shubham Sharma
2019 Big Data and Cognitive Computing  
We have modelled and evaluated both parametric and non-parametric learning algorithms.  ...  By analyzing those comments we could provide insight to the Youtubers that would help them to deliver better quality. Youtube is very popular in India.  ...  Density-based spatial clustering of applications with noise (DBSCAN) is a well-known density-based non-parametric clustering algorithm used in machine learning and data mining.  ... 
doi:10.3390/bdcc3030037 fatcat:zzx5snx3qze4rczaejyft5cngq

Détection de signaux faibles dans des masses de données faiblement structurées

Julien Maitre, Michel Menard, Guillaume Chiron, Alain Bouju
2019 Recherche d'information document et web sémantique  
We also proposed a non-traditional visualization approach based on a multi-agents system which combines both dimension reduction and interactivity.  ...  This paper is related to a project aiming at discovering weak signals from different streams of information, possibly sent by whistleblowers in a platform as GlobalLeaks.  ...  LDA est une méthode de clustering (non supervisée) et n'associe pas un label aux clusters trouvés.  ... 
doi:10.21494/iste.op.2020.0463 fatcat:2aabrvdlhjc5fhlsjsep7cbndm

Enhancing web search result clustering model based on multiview multirepresentation consensus cluster ensemble (mmcc) approach

Ali Sabah, Sabrina Tiun, Nor Samsiah Sani, Masri Ayob, Adil Yaseen Taha, Gulistan Raja
2021 PLoS ONE  
ensemble method.  ...  The overlapping clusters are obtained from the candidate solutions created by different clustering methods.  ...  In addition, cluster ensemble results are remarkably affected by the level of document representation [36] .  ... 
doi:10.1371/journal.pone.0245264 pmid:33449949 fatcat:wkbwt3yxrzbxthhqc2ddhzmpoy

Stability of Topic Modeling via Matrix Factorization [article]

Mark Belford and Brian Mac Namee and Derek Greene
2017 arXiv   pre-print
Topic models can provide us with an insight into the underlying latent structure of a large corpus of documents.  ...  This corresponds to the concept of "instability" which has previously been studied in the context of k-means clustering.  ...  This research was supported by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.  ... 
arXiv:1702.07186v2 fatcat:r5s3iwgb4nhctkzqjd545kcpom

Text Classification Techniques: A Literature Review

2018 Interdisciplinary Journal of Information, Knowledge, and Management  
Future Research: In the future, better methods for parameter optimization will be identified by selecting better parameters that reflects effective knowledge discovery.  ...  approaches for knowledge distillation, multilingual text refining, domain knowledge integration, subjectivity detection, and contrastive viewpoint summarization are some of the areas that could be explored by  ...  Non parametric models The model that could not summarize data based on underlying parameters is called a non-parametric model.  ... 
doi:10.28945/4066 fatcat:6dio5bpajjf77lkrs7xdtciveu

Stability of topic modeling via matrix factorization

Mark Belford, Brian Mac Namee, Derek Greene
2018 Expert systems with applications  
This corresponds to the concept of "instability" which has previously been studied in the context of k-means clustering.  ...  A range of methods have been proposed in the literature, including probabilistic topic models and techniques based on matrix factorization.  ...  This research was supported by Science Foundation Ireland (SFI) under Grant Number SFI/12/RC/2289.  ... 
doi:10.1016/j.eswa.2017.08.047 fatcat:7l3qufvyzvbijogbmrioxhq63y

A Novel Ensemble based Cluster Analysis using Similarity Matrices and Clustering Algorithm (SMCA)

Mayank Gupta, Dhanraj Varma
2014 International Journal of Computer Applications  
Ensemble uses the mechanism for criteria selection from newly formed clusters with a defined portioning and joining methods to generate a single result instead of multiple solutions.  ...  Over the last few years various schemes are suggested by different authors for improving the performance of tradition clustering algorithms. Among them, one is ensemble based clustering.  ...  association matrices for non parametric datasets.  ... 
doi:10.5120/17558-8171 fatcat:bxvkfob56jdjziu5kplllv64qy

Proposition pour l'intégration des réseaux petits mondes en recherche d'information

Mohamed Khazri, Mohamed Tmar, Mohamed Abid, Mohand Boughanem
2009 ARIMA  
International audience We propose in this paper an approach for document clustering. It consists of representing the corpus as a document graph, where the links are defined by some criteria.  ...  These links are quantified by simialrity measures. We aim join this context into the approach of classification to constitute small-worlds networks of homogeneous documents.  ...  Il considère que pour retrouver plus de documents pertinents et écarter plus de documents non pertinents, il suffit de retrouver la fonction qui sépare les deux ensembles.  ... 
doi:10.46298/arima.1925 fatcat:z5ctzznsuzembmh6gt3vcybzjq
« Previous Showing results 1 — 15 out of 9,544 results