25,772 Hits in 6.9 sec

Avoiding Bias in Text Clustering Using Constrained K-means and May-Not-Links [chapter]

M. Eduardo Ares, Javier Parapar, Álvaro Barreiro
2009 Lecture Notes in Computer Science  
In this paper we present a new clustering algorithm which extends the traditional batch k-means enabling the introduction of domain knowledge in the form of Must, Cannot, May and May-Not rules between  ...  Besides, we have applied the presented method to the task of avoiding bias in clustering.  ...  Acknowledgements: This work was co-funded by FEDER, SEUI and Xunta de Galicia under projects TIN2008-06566-C04-04 and 07SIN005206PR and FPU grant AP2007-02476.  ... 
doi:10.1007/978-3-642-04417-5_32 fatcat:ly5wkjqxhva7pfdewo6cawdc4e

Improving Alternative Text Clustering Quality in the Avoiding Bias Task with Spectral and Flat Partition Algorithms [chapter]

M. Eduardo Ares, Javier Parapar, Álvaro Barreiro
2010 Lecture Notes in Computer Science  
The first approach tries to introduce these constraints in the core of the constrained normalised cut clustering, while the second one combines spectral clustering and soft constrained k-means.  ...  The problems of finding alternative clusterings and avoiding bias have gained popularity over the last years.  ...  This work was co-funded by FEDER, Ministerio de Ciencia e Innovación, Xunta de Galicia and Ministerio de Educación under projects TIN2008-06566-C04-04 and 07SIN005206PR and FPU grant AP2007-02476.  ... 
doi:10.1007/978-3-642-15251-1_32 fatcat:nxm7mnrcz5bl3inmvz443umdta

Clustering Genes Using Heterogeneous Data Sources

Erliang Zeng, Chengyong Yang, Tao Li, Giri Narasimhan
2010 International Journal of Knowledge Discovery in Bioinformatics  
Data sources may be complete or incomplete depending on whether or not they provide information about every gene in the genome.  ...  of constraints sets and robustness of the constrained clustering algorithm using multiple sources of incomplete biological data, and incorporating such incomplete data into constrained clustering algorithm  ...  In this paper, the CMI was used to compare the performance of the four clustering methods: K-means clustering of expression data, K-means clustering of text data, K-means clustering of the feature-level  ... 
doi:10.4018/jkdb.2010040102 fatcat:i65e5huzurcord6yaojw44jknu

Language Modelling of Constraints for Text Clustering [chapter]

Javier Parapar, Álvaro Barreiro
2012 Lecture Notes in Computer Science  
Constrained clustering is a recently presented family of semisupervised learning algorithms. These methods use domain information to impose constraints over the clustering output.  ...  by means of their language modelling.  ...  honour at the end of the clustering process (May-Link and May-Not-Link for positive and negative constraints respectively).  ... 
doi:10.1007/978-3-642-28997-2_30 fatcat:nh3xkas2gfe6dlgdpgql4jzgfa

New Survey Questions and Estimators for Network Clustering with Respondent-driven Sampling Data

Ashton M. Verdery, Jacob C. Fisher, Nalyn Siripong, Kahina Abdesselam, Shawn Bauldry
2017 Sociological methodology  
Drawing on recent advances in computer science, we introduce a set of data collection instruments and RDS estimators for network clustering, an important topological property that has been linked to a  ...  We find that clustering coefficient estimators retain desirable properties in RDS samples.  ...  Giovanna Merli, Ann Jolly, and Anne DeLessio-Parson for providing information about aspects of the empirical cases we examine.  ... 
doi:10.1177/0081175017716489 pmid:30337767 pmcid:PMC6191199 fatcat:stdi6ggaineh3noykkqb72hyd4

Multimodal ranking for image search on community databases

Fabian Richter, Stefan Romberg, Eva Hörster, Rainer Lienhart
2010 Proceedings of the international conference on Multimedia information retrieval - MIR '10  
The image ranking approach presented in this work represents an image collection as a graph that is built using a multimodal similarity measure based on visual features and user tags.  ...  Further we discuss several scalability issues of the proposed approach and show how in this framework queries can be answered fast.  ...  As can be seen from Figure 6 the best results have been obtained by using k = 100 and k = 250 neighbors. Using more neighbors may introduce noise that degrades the quality of the link structure.  ... 
doi:10.1145/1743384.1743402 dblp:conf/mir/RichterRHL10 fatcat:ylplqntbq5go7nm25wrcfvi35i

Clustering with Soft and Group Constraints [chapter]

Martin H. C. Law, Alexander Topchy, Anil K. Jain
2004 Lecture Notes in Computer Science  
We develop a new clustering algorithm that extends mixture clustering in the presence of (i) soft constraints, and (ii) grouplevel constraints.  ...  Empirical study demonstrates that the use of soft constraints results in superior data partitions normally unattainable without constraints.  ...  A constrained k-means algorithm is proposed in [4] : must-link data points are replaced by their centroid, and a data point is assigned to the closest cluster center that does not violate any constraints  ... 
doi:10.1007/978-3-540-27868-9_72 fatcat:g6acmcfsbnaspdtjtmkzglzwpu

NeSyChair: Automatic Conference Scheduling Combining Neuro-Symbolic Representations and Constrained Clustering

Tadej Skvorc, Nada Lavrac, Marko Robnik-Sikonja
2022 IEEE Access  
in European News Media).  ...  ACKNOWLEDGMENT We are grateful to general and program chairs of the ECML PKDD 2017 conference for giving us access to the accepted papers and metadata of the conference.  ...  We assigned papers to the conference schedule using the modified, constrained k-means clustering algorithm that takes into account the size and the number of clusters.  ... 
doi:10.1109/access.2022.3144932 fatcat:rkn7nut5brbz5e6vsgtlftjhx4

Clustering with Balancing Constraints [chapter]

Joydeep Ghosh, Ayhan Demiriz
2008 Constrained Clustering  
Next, we discuss how frequency sensitive competitive learning can be used for balanced clustering in both batch and on-line scenarios, and illustrate the mechanism with a case study of clustering directional  ...  data such as text documents.  ...  . 30Constrained Clustering: Advances in Algorithms, Theory and Applications  ... 
doi:10.1201/9781584889977.ch8 fatcat:kj5gtm37ebbmtcvk3zw2dw2bde

Efficient community detection in large networks using content and links

Yiye Ruan, David Fuhry, Srinivasan Parthasarathy
2013 Proceedings of the 22nd international conference on World Wide Web - WWW '13  
The resulting backbone graph can be clustered using standard community discovery algorithms such as Metis and Markov clustering.  ...  In this paper we discuss a very simple approach of combining content and link information in graph structures for the purpose of community discovery, a fundamental task in network analysis.  ...  Finally, we also adapt LDA and K-means 6 algorithm to cluster graph nodes using content information only.  ... 
doi:10.1145/2488388.2488483 dblp:conf/www/RuanFP13 fatcat:xjlszioaxfai5h4h6a7hkopc6e

Efficient Community Detection in Large Networks using Content and Links [article]

Yiye Ruan and David Fuhry and Srinivasan Parthasarathy
2012 arXiv   pre-print
The resulting backbone graph can be clustered using standard community discovery algorithms such as Metis and Markov clustering.  ...  In this paper we discuss a very simple approach of combining content and link information in graph structures for the purpose of community discovery, a fundamental task in network analysis.  ...  Acknowledgements This work is sponsored by NSF SoCS Award #IIS-1111118, "Social Media Enhanced Organizational Sensemaking in Emergency Response".  ... 
arXiv:1212.0146v1 fatcat:53e7o3s7lzd4vanjikvrvlfryu

Non-redundant data clustering

David Gondek, Thomas Hofmann
2006 Knowledge and Information Systems  
We present experimental results for applications in text mining and computer vision.  ...  Algorithmically, one can apply an alternating optimization scheme that can be used in conjunction with different types of numeric and non-numeric attributes.  ...  Using the above news story example, one may want to condition on the occurrence of certain geographic terms such as country and city names to introduce a bias that favors document clusters that are not  ... 
doi:10.1007/s10115-006-0009-7 fatcat:xn56x5qvtrfklefvvq4pmetdt4

Graph-Based Clustering with Constraints [chapter]

Rajul Anand, Chandan K. Reddy
2011 Lecture Notes in Computer Science  
The proposed approach and its variants are evaluated on UCI datasets and compared with the other constrained-clustering algorithms which embed constraints in a similar fashion.  ...  In this paper, we propose a constrained graph-based clustering method and argue that adding constraints in distance function before graph partitioning will lead to better results.  ...  In this work, the primary emphasis is to demonstrate that adding constraints to graphbased clustering can potentially avoid this problem at least sometimes, if not always.  ... 
doi:10.1007/978-3-642-20847-8_5 fatcat:jsigv3iuzfcbzcnmqmzqkp2cca

Semi-supervised Collaborative Clustering with Partial Background Knowledge

Germain Forestier, Cédric Wemmert, Pierre Gançarski
2008 2008 IEEE International Conference on Data Mining Workshops  
In this paper we present a new algorithm for semisupervised clustering. We assume to have a small set of labeled samples and we use it in a clustering algorithm to discover relevant patterns.  ...  Indeed, in complex problems, the user is not always able to produce samples for each class present in the dataset.  ...  In [10] , Wagstaff et al. present a constrained version of the k-means algorithm which uses such constraints to bias the affectation of the objects to the clusters.  ... 
doi:10.1109/icdmw.2008.116 dblp:conf/icdm/ForestierWG08 fatcat:tjq5rxvvkvbp7lu4app4rn6kxe

The clustering of the first galaxy haloes

Darren S. Reed, Richard Bower, Carlos S. Frenk, Adrian Jenkins, Tom Theuns
2009 Monthly notices of the Royal Astronomical Society  
This implies "non-universality" in the scale-dependence of halo clustering, at least for the commonly used parameterizations of the scale-dependence of bias that we consider.  ...  We provide a fit for the scale-dependence of bias in our results.  ...  AC K N OW L E D G M E N T S DR is a post-doc LANL, and is supported by the DOE through the IGPP, the LDRD-DR and the LDRD-ER programs at LANL.  ... 
doi:10.1111/j.1365-2966.2008.14333.x fatcat:42rriaal4feifliyyii46b4i7y
« Previous Showing results 1 — 15 out of 25,772 results