Filters








62,509 Hits in 9.1 sec

Data mining in metric space

Rich Caruana, Alexandru Niculescu-Mizil
2004 Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04  
The three metrics that are appropriate when predictions are interpreted as probabilities: squared error, cross entropy, and calibration, lay in one part of metric space far away from metrics that depend  ...  In between them fall two metrics that depend on comparing predictions to a threshold: accuracy and F-score.  ...  The three ordering metrics, AUC, APR, and BEP, cluster close in metric space and exhibit strong pairwise correlations. These metrics clearly are similar to each other and somewhat interchangeable.  ... 
doi:10.1145/1014052.1014063 dblp:conf/kdd/CaruanaN04 fatcat:yugijt4mmbfadhk23rnx4uryyu

Mining distance-based outliers from large databases in any metric space

Yufei Tao, Xiaokui Xiao, Shuigeng Zhou
2006 Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06  
An object o ∈ R is an outlier, if there exist less than k objects in R whose distances to o are at most r. The values of k, r, and the distance metric are provided by a user at the run time.  ...  The upper bound turns out to be extremely low in practice, e.g., less than 1% of R.  ...  We would like to thank the anonymous reviewers for their insightful comments.  ... 
doi:10.1145/1150402.1150447 dblp:conf/kdd/TaoXZ06 fatcat:cp4ihcwxwvcttllqirp4ggncu4

Scalable all-pairs similarity search in metric spaces

Ye Wang, Ahmed Metwally, Srinivasan Parthasarathy
2013 Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '13  
In this article, we propose a parallel framework for solving this problem in metric spaces.  ...  Novel elements of our solution include: i) flexible support for multiple metrics of interest; ii) an autonomic approach to partition the input dataset with minimal redundancy to achieve good load-balance  ...  In metric space, one way to form worksets is to let Ii, the inner set of workset Wi, be Pi. Oi, the outer set of Wi, is given by {pj | dist(pj, ci) ≤ (ri + t) ∧ pj / ∈ Pi}.  ... 
doi:10.1145/2487575.2487625 dblp:conf/kdd/WangMP13 fatcat:5vtxulyq2rhf7joiossq46beu4

Fast Best-Match Shape Searching in Rotation Invariant Metric Spaces [chapter]

Dragomir Yankov, Eamonn Keogh, Li Wei, Xiaopeng Xi, Wendy Hodges
2007 Proceedings of the 2007 SIAM International Conference on Data Mining  
The algorithm can be utilized in a number of important data mining tasks such as shape clustering and classification, or for discovering of motifs and discords in image collections.  ...  In this work we explore the metric properties of the rotation invariant distance measures and propose an algorithm for fast similarity search in the shape space.  ...  Conclusions In this work we demonstrated that, under certain conditions, rotation invariant distance measures define a metric over the shape space, which implies that searching in this space could be highly  ... 
doi:10.1137/1.9781611972771.70 dblp:conf/sdm/YankovKWXH07 fatcat:ozhsroeorngy5meskmiqc2cewq

Cluster Ranking with an Application to Mining Mailbox Networks

Ziv Bar-Yossef, Ido Guy, Ronny Lempel, Yoelle Maarek, Vladimir Soroka
2006 IEEE International Conference on Data Mining. Proceedings  
To this end, we introduce a novel strength measure for clusters-the integrated cohesion-which is applicable to arbitrary weighted networks. We then present C-Rank: a new cluster ranking algorithm.  ...  C-Rank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths.  ...  Nevertheless, most of the classical work in the area (e.g., Fuzzy c-means) assumes the data points lie in a metric space, which respects the triangle inequality.  ... 
doi:10.1109/icdm.2006.35 dblp:conf/icdm/Bar-YossefGLMS06 fatcat:7kddbqdej5f2fhumprdoi6ojze

Cluster ranking with an application to mining mailbox networks

Ziv Bar-Yossef, Ido Guy, Ronny Lempel, Yoëlle S. Maarek, Vladimir Soroka
2007 Knowledge and Information Systems  
To this end, we introduce a novel strength measure for clusters-the integrated cohesion-which is applicable to arbitrary weighted networks. We then present C-Rank: a new cluster ranking algorithm.  ...  C-Rank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths.  ...  Nevertheless, most of the classical work in the area (e.g., Fuzzy c-means) assumes the data points lie in a metric space, which respects the triangle inequality.  ... 
doi:10.1007/s10115-007-0096-0 fatcat:jmpazzk2qven5nnuapkxnzf2fe

Data Integration via Constrained Clustering: An Application to Enzyme Clustering [chapter]

Elisa Boari de Lima, Raquel Cardoso de Melo Minardi, Wagner Meira, Mohammed Javeed Zaki
2011 Proceedings of the 2011 SIAM International Conference on Data Mining  
In this paper we propose constrained clustering as a strategy for integrating data sources without losing any information.  ...  We use constrained clustering as a means of integrating information from diverse sources as constraints, and analyze how this additional information impacts clustering quality in an enzyme clustering application  ...  Related Work Clustering is a data mining technique that groups similar objects without any supervised information.  ... 
doi:10.1137/1.9781611972818.8 dblp:conf/sdm/LimaMMZ11 fatcat:2edz6gbpwzdvpie7wdwu7jwje4

Text clustering as a mining task [chapter]

F. Mandreoli, R. Martoglia, P. Tiberio
2005 Text Mining and its Applications to Intelligence, CRM and Knowledge Management  
In this chapter we introduce readers to the various aspects of cluster analysis performed on textual data in a mining framework.  ...  Then, we focus on the importance and on the goals of clustering in a text mining scenario, analyzing and describing the issues which are specific to this particular field.  ...  data mining [4] .  ... 
doi:10.2495/978-1-85312-995-7/03 fatcat:xi7aia5zhjdrxlwsihxoxi3evm

Robust Distance-Based Clustering with Applications to Spatial Data Mining

V. Estivill-Castro, M. E. Houle
2001 Algorithmica  
In this paper, we present a method for clustering geo-referenced data suitable for applications in spatial data mining, based on the medoid method.  ...  The medoid method is related to k-Means, with the restriction that cluster representatives be chosen from among the data elements.  ...  Combinatorial reclassification has been favored in Data Mining applications [46, 47] .  ... 
doi:10.1007/s00453-001-0010-1 fatcat:zjjbzmlu2berth5tts5u6kc2oq

An efficient approach to external cluster assessment with an application to martian topography

R. Vilalta, T. Stepinski, M. Achari
2007 Data mining and knowledge discovery  
In the context of unsupervised learning or clustering, such tools delve inside large databases looking for alternative classification schemes that are meaningful and novel.  ...  The inherently large computational cost of this step is alleviated by first projecting all data over the single dimension that best separates both distributions (using Fisher's Linear Discriminant).  ...  Acknowledgements Thanks to the Lunar and Planetary Institute, which is operated by USRA under contract CAN-NCC5-679 with NASA, for facilitating data on Martian landscapes.  ... 
doi:10.1007/s10618-006-0045-7 fatcat:adyelvuienhzheftu4aqsgmv24

Attribute Level Clustering Approach to Quantitative Association Rule Mining

M. PhaniKrishnaKishore, Ashok Kumar Madamsetti
2014 International Journal of Computer Applications  
In this paper a new data driven partitioning algorithm has been proposed to discretize the ranges of the attributes.  ...  Discretization of the ranges of the attributes has been one of the challenging tasks in quantitative association rule mining that guides the rules generated.  ...  Let D be a data For each attribute with numerical values by using the clustering method as specified in section I, clustering is performed on single dimensional data with the distance metric as absolute  ... 
doi:10.5120/16598-6404 fatcat:ftsbtd3annc7tf75ghtkjtlnt4

List of twin clusters: a data structure for similarity joins in metric spaces

Rodrigo Paredes, Nora Reyes
2008 2008 IEEE 24th International Conference on Data Engineering Workshop  
Introduction Some applications: data mining, data cleaning, and data integration. This version of similarity join translates into solving several range queries.  ...  Similarity Joins A A B B r Range queries with threshold r for all element in A List of Clusters (LC) The LC splits the space into zones.  ...  LTC index stands out as a practical and efficient data structure to solve a particular case of similarity join. Work in Progress The similarity self join.  ... 
doi:10.1109/icdew.2008.4498353 dblp:conf/icde/ParedesR08 fatcat:bfg5pdm7fffgfanhahj5xjoqtq

Near Neighbor Search in Large Metric Spaces

Sergey Brin
1995 Very Large Data Bases Conference  
In experiments, we find that GNAT's outperform previous data structures in a number of applications.  ...  Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images.  ...  Frank Olken, Luis Gravano, Edouard Bugnion, and Sandeep Singhal for helpful discussions and for listening to my endless ramblings.  ... 
dblp:conf/vldb/Brin95 fatcat:7ponswfxcja3joojvmqwe6gkjy

An Index Data Structure for Searching in Metric Space Databases [chapter]

Roberto Uribe, Gonzalo Navarro, Ricardo J. Barrientos, Mauricio Marín
2006 Lecture Notes in Computer Science  
This paper presents the Evolutionary Geometric Near-neighbor Access Tree (EGNAT) which is a new data structure devised for searching in metric space databases.  ...  All this indicates that the EGNAT is suitable for conducting similarity searches on very large metric space databases.  ...  Applications can be found in voice and image recognition, and data mining problems. Similarity can be modeled as a metric space as stated by the following definitions. Metric Space.  ... 
doi:10.1007/11758501_82 fatcat:3csp7gzw7bdrvc6nbz2gv4f7qm

A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings

Eric Bae, James Bailey, Guozhu Dong
2010 Data mining and knowledge discovery  
In particular, it adopts a 'data mining style' philosophy to clustering comparison, whereby two clusterings are considered to be more similar, if they are likely to give rise to similar types of prediction  ...  Data clustering is a fundamental and very popular method of data analysis.  ...  This has important applications in situations where the data is evolving, such as stream clustering.  ... 
doi:10.1007/s10618-009-0164-z fatcat:5jxpwap4lnfrtafqmcl7k6wr3e
« Previous Showing results 1 — 15 out of 62,509 results