A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2004; you can also visit the original URL.
The file type is application/pdf
.
Filters
Data mining in metric space
2004
Proceedings of the 2004 ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '04
The three metrics that are appropriate when predictions are interpreted as probabilities: squared error, cross entropy, and calibration, lay in one part of metric space far away from metrics that depend ...
In between them fall two metrics that depend on comparing predictions to a threshold: accuracy and F-score. ...
The three ordering metrics, AUC, APR, and BEP, cluster close in metric space and exhibit strong pairwise correlations. These metrics clearly are similar to each other and somewhat interchangeable. ...
doi:10.1145/1014052.1014063
dblp:conf/kdd/CaruanaN04
fatcat:yugijt4mmbfadhk23rnx4uryyu
Mining distance-based outliers from large databases in any metric space
2006
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '06
An object o ∈ R is an outlier, if there exist less than k objects in R whose distances to o are at most r. The values of k, r, and the distance metric are provided by a user at the run time. ...
The upper bound turns out to be extremely low in practice, e.g., less than 1% of R. ...
We would like to thank the anonymous reviewers for their insightful comments. ...
doi:10.1145/1150402.1150447
dblp:conf/kdd/TaoXZ06
fatcat:cp4ihcwxwvcttllqirp4ggncu4
Scalable all-pairs similarity search in metric spaces
2013
Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '13
In this article, we propose a parallel framework for solving this problem in metric spaces. ...
Novel elements of our solution include: i) flexible support for multiple metrics of interest; ii) an autonomic approach to partition the input dataset with minimal redundancy to achieve good load-balance ...
In metric space, one way to form worksets is to let Ii, the inner set of workset Wi, be Pi. Oi, the outer set of Wi, is given by {pj | dist(pj, ci) ≤ (ri + t) ∧ pj / ∈ Pi}. ...
doi:10.1145/2487575.2487625
dblp:conf/kdd/WangMP13
fatcat:5vtxulyq2rhf7joiossq46beu4
Fast Best-Match Shape Searching in Rotation Invariant Metric Spaces
[chapter]
2007
Proceedings of the 2007 SIAM International Conference on Data Mining
The algorithm can be utilized in a number of important data mining tasks such as shape clustering and classification, or for discovering of motifs and discords in image collections. ...
In this work we explore the metric properties of the rotation invariant distance measures and propose an algorithm for fast similarity search in the shape space. ...
Conclusions In this work we demonstrated that, under certain conditions, rotation invariant distance measures define a metric over the shape space, which implies that searching in this space could be highly ...
doi:10.1137/1.9781611972771.70
dblp:conf/sdm/YankovKWXH07
fatcat:ozhsroeorngy5meskmiqc2cewq
Cluster Ranking with an Application to Mining Mailbox Networks
2006
IEEE International Conference on Data Mining. Proceedings
To this end, we introduce a novel strength measure for clusters-the integrated cohesion-which is applicable to arbitrary weighted networks. We then present C-Rank: a new cluster ranking algorithm. ...
C-Rank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths. ...
Nevertheless, most of the classical work in the area (e.g., Fuzzy c-means) assumes the data points lie in a metric space, which respects the triangle inequality. ...
doi:10.1109/icdm.2006.35
dblp:conf/icdm/Bar-YossefGLMS06
fatcat:7kddbqdej5f2fhumprdoi6ojze
Cluster ranking with an application to mining mailbox networks
2007
Knowledge and Information Systems
To this end, we introduce a novel strength measure for clusters-the integrated cohesion-which is applicable to arbitrary weighted networks. We then present C-Rank: a new cluster ranking algorithm. ...
C-Rank is well suited to mine such networks, since they are abundant with overlapping communities of highly variable strengths. ...
Nevertheless, most of the classical work in the area (e.g., Fuzzy c-means) assumes the data points lie in a metric space, which respects the triangle inequality. ...
doi:10.1007/s10115-007-0096-0
fatcat:jmpazzk2qven5nnuapkxnzf2fe
Data Integration via Constrained Clustering: An Application to Enzyme Clustering
[chapter]
2011
Proceedings of the 2011 SIAM International Conference on Data Mining
In this paper we propose constrained clustering as a strategy for integrating data sources without losing any information. ...
We use constrained clustering as a means of integrating information from diverse sources as constraints, and analyze how this additional information impacts clustering quality in an enzyme clustering application ...
Related Work Clustering is a data mining technique that groups similar objects without any supervised information. ...
doi:10.1137/1.9781611972818.8
dblp:conf/sdm/LimaMMZ11
fatcat:2edz6gbpwzdvpie7wdwu7jwje4
Text clustering as a mining task
[chapter]
2005
Text Mining and its Applications to Intelligence, CRM and Knowledge Management
In this chapter we introduce readers to the various aspects of cluster analysis performed on textual data in a mining framework. ...
Then, we focus on the importance and on the goals of clustering in a text mining scenario, analyzing and describing the issues which are specific to this particular field. ...
data mining [4] . ...
doi:10.2495/978-1-85312-995-7/03
fatcat:xi7aia5zhjdrxlwsihxoxi3evm
Robust Distance-Based Clustering with Applications to Spatial Data Mining
2001
Algorithmica
In this paper, we present a method for clustering geo-referenced data suitable for applications in spatial data mining, based on the medoid method. ...
The medoid method is related to k-Means, with the restriction that cluster representatives be chosen from among the data elements. ...
Combinatorial reclassification has been favored in Data Mining applications [46, 47] . ...
doi:10.1007/s00453-001-0010-1
fatcat:zjjbzmlu2berth5tts5u6kc2oq
An efficient approach to external cluster assessment with an application to martian topography
2007
Data mining and knowledge discovery
In the context of unsupervised learning or clustering, such tools delve inside large databases looking for alternative classification schemes that are meaningful and novel. ...
The inherently large computational cost of this step is alleviated by first projecting all data over the single dimension that best separates both distributions (using Fisher's Linear Discriminant). ...
Acknowledgements Thanks to the Lunar and Planetary Institute, which is operated by USRA under contract CAN-NCC5-679 with NASA, for facilitating data on Martian landscapes. ...
doi:10.1007/s10618-006-0045-7
fatcat:adyelvuienhzheftu4aqsgmv24
Attribute Level Clustering Approach to Quantitative Association Rule Mining
2014
International Journal of Computer Applications
In this paper a new data driven partitioning algorithm has been proposed to discretize the ranges of the attributes. ...
Discretization of the ranges of the attributes has been one of the challenging tasks in quantitative association rule mining that guides the rules generated. ...
Let D be a data For each attribute with numerical values by using the clustering method as specified in section I, clustering is performed on single dimensional data with the distance metric as absolute ...
doi:10.5120/16598-6404
fatcat:ftsbtd3annc7tf75ghtkjtlnt4
List of twin clusters: a data structure for similarity joins in metric spaces
2008
2008 IEEE 24th International Conference on Data Engineering Workshop
Introduction Some applications: data mining, data cleaning, and data integration. This version of similarity join translates into solving several range queries. ...
Similarity Joins A A B B r Range queries with threshold r for all element in A
List of Clusters (LC) The LC splits the space into zones. ...
LTC index stands out as a practical and efficient data structure to solve a particular case of similarity join.
Work in Progress The similarity self join. ...
doi:10.1109/icdew.2008.4498353
dblp:conf/icde/ParedesR08
fatcat:bfg5pdm7fffgfanhahj5xjoqtq
Near Neighbor Search in Large Metric Spaces
1995
Very Large Data Bases Conference
In experiments, we find that GNAT's outperform previous data structures in a number of applications. ...
Given user data, one often wants to find approximate matches in a large database. A good example of such a task is finding images similar to a given image in a large collection of images. ...
Frank Olken, Luis Gravano, Edouard Bugnion, and Sandeep Singhal for helpful discussions and for listening to my endless ramblings. ...
dblp:conf/vldb/Brin95
fatcat:7ponswfxcja3joojvmqwe6gkjy
An Index Data Structure for Searching in Metric Space Databases
[chapter]
2006
Lecture Notes in Computer Science
This paper presents the Evolutionary Geometric Near-neighbor Access Tree (EGNAT) which is a new data structure devised for searching in metric space databases. ...
All this indicates that the EGNAT is suitable for conducting similarity searches on very large metric space databases. ...
Applications can be found in voice and image recognition, and data mining problems. Similarity can be modeled as a metric space as stated by the following definitions.
Metric Space. ...
doi:10.1007/11758501_82
fatcat:3csp7gzw7bdrvc6nbz2gv4f7qm
A clustering comparison measure using density profiles and its application to the discovery of alternate clusterings
2010
Data mining and knowledge discovery
In particular, it adopts a 'data mining style' philosophy to clustering comparison, whereby two clusterings are considered to be more similar, if they are likely to give rise to similar types of prediction ...
Data clustering is a fundamental and very popular method of data analysis. ...
This has important applications in situations where the data is evolving, such as stream clustering. ...
doi:10.1007/s10618-009-0164-z
fatcat:5jxpwap4lnfrtafqmcl7k6wr3e
« Previous
Showing results 1 — 15 out of 62,509 results