International Journal of Intellectual Advancements and Research in Engineering Computations ATTRIBUTE DEPENDANT DATA LINKAGE SCHEME WITH CLUSTERING TREES
Data linkage is the task of identifying different entries or data items refer to the same entity across different data sources. Data sets are joined without a common identifier (Foreign Key). Data linkage is divided into two types one-to-one and one-to-many. One-to-one data linkage model associates an entity from one data set with a single matching entity in another data set. One-to-many data linkage associates an entity from the first data set with a group of matching entities from the other
... es from the other data set. The clustering tree is constructed with each leaves contains a cluster. Each cluster is generalized by a set of rules (conditional probabilities) stored in the appropriate leaf. Clustering tree is used for data leakage prevention, recommender systems and fraud detection. One-to-many data linkage method is used to build links between entities of different natures. One-class clustering tree (OCCT) characterizes the entities should be linked together. The OCC Tree is built to transform into association rules. Splitting and pruning operations are used for inducing the OCCT. Structure identification and split based attribute selection tasks are used in inducing a clustering tree linkage model. Probabilistic models are build to represent the leaves in the tree. The data items are linked with induced models. The linkage model is cross validated with test sets to produce score with matched probability values. The One Class Clustering Tree (OCCT) model is extended to handle Many to Many relationship data items. Positive (matching) and negative (Non matching) pairs are integrated in the training process. The system is enhanced to handle binary, categorical and continuous attributes. Accuracy level based splitting and pruning selection process is used in the system.