EFFICIENT DECISION TREE CONSTRUCTION IN UNREALIZED DATASET USING C4.5 ALGORITHM

Mrs Subapriya, M Kalimuthu, P Sengottuvelan
2016 International Journal of Advanced Engineering and Recent Technology   unpublished
Privacy preservation is important for machine learning and data mining, but measures designed to protect private information sometimes result in a trade off: reduced utility of the training samples. It introduces a privacy preserving approach that can be applied to decision-tree learning, without concomitant loss of accuracy. It describes an approach to the preservation of privacy of collected data samples in cases when information of the sample database has been partially lost. It converts the
more » ... original sample datasets into a group of unreal datasets, where an original sample cannot be reconstructed without the entire group of unreal datasets. It does not perform well for sample datasets with low frequency, or when there is low variance in the distribution of all samples. However, this problem can be solved through C4.5 Algorithm. C4.5 is a suite of algorithms for classification problems in machine learning and data mining. Used percentage split method to separate training set with test data set and Used C4.5 algorithm finding the information gain and gain ratio for all attributes and then creation of decision node that split on attribute which has the highest information gain. C4.5 is a software extension of the basic ID3 algorithm. It address the issues not solved by ID3.The issues are avoiding over fitting the data, Reduced error pruning, Handling continuous attributes and also Handling training data with missing attribute values and Generation of decision tree from dataset.
fatcat:keai6rge4vbahecyrplppdfvxu