Bucketization based Flow Classification Algorithm for Data Stream Privacy Mining

G. Kesavaraj, S. Sukumaran
2013 International Journal of Computer Applications  
In recent years, data mining plays a major role in maintaining the huge volume of data from which it can derive the useful information. With the huge number of formation of data, the data wants to be lectured in a limit to the charge of growth. But it is complex to get over the set of meaningful information from the continuous set of data. Data-stream mining is a method which can discover important information from a huge contract of prehistoric data. For identification of useful information,
more » ... e classification of continuous data streams is done. Current approaches in classifying the data streams are processed using supervised learning algorithms, which can be qualified with tagged data. Usually, manual classification of data is both expensive and time consuming. As a result, where massive amount of data emerge at a high speed, tagged data might be very sparse. Therefore, only a restricted amount of training data might be accessible for constructing the classification models, tend to badly trained classifiers. To overcome the issue, in this work, a novel technique is presented to build a classification set having both unlabeled and a small amount of labeled instances. This model is built by using the Flow Classification Algorithm (FCA). The FC algorithm is able to judge internally on set of marked data. Before classification, the correlation set of attributes in the each record set are grouped using bucketization technique. The superiority of models updated from them is enough for utilization of unlabeled records, or whether more set of labeled records are needed for classification is processed. Experimental evalaution is conducted to the proposed FC technqiue over its counterparts to find a set of diverse solution in terms of execution time, classification accuracy and security. Performance metrics for evaluation of proposed FCA technique shows that the security level is 10-15% high against existing work.
doi:10.5120/14063-2245 fatcat:gxazbioatfdl7cjyzryi373d6u