Performance Evaluation of Different Data Mining Classification Algorithm and Predictive Analysis

Syeda Farha Shazmeen Syeda Farha Shazmeen
2013 IOSR Journal of Computer Engineering  
Data mining is the knowledge discovery process by analyzing the large volumes of data from various perspectives and summarizing it into useful information; data mining has become an essential component in various fields of human life. It is used to identify hidden patterns in a large data set. Classification techniques are supervised learning techniques that classify data item into predefined class label. It is one of the most useful techniques in data mining to build classification models from
more » ... an input data set; these techniques commonly build models that are used to predict future data trends. In this paper we have worked with different data mining applications and various classification algorithms, these algorithms have been applied on different dataset to find out the efficiency of the algorithm and improve the performance by applying data preprocessing techniques and feature selection and also prediction of new class labels. II. Data Sets used in the application: IRIS Datasets: -we make use of a large database "Fisher"s Iris Dataset" containing 5 attributes and 150 instances. It consists of 50 samples from each of three species of Iris flowers (Iris setosa, Iris virginica and Iris versicolor). Four features were measured from each sample; they are the length and the width of sepal and petal, in centimeters. Based on the combination of the four features, Fisher developed a linear discriminant model to distinguish the species from each other classification method to identify the class of Iris flower as Irissetosa, Iris-versicolor or Iris-virginica using data mining classification algorithm. Liver Disorder: -The observations in the dataset consist of 7 variables and 345 observed instances. The first 5 variables are measurements taken by blood tests that are thought to be sensitive to liver disorders and might arise from excessive alcohol consumption. The sixth variable is a sort of selector variable. The subjects are single male individuals. The seventh variable is a selector on the dataset, being used to split it into two sets,
doi:10.9790/0661-1060106 fatcat:ffw6zxhlajaljmhstlrz6tbqhe