27,446 Hits in 6.0 sec

Proximity Measurement for Hierarchical Categorical Attributes in Big Data

Zakariae El Ouazzani, An Braeken, Hanan El Bakkali, Fulvio Valenza
2021 Security and Communication Networks  
In order to prevent the similarity attack while preserving data utility, a hybrid technique dealing with categorical attributes is proposed in this paper.  ...  The L-diversity technique is one of those techniques dealing with sensitive numerical and categorical attributes.  ...  For instance, in a hospital data set, "Disease" is a sensitive attribute; in a financial data set, "CCV" number is a sensitive attribute, and the "annual income" is a sensitive attribute in a census data  ... 
doi:10.1155/2021/6612923 fatcat:vcjbyxy3efcktm6wt3ricijx6m

S-CPM: Semantic-Similarity Cluster based Privacy Preservation Model with Cell Generalization Principle

Satish B Basapur, B S Shylaja, Venkatesh
2022 Journal of Computer Science  
Two-phase clustering operations are carried out in parallel and scalable for Big Data sets.  ...  The proposed model considers both numerical and categorical attribute values for data anonymization. Two-phase clustering contains two phases.  ...  The sensitive attributes are used extensively for data analytics or data mining but not for anonymization  Non-sensitive attribute value can be disclosed and no need to protect data privacy In this study  ... 
doi:10.3844/jcssp.2022.138.150 fatcat:oyfn5t57yffjlhc3fvmji4zahy

Privacy Preservation on Big Data using Efficient Privacy Preserving Algorithm

Johnny Antony P
2019 International Journal for Research in Applied Science and Engineering Technology  
To improve search efficiency and to provide privacy preservation for big data environment Efficient Privacy Preserving (EPP) Algorithm is used in this article.  ...  In this article, we analyze a method of hiding sensitive information on big data by reconstruct a dataset according to the anonymization technique applied to clustered data.  ...  Shweta Shukla [5] , proposed a method of hiding sensitive classification rules from data mining algorithms for categorical datasets.  ... 
doi:10.22214/ijraset.2019.6048 fatcat:3iywdjtajbfj7jcxsgpml7lzre

A dynamic data classification techniques and tools for big data

T Usha Rani, CH Sindhu Priyanka, B S S Monica
2019 Journal of Physics, Conference Series  
Big Data is an immense term for working with the large volume and complex data sets.  ...  To find the important and accurate data from large unstructured data is going to be a difficult task for any user. This is the reason why the classification technique came into picture for big data.  ...  Disadvantages: • Slow to process the query. • Needs large storage space for remembering all instances. • Noise sensitive. • Slow testing.  ... 
doi:10.1088/1742-6596/1228/1/012043 fatcat:muhersxqwrgyjfp4xwlc7ccrpa

Fuzzy-Based Approach for Clustering Data with Multivalued Features

L. N. C. Prakash K, M. Vimaladevi, V. Deeban Chakravarthy, G. Surya Narayana, Asadi Srinivasulu
2022 Wireless Communications and Mobile Computing  
In analysis of data, objects have mostly been characterized by a set of characteristics known as attributes, which together contained only one value for each object.  ...  , in addition to shipping addresses, of that kind of attributes are referred to as multivalued attributes and are typically regarded as null attributes when data is processed employing machine learning  ...  Furthermore, because these k-means-type methods can indeed work data with numerical information, they can be used very less in fields like data where big categorical or multivalued data sets are widespread  ... 
doi:10.1155/2022/3818107 doaj:799121d0ea5c41eeba0fdd59034cdc57 fatcat:2skqncfnzbfm5pl7jdqfqtqkbe

Improving binary classification using filtering based on k-NN proximity graphs

Maher Ala'raj, Munir Majdalawieh, Maysam F. Abbod
2020 Journal of Big Data  
As results of filtering, accuracy of DES-LA combiner shows big increase for low-accuracy datasets.  ...  in training data.  ...  In this dataset bank credit attributes for 1000 credits is provided. • Data banknote authentication (4 features, 1372 entries, 56% of positive entries). Will be denoted as dataset B.  ... 
doi:10.1186/s40537-020-00297-7 fatcat:2jwd4cvfkbglpmqmsyfmx6xjly

Survey on Technique and User Profiling in Unsupervised Machine Learning Method

Andri M Kristijansson, Tyr Aegisson
2022 Journal of Machine and Computing  
In terms of data size and dimensions, it offers two-stage clustering algorithms for category, quantitative, and mixed types of datasets.  ...  The goal of this research is to provide a framework that outlines the Unsupervised Machine Learning (UML) methods for User-Profiling (UP) based on essential data attributes.  ...  This research intends to participate in an answer by establishing a method and paradigm of UML techniques with regard to essential data attributes, based on the second classification technique.  ... 
doi:10.53759/7669/jmc202202002 fatcat:kznjwlyygbgw7dhzrwyzbp5ahm

Data Mining, Machine Learning and Big Data Analytics

Lidong Wang
2017 International Transaction of Electrical and Computer Engineers System  
data, IT challenges, and Big Data in an extended service infrastructure.  ...  The feasibility and challenges of the applications of deep learning and traditional data mining and machine learning methods in Big Data analytics are also analyzed and presented.  ...  The k-means method is not applicable for categorical data while k-modes is a method for categorical data that uses modes. k-modes use new dissimilarity measures to deal with categorical objects and use  ... 
doi:10.12691/iteces-4-2-2 fatcat:bk3lvlmikjdqhfejqrrxjdq5eq

Possibilistic Fuzzy Clustering for Categorical Data Arrays Based on Frequency Prototypes and Dissimilarity Measures

Zhengbing Hu, Yevgeniy V. Bodyanskiy, Oleksii K. Tyshchenko, Viktoriia O. Samitova
2017 International Journal of Intelligent Systems and Applications  
Fuzzy clustering procedures for categorical data are proposed in the paper.  ...  A detailed description of a possibilistic fuzzy clustering method based on frequency-based cluster prototypes and dissimilarity measures for categorical data is given.  ...  This approach is not sensitive to outliers and doesn't require partition of objects into clusters. It's designated for clustering data with a huge amount of number and nominal attributes.  ... 
doi:10.5815/ijisa.2017.05.07 fatcat:wqnf674aofef7dn7lt6aafdix4

Local Neighborhood-based Outlier Detection of High Dimensional Data using different Proximity Functions

Mujeeb Ur Rehman, Dost Muhammad
2020 International Journal of Advanced Computer Science and Applications  
This analytic research is also very appropriate and applicable in the domain of big data and data science as well.  ...  In recent times, dimension size has posed more challenges as compared to data size.  ...  EXPERIMENTAL WORK The proposed research is evaluated and tested in RapidMiner and ELKI tools which are specialized ones for data mining and outlier detection tasks.  ... 
doi:10.14569/ijacsa.2020.0110418 fatcat:v6ebok6v7va3pk3j4porcj6rkm

A Weighted Similarity Measure for k-Nearest Neighbors Algorithm

Bergen Karabulut, Güvenç Arslan, Halil Murat ÜNVER
2019 Celal Bayar Universitesi Fen Bilimleri Dergisi  
Firstly, it calculates the weight of each attribute and similarity between the instances in the dataset.  ...  And then, it weights similarities by attribute weights and creates a weighted similarity matrix to use as proximity measure.  ...  Each data point is defined by attributes in the attributes set; = { 1 , 2 , ⋯ , }. The value of a data point in the dataset for an attribute is represented by ( , ).  ... 
doi:10.18466/cbayarfbe.618964 fatcat:p6cz3eiqxjfwjerbhsidkqsdl4

An Extended Mondrian Algorithm – XMondrian to Protect Identity Disclosure [chapter]

R. Padmaja, V. Santhi
2021 Advances in Parallel Computing  
Many Organizations often need to publish their data in internet for research and analysis purpose, but there is no guarantee that those data would be used only for ethical purposes.  ...  The proposed algorithm can handle both numerical and categorical attributes without encoding or decoding the categorical values.The effectiveness of the proposed algorithm has been analysed through experimental  ...  This reflects the fact that there is no need for encoding and decoding for categorical attributes. This time is reduced in the XMondrian algorithm.  ... 
doi:10.3233/apc210088 fatcat:2kf6ipjhrjh5jfuvpdh3ip5hkq

Anomaly Detection in Big Data [article]

Chandresh Kumar Maurya
2022 arXiv   pre-print
Therefore, we take an alternative approach to tackle anomaly detection in big data. Essentially, there are two ways to scale anomaly detection in big data.  ...  Due to data explosion in data laden domains, traditional anomaly detection techniques developed for small data sets scale poorly on large-scale data sets.  ...  Examples of big data can be patient monitoring sensor data that consists of hundreds of attributes each of various types.  ... 
arXiv:2203.01684v1 fatcat:3w5yogrqwnasdn3niubj2pgghi

Cluster Based Outlier Detection Algorithm for Healthcare Data

A. Christy, G. Meera Gandhi, S. Vaithyasubramanian
2015 Procedia Computer Science  
Outliers has been studied in a variety of domains including Big Data, High dimensional data, Uncertain data, Time Series data, Biological data, etc.  ...  In this paper, we utilize the concept of data preprocessing for outlier reduction.  ...  Outlier detection can be done using uni variate as well as multivariate data in terms of categorical as well as continuous attributes.  ... 
doi:10.1016/j.procs.2015.04.058 fatcat:fokovmxv7vfhfpfqmghqnxvp7y

Big data privacy: a technological perspective and review

Priyank Jain, Manasi Gyanchandani, Nilay Khare
2016 Journal of Big Data  
On the premise of this definition, the properties of big Abstract Big data is a term used for very large data sets that have more varied and complex structure.  ...  All the enormous measure of data produced from various sources in multiple formats with very high speed [3] is referred as big data.  ...  Big data privacy preserving in data processing Big data processing paradigm categorizes systems into batch, stream, graph, and machine learning processing [27, 28] .  ... 
doi:10.1186/s40537-016-0059-y fatcat:hol5lhemzvawzc6yyxe4paa7zu
« Previous Showing results 1 — 15 out of 27,446 results