10 Hits in 4.9 sec

Frequent Itemset Mining for Big Data Using Greatest Common Divisor Technique

Mohamed A. Gawwad, Mona F. Ahmed, Magda B. Fayek
2017 Data Science Journal  
The discovery of frequent itemsets is one of the very important topics in data mining.  ...  Our approach is based on the original Buddy Prima algorithm and the Greatest Common Divisor (GCD) calculation between itemsets which exist in the transaction database.  ...  Introduction Frequent itemsets discovery "is one of the most important techniques in data mining" (Zhengui Li, 2012) .  ... 
doi:10.5334/dsj-2017-025 fatcat:mbyvkchusfhnln7ffncqsxmvmm

A Prime Number Based Approach for Closed Frequent Itemset Mining in Big Data [chapter]

Mehdi Zitouni, Reza Akbarinia, Sadok Ben Yahia, Florent Masseglia
2015 Lecture Notes in Computer Science  
One of the worth of cite features of CloPN is that it uses a prime number based approach to transform the data into numerical form, and then to mine closed frequent itemsets by using only multiplication  ...  In this paper, we address the issue of mining closed frequent itemsets (CFI) from big datasets in such environments. We introduce a new parallel algorithm, called CloPN, for CFI mining.  ...  Unfortunately, mining only frequent itemsets in big data generates an overwhelming large number of itemsets.  ... 
doi:10.1007/978-3-319-22849-5_35 fatcat:tufauhxzinfrjfh7ke4jtkkm4i

Frequent Itemset Generation for Analyzing Customer Buying Nature using Bit Vector Mining

G. Harini
2019 International Journal for Research in Applied Science and Engineering Technology  
The paper introduces an efficient algorithm using bit vector to find frequent itemsets from a huge set of itemsets.  ...  As the amount of data keeps increasing, the efficiency of the existing frequent itemset mining algorithms have decreased.  ...  The approach is based on the original Buddy Prima algorithm and the Greatest Common Divisor (GCD) calculation between itemsets which exist in the transaction database.  ... 
doi:10.22214/ijraset.2019.3352 fatcat:jiz7wz3tkzbcvofzdjgepcwnvq

Massively Distributed Environments and Closed Itemset Mining: The DCIM Approach [chapter]

Mehdi Zitouni, Reza Akbarinia, Sadok Ben Yahia, Florent Masseglia
2017 Lecture Notes in Computer Science  
Mining closed frequent itemsets (CFI) is one of these data mining techniques, associated with great challenges. It allows discovering itemsets with better efficiency and result compactness.  ...  We address the problem of distributed CFI mining by introducing a new parallel algorithm, called DCIM, which uses a prime number based approach.  ...  To do so, we adopted the notion of greatest common divisor (Gcd).  ... 
doi:10.1007/978-3-319-59536-8_15 fatcat:gb6t3l6jp5fzbgqfhsahmyxyai

Privacy Management of Multi User Environment in Online Social Networks (OSNs)

P. Amrutha
2013 IOSR Journal of Computer Engineering  
To this end, we propose an approach to enable the protection of shared data associated with multiple users in OSNs.  ...  While OSNs allow users to restrict access to shared data, they currently do not provide any mechanism to totally enforce privacy issue solver associated with multiple users.  ...  Vanita Mane, PG coordinator Department of Computer Engineering for her constant motivation, knowledge sharing and support behind this paper. Acknowledgement  ... 
doi:10.9790/0661-1320107 fatcat:rdpsdcqelbbzvkg5rm4og4tzba

Aggregating privatized medical data for secure querying applications

Kalpana Singh, Lynn Batten
2017 Future generations computer systems  
The first category of methods attempts to prevent sensitive patterns, such as frequent itemsets [OZ03] or sequences [GDL11] , or association rules [GDV09] , from being mined from the data, while the  ...  Cryptography Based Data Aggregation Methods Common Approaches Several researchers have investigated secure data aggregation schemes using the most common approach the MPC techniques to perform the collaborative  ...  This thesis delineates the applications of aggregation and querying of sensitive medical data in several application scenarios, and also tests data privatization techniques to assist in improving the strength  ... 
doi:10.1016/j.future.2016.11.028 fatcat:fd6kzkwccvhuhltszlq7ysj4yq

Data Structures, Algorithms and Applications for Big Data Analytics: Single, Multiple and All Repeated Patterns Detection in Discrete Sequences [article]

Konstantinos Xylogiannopoulos, University Of Calgary, University Of Calgary, Reda Elhajj
A unique, innovative algorithm (ARPaD), which takes advantage of the exceptional characteristics of the introduced data structure and allows big data mining with space and time optimization, has also been  ...  The main problem of detecting all repeated patterns is that all data structures used in computer science are incapable of scaling well for such purposes due to their space and time complexity.  ...  Irrational Numbers and Big Data The last experiment presented in my thesis concerns big data mining.  ... 
doi:10.11575/prism/25522 fatcat:e4fviyr3dvacdetbzmzohrblqi

Privacy-Preserving Data Mining

Saeed Samet, Université D'Ottawa / University Of Ottawa, Université D'Ottawa / University Of Ottawa
Related data is normally distributed among two or more parties in different configurations, and mining can be done in an accurate and useful way for all parties involved, on all data collections.  ...  Data mining is a collection of techniques which find patterns and associations in raw data, and classify or cluster the items according to their attributes.  ...  for data mining and machine learning techniques are non-incremental.  ... 
doi:10.20381/ruor-13248 fatcat:vnvercqxjjaitkmbthk4j4e3wy

DBKDA 2012 Committee DBKDA Advisory Chairs DBKDA 2012 Technical Program Committee

Friedrich Laux, Aris Ouksel, Lena Strömbäck, Sweden Smhi, Miranda, Friedrich Laux, Aris Ouksel, Lena Strömbäck, Sweden Smhi, Miranda, Nipun Agarwal, Suad Alagic (+38 others)
Advances in different technologies and domains related to databases triggered substantial improvements for content processing, information indexing, and data, process and knowledge mining.  ...  and data applications.  ...  , which are frequently used in data mining community.  ... 

Document clustering in large German corpora using Natural Language Processing

Richard Forster
Special thanks go to the Neue Zürcher Zeitung (nzz) and the Schweizerische Depeschenagentur (sda), which have kindly provided two excellent corpora for use in this study.  ...  For technical support, guidance and patience I would like to thank in particular Beat Rageth, Manfred Klenner and Simon Clematide of the Department of Informatics and the Institute of Computational Linguistics  ...  Frequent Itemsets. Association rule mining (Agrawal et al., 1993) can be used to establish sets of frequently co-occurring terms.  ... 
doi:10.5167/uzh-163398 fatcat:qtmlupzihnfkre2d2dgreowbj4