253 Hits in 5.6 sec

Scalable Multidimensional Hierarchical Bayesian Modeling on Spark

Róbert Ormándi, Hongxia Yang, Quan Lu
2015 Knowledge Discovery and Data Mining  
To overcome these difficulties, we decompose the joint prior of the three-dimensional Click-Through-Rate (CTR) using tensor decomposition and propose a Multidimensional Hierarchical Bayesian framework  ...  Besides the centralized implementation, we propose a distributed algorithm through Spark for inference which make the model highly scalable and suited for large scale data mining applications.  ...  It is also the first time to deploy such a multidimensional hierarchical Bayesian framework on Spark through GraphX.  ... 
dblp:conf/kdd/OrmandiYL15 fatcat:gp7kttyr6zbhtalnxsyxfvhjyu

Rotation Forest for Big Data

Mario Juez-Gil, Álvar Arnaiz-González, Juan J. Rodríguez, Carlos López-Nozal, César García-Osorio
2021 Information Fusion  
In this paper, a MapReduce Rotation Forest and its implementation under the Spark framework are presented.  ...  Bayesian tests are used to validate the method against two ensembles for Big Data: Random Forest and PCARDE classifiers.  ...  Fig. 8 shows the hierarchical Bayesian test heatmap for 𝐿 = 5 and 𝐿 = 10 on the top and bottom row, respectively.  ... 
doi:10.1016/j.inffus.2021.03.007 fatcat:4jagsnzwlvgxpmkda47kbqopay

Intrusion Detection System on Big data using Deep Learning Techniques

It is used to develop the hybrid, secure, scalable NIDS which is based on deep learning and big data techniques.  ...  In this paper, the detailed review has been done on intrusion detection on various fields using deep learning and gives an idea of applications of deep learning.  ...  Their distributed algorithm performs with better scalability and is capable of discord discovery in multidimensional time series [17] .  ... 
doi:10.35940/ijitee.d2011.029420 fatcat:t4woonejwzd3njpqix5o42uf5q

A Robust Distributed Big Data Clustering-based on Adaptive Density Partitioning using Apache Spark

Behrooz Hosseini, Kourosh Kiani
2018 Symmetry  
The proposed method is developed-based on Apache Spark framework and tested on some of the prevalent datasets.  ...  In the first step of this algorithm, the input data is divided into partitions using a Bayesian type of Locality Sensitive Hashing (LSH).  ...  Hierarchical models are like a one-way road that assign each point to its first nearest cluster.  ... 
doi:10.3390/sym10080342 fatcat:wcm32bgxobcrdndmytaixtl2bm

Data stream mining techniques: a review

Eiman Alothali, Hany Alashwal, Saad Harous
2019 TELKOMNIKA (Telecommunication Computing Electronics and Control)  
Recently, many studies have addressed the concerns on massive data mining problems and proposed several techniques that produce impressive results.  ...  Statistical SAL [35] uses a Bayesian model that allows multi-class classification without a predefine number of classes.  ...  SAL [35] 2018 Bayesian model Active learning algorithm to cope with both concept drift and concept evolution by adapting the classification model to the changes of stream.  ... 
doi:10.12928/telkomnika.v17i2.11752 fatcat:rls2qzcl3vhobmkpycsdwhzplu

A Collective Anomaly Detection Approach for Multidimensional Streams in Mobile Service Security

Yu Weng, Lei Liu
2019 IEEE Access  
Furthermore, we implement a distributed iForestFS using spark framework in order to improve time performance and scalability.  ...  In this paper, we consider the statistical features of the subsequence of streams, proposing a novel collective anomaly detection algorithm for multidimensional streams based on iForest in a cloud environment  ...  Table (6) shows the scalability of the distributed iForest model. The testing dataset is still Http. The memory is set 1G per executor on Spark. The number of cores range from 1 to 4 cores.  ... 
doi:10.1109/access.2019.2909750 fatcat:jztgb3zuwvhyrgrnuru2n5ak5m

Anomaly detection techniques for streaming data–An overview

Saranya Kunasekaran, Chellammal Suriyanarayanan
2020 Malaya Journal of Matematik  
Bayesian Networks Bayesian Networks [22] has been implemented for anomaly detection in multi class setting. It is a model that encodes the probabilistic relationships among variables.  ...  [4] Internet of Things (IoT) DBSCAN NRDD-DBSCAN Tools used: Apache Spark RDD DBSCAN is not suited for scalability.  ... 
doi:10.26637/mjm0s20/0133 fatcat:blyjw2z4q5datacu7y4lavwchq

Predictive Analytics On Big Data - An Overview

Gayathri Nagarajan, Dhinesh Babu L.D
2019 Informatica (Ljubljana, Tiskana izd.)  
The overview throws light on the core predictive models, challenges of these models on big data, research gaps in several domain sectors and using different techniques.  ...  While research works carried out continuously to handle big data is at one end, processing it to develop the business insights is a hot topic to work on the other end.  ...  Map reduce is used on spark for KNN to yield better results in terms of time and accuracy. Resilient distributed databases are used on spark platform.  ... 
doi:10.31449/inf.v43i4.2577 fatcat:hqi45o6t7jb63dr3aaesink6l4

A Review: Predictive Analytics with Big Data

Mr. Rizwanahmed B. Mujawar, Dr. Dinesh B. Kulkarni
2017 IJARCCE  
To the contrary, distributed computing frameworks like Hadoop are scalable for complex operations and tasks on large datasets (petabyte range), Apache Spark etc. III.  ...  We can fully utilize the big data power in more precise way by using the modern and scalable system like distributed framework like Hadoop and Spark.  ... 
doi:10.17148/ijarcce.2017.63124 fatcat:e2oyzsp2r5g6rj6r7k7trdgc5u

A Distributed Learning Architecture for Scientific Imaging Problems [article]

A. Panousopoulou, S. Farrens, K. Fotiadou, A. Woiselle, G. Tsagkatakis, J-L. Starck, P. Tsakalides
2018 arXiv   pre-print
Ultimately, the offered discussion provides useful practical insights on the impact of key Spark tuning parameters on the speedup achieved, and the memory/disk footprint.  ...  We apply the resulting, Spark-compliant, architecture on two emerging use cases from the scientific imaging domain, namely: (a) the space variant deconvolution of galaxy imaging surveys (astrophysics),  ...  on exact k-nearest neighbors classification and Bayesian Network Classifiers [40] , respectively.  ... 
arXiv:1809.05956v2 fatcat:iqzxocxplvd6xpxzut4wosxwgi

Machine Learning and Deep Learning frameworks and libraries for large-scale data mining: a survey

Giang Nguyen, Stefan Dlugolinsky, Martin Bobák, Viet Tran, Álvaro López García, Ignacio Heredia, Peter Malík, Ladislav Hluchý
2019 Artificial Intelligence Review  
Therefore, Apache introduced Spark MLlib and Spark ML built on top of Spark ecosystem (Spark 2018b) thus being much faster than Mahout.  ...  Model selection and model performance optimization contains a large number of approaches such as hyper-parameter tuning, grid search, bayesian optimization (Snoek et al. 2012) , local minimum search,  ... 
doi:10.1007/s10462-018-09679-z fatcat:ueffoypwlva4ndo35g5gzfrpcy

2021 Index IEEE Transactions on Visualization and Computer Graphics Vol. 27

2022 IEEE Transactions on Visualization and Computer Graphics  
Valdivia, P., +, TVCG Jan. 2021 1-13 ArchiText: Interactive Hierarchical Topic Modeling.  ...  ., +, TVCG Feb. 2021 1417-1426 Implicit Multidimensional Projection of Local Subspaces.  ...  ., +, TVCG Feb. 2021 989-999 Revealing Perceptual Proxies with Adversarial Examples. 1073-1083 Towards Modeling Visualization Processes as Dynamic Bayesian Networks.  ... 
doi:10.1109/tvcg.2022.3163599 fatcat:2mtpsecojbc33pqht3n7oyqmoq

Machine Learning in Python: Main Developments and Technology Trends in Data Science, Machine Learning, and Artificial Intelligence

Sebastian Raschka, Joshua Patterson, Corey Nolet
2020 Information  
Smarter applications are making better use of the insights gleaned from data, having an impact on every industry and research discipline.  ...  Deep neural networks, along with advancements in classical machine learning and scalable general-purpose graphics processing unit (GPU) computing, have become critical components of artificial intelligence  ...  Dask and Apache Spark [27] provide abstractions for both data frames and multidimensional arrays that can scale to multiple nodes.  ... 
doi:10.3390/info11040193 fatcat:hetp7ngcpbbcpkhdcyowuiiwxe

Distributed Flexible Nonlinear Tensor Factorization [article]

Shandian Zhe, Kai Zhang, Pengyuan Wang, Kuang-chih Lee, Zenglin Xu, Yuan Qi, Zoubin Ghahramani
2016 arXiv   pre-print
as SPARK.  ...  Based on the new bound, we develop a distributed inference algorithm in the MapReduce framework, which is key-value-free and can fully exploit the memory cache mechanism in fast MapReduce systems such  ...  Both GigaTensor and DinTucker are developed on HADOOP, while InfTuckerEx uses online inference. Our model was implemented on SPARK.  ... 
arXiv:1604.07928v2 fatcat:hd5dlvgeizeubf5iqqzzrbrogq

The Best of the Machine Learning Algorithms Used in Artificial Intelligence

Indrasen Poola
2017 Figshare  
Clustering methods are typically organized by the modelling approaches such as centroid-based and hierarchal.  ...  The best suited algorithms are Naive Bayes classifier, Averaged One-Dependence Estimators (AODE) and Bayesian Belief Network (BBN).  ... 
doi:10.6084/m9.figshare.5615581.v1 fatcat:wpzi3n6bsvajhh5k27axn23gpm
« Previous Showing results 1 — 15 out of 253 results