36 Hits in 1.9 sec

An Intelligent Distributed K-means Algorithm over Cloudera /Hadoop

Tawseef Ayoub Shaikh, Umar Badr Shafeeque, Maksud Ahamad
2018 International Journal of Education and Management Engineering  
The 21 st century evolved with tsunami of data generation by the human civilization that has delivered new words like Big Data to the world of vocabulary.  ...  The given algorithm is evaluated using Speedup, Scale up and Size up parameters and it neatly performed better as the size of the input data gets increased.  ...  (a) . Parallel K-Means has a very good speedup performance depicted from the results. Alternatively, as dataset size rises, performance of speedup gets better.  ... 
doi:10.5815/ijeme.2018.04.06 fatcat:x5mn5rskeff23pjn46xu5a5fb4

Big Data Based Dynamic Flow Aggregation over 5G Network Slicing

2017 KSII Transactions on Internet and Information Systems  
We conducted experiments, using a dataset of up to 100,000 flows, and studied the performance of our algorithm analytically.  ...  The number of devices connected to the IoT and hence the number of traffic flow increases continuously, as well as the emergence of new applications.  ...  Sizeup Evaluation Sizeup metric validates how much longer the parallel algorithm takes to perform aggregation on a given fixed number of nodes, when the size of the data flow set is larger than the original  ... 
doi:10.3837/tiis.2017.10.003 fatcat:kl3migtz7rbfvfav7mr7icznjy

Implementing a Parallel Image Edge Detection Algorithm Based on the Otsu-Canny Operator on the Hadoop Platform

Jianfang Cao, Lichao Chen, Min Wang, Yun Tian
2018 Computational Intelligence and Neuroscience  
To improve the runtime and edge detection performance of the Canny operator, in this paper, we propose a parallel design and implementation for an Otsu-optimized Canny operator using a MapReduce parallel  ...  The parallel approach reduced the running time by approximately 67.2% on a Hadoop cluster architecture consisting of 5 nodes with a dataset of 60,000 images.  ...  Therefore, the approach proposed in this study obtains a good sizeup performance.  ... 
doi:10.1155/2018/3598284 pmid:29861711 pmcid:PMC5971336 fatcat:4imkhl4ksnbkhbrlgfiqod43ia

A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification

Jianfang Cao, Lichao Chen, Min Wang, Hao Shi, Yun Tian
2016 Scientific Reports  
This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model.  ...  Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup.  ...  To verify the classification performance, experiments are performed to compare the following three aspects: classification accuracy, running time, speedup, sizeup and scaleup.  ... 
doi:10.1038/srep38201 pmid:27905520 pmcid:PMC5131302 fatcat:xfziaow5czcxhn4rc7bpsjivmu

A MapReduce-Based Parallel Frequent Pattern Growth Algorithm for Spatiotemporal Association Analysis of Mobile Trajectory Big Data

Dawen Xia, Xiaonan Lu, Huaqing Li, Wendong Wang, Yantao Li, Zili Zhang
2018 Complexity  
To conquer these challenges, this paper presents a MapReduce-based Parallel Frequent Pattern growth (MR-PFP) algorithm to analyze the spatiotemporal characteristics of taxi operating using large-scale  ...  taxi trajectories with massive small file processing strategies on a Hadoop platform.  ...  Sizeup. The sizeup metric measures how much longer the parallel algorithm takes on a given node, when the size of datasets is m-times larger than the original dataset.  ... 
doi:10.1155/2018/2818251 fatcat:hzh45ggo6vek7jkbw2h2upb6hu

Efficient Approaches for Solving the Large-Scale k-medoids Problem

Alessio Martino, Antonello Rizzi, Fabio Massimo Frattale Mascioli
2017 Proceedings of the 9th International Joint Conference on Computational Intelligence  
update problem; the latter based on a scan and replacement procedure.  ...  We implemented and tested our approach using the Apache Spark framework for parallel and distributed processing on several datasets of increasing dimensions, both in terms of patterns and dimensionality  ...  Figure 3 (a) depicts the Approximate Medoid Tracking case which has a very good sizeup performances, especially for m ≥ 3 (e.g. a 4-times larger dataset needs from 1.9 to 5.5 times more time).  ... 
doi:10.5220/0006515003380347 dblp:conf/ijcci/MartinoRM17 fatcat:i4s6thpz7jevzevdmrcfrpfoyq

SPRINT: A Scalable Parallel Classifier for Data Mining

John C. Shafer, Rakesh Agrawal, Manish Mehta
1996 Very Large Data Bases Conference  
We present a new decision-tree-based classification algorithm, called SPRINT that removes all of the memory restrictions, and is fast and scalable.  ...  The algorithm has also been designed to be easily parallelized, allowing many processors to work together to build a single consistent model.  ...  The result is superior sizeup performance.  ... 
dblp:conf/vldb/ShaferAM96 fatcat:mzef57ppdjalrmaxsb7fxdu5sq

A New Approach for Large-Scale Scene Image Retrieval Based on Improved Parallelk-Means Algorithm in MapReduce Environment

Jianfang Cao, Min Wang, Hao Shi, Guohua Hu, Yun Tian
2016 Mathematical Problems in Engineering  
(speedup and efficiency, sizeup, and scaleup), which is a significant improvement from applying parallel processing to intelligent algorithms with large-scale datasets.  ...  Second, we presented a parallel design and realization method for improvedk-Means algorithm applied it to feature clustering of scene images.  ...  Therefore, the proposed method in this paper has a good sizeup performance.  ... 
doi:10.1155/2016/3593975 fatcat:6snpevuyxjclxgvtbmc7djbh4q

A scalable and effective rough set theory-based approach for big data pre-processing

Zaineb Chelly Dagdia, Christine Zarges, Gaël Beck, Mustapha Lebbah
2020 Knowledge and Information Systems  
A big challenge in the knowledge discovery process is to perform data pre-processing, specifically feature selection, on a large amount of data and high dimensional attribute set.  ...  sacrificing performance.  ...  A parallel algorithm with a linear sizeup has a very good sizeup performance: Considering a problem that is m times larger than a baseline problem, the algorithm requires in the order of m times more runtime  ... 
doi:10.1007/s10115-020-01467-y fatcat:4tsm4qlsdrddpnm3k5dzg2f7gi

FDR2-BD: A Fast Data Reduction Recommendation Tool for Tabular Big Data Classification Problems

María José Basgall, Marcelo Naiouf, Alberto Fernández
2021 Electronics  
An extensive experimental study is performed over 25 big datasets with different characteristics.  ...  Another significant capability is being fast and scalable by using fully optimized parallel operations provided by Apache Spark.  ...  The latter builds a whole set of new features from combining the original ones.  ... 
doi:10.3390/electronics10151757 fatcat:pzkwj5heq5fabna5eiaaqspcnu

Clustering of Association Rules for Big Datasets using Hadoop MapReduce

Salahadin A. Moahmmed, Mohamed A., El-Sayed M.
2021 International Journal of Advanced Computer Science and Applications  
In this paper, we are proposing a novel parallel association rule clustering approach which is based on Hadoop MapReduce.  ...  We ran many experiments to study the performance of the proposed approach, and promising results have been demonstrated, e.g. the lowest scaleup was 77%.  ...  Evaluation Metrics The proposed algorithms were evaluated using four performance measures, namely elapsed time, speedup, scaleup, and sizeup.  ... 
doi:10.14569/ijacsa.2021.0120364 fatcat:id2qnadaf5ef7lolgldw64ewiy

CFM-BD: a distributed rule induction algorithm for building Compact Fuzzy Models in Big Data classification problems

Mikel Elkano, Jose Antonio Antonio Sanz, Edurne Barrenechea, Humberto Bustince, Mikel Galar
2019 IEEE transactions on fuzzy systems  
We conducted a complete empirical study to test the performance of our approach in terms of accuracy, complexity, and runtime.  ...  In this paper, we propose a new distributed learning algorithm named CFM-BD to construct accurate and compact fuzzy rule-based classification systems for Big Data.  ...  metrics used to evaluate distributed systems, i.e., speedup, sizeup, and scaleup [29] , [30] (Section V-B).  ... 
doi:10.1109/tfuzz.2019.2900856 fatcat:gzzogkxgcvfbbjq3mre6lhr43e

DistLODStats: Distributed Computation of RDF Dataset Statistics

Gezim Sejdiu, Ivan Ermilov, Jens Lehmann, Mohamed Nadjib Mami
2018 Zenodo  
However, those usually ere deficiencies in terms of performance once the dataset size grows he capabilities of a single machine.  ...  In e are already a number of tools, which offer such statistics, providing ormation about RDF datasets and vocabularies.  ...  It allows performing coarse-grained operations over voluminous datasets in a distributed manner in parallel. It extends earlier efforts in the area such as Hadoop MapReduce.  ... 
doi:10.5281/zenodo.3567965 fatcat:24tntp6einggrjawhjwo5c5aj4

A One-Pass Algorithm for Accurately Estimating Quantiles for Disk-Resident Data

Khaled Alsabti, Sanjay Ranka, Vineet Singh
1997 Very Large Data Bases Conference  
In this paper, we present a new algorithm for estimating the quantile values for disk-resident data.  ...  priori knowledge of the distribution of the data set; (5) It has a scalable parallel formulation; (6) Extra time and memory for computing additional quantiles (beyond the first one) are constant per quantile  ...  In this paper, we present a new algorithm OPAQ for estimating the quantiles. The OPAQ algorithm has the following characteristics: It has a scalable parallel formulation.  ... 
dblp:conf/vldb/AlsabtiRS97 fatcat:mi2pjmh5andvxmxbe4hkroxcqe

A Scalable Framework for Quality Assessment of RDF Datasets [article]

Gezim Sejdiu, Anisa Rula, Jens Lehmann, Hajira Jabeen
2020 arXiv   pre-print
We also provide a quality assessment pattern that can be used to generate new scalable metrics that can be applied to big data.  ...  There exist a few approaches for the quality assessment of Linked Data, but their performance degrades with the increase in data size and quickly grows beyond the capabilities of a single machine.  ...  Speedup S is an important metric to evaluate a parallel algorithm.  ... 
arXiv:2001.11100v1 fatcat:azwjqvmwu5bzvlgcqrjaoaik54
« Previous Showing results 1 — 15 out of 36 results