Filters








24 Hits in 1.1 sec

SparkBench – A Spark Performance Testing Suite [chapter]

Dakshi Agrawal, Ali Butt, Kshitij Doshi, Josep-L. Larriba-Pey, Min Li, Frederick R Reiss, Francois Raab, Berni Schiefer, Toyotaro Suzumura, Yinglong Xia
2016 Lecture Notes in Computer Science  
Recognizing the need for such comprehensive and agile testing, this paper proposes going beyond existing performance tests for Spark and creating an expanded Spark performance testing suite.  ...  Its rapid adoption therefore calls for a performance assessment suite that supports agile development, measurement, validation, optimization, configuration, and deployment decisions across a broad range  ...  performance testing suite.  ... 
doi:10.1007/978-3-319-31409-9_3 fatcat:koanet7fdfdfnegkhujk3pj27q

SMConf: One-Size-Fit-Bunch, Automated Memory Capacity Configuration for In-memory Data Analytic Platform

Yi Liang, Shaokang Zeng, Xiaoxian Xu, Shilu Chang, Xing Su
2021 Computers Materials & Continua  
Spark is the most popular in-memory processing framework for big data analytics. Memory is the crucial resource for workloads to achieve performance acceleration on Spark.  ...  Experimental results demonstrate that, compared to the conservative default configuration, SMConf can reduce the memory resource provision to Spark workloads by up to 69% with the slight performance degradation  ...  We evaluate SMConf on Spark with representative workloads from SparkBench suite.  ... 
doi:10.32604/cmc.2020.012513 fatcat:hspzysshtzel3djkq4jygt23wa

Big Data Benchmark Compendium [chapter]

Todor Ivanov, Tilmann Rabl, Meikel Poess, Anna Queralt, John Poelman, Nicolas Poggi, Jeffrey Buell
2016 Lecture Notes in Computer Science  
This document provides a summary of existing benchmarks and those that are in development, gives a side-by-side comparison of their characteristics and discusses their pros and cons.  ...  Also with the combinations of large volumes of data, heterogeneous data formats and the changing processing velocity, it becomes complex to specify an architecture which best suits all application requirements  ...  Acknowledgment This research has been supported by the Research Group of the Standard Performance Evaluation Corporation (SPEC).  ... 
doi:10.1007/978-3-319-31409-9_9 fatcat:n7lwtxainnblpf2xp4c5o2eynq

A Survey of Benchmarks to Evaluate Data Analytics for Smart-* Applications [article]

Athanasios Kiatipis, Alvaro Brandon, Rizkallah Touma, Pierre Matri, Michal Zasadzinski, Linh Thuy Nhuyen, Adrien Lebre, Alexandru Costan
2019 arXiv   pre-print
How to assess the performance of such a complex stack, when faced with the specifics of Applications, remains an open research question.  ...  Afterwards, for each of these requirements, there is a description of the benchmarks one can use to precisely evaluate the performance of the underlying systems and technologies.  ...  SparkBench [67] SparkBench is a benchmarking suite specifically for Apache Spark.  ... 
arXiv:1910.02004v1 fatcat:l2bghlqczffspfzbfnx222gdvy

Feasibility analysis of AsterixDB and Spark streaming with Cassandra for stream-based processing

Pekka Pääkkönen
2016 Journal of Big Data  
Particularly, feasibility of a Big Data management system for semi-structured data (AsterixDB) will be compared to Spark streaming, which has been integrated with Cassandra NoSQL database for persistence  ...  Twitter has created a big data streaming architecture, which is able to serve and process thousands of tweets in a second [3] [4] [5] .  ...  Additionally, the author acknowledges feedback received from subscribers of Apache AsterixDB users mailing list, and DataStax's Spark connector for Cassandra mailing list.  ... 
doi:10.1186/s40537-016-0041-8 fatcat:r6gosoffnngtdkphlodzxunuem

Artificial neural networks based techniques for anomaly detection in Apache Spark

Ahmad Alnafessah, Giuliano Casale
2019 Cluster Computing  
Apache Spark is widely adopted by industry because of its speed and generality, however there is still a shortage of comprehensive performance anomaly detection methods applicable to this platform.  ...  Late detection and manual resolutions of performance anomalies in Cloud Computing and Big Data systems may lead to performance violations and financial penalties.  ...  The specifications for these servers are as follows: Workload generation SparkBench provides workload suites that include a collection of workloads that can be run either serially or in parallel [  ... 
doi:10.1007/s10586-019-02998-y fatcat:fgfp27k4xfbrpf2vv4qlmuovme

Big data analytics on Apache Spark

Salman Salloum, Ruslan Dautov, Xiaojun Chen, Patrick Xiaogang Peng, Joshua Zhexue Huang
2016 International Journal of Data Science and Analytics  
In this paper, we present a technical review on big data analytics using Apache Spark. This review focuses on the key components, abstractions and features of Apache Spark.  ...  It is a general-purpose cluster computing framework with language-integrated APIs in Scala, Java, Python and R.  ...  - Spark SQL Performance Tests 54 : a performance testing framework from Databricks for Spark SQL in Apache Spark 1.6+.  ... 
doi:10.1007/s41060-016-0027-9 dblp:journals/ijdsa/SalloumD0PH16 fatcat:gtzw3aqupnhxvcjbefovrnfhne

Big Data Methodologies, Tools And Infrastructures

Kim Hee, Todor Ivanov, Roberto V. Zicari, Rut Waldenfels, Hevin Özmen, Naveed Mushtaq, Minsung Hong, Tharsis Teoh, Rajendra Akerkar
2018 Zenodo  
The transportation industry is a leader in creating the so-called Internet of Everything.  ...  It also looks at how these technologies are applied to build a Big Data Platf [...]  ...  SparkBench SparkBench 1 (version 2.0) is another Spark specific benchmark suite developed by IBM, which provides representative workloads in four categories as listed in Table 9 .  ... 
doi:10.5281/zenodo.1465539 fatcat:mkad5yu2tnfw7fdi3xqcermac4

Benchmarking Graph Data Management and Processing Systems: A Survey [article]

Miyuru Dayarathna, Toyotaro Suzumura
2021 arXiv   pre-print
The development of scalable, representative, and widely adopted benchmarks for graph data systems have been a question for which answers has been sought for decades.  ...  They tested the benchmark suite on VMs on cloud in addition to testing them locally.  ...  Similar to SPARKBENCH [67] , BigDataBench used Google Web Graph dataset. BigOP is a benchmarking framework which allows for running comprehensive performance testing [126] .  ... 
arXiv:2005.12873v4 fatcat:jh3367b4vjaqbgyvaccjnxqjfi

Benchmarking Distributed Stream Data Processing Systems

Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, Volker Markl
2018 2018 IEEE 34th International Conference on Data Engineering (ICDE)  
We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink.  ...  Third, we build the first benchmarking framework to define and test the sustainable performance of streaming systems.  ...  We use our suite to evaluate the performance of three widely used SDPSs in detail, namely Apache Storm, Apache Spark, and Apache Flink.  ... 
doi:10.1109/icde.2018.00169 dblp:conf/icde/KarimovRKSHM18 fatcat:yfvlfvsgvzaj7opgqin6cudxzu

Designing and implementing a Big Data benchmark in a financial context: application to a cash management use case

Lilia Sfaxi, Mohamed Mehdi Ben Aissa
2021 Computing  
The performance results collected with BABEL for the cash management use case enables to define the right tradeoffs in terms of consistency and availability, in a way that respects the service level agreements  ...  This paper details the steps followed to benchmark a cash management platform of an investment bank using a generic benchmarking solution called BABEL.  ...  PigMix [13] tracks the performance of the Pig query processor and SparkBench [14] targets all layers of the Spark framework.  ... 
doi:10.1007/s00607-021-00933-x fatcat:myqri224vfbsnd4omulyhxr5zy

Mission possible: Unify HPC and Big Data stacks towards application-defined blobs at the storage layer

Pierre Matri, Yevhen Alforov, Álvaro Brandon, María S. Pérez, Alexandru Costan, Gabriel Antoniu, Michael Kuhn, Philip Carns, Thomas Ludwig
2018 Future generations computer systems  
This motivates a global move towards dropping file-based, POSIX-IO compliance systems.  ...  The storage layer offers opportunities for convergence, as the challenges associated with HPC and Big Data storage are similar: trading versatility for performance.  ...  Experiments presented in this paper were carried out using the Grid'5000 testbed, supported by a scientific interest group hosted by Inria and including CNRS, RENATER, and several universities and organizations  ... 
doi:10.1016/j.future.2018.07.035 fatcat:2lr4a4t34nau3e3pkgrpmfcfu4

Benchmarking Distributed Stream Processing Engines [article]

Jeyhun Karimov, Tilmann Rabl, Asterios Katsifodimos, Roman Samarev, Henri Heiskanen, Volker Markl
2018 arXiv   pre-print
Third, we build the first driver to test the actual sustainable performance of a system under test.  ...  In this paper, we propose a framework to evaluate the performance of three SDPSs, namely Apache Storm, Apache Spark, and Apache Flink.  ...  HiBench [17] was the first benchmark suite to evaluate and characterize the performance of Hadoop and it was later extended with a streaming component [30] .  ... 
arXiv:1802.08496v1 fatcat:zfdxcy3eajcv7eqafymethtvmq

Sequence-to-sequence models for workload interference prediction on batch processing datacenters

David Buchaca, Joan Marcual, Josep LLuis Berral, David Carrera
2020 Future generations computer systems  
The methods here presented are validated using High Performance Computing benchmarks based on different frameworks (like Hadoop and Spark) and applications (CPU bound, IO bound, machine learning, SQL queries  ...  Co-scheduling of jobs in data-centers is a challenging scenario, where jobs can compete for resources yielding to severe slowdowns or failed executions.  ...  The workloads used have been extracted from different suites: HiBench [6] , Spark-perf [8] and SparkBench [7] .  ... 
doi:10.1016/j.future.2020.03.058 fatcat:vw33tgjwdjfahfxfq7crf5fpqe

TextBenDS: a generic Textual data Benchmark for Distributed Systems [article]

Ciprian-Octavian Truica, Ira Assent
2021 arXiv   pre-print
Our benchmark offers a generic data model designed with a multidimensional approach for storing text documents.  ...  The weights are usually computed in the data preprocessing step, as they are costly to update and keep track of all the modifications performed on the dataset.  ...  SparkBench [1, 25] is a micro-benchmark suite developed specifically to stress test the capabilities of Spark on Machine Learning and Graph Computation tasks, rather that text preprocessing and computing  ... 
arXiv:2108.05689v1 fatcat:ozt5dleqnnattgwp3ic5w4uxga
« Previous Showing results 1 — 15 out of 24 results