12 Hits in 4.5 sec

ALOJA: A Benchmarking and Predictive Platform for Big Data Performance Analysis [chapter]

Nicolas Poggi, Josep Ll. Berral, David Carrera
2016 Lecture Notes in Computer Science  
The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectiveness of Big Data deployments.  ...  The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and webbased analytic tools  ...  Acknowledgements This work is partially supported the BSC-Microsoft Research Centre, the Spanish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the  ... 
doi:10.1007/978-3-319-49748-8_4 fatcat:lgzpi3vmabfbbb7vw7r6otrogq


Josep Lluís Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green
2015 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15  
This article presents ALOJA-Machine Learning (ALOJA-ML) an extension to the ALOJA project that uses machine learning techniques to interpret Hadoop benchmark performance data and performance tuning; here  ...  The resulting performance models can be used to forecast execution behavior of various workloads; they allow 'a-priori' prediction of the execution times for new configurations and HW choices and they  ...  All the data-sets collected for ALOJA and ALOJA-ML are public, and can be explored through our framework or used as data-sets in other data analysis platforms. 3.  ... 
doi:10.1145/2783258.2788600 dblp:conf/kdd/BerralPCCRG15 fatcat:3y7pnkbwxvbzjodjfwhm4ckjla

ALOJA: A Framework for Benchmarking and Predictive Analytics in Hadoop Deployments

Josep Lluis Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green
2017 IEEE Transactions on Emerging Topics in Computing  
This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning.  ...  In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.  ...  This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.  ... 
doi:10.1109/tetc.2015.2496504 fatcat:7kpa5wvwfzfs3jtd6aqjfbq5du

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Nicolas Poggi, David Carrera, Aaron Call, Sergio Mendoza, Yolanda Becerra, Jordi Torres, Eduard Ayguade, Fabrizio Gagliardi, Jesus Labarta, Rob Reinauer, Nikola Vujic, Daron Green (+1 others)
2014 2014 IEEE International Conference on Big Data (Big Data)  
While during the last 5 years, Hadoop has become the de-facto platform for Big Data deployments, still little is understood of how the different layers of the software and hardware deployment options affects  ...  As few organizations have the time or performance profiling expertise, we expect our growing repository will benefit Hadoop customers to meet their Big Data application needs.  ...  ACKNOWLEDGEMENTS This work is partially supported by the Ministry of Science and Technology of Spain under contracts TIN2012-34557 and 2014SGR1051.  ... 
doi:10.1109/bigdata.2014.7004322 dblp:conf/bigdataconf/PoggiCCMBTAGLRVGB14 fatcat:mwznaa3urrcplbsmssebw2ppii

Designing and implementing a Big Data benchmark in a financial context: application to a cash management use case

Lilia Sfaxi, Mohamed Mehdi Ben Aissa
2021 Computing  
This paper details the steps followed to benchmark a cash management platform of an investment bank using a generic benchmarking solution called BABEL.  ...  The performance results collected with BABEL for the cash management use case enables to define the right tradeoffs in terms of consistency and availability, in a way that respects the service level agreements  ...  STEP 3.1-design of BABEL BABEL, 1 the Big dAta BEnchmarking pLatform, is a generic, scalable, distributed and end-to-end benchmarking platform for Big Data architectures.  ... 
doi:10.1007/s00607-021-00933-x fatcat:myqri224vfbsnd4omulyhxr5zy

Automatic Generation of Workload Profiles Using Unsupervised Learning Pipelines

David Buchaca Prats, Josep Lluis Berral, David Carrera
2018 IEEE Transactions on Network and Service Management  
CRBMs can be used to map a given given historic window of trace behaviour into a single vector.  ...  Furthermore, given the different amount of scenarios and applications, automation is required. Here we examine and model application behavior by finding behavior phases.  ...  The datasets comprise a mix of Big Data workloads involving Hadoop and Spark applications extracted from two wellestablished benchmarks: HiBench and TPCx-BB (BigBench).  ... 
doi:10.1109/tnsm.2017.2786047 fatcat:idlfldnoxnbufjnu2si4dljj4q

ATCS: Auto-Tuning Configurations of Big Data Frameworks Based on Generative Adversarial Nets

Mingyu Li, Zhiqiang Liu, Xuanhua Shi, Hai Jin
2020 IEEE Access  
Building performance-predicting models for big data frameworks is challenging for several reasons: (1) the significant time required to collect training data and (2) the poor accuracy of the prediction  ...  ATCS can build a performance prediction model with less training data and without sacrificing model accuracy.  ...  The PA-based method captures performance characteristics using a fine-grained analysis of the run-time state of the program, and creates a simulator to simulate the job-execution process and predict performance  ... 
doi:10.1109/access.2020.2979812 fatcat:aownx2kmxvcjlp5gx5otahigz4

Learning-based Automatic Parameter Tuning for Big Data Analytics Frameworks [article]

Liang Bao, Xin Liu, Weizhao Chen
2018 arXiv   pre-print
Big data analytics frameworks (BDAFs) have been widely used for data processing applications.  ...  AutoTune is implemented and evaluated using the Spark framework and HiBench benchmark deployed on a public cloud.  ...  For instance, RFHOC uses random forests for performance prediction and a genetic algorithm to search for the Hadoop configuration space [17] ; ALOJA-ML [18] identifies key performance properties of  ... 
arXiv:1808.06008v1 fatcat:z75qce5esfcvxooborqvcggwr4

Unsupervised learning for vascular heterogeneity assessment of glioblastoma based on magnetic resonance imaging: The Hemodynamic Tissue Signature [article]

Javier Juan-Albarracín
2020 arXiv   pre-print
& Nuclear Medicine, Machine Learning and Data Mining and Biomedical Engineering.  ...  means of perfusion MRI analysis.  ...  CR4 ONCOhabitats ( platform encapsulates all the original methods and algorithms developed in this thesis, and several state-ofthe-art algorithms for medical image analysis  ... 
arXiv:2009.06288v1 fatcat:dum2y7fuuve73lxbb2any6iak4

Benchmarking dataflow systems for scalable machine learning

Christoph Boden, Technische Universität Berlin, Technische Universität Berlin, Volker Markl
However, it remains an open question how efficient they perform at this task and how to adequately evaluate and benchmark these systems for scalable machine learning workloads in general.  ...  In this thesis, we present work on all crucial building blocks for a benchmark of distributed data processing systems for scalable machine learning including extensive experimental evaluations of distributed  ...  Another somewhat related initiative is ALOJA [96] , a Big Data Benchmark Repository and platform for performance analysis.  ... 
doi:10.14279/depositonce-7532 fatcat:mwjd6bnzvjaknbjnfohjokkj4y

Efficient and high-performance data orchestration for large scale cloud workloads

Shouwei Chen
The data orchestration based on memory and high-performance storage devices has become a key concern to optimize these cloud computing frameworks' performance.  ...  However, providing an efficient and high-performant storage layer for large-scale computing frameworks, such as intermediate data storage and shuffle data storage, is still challenging.  ...  A.4 Discussion This chapter provided a detailed evaluation of performance, power and resource utilization behaviors trends of Hadoop and Spark using a relevant set of Big Data benchmarks and different  ... 
doi:10.7282/t3-ckq3-tw41 fatcat:3m3fpfhbnnat5ar7ri5y5l3uvm

Sáenz 557-Jesús María Teléf

Claudio Luis, Liñán Cervantes, Vicerrector, Lazo Jorge, Facultad Manrique, Ingeniería De, De Sistemas, José Ugaz Burga, Msc Santiago, Raúl Gonzales Sánchez, Lic Cipriano, Torres Guerra
This paper presents a quantitative comparison, time evaluation performance, between two equivalent versions of a sale system, one developed according to the principles of OOP in C++ and another developed  ...  In this context, Notification-Oriented Paradigm (NOP) presents an alternative for those issues.  ...  media and modeling of a device to mitigate or utilize the dispersion effect for the polarization mode.  ...