9 Hits in 3.3 sec

ALOJA: A Framework for Benchmarking and Predictive Analytics in Hadoop Deployments

Josep Lluis Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green
2017 IEEE Transactions on Emerging Topics in Computing  
In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.  ...  This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning.  ...  This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.  ... 
doi:10.1109/tetc.2015.2496504 fatcat:7kpa5wvwfzfs3jtd6aqjfbq5du


Josep Lluís Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green
2015 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15  
In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big  ...  The resulting performance models can be used to forecast execution behavior of various workloads; they allow 'a-priori' prediction of the execution times for new configurations and HW choices and they  ...  INTRODUCTION Hadoop has emerged as the de-facto framework for Big Data processing deployment [2] [19] and its adoption continues at a compound annual growth rate of 58% [12] .  ... 
doi:10.1145/2783258.2788600 dblp:conf/kdd/BerralPCCRG15 fatcat:3y7pnkbwxvbzjodjfwhm4ckjla

ALOJA: A Benchmarking and Predictive Platform for Big Data Performance Analysis [chapter]

Nicolas Poggi, Josep Ll. Berral, David Carrera
2016 Lecture Notes in Computer Science  
The main goals of the ALOJA research project from BSC-MSR, are to explore and automate the characterization of cost-effectiveness of Big Data deployments.  ...  The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and webbased analytic tools  ...  Acknowledgements This work is partially supported the BSC-Microsoft Research Centre, the Spanish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the  ... 
doi:10.1007/978-3-319-49748-8_4 fatcat:lgzpi3vmabfbbb7vw7r6otrogq

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Nicolas Poggi, David Carrera, Aaron Call, Sergio Mendoza, Yolanda Becerra, Jordi Torres, Eduard Ayguade, Fabrizio Gagliardi, Jesus Labarta, Rob Reinauer, Nikola Vujic, Daron Green (+1 others)
2014 2014 IEEE International Conference on Big Data (Big Data)  
While during the last 5 years, Hadoop has become the de-facto platform for Big Data deployments, still little is understood of how the different layers of the software and hardware deployment options affects  ...  This article presents the ALOJA project, an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results.  ...  ACKNOWLEDGEMENTS This work is partially supported by the Ministry of Science and Technology of Spain under contracts TIN2012-34557 and 2014SGR1051.  ... 
doi:10.1109/bigdata.2014.7004322 dblp:conf/bigdataconf/PoggiCCMBTAGLRVGB14 fatcat:mwznaa3urrcplbsmssebw2ppii

Database Integrated Analytics Using R: Initial Experiences with SQL-Server + R

Josep Ll. Berral, Nicolas Poggi
2016 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)  
In this work we discuss some data-flow schemes for porting a local DB + analytics engine architecture towards Big Data, focusing specially on the new DB Integrated Analytics approach, and commenting the  ...  Here we show a first taste of such technology by testing the portability of our ALOJA-ML analytics framework, coded in R, to Microsoft SQL-Server 2016, one of the SQL+R solutions released recently.  ...  The ALOJA-ML framework is a collection of predictive analytics functions (machine learning and data mining), written in R, originally purposed for modeling and prediction High Performance Computing (HPC  ... 
doi:10.1109/icdmw.2016.0009 dblp:conf/icdm/BerralP16 fatcat:izibngwkhnf2xanzxcrcs3kbwi

Designing and implementing a Big Data benchmark in a financial context: application to a cash management use case

Lilia Sfaxi, Mohamed Mehdi Ben Aissa
2021 Computing  
The performance results collected with BABEL for the cash management use case enables to define the right tradeoffs in terms of consistency and availability, in a way that respects the service level agreements  ...  This paper details the steps followed to benchmark a cash management platform of an investment bank using a generic benchmarking solution called BABEL.  ...  Getting started with BABEL In order to use this framework to benchmark a Big Data architecture, we recommend the following approach : 1.  ... 
doi:10.1007/s00607-021-00933-x fatcat:myqri224vfbsnd4omulyhxr5zy

Automatic Generation of Workload Profiles Using Unsupervised Learning Pipelines

David Buchaca Prats, Josep Lluis Berral, David Carrera
2018 IEEE Transactions on Network and Service Management  
CRBMs can be used to map a given given historic window of trace behaviour into a single vector.  ...  We use these methods to find phases of similar behaviour in the workloads.  ...  The datasets comprise a mix of Big Data workloads involving Hadoop and Spark applications extracted from two wellestablished benchmarks: HiBench and TPCx-BB (BigBench).  ... 
doi:10.1109/tnsm.2017.2786047 fatcat:idlfldnoxnbufjnu2si4dljj4q

Efficient and high-performance data orchestration for large scale cloud workloads

Shouwei Chen
Data analytics generates a large amount of intermediate data at the back of cloud computing frameworks while processing large amounts of data from different data sources.  ...  Consequently, the revolution of hardware devices requires a new paradigm for data orchestration for cloud computing frameworks. This thesis address [...]  ...  frameworks for big data analytics.  ... 
doi:10.7282/t3-ckq3-tw41 fatcat:3m3fpfhbnnat5ar7ri5y5l3uvm

Benchmarking dataflow systems for scalable machine learning

Christoph Boden, Technische Universität Berlin, Technische Universität Berlin, Volker Markl
In this thesis, we present work on all crucial building blocks for a benchmark of distributed data processing systems for scalable machine learning including extensive experimental evaluations of distributed  ...  However, it remains an open question how efficient they perform at this task and how to adequately evaluate and benchmark these systems for scalable machine learning workloads in general.  ...  Another somewhat related initiative is ALOJA [96] , a Big Data Benchmark Repository and platform for performance analysis.  ... 
doi:10.14279/depositonce-7532 fatcat:mwjd6bnzvjaknbjnfohjokkj4y