Filters








9 Hits in 5.2 sec

ALOJA: A Framework for Benchmarking and Predictive Analytics in Hadoop Deployments

Josep Lluis Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green
2017 IEEE Transactions on Emerging Topics in Computing  
In addition, the community can benefit from ALOJA data-sets and framework to improve the design and deployment of Big Data applications.  ...  This article presents the ALOJA project and its analytics tools, which leverages machine learning to interpret Big Data benchmark performance data and tuning.  ...  This work is partially supported by the Ministry of Economy of Spain under contracts TIN2012-34557 and 2014SGR1051.  ... 
doi:10.1109/tetc.2015.2496504 fatcat:7kpa5wvwfzfs3jtd6aqjfbq5du

ALOJA-ML

Josep Lluís Berral, Nicolas Poggi, David Carrera, Aaron Call, Rob Reinauer, Daron Green
2015 Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD '15  
In addition to learning from the methodology presented in this work, the community can benefit in general from ALOJA data-sets, framework, and derived insights to improve the design and deployment of Big  ...  Hadoop presents a complex execution environment, where costs and performance depends on a large number of software (SW) configurations and on multiple hardware (HW) deployment choices.  ...  Contribution In ALOJA-ML we aim to provide 1) a useful framework for Hadoop users and researchers characterize and address configuration and performance issues; 2) data-sets of Hadoop experimentation and  ... 
doi:10.1145/2783258.2788600 dblp:conf/kdd/BerralPCCRG15 fatcat:3y7pnkbwxvbzjodjfwhm4ckjla

ALOJA: A Benchmarking and Predictive Platform for Big Data Performance Analysis [chapter]

Nicolas Poggi, Josep Ll. Berral, David Carrera
2016 Lecture Notes in Computer Science  
The development of the project over its first year, has resulted in a open source benchmarking platform, an online public repository of results with over 42,000 Hadoop job runs, and webbased analytic tools  ...  This article describes the evolution of the project's focus and research lines from over a year of continuously benchmarking Hadoop under different configuration and deployments options, presents results  ...  Acknowledgements This work is partially supported the BSC-Microsoft Research Centre, the Spanish Ministry of Education (TIN2012-34557), the MINECO Severo Ochoa Research program (SEV-2011-0067) and the  ... 
doi:10.1007/978-3-319-49748-8_4 fatcat:lgzpi3vmabfbbb7vw7r6otrogq

ALOJA: A systematic study of Hadoop deployment variables to enable automated characterization of cost-effectiveness

Nicolas Poggi, David Carrera, Aaron Call, Sergio Mendoza, Yolanda Becerra, Jordi Torres, Eduard Ayguade, Fabrizio Gagliardi, Jesus Labarta, Rob Reinauer, Nikola Vujic, Daron Green (+1 others)
2014 2014 IEEE International Conference on Big Data (Big Data)  
This article presents the ALOJA project, an initiative to produce mechanisms for an automated characterization of cost-effectiveness of Hadoop deployments and reports its initial results.  ...  for Hadoop.  ...  ACKNOWLEDGEMENTS This work is partially supported by the Ministry of Science and Technology of Spain under contracts TIN2012-34557 and 2014SGR1051.  ... 
doi:10.1109/bigdata.2014.7004322 dblp:conf/bigdataconf/PoggiCCMBTAGLRVGB14 fatcat:mwznaa3urrcplbsmssebw2ppii

Database Integrated Analytics Using R: Initial Experiences with SQL-Server + R

Josep Ll. Berral, Nicolas Poggi
2016 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)  
Here we show a first taste of such technology by testing the portability of our ALOJA-ML analytics framework, coded in R, to Microsoft SQL-Server 2016, one of the SQL+R solutions released recently.  ...  In this work we discuss some data-flow schemes for porting a local DB + analytics engine architecture towards Big Data, focusing specially on the new DB Integrated Analytics approach, and commenting the  ...  All of this using the ALOJA-ML framework as reference, a framework written in R dedicated to model, predict and classify data from Hadoop executions, stored as the ALOJA data-set.  ... 
doi:10.1109/icdmw.2016.0009 dblp:conf/icdm/BerralP16 fatcat:izibngwkhnf2xanzxcrcs3kbwi

Designing and implementing a Big Data benchmark in a financial context: application to a cash management use case

Lilia Sfaxi, Mohamed Mehdi Ben Aissa
2021 Computing  
The performance results collected with BABEL for the cash management use case enables to define the right tradeoffs in terms of consistency and availability, in a way that respects the service level agreements  ...  This paper details the steps followed to benchmark a cash management platform of an investment bank using a generic benchmarking solution called BABEL.  ...  The ALOJA benchmarking platform [8] , for instance, is used to benchmark the Hadoop environment when varying the system's architecture.  ... 
doi:10.1007/s00607-021-00933-x fatcat:myqri224vfbsnd4omulyhxr5zy

Automatic Generation of Workload Profiles Using Unsupervised Learning Pipelines

David Buchaca Prats, Josep Lluis Berral, David Carrera
2018 IEEE Transactions on Network and Service Management  
CRBMs can be used to map a given given historic window of trace behaviour into a single vector.  ...  We use these methods to find phases of similar behaviour in the workloads.  ...  For benchmark purposes, no early stopping is applied and the presented times use a single thread of CPU.  ... 
doi:10.1109/tnsm.2017.2786047 fatcat:idlfldnoxnbufjnu2si4dljj4q

Efficient and high-performance data orchestration for large scale cloud workloads

Shouwei Chen
2021
Data analytics generates a large amount of intermediate data at the back of cloud computing frameworks while processing large amounts of data from different data sources.  ...  Consequently, the revolution of hardware devices requires a new paradigm for data orchestration for cloud computing frameworks. This thesis address [...]  ...  For example, Hadoop [1] , and Spark [2] are widely used for data warehouse analytics, machine learning analytics.  ... 
doi:10.7282/t3-ckq3-tw41 fatcat:3m3fpfhbnnat5ar7ri5y5l3uvm

Benchmarking dataflow systems for scalable machine learning

Christoph Boden, Technische Universität Berlin, Technische Universität Berlin, Volker Markl
2018
However, it remains an open question how efficient they perform at this task and how to adequately evaluate and benchmark these systems for scalable machine learning workloads in general.  ...  In this thesis, we present work on all crucial building blocks for a benchmark of distributed data processing systems for scalable machine learning including extensive experimental evaluations of distributed  ...  ALOJA features a database of thousands of benchmark experiment runs of Apache Hadoop.  ... 
doi:10.14279/depositonce-7532 fatcat:mwjd6bnzvjaknbjnfohjokkj4y