171 Hits in 6.1 sec

Evaluation of Data Locality Strategies for Hybrid Cloud Bursting of Iterative MapReduce

Francisco J. Clemente-Castello, Bogdan Nicolae, M. Mustafa Rafique, Rafael Mayo, Juan Carlos Fernandez
2017 2017 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID)  
In this paper we study how to combine various MapReduce data locality techniques designed for hybrid cloud bursting in order to achieve scalability for iterative MapReduce applications in a cost-effective  ...  It is particularly promising for iterative MapReduce applications that reuse massive amounts of input data at each iteration, which compensates for the high overhead and cost of concurrent data transfers  ...  CLOUD BURSTING DATA LOCALITY CHALLENGES AND STRATEGIES This section briefly introduces the challenges of running iterative MapReduce applications in hybrid cloud bursting scenarios and revisits two complementary  ... 
doi:10.1109/ccgrid.2017.96 dblp:conf/ccgrid/Clemente-Castello17 fatcat:julzxysp75hwvmpizgs5ezjaqe

On exploiting data locality for iterative mapreduce applications in hybrid clouds

Francisco J. Clemente-Castelló, Bogdan Nicolae, Rafael Mayo, Juan Carlos Fernández, M. Mustafa Rafique
2016 Proceedings of the 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies - BDCAT '16  
Specifically, we focus our study on iterative MapReduce applications, which are a class of large-scale data intensive applications particularly popular on hybrid clouds.  ...  Hybrid cloud bursting (i.e., leasing temporary off-premise cloud resources to boost the capacity during peak utilization), has made significant impact especially for big data analytics, where the explosion  ...  This paper has contributed with an analysis of the challenges that arise when running iterative MapReduce applications in hybrid cloud bursting.  ... 
doi:10.1145/3006299.3006329 dblp:conf/bdc/Clemente-Castello16 fatcat:3xlkkjny6zajtk54p5j2rqfsbu

Enabling Big Data Analytics in the Hybrid Cloud Using Iterative MapReduce

Francisco J. Clemente-Castello, Bogdan Nicolae, Kostas Katrinis, M. Mustafa Rafique, Rafael Mayo, Juan Carlos Fernandez, Daniela Loreti
2015 2015 IEEE/ACM 8th International Conference on Utility and Cloud Computing (UCC)  
This paper addresses this gap by taking on the challenge of bursting over hybrid clouds for the benefit of accelerating iterative MapReduce applications.  ...  We show through experimentation in a dual-Openstack hybrid cloud setup that our solutions manage to bring substantial improvement at predictable cost-control for two real-life iterative MapReduce applications  ...  As such, it does not capture the initial phase of cross-cloud data distribution and data-balancing, a vital phase in high value hybrid cloud usecases, such as cloud-bursting.  ... 
doi:10.1109/ucc.2015.47 dblp:conf/ucc/Clemente-Castello15 fatcat:wmo6c3utunbmbdajulzo5cbtom

Floe: A Continuous Dataflow Framework for Dynamic Cloud Applications [article]

Yogesh Simmhan, Alok Kumbhare
2014 arXiv   pre-print
Adaptive resource allocation strategies allow our framework to effectively use elastic Cloud resources to meet varying data rates.  ...  It offers advanced dataflow patterns like BSP and MapReduce for flexible and holistic composition of streams and files, and supports dynamic recomposition at runtime with minimal impact on the execution  ...  This approach generalizes the pattern beyond just MapReduce, and allows even iterative MapReduce composition (but does away with the, often, unnecessary Map stage for the second and subsequent iterations  ... 
arXiv:1406.5977v1 fatcat:tfxojvrp7jesjlbau6xeq62bfa

Data Processing Model to Perform Big Data Analytics in Hybrid Infrastructures

Julio C. S. Anjos, Kassiano J. Matteussi, Paulo R. R. De Souza, Gabriel J. A. Grabher, Guilherme A. Borges, Jorge L. V. Barbosa, Gabriel V. Gonzalez, Valderi R. Q. Leithardt, Claudio F. R. Geyer
2020 IEEE Access  
[36] address iterative MapReduce issues in Hybrid IaaS CCC environments. The authors argue that it is essential to improve the ability to take advantage of the data locality in a HC environment.  ...  The data locality and data movement remain a challenge for accelerating iterative MR in HC once iterative applications reuse invariant input data. Clement et al.  ... 
doi:10.1109/access.2020.3023344 fatcat:dmqifexpivhnld75k3uwbmyaui

Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Jawwad Shamsi, Muhammad Ali Khojaye, Mohammad Ali Qasmi
2013 Journal of Grid Computing  
A data-intensive cloud provides an abstraction of high availability, usability, and efficiency to users.  ...  Data-intensive systems encompass terabytes to petabytes of data.  ...  Similarly, Lang and Patel [73] , have evaluated power saving strategies for MapReduce based cloud systems. The focus is on two categories of techniques for power conservation.  ... 
doi:10.1007/s10723-013-9255-6 fatcat:l27ga4kh7nhnjd6nb6n57autgq

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks [article]

Sanaa Hamid Mohamed, Taisir E.H. El-Gorashi, Jaafar M.H. Elmirghani
2019 arXiv   pre-print
The MapReduce programming model and its widely-used open-source platform; Hadoop, are enabling the development of a large number of cloud-based services and big data applications.  ...  Moreover, we provide a brief review of data centers topologies, routing protocols, and traffic characteristics, and emphasize the implications of big data on such cloud data centers and their supporting  ...  All data are provided in full in the results section of this paper.  ... 
arXiv:1910.00731v1 fatcat:kvi3br4iwzg3bi7fifpgyly7m4

Building and scaling virtual clusters with residual resources from interactive clouds

R. Benjamin Clay, Zhiming Shen, Xiaosong Ma
2013 Proceedings of the 22nd international symposium on High-performance parallel and distributed computing - HPDC '13  
RHIC builds adhoc clusters for running throughput-oriented "background" workloads using a hybrid of residual and dedicated resources.  ...  We demonstrate the effectiveness and adaptivity of our RHIC prototype with two parallel data analytics frameworks, Hadoop and HBase.  ...  In the hybrid cluster design, the dedicated nodes have node-local storage capacity, while the volunteer VMs only use their local storage for temporary data, as shown in Fig. 3 .  ... 
doi:10.1145/2493123.2462927 fatcat:h6kgbrs6c5cwhnjjfkxys4pcxu

Time and Cost Sensitive Data-Intensive Computing on Hybrid Clouds

Tekin Bicer, David Chiu, Gagan Agrawal
2012 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)  
This scenario gives rise to a hybrid cloud, where data stored across local and cloud resources may be processed over both environments.  ...  In this paper, we describe a modeling-driven resource allocation framework to support both time and cost sensitive execution for data-intensive applications executed in a hybrid cloud setting.  ...  DATA-INTENSIVE COMPUTING ON HYBRID CLOUD: MOTIVATION AND ENABLING MIDDLEWARE We now describe the situations where processing of data in a hybrid cloud may be desired.  ... 
doi:10.1109/ccgrid.2012.95 dblp:conf/ccgrid/BicerCA12 fatcat:mmq5kxt4uzef5dqlmexnascv3i

Accelerating Batch Analytics with Residual Resources from Interactive Clouds

R. Benjamin Clay, Zhiming Shen, Xiaosong Ma
2013 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems  
RHIC builds ad-hoc clusters for running throughput-oriented "background" workloads using a hybrid of residual and dedicated resources.  ...  We demonstrate the effectiveness and adaptivity of our RHIC prototype with two parallel data analytics frameworks, Hadoop and HBase.  ...  Any opinions expressed in this paper are those of the authors and do not necessarily reflect the views of the NSF or U.S. Government.  ... 
doi:10.1109/mascots.2013.63 dblp:conf/mascots/ClaySM13 fatcat:ksz5rdsbnna5rjlkc2ubd2n6fi

Big Data computing and clouds: Trends and future directions

Marcos D. Assunção, Rodrigo N. Calheiros, Silvia Bianchi, Marco A.S. Netto, Rajkumar Buyya
2015 Journal of Parallel and Distributed Computing  
This paper discusses approaches and environments for carrying out analytics on Clouds for Big Data applications.  ...  Through a detailed survey, we identify possible gaps in technology and provide recommendations for the research community on future directions on Cloud-supported Big Data computing and analytics solutions  ...  The same approach of exploring data locality was explored previously in scientific workflows [39] and in Data Grids [40] In the context of Big Data analytics, MapReduce presents an interesting model  ... 
doi:10.1016/j.jpdc.2014.08.003 fatcat:l4d5t2y4hrhg5irbyk7bd6zfo4

Big Data and cloud computing: innovation opportunities and challenges

Chaowei Yang, Qunying Huang, Zhenlong Li, Kai Liu, Fei Hu
2016 International Journal of Digital Earth  
This review introduces future innovations and a research agenda for cloud computing supporting the transformation of the volume, velocity, variety and veracity into values of Big Data for local to global  ...  and application developments; (ii) cloud computing provides major solutions for Big Data; (iii) Big Data, spatiotemporal thinking and various application domains drive the advancement of cloud computing  ...  Acknowledgements We thank the anonymous reviewers for their insightful comments and reviews. Dr George Taylor reviewed a previous version of this manuscript.  ... 
doi:10.1080/17538947.2016.1239771 fatcat:qbcgqj2pcvbgja6dnnakoj2saa

2014 Index IEEE Transactions on Parallel and Distributed Systems Vol. 25

2015 IEEE Transactions on Parallel and Distributed Systems  
., +, TPDS Dec. 2014 3167-3176 Data privacy A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.  ...  Samaan, Nancy, TPDS Jan. 2014 12-21 A Scalable Two-Phase Top-Down Specialization Approach for Data Anonymization Using MapReduce on Cloud.  ...  ., +, TPDS Aug. 2014 2840 -2850 Energy and Network Aware Workload Management for Sustainable Data Centers with Thermal Storage. 2030 -2042 Hyperbolic Utilization Bounds for Rate Monotonic Scheduling  ... 
doi:10.1109/tpds.2014.2371591 fatcat:qxyljogalrbfficryqjowgv3je

Coming Together of Big Data and Cloud Computing : A Review

Muneeba Afzal Mukhdoomi, Ashish Oberoi, Ankur Gupta
2020 International Journal of Scientific Research in Computer Science Engineering and Information Technology  
Big data stands for sheer amount of data that is growing unceasingly at a rapid pace.  ...  This paper will mainly review processing of big data cloud using Hadoop and spark in cloud, advantages of driving Big Data using cloud computing and applications of Big data in Cloud.  ...  The most common benefit of hybrid cloud is its "cloud bursting strategy", due to which data and applications are portable to public cloud as per the requirements of the organizations.  ... 
doi:10.32628/cseit206613 fatcat:4xk42hnfffekriax53bd7ihbwy

An optimization approach to capacity evaluation and investment decision of hybrid cloud: a corporate customer's perspective

In Lee
2019 Journal of Cloud Computing: Advances, Systems and Applications  
While the rapid growth of cloud computing is driven by the surge of big data, the Internet of Things, and social media applications, an evaluation and investment decision for cloud computing has been challenging  ...  This paper attempts to identify critical variables for making a cloud capacity decision from a corporate customer's perspective and develops a base mathematical model to aid in a hybrid cloud investment  ...  An evaluation and investment in interoperability for cloud bursting The hybrid cloud requires interoperability of the private and the public cloud to support cloud bursting.  ... 
doi:10.1186/s13677-019-0140-0 fatcat:ktdaryqaljajhcxve4ve76k4d4
« Previous Showing results 1 — 15 out of 171 results