214 Hits in 8.0 sec

To Overlap or Not to Overlap: Optimizing Incremental MapReduce Computations for On-Demand Data Upload

Stefan Ene, Bogdan Nicolae, Alexandru Costan, Gabriel Antoniu
2014 2014 5th International Workshop on Data-Intensive Computing in the Clouds  
Our key finding shows that overlapping the transfer time with as many incremental computations as possible is not always efficient: a better solution is to wait for enough to fill the computational capacity  ...  Research on cloud-based Big Data analytics has focused so far on optimizing the performance and cost-effectiveness of the computations, while largely neglecting an important aspect: users need to upload  ...  CONCLUSIONS In this paper, we explore how to optimize incremental MapReduce computations specifically for the case when the input data is a large remote dataset that needs to be uploaded to the cloud first  ... 
doi:10.1109/datacloud.2014.7 dblp:conf/sc/EneNCA14 fatcat:bqqhfzmz6fhljf5tk7rmgisyya

Efficient multi-way theta-join processing using MapReduce

Xiaofei Zhang, Lei Chen, Min Wang
2012 Proceedings of the VLDB Endowment  
with only one MapReduce job.  ...  is proven to be able to support OLAP applications over immense data volumes.  ...  for the data uploading process, as shown in Fig.11 .  ... 
doi:10.14778/2350229.2350238 fatcat:o5qgvtbgwfeafnpwkh2nbt3hju

Efficient Multi-way Theta-Join Processing Using MapReduce [article]

Xiaofei Zhang, Lei Chen, Min Wang
2012 arXiv   pre-print
with only one MapReduce job.  ...  is proven to be able to support OLAP applications over immense data volumes.  ...  for the data uploading process, as shown in Fig.11 .  ... 
arXiv:1208.0081v1 fatcat:r77n5x4csre2jpmtvv5asqj3au

A comprehensive view of Hadoop research—A systematic literature review

Ivanilton Polato, Reginaldo Ré, Alfredo Goldman, Fabio Kon
2014 Journal of Network and Computer Applications  
computing.  ...  Lately, Apache Hadoop has attracted strong attention due to its applicability to Big Data processing.  ...  Table A1 and A2 Table A1 Studies with implementation and/or experiments (MapReduce and data storage & manipulation categories). Appendix A.  ... 
doi:10.1016/j.jnca.2014.07.022 fatcat:4xjveqy6mrctzjc4ou7llyy4u4

Dynamic Creation of BSP/CGM Clusters on Cloud Computing Platforms

Alessandro Kraemer, Junior Cesar de Oliveira, Fabio Andre Garaluz dos Santos, Ana Claudia Maciel, Alfredo Goldman, Daniel Cordeiro
2013 2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies  
We show how to build and instantiate virtual machines templates for BSP/CGM applications that can be used on private cloud computing platforms.  ...  Cloud computing platforms have the potential to benefit scientific projects on all fields of knowledge.  ...  INTRODUCTION The demand for computational power is increasing.  ... 
doi:10.1109/eidwt.2013.140 dblp:conf/eidwt/KraemerOSMGC13 fatcat:de44tnadcrgapluf74kz4ibggy

Forensicloud: An Architecture for Digital Forensic Analysis in the Cloud

Cody Miller, Dae Glendowne, David Dampier, Kendall Blaylock
2014 Journal of Cyber Security and Mobility  
These environments allow investigators the ability to use licensed and unlicensed tools that they may not have had access to before and allows some of these tools to be run on computing clusters.  ...  The amount of data that must be processed in current digital forensic examinations continues to rise.  ...  Increment the number of cases by one until the processing performance is no longer optimal. Record the number of nodes a single case needs to be effective.  ... 
doi:10.13052/jcsm2245-1439.331 fatcat:kuqnzpg4znf3libxxete5j2boy

Algorithmic Skeleton Framework for the Orchestration of GPU Computations [chapter]

Ricardo Marques, Hervé Paulino, Fernando Alexandre, Pedro D. Medeiros
2013 Lecture Notes in Computer Science  
The Loop must be able to create environments on demand, for it to enable overlapped and independent loop executions.  ...  However, it does not optimize how the communication is overlapped with the computation.  ... 
doi:10.1007/978-3-642-40047-6_86 fatcat:rsrjtgynnzfedjmtejym3khija

Scalable parallel computing on clouds using Twister4Azure iterative MapReduce

Thilina Gunarathne, Bingjing Zhang, Tak-Lon Wu, Judy Qiu
2013 Future generations computer systems  
The challenges to large-scale distributed computations on cloud environments demand innovative computational frameworks that are specifically tailored for cloud characteristics to easily and effectively  ...  Window Azure claims to allow users to "focus on your applications, not the infrastructure."  ...  The MapReduce model reduces the data transfer overheads by overlapping data communication with computations when reduce steps are involved.  ... 
doi:10.1016/j.future.2012.05.027 fatcat:7555zbco7rggvhvpvswuvsnteu

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks [article]

Sanaa Hamid Mohamed, Taisir E.H. El-Gorashi, Jaafar M.H. Elmirghani
2019 arXiv   pre-print
Wide ranging efforts were devoted to optimize systems that handle big data in terms of various applications performance metrics and/or infrastructure energy efficiency.  ...  MapReduce and Hadoop thus introduce innovative, efficient, and accelerated intensive computations and analytics.  ...  All data are provided in full in the results section of this paper.  ... 
arXiv:1910.00731v1 fatcat:kvi3br4iwzg3bi7fifpgyly7m4

Uncovering Large Groups of Active Malicious Accounts in Online Social Networks

Qiang Cao, Xiaowei Yang, Jieqi Yu, Christopher Palow
2014 Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security - CCS '14  
We implement Syn-chroTrap as an incremental processing system on Hadoop and Giraph so that it can process the massive user activity data in a large online social network efficiently.  ...  SynchroTrap was able to unveil more than two million malicious accounts and 1156 large attack campaigns within one month.  ...  We are particularly thankful to Michael Sirivianos for his extensive feedback. We also thank the Facebook Digraph team for providing the graph processing infrastructure.  ... 
doi:10.1145/2660267.2660269 dblp:conf/ccs/CaoYYP14 fatcat:qihgyx724bbp5l25hlt2tqtk3q

A Survey on Spark Ecosystem for Big Data Processing [article]

Shanjiang Tang, Bingsheng He, Ce Yu, Yusen Li, Kun Li
2018 arXiv   pre-print
Finally, we make a discussion on the open issues and challenges for large-scale in-memory data processing with Spark.  ...  In this survey, we aim to have a thorough review of various kinds of optimization techniques on the generality and performance improvement of Spark.  ...  It charges users for on-demand storage, requests and data transfers.  ... 
arXiv:1811.08834v1 fatcat:6fxvg6me7rayzm4suoabyg7fii

Grex: An efficient MapReduce framework for graphics processing units

Can Basaran, Kyoung-Don Kang
2013 Journal of Parallel and Distributed Computing  
In this paper, we present a new MapReduce framework, called Grex, designed to leverage general purpose graphics processing units (GPUs) for parallel data processing.  ...  Second, Grex evenly distributes data to map/reduce tasks to avoid data partitioning skews.  ...  In addition, data upload to and download from the GPU in Grex is overlapped with the computation in the GPU to reduce the I/O overhead.  ... 
doi:10.1016/j.jpdc.2013.01.004 fatcat:ow36bddjybf7lppr7sce52t264


Lakshmi N
2020 Zenodo  
The Information Technology (IT) industry is one among the most information and demanding industries.  ...  In the IT field, the finance department is the one where the knowledge and data keep on developing on a daily basis.  ...  It allows for data scientists to upload data in any format, and provides a simple platform organize, sort, and manipulate that data.  ... 
doi:10.5281/zenodo.3992839 fatcat:dsl73qyi65cxtimgzombg7c2ze

Parallel programming for multimedia applications

Hari Kalva, Aleksandar Colic, Adriana Garcia, Borko Furht
2010 Multimedia tools and applications  
This increase in the amount of data has put heavier burden on the computing infrastructure necessary to process multimedia data.  ...  For example, users are allowed to upload HD resolution video to YouTube and YouTube processes the video to allow other users to access it at various bitrates and resolutions.  ...  At the highest level of hierarchy is cloud or cluster computing where multiple systems connected over networks are used to solve large scale problems or process large amounts of data.  ... 
doi:10.1007/s11042-010-0656-2 fatcat:2a2eifgpkfg2hejcx3kwyjjjte

Graph analytics using vertica relational database

Alekh Jindal, Samuel Madden, Malu Castellanos, Meichun Hsu
2015 2015 IEEE International Conference on Big Data (Big Data)  
These systems often start from the assumption that a new storage or query processing system is needed, in spite of graph data being often collected and stored in a relational database in the first place  ...  In this paper, we study Vertica relational database as a platform for graph analytics.  ...  Given the popular demand for graph analytics, a natural question is whether or not traditional database systems really are a bad fit for these graph analytics workloads?  ... 
doi:10.1109/bigdata.2015.7363873 dblp:conf/bigdataconf/JindalMCH15 fatcat:ydciojiv5vgatnlmhxtyu4j6cq
« Previous Showing results 1 — 15 out of 214 results