Filters








2,594 Hits in 2.4 sec

Online aggregation and continuous query support in MapReduce

Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein, John Gerth, Justin Talbot, Khaled Elmeleegy, Russell Sears
2010 Proceedings of the 2010 international conference on Management of data - SIGMOD '10  
Our Hadoop Online Prototype (HOP) also supports continuous queries, which enable MapReduce programs to be written for applications such as event monitoring and stream processing.  ...  We demonstrate a modified version of the Hadoop MapReduce framework that supports online aggregation, which allows users to see "early returns" from a job as it is being computed.  ...  However, HOP supports early returns of reducer output, which enables online aggregation and continuous query pipelines. This new functionality will be the focus of our demonstration.  ... 
doi:10.1145/1807167.1807295 dblp:conf/sigmod/CondieCAHGTES10 fatcat:lmurylumujelfdjbiewkaowqx4

COLA: A cloud-based system for online aggregation

Yantao Gan, Xiaofeng Meng, Yingjie Shi
2013 2013 IEEE 29th International Conference on Data Engineering (ICDE)  
COLA provides an online aggregation execution engine with novel sampling techniques to support incremental and continuous computing of aggregation, and minimize the waiting time before an acceptably precise  ...  In addition, userfriendly SQL queries are supported in COLA.  ...  that supports SQL queries and also non-OLA MapReduce programs.  ... 
doi:10.1109/icde.2013.6544946 dblp:conf/icde/GanMS13 fatcat:a32qfakyafgm3l5bgoszewg23e

You can stop early with COLA

Yingjie Shi, Xiaofeng Meng, Fusheng Wang, Yantao Gan
2012 Proceedings of the 21st ACM international conference on Information and knowledge management - CIKM '12  
We develop an online query processing algorithm for MapReduce to support incremental and continuous computing of aggregations on joins which minimizes the waiting time before an acceptable estimate is  ...  As an attractive solution to provide a quick sketch of massive data before a long wait of the final accurate query result, online processing of aggregate queries in the cloud is of paramount importance  ...  We develop an online query processing algorithm for MapReduce to support incremental and continuous computing of aggregations on joins, and minimize the waiting time before an acceptable estimate is achieved  ... 
doi:10.1145/2396761.2398423 dblp:conf/cikm/ShiMWG12 fatcat:guycck6q6vhblptkpr3otweysu

Towards Scalable One-Pass Analytics Using MapReduce

Edward Mazur, Boduo Li, Yanlei Diao, Prashant Shenoy
2011 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum  
stream processing and online aggregation?  ...  Current MapReduce systems, however, require the data set to be loaded into the cluster before running analytical queries, and thereby incur high delays to start query processing.  ...  MapReduce Online can perform sort-merge and global aggregation periodically when a snapshot is generated.  ... 
doi:10.1109/ipdps.2011.251 dblp:conf/ipps/MazurLDS11 fatcat:53r7l2yzdjaj7m33d4ympyuiem

Challenges for MapReduce in Big Data

Katarina Grolinger, Michael Hayes, Wilson A. Higashino, Alexandra L'Heureux, David S. Allison, Miriam A.M. Capretz
2014 2014 IEEE World Congress on Services  
analytics), online processing, and security and privacy.  ...  In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets.  ...  In this work, the authors proposed an online MapReduce implementation with the goal of supporting online aggregation and continuous queries.  ... 
doi:10.1109/services.2014.41 dblp:conf/services/GrolingerHHLAC14 fatcat:p2rviqzm5ngtlhhhbd2qzsjlxa

An Efficient Block Sampling Strategy for Online Aggregation in the Cloud [chapter]

Xiang Ci, Xiaofeng Meng
2015 Lecture Notes in Computer Science  
Online aggregation responds aggregation queries against the random samples and refines the result as more samples are received.  ...  In the era of big data, more and more data analysis applications are migrated to the cloud, so online aggregation in the cloud has also attracted more attention.  ...  Online Aggregation in the Cloud Cloud is different from RDBMS, and the major problem of online aggregation in the cloud is that naive MapReduce does not support pipeline operations.  ... 
doi:10.1007/978-3-319-21042-1_29 fatcat:k7niv3vpqrf4npk7vfxnhrdjiy

Tiled-MapReduce

Rong Chen, Haibo Chen
2013 ACM Transactions on Architecture and Code Optimization (TACO)  
Further, we demonstrate that Tiled-MapReduce supports fine-grained fault tolerance and enables several usage scenarios such as online and incremental computing on multicore machines.  ...  Ostrich also efficiently supports fine-grained fault tolerance, online, and incremental computing with small performance penalty.  ...  Online MapReduce [Condie et al. 2010 ] extends the MapReduce runtime using a pipelining scheme to support two features in the database domain, the online aggregation and continuous query processing.  ... 
doi:10.1145/2445572.2445575 fatcat:fbfbnro6rzegfb4y5vro3i4zva

M3: Stream Processing on Main-Memory MapReduce

Ahmed M. Aly, Asmaa Sallam, Bala M. Gnanasekaran, Long-Van Nguyen-Dinh, Walid G. Aref, Mourad Ouzzani, Arif Ghafoor
2012 2012 IEEE 28th International Conference on Data Engineering  
Such restriction makes these implementations inapplicable for most streaming applications, in which queries are continuous in nature, and input data streams are continuously received at high arrival rates  ...  In this demonstration, we showcase M 3 , a prototype implementation of the MapReduce framework in which continuous queries over streams of data can be efficiently answered.  ...  ACKNOWLEDGMENT This research is supported in part by QCRI, and the National Science Foundation under Grants III-1117766, IIS-0964639, and IIS-0811954.  ... 
doi:10.1109/icde.2012.120 dblp:conf/icde/AlySGNAOG12 fatcat:sbnjzc73c5afjoi4t74eg72b4m

Big Data Processing in Cloud Computing Environments

Changqing Ji, Yu Li, Wenming Qiu, Uchechukwu Awada, Keqiu Li
2012 2012 12th International Symposium on Pervasive Systems, Algorithms and Networks  
Following the MapReduce parallel processing framework, we then introduce MapReduce optimization strategies and applications reported in the literature.  ...  Finally, we discuss the open issues and challenges, and deeply explore the research directions in the future on big data processing in cloud computing environments.  ...  MapReduce Online [36] is desgined to support online aggregation and continuous queries in MapReduce.  ... 
doi:10.1109/i-span.2012.9 fatcat:5nk7w7xdlzbe7dlxb2mqq7fne4

VisReduce: Fast and responsive incremental information visualization of large datasets

Jean-Francois Im, Felix Giguere Villegas, Michael J. McGuffin
2013 2013 IEEE International Conference on Big Data  
We show that our end-to-end approach allows for greater speed and guaranteed end-user responsiveness, even in the face of large, long-running queries.  ...  We compare our method with one that queries three other readily available database and data warehouse systems -PostgreSQL, Cloudera Impala and the MapReducebased Apache Hive -in order to build visualizations  ...  ACKNOWLEDGMENTS The authors wish to thank Hisham Mardam-Bey and Mate1.com for their support.  ... 
doi:10.1109/bigdata.2013.6691710 dblp:conf/bigdataconf/ImVM13 fatcat:45h4i237rjd6vh4ypf4zdpr5mm

Beyond Batch Processing: Towards Real-Time and Streaming Big Data [article]

Saeed Shahrivari, Saeed Jalili
2014 arXiv   pre-print
Today, big data is generated from many sources and there is a huge demand for storing, managing, processing, and querying on big data.  ...  The MapReduce model and its counterpart open source implementation Hadoop, has proven itself as the de facto solution to big data processing.  ...  There are numerous use cases for stream processing like: online machine learning, and continuous computation.  ... 
arXiv:1403.3375v2 fatcat:oulgi324y5ez3ldnrwbfbtgvfq

MapReduce Performance in MongoDB Sharded Collections

Jaumin Ajdari, Brilant Kasami
2018 International Journal of Advanced Computer Science and Applications  
In the modern era of computing and countless of online services that gather and serve huge data around the world, processing and analyzing Big Data has rapidly developed into an area of its own.  ...  ., in their paper [4] , look back to the MapReduce and try to find out the strengths and  ...  Combining those two area (MapReduce and Online Aggregation) they introduced a new methodology that uses MapReduce paradigm along with online aggregation.  ... 
doi:10.14569/ijacsa.2018.090617 fatcat:ba7sewncafcwxgxiahlh42hi7u

Parallel data processing with MapReduce

Kyong-Ha Lee, Yoon-Joon Lee, Hyunsik Choi, Yon Dohn Chung, Bongki Moon
2012 SIGMOD record  
In this survey, we characterize the MapReduce framework and discuss its inherent pros and cons. We then introduce its optimization strategies reported in the recent literature.  ...  This survey intends to assist the database and open source communities in understanding various technical aspects of the MapReduce framework.  ...  MapReduce Online is devised to support online aggregation and continuous queries in MapReduce [63] .  ... 
doi:10.1145/2094114.2094118 fatcat:kuvfuwss3fcmbf2d7oqqfibmoq

Beyond Batch Processing: Towards Real-Time and Streaming Big Data

Saeed Shahrivari
2014 Computers  
In this article, we discussed two categories of these solutions: real-time processing, and stream processing of big data.  ...  Today, big data are generated from many sources, and there is a huge demand for storing, managing, processing, and querying on big data.  ...  Acknowledgements The author would like to thank Saeed Jalili for reviewing and commenting on this paper. Conflicts of Interest The authors declare no conflict of interest. References  ... 
doi:10.3390/computers3040117 fatcat:nt5bxrpymvga5efzvyne67hejy

Distributed data management using MapReduce

Feng Li, Beng Chin Ooi, M. Tamer Özsu, Sai Wu
2014 ACM Computing Surveys  
In this paper we aim to provide a comprehensive review of a wide range of proposals and systems that focusing fundamentally on the support of distributed data management and processing using the MapReduce  ...  MapReduce is a framework for processing and managing large scale data sets in a distributed cluster, which has been used for applications such as generating search indexes, document clustering, access  ...  Streams and Continuous Query Processing Another extension to MapReduce has been to address continuous processing such as stream processing [Stephens 1997; Golab and Özsu 2010] or online aggregation  ... 
doi:10.1145/2503009 fatcat:nxfuh67rnrhwvh3c5zxmdkyvae
« Previous Showing results 1 — 15 out of 2,594 results