Filters








81 Hits in 7.9 sec

Clydesdale

Tim Kaldewey, Eugene J. Shekita, Sandeep Tata
2012 Proceedings of the 15th International Conference on Extending Database Technology - EDBT '12  
In this paper we introduce Clydesdale, a novel system for structured data processing on Hadoop -a popular implementation of MapReduce.  ...  This demonstrates that MapReduce in general, and Hadoop in particular, is a far more compelling platform for structured data processing than previous results suggest.  ...  Llama is a recent system that combines columnar storage and tailored join algorithms. It demonstrates a speedup of at most 5x compared to Hive, while Clydesdale's speedup ranges from 5.2x to 82.7x.  ... 
doi:10.1145/2247596.2247600 dblp:conf/edbt/KaldeweyST12 fatcat:lovj3vh7t5ftdg3ksnncfup2gm

A survey of large-scale analytical query processing in MapReduce

Christos Doulkeridis, Kjetil Nørvåg
2013 The VLDB journal  
A set of the most significant weaknesses and limitations of MapReduce is discussed at a high level, along with solving techniques.  ...  This survey aims to review the state of the art in improving the performance of parallel query processing using MapReduce.  ...  Doulkeridis was supported under the Marie-Curie IEF grant number 274063 with partial support from the Norwegian Research Council.  ... 
doi:10.1007/s00778-013-0319-9 fatcat:3gkpguiwnre2jduhjssuqgydfq

Integration of large-scale data processing systems and traditional parallel database technology

Azza Abouzied, Daniel J. Abadi, Kamil Bajda-Pawlikowski, Avi Silberschatz
2019 Proceedings of the VLDB Endowment  
In 2009 we explored the feasibility of building a hybrid SQL data analysis system that takes the best features from two competing technologies: large-scale data processing systems (such as Google MapReduce  ...  We describe how the project innovated both in the research lab, and as a commercial product at Hadapt and Teradata.  ...  take advantage of columnar storage by keeping data in columnar form during certain query operators.  ... 
doi:10.14778/3352063.3352145 fatcat:qnwfplmf3jgodaw7tsu3kwjsnq

An Efficient Distributed Data Processing Method for Smart Environment

C. Hemanth Kumar, A. Siva Sangari
2016 Indian Journal of Science and Technology  
less latency, that can be run on any Large scale Machine Learning Algorithms for recognizing any interest pattern in the streaming data set was employed.  ...  Applications/Improvements: From this study, we conclude that, building a smart environment by using the big data setup platform improves and enhances the results for the smart environment.  ...  High memory-efficient can be achieved by Spark SQL providing a columnar store for many aggregates than naive Spark code in computations expressible in SQL 7.  ... 
doi:10.17485/ijst/2016/v9i31/95172 fatcat:jbni722vabhqliq2edcv37mi7m

Analysis and Evaluation of Techniques for Managing Unstructured and Semi-Structured Data in a MapReduce Platform

Dina Darwish
2017 International Journal Of Engineering And Computer Science  
MapReduce is one of the most popular platforms in which the dataflow is in the form of a directed acyclic graph of operators.  ...  In this paper, we develop the engineering principles and practices to manage unstructured and semi-structured data in a MapReduce platform.  ...  programming model Figure 3 : 3 Storage technique for management of UNSED in a MapReduce environment 3.1.2.  ... 
doi:10.18535/ijecs/v6i2.03 fatcat:qonsrnvrtng4fkzvqqluwcqft4

A Survey on Spark Ecosystem for Big Data Processing [article]

Shanjiang Tang, Bingsheng He, Ce Yu, Yusen Li, Kun Li
2018 arXiv   pre-print
Finally, we make a discussion on the open issues and challenges for large-scale in-memory data processing with Spark.  ...  In this survey, we aim to have a thorough review of various kinds of optimization techniques on the generality and performance improvement of Spark.  ...  Like Hive, the default backend execution engine for Pig is MapReduce.  ... 
arXiv:1811.08834v1 fatcat:6fxvg6me7rayzm4suoabyg7fii

Efficient processing of data warehousing queries in a split execution environment

Kamil Bajda-Pawlikowski, Daniel J. Abadi, Avi Silberschatz, Erik Paulson
2011 Proceedings of the 2011 international conference on Management of data - SIGMOD '11  
In these experiments we compare our results with a standard commercial parallel database and an open-source MapReduce implementation featuring a SQL interface (Hive).  ...  The company focuses on building a platform for Big Data analytics in the cloud by introducing a storage layer optimized for structured data and by providing a framework for executing SQL queries efficiently  ...  This algorithm is implemented in Hive, Pig, and a recent research paper [12] 3 .  ... 
doi:10.1145/1989323.1989447 dblp:conf/sigmod/Bajda-PawlikowskiASP11 fatcat:5mu2kjm3yjd6vnkhosuulhrmqe

Human Behavior Analysis Using Intelligent Big Data Analytics

Muhammad Usman Tariq, Muhammad Babar, Marc Poulin, Akmal Saeed Khattak, Mohammad Dahman Alshehri, Sarah Kaleem
2021 Frontiers in Psychology  
The API key is generated to fetch information of public channel data in the form of text files. Hive storage machinist is utilized with Apache Spark for efficient data processing.  ...  Social media data is created in a significant amount and at a tremendous pace. There is a very high volume to store, sort, process, and carefully study the data for making possible decisions.  ...  The MLLib library is utilized for applying the Machine Learning (ML) algorithm in the spark context. The graphX library is utilized for graph implementation.  ... 
doi:10.3389/fpsyg.2021.686610 fatcat:axqb4f7pefbkvohtbmolu6clnu

Collaborative Cloud Computing Framework for Health Data with Open Source Technologies [article]

Fatemeh Rouzbeh, Ananth Grama, Paul Griffin, Mohammad Adibuzzaman
2020 arXiv   pre-print
We propose a novel architecture for software-hardware-data ecosystem using open source technologies such as Apache Hadoop, Kubernetes and JupyterHub in a distributed environment.  ...  In this paper, we review the existing literature for such big data systems for scientific research in health sciences and identify the gaps of the current system landscape.  ...  ACKNOWLEDGMENTS We would like to thank the Information Technology at Purdue (ITaP) department for their support in managing the security, networking and operating system.  ... 
arXiv:2007.10498v1 fatcat:gucnqk6gorfsni6qsm5cx3bjja

Big Data Knowledge System in Healthcare [chapter]

Gunasekaran Manogaran, Chandu Thota, Daphne Lopez, V. Vijayakumar, Kaja M. Abbas, Revathi Sundarsekar
2017 Studies in Big Data  
The chapter proposes a big data based knowledge management system to develop the clinical decisions.  ...  Hence, effective big data based knowledge management system is needed for monitoring of patients and identify the clinical decisions to the doctor.  ...  For example, scalable MapReduce based algorithmic technologies are used to compare one genome to many others in an efficient way.  ... 
doi:10.1007/978-3-319-49736-5_7 fatcat:d7zfowy2prc7hbvitso7zcasga

A comprehensive view of Hadoop research—A systematic literature review

Ivanilton Polato, Reginaldo Ré, Alfredo Goldman, Fabio Kon
2014 Journal of Network and Computer Applications  
Context: In recent years, the valuable knowledge that can be retrieved from petabyte scale datasetsknown as Big Dataled to the development of solutions to process information based on parallel and distributed  ...  of the experiments conducted by authors, hindering their reproducibility; finally, the systematic review presented in this paper demonstrates that Hadoop has evolved into a solid platform to process large  ...  Table A1 and A2 Table A1 Studies with implementation and/or experiments (MapReduce and data storage & manipulation categories). Appendix A.  ... 
doi:10.1016/j.jnca.2014.07.022 fatcat:4xjveqy6mrctzjc4ou7llyy4u4

An Architecture for Data Warehousing in Big Data Environments [chapter]

Bruno Martinho, Maribel Yasmina Santos
2016 Lecture Notes in Business Information Processing  
a querying mechanism, and not as a data storage repository with tables that enhance data analytics over different perspectives.  ...  According to [20] , Impala is faster in querying the data when compared to Hive, as it uses a query engine that does not need MapReduce [20, 21] and, as Hive uses MapReduce jobs, its performance is  ... 
doi:10.1007/978-3-319-49944-4_18 fatcat:mbmzyorcsrg7jdifddc6rwzyua

A Taxonomy on Big Data: Survey [article]

Ripon Patgiri
2019 arXiv   pre-print
Therefore, the Big Data is spawning everywhere to enhance the organizations' revenue. Thus, many new technologies emerging based on Big Data. In this paper, we present the taxonomy of Big Data.  ...  For instance, science, engineering, economics, business, social science, and government. The Big Data are used to boost up the organization performance using massive amount of dataset.  ...  Object Storage for Big Data Object storage is a basic storage unit for applications which stores data as objects and as a logical collection of bytes on a storage device along with the methods for accessing  ... 
arXiv:1808.08474v3 fatcat:mxnvemtv75akvhq4643b4q4lne

A Scalability Comparison Study of Data Management Approaches for Smart Metering Systems

Houssem Chihoub, Christine Collet
2016 2016 45th International Conference on Parallel Processing (ICPP)  
To this end, we conduct a thorough experimental study of various systems including a parallel relational database system, MapReduce based systems including Hadoop and Spark, and a NoSQL datastore system  ...  In this work, we focus on investigating the scalability and performance of different data management approaches for meter data processing.  ...  Next-generation MapReduce in-memory processing In recent years, many efforts have been dedicated to enhance the performance of MapReduce systems.  ... 
doi:10.1109/icpp.2016.61 dblp:conf/icpp/ChihoubC16 fatcat:anypufbhcbgpjlfv7zebbgxfkm

Efficient storage, retrieval and analysis of poker hands: An adaptive data framework

Marcin Gorawski, Michal Lorek
2017 International Journal of Applied Mathematics and Computer Science  
Both index types operate independently of the Hive execution context and allow other big data computational frameworks such as MapReduce or Spark to benefit from the optimized data access path to the hand  ...  with other approaches.  ...  The framework, in conjunction with the capabilities provided by Hive, allows users to take advantage of the parallel processing capabilities provided by Hadoop and MapReduce using a simple SQL-based  ... 
doi:10.1515/amcs-2017-0049 fatcat:4fvajh46drf3lj5j6mnep5trta
« Previous Showing results 1 — 15 out of 81 results