123 Hits in 3.5 sec

Apache hadoop goes realtime at Facebook

Dhruba Borthakur, Samuel Rash, Rodrigo Schmidt, Amitanand Aiyer, Jonathan Gray, Joydeep Sen Sarma, Kannan Muthukkaruppan, Nicolas Spiegelberg, Hairong Kuang, Karthik Ranganathan, Dmytro Molkov, Aravind Menon
2011 Proceedings of the 2011 international conference on Management of data - SIGMOD '11  
Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform.  ...  MySQL database scheme used in other applications at Facebook and many other web-scale companies.  ...  We have Acknowledgements are also due to Patrick Kling for implementing a test suite for HDFS HA as part of his internship at Facebook.  ... 
doi:10.1145/1989323.1989438 dblp:conf/sigmod/BorthakurGSMSKRMMRSA11 fatcat:gnicex2fwzbotcpmlaqhz7k2mm

Identifying Requirements for Big Data Analytics and Mapping to Hadoop Tools

2019 International journal of recent technology and engineering  
Apache Hadoop is a popular open-source platform that supports storage and processing of extremely large datasets. For the purposes of big data analytics, Hadoop ecosystem provides a variety of tools.  ...  Big data is being generating in a wide variety of formats at an exponential rate.  ...  Hadoop is currently used by Google, Facebook, LinkedIn, Yahoo!, Twitter and many more. Hadoop ecosystem includes various components of Apache Hadoop software library.  ... 
doi:10.35940/ijrte.c5524.098319 fatcat:zgw5y6nucve3jo36sqeio3wukq

New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools

2022 Baghdad Science Journal  
Everybody is connected with social media like (Facebook, Twitter, LinkedIn, Instagram...etc.) that generate a large quantity of data and which traditional applications are inadequate to process.  ...  Comparison between Hadoop and spark has been also illustrated.  ...  Every dataset entered the shuffling of the general process of the reducer in guarantees that partitioning goes on partition divides completed at suitable reducers where for http is used by the reducer  ... 
doi:10.21123/bsj.2022.19.4.0887 fatcat:syywdq6xgret5gfezyjuam33qy

Acerca de la aplicación de MapReduce + Hadoop en el tratamiento de Big Data

Antonio Hernández Dominguez, Adrian Hernández Yeja
2015 Revista Cubana de Ciencias Informáticas  
MapReduce + Hadoop es un modelo de programación que es utilizado por disímiles empresas que se dedican al desarrollo de software en el mundo, entre ellas Google y Yahoo.  ...  La belleza de Hadoop MapReduce es que los usuarios por lo general solo tienen que − BORTHAKUR, D., et al. 2011. Apache Hadoop goes realtime at Facebook. [ed.]  ...  Apache Hadoop goes realtime at Facebook. s.l. : Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 2011. − CASSANDRA, A. 2013. The Apache Software Foundation.  ... 
doaj:dc662fe03ad1456b8ef3dc2acbe898fa fatcat:ne4wdl7hwbgrrj42fma6nnhkmy

Perform wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm [article]

Nandan Mirajkar, Sandeep Bhujbal, Aaradhana Deshmukh
2013 arXiv   pre-print
The paper describes performing a wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm.  ...  This storage capacity can be reduced and distributed processing of huge data can be done using Apache Hadoop which uses Map-reduce algorithm and combines the repeating data so that entire data is stored  ...  Apache Hadoop software library itself detects and handles any failures at application layer [2]. 2.2 Hadoop Distributed File System -HDFS A distributed user-level filesystem HDFS Hadoop Distributed File  ... 
arXiv:1307.1517v1 fatcat:deecdzioxvdsdkq52nfyawrgiq

Study of the Big Data Collection Scheme Based Apache Flume for Log Collection

Sooyong Jung, Dept. of Computer Science Graduate School, Soongsil University, 369 Sangdo-Ro, Dongjak-Gu, Seoul, Korea, Yongtae Shin
2018 Journal of clean energy technologies  
In this paper, we have studied the big data collection technology based on Apache Flume for bulk log collection.  ...  log processing is designed to be matched with one web server and one Flume agent, and the Flume agents connected to the web server are connected to the Flume agent that plays the role of storing in the Hadoop  ...  HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is now an Apache Hadoop subproject. Fig. 7 shows the HDFS Architecture [3] . B.  ... 
doi:10.7763/ijcte.2018.v10.1206 fatcat:75t6ft5ghvh4fog3tuyyavdt3e

Big Data Analytics [article]

Ahmed Masmoudi
2017 Zenodo  
"   In December 2012 apache releases Hadoop 1.0.0, more information and installation guide can be found at Apache Hadoop Documentation.  ...  Apache Spark AMPLab at UC Berkeley. Spark fits into the Hadoop open-source 2.  ... 
doi:10.5281/zenodo.573349 fatcat:qg7licyavbgbtph6jadfm6bncu

A Maturity Analysis of Big Data Technologies

Radu BONCEA, Ionut PETRE, Dragos-Marian SMADA, Alin ZAMFIROIU
2017 Informatică economică  
In recent years Big Data technologies have been developed at faster pace due to increase in demand from applications that generate and process vast amount of data.  ...  and services that complement the open-source Apache Hadoop platform;  DataStax provides a product which fully integrates Apache Hadoop with Apache Cassandra and Apache Solr in its DataStax Enterprise  ...  processing:  Big Data data visualization and advanced analytics;  Real time stream processing;  Machine learning at scale;  Enterprise integration.  DataTorrent is certified on Apache Hadoop, and  ... 
doi:10.12948/issn14531305/21.1.2017.05 fatcat:ibzrpjxlznedphvsft3uxaesky

A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm

2021 International Journal of Advanced Computer Science and Applications  
This paper provides a comprehensive comparison between Apache Hadoop & Apache Spark in terms of efficiency, scalability, security, cost-effectiveness, and other parameters.  ...  The most efficient solutions to Big Data analysis in a distributed environment are Hadoop and Spark administered by Apache, both these solutions are open-source data management frameworks and they allow  ...  The slope of the reducer function goes upward slightly as the size of the file increases, but the time slope of the mapper function goes upward at 75 degrees as the file size increases from 137MB to 202MB  ... 
doi:10.14569/ijacsa.2021.0120495 fatcat:t46gjqqcn5ak7diblq6tsuhqmy

Emerging trends and technologies in big data processing

Rubén Casado, Muhammad Younas
2014 Concurrency and Computation  
Lambdoop is still an ongoing project and has not been open sourced at the time of writing this paper.  ...  This paper is first of its kind that reviews and analyses current trends and technologies in relation to the characteristics, evolution, and processing of Big Data. b) Data storage Kafka [18] : Apache  ...  released a stable version of Hadoop. This was followed by Facebook and Yahoo started working on abstract layers over MapReduce. Yahoo!  ... 
doi:10.1002/cpe.3398 fatcat:qhyxvbwzereapnf3ac3i5eflgm

Developing a Real-Time Data Analytics Framework for Twitter Streaming Data

Babak Yadranjiaghdam, Seyedfaraz Yasrobi, Nasseh Tabrizi
2017 2017 IEEE International Congress on Big Data (BigData Congress)  
Currently there are different workflows offering realtime data analysis for Twitter, presenting general processing over streaming data.  ...  The proposed framework includes data ingestion and stream processing and data visualization components with the Apache Kafka messaging system that is used to perform data ingestion task.  ...  Hadoop, Spark, and Storm are implemented in JVM based • Spark processes in-memory data whereas Hadoop MapReduce goes back to the disk after a map action or a reduce action; thereby, Hadoop MapReduce  ... 
doi:10.1109/bigdatacongress.2017.49 dblp:conf/bigdata/YadranjiaghdamY17 fatcat:5koxm6vjn5hajf6yhujwypwbjm

Chapter 3 Big Data Outlook, Tools, and Architectures [chapter]

Hajira Jabeen
2020 Lecture Notes in Computer Science  
Furthermore, the chapter covers prominent technologies, tools, and architectures developed to handle this large data at scale.  ...  At the end, the chapter reviews knowledge graphs that address the challenges (e.g. heterogeneity, interoperability, variety) of big data through their specialised representation.  ...  The distributed coordination manages the sharing of the locks, shared-variables, realtime-configurations at runtime among the nodes.  ... 
doi:10.1007/978-3-030-53199-7_3 fatcat:vy7ac2ccszcenmxl7kxdiapfvq

Business Process Analytics and Big Data Systems: A Roadmap to Bridge the Gap

Sherif Sakr, Zakaria Maamar, Ahmed Awad, Boualem Benatallah, Wil M. P. Van Der Aalst
2018 IEEE Access  
Apache Flink 3 is another distributed in-memory data processing engine, which represents a flexible alternative for Hadoop that supports both batch and realtime processing [24] .  ...  A Process Footprint goes beyond process-enactment generated data.  ... 
doi:10.1109/access.2018.2881759 fatcat:2fcc4au7bfgklf3zemq7xfxcii

Fault Tolerance in MapReduce: A Survey [chapter]

Bunjamin Memishi, Shadi Ibrahim, María S. Pérez, Gabriel Antoniu
2016 Computer Communications and Networks  
Given that failures are common at large scale, these frameworks exhibit some fault tolerance and dependability techniques as built-in features.  ...  Data-intensive computing systems, such as Hadoop MapReduce, have as main goal the processing of an enormous amount of data in a short time, by transmitting the computation where the data resides.  ...  Apache Hadoop Reliability Since its appearance in 2006, Apache Hadoop has undergone many releases [27] .  ... 
doi:10.1007/978-3-319-44881-7_11 dblp:series/ccn/MemishiIPA16 fatcat:m5x33gpzunhzzgrdslagndiwzy

MapReduce: Simplified Data Analysis of Big Data

Seema Maitrey, C.K. Jha
2015 Procedia Computer Science  
The other one having such features is Hadoop which is the most popular open source MapReduce software adopted by many huge IT companies, such as Yahoo, Facebook, eBay and so on.  ...  In this paper, we focus specifically on Hadoop and its implementation of MapReduce for analytical processing.  ...  Fig. 1 : 1 Steps in MapReduce to process the database Fig. 2 : 2 MapReduce with combiners, partitioners Fig. 3 : 3 Primary contribution of Hadoop Apache Hadoop consists of several components.  ... 
doi:10.1016/j.procs.2015.07.392 fatcat:whtpro3grzbpphvfzbptlun744
« Previous Showing results 1 — 15 out of 123 results