A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2014; you can also visit the original URL.
The file type is application/pdf
.
Filters
Apache hadoop goes realtime at Facebook
2011
Proceedings of the 2011 international conference on Management of data - SIGMOD '11
Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. ...
MySQL database scheme used in other applications at Facebook and many other web-scale companies. ...
We have Acknowledgements are also due to Patrick Kling for implementing a test suite for HDFS HA as part of his internship at Facebook. ...
doi:10.1145/1989323.1989438
dblp:conf/sigmod/BorthakurGSMSKRMMRSA11
fatcat:gnicex2fwzbotcpmlaqhz7k2mm
Identifying Requirements for Big Data Analytics and Mapping to Hadoop Tools
2019
International journal of recent technology and engineering
Apache Hadoop is a popular open-source platform that supports storage and processing of extremely large datasets. For the purposes of big data analytics, Hadoop ecosystem provides a variety of tools. ...
Big data is being generating in a wide variety of formats at an exponential rate. ...
Hadoop is currently used by Google, Facebook, LinkedIn, Yahoo!, Twitter and many more. Hadoop ecosystem includes various components of Apache Hadoop software library. ...
doi:10.35940/ijrte.c5524.098319
fatcat:zgw5y6nucve3jo36sqeio3wukq
New and Existing Approaches Reviewing of Big Data Analysis with Hadoop Tools
2022
Baghdad Science Journal
Everybody is connected with social media like (Facebook, Twitter, LinkedIn, Instagram...etc.) that generate a large quantity of data and which traditional applications are inadequate to process. ...
Comparison between Hadoop and spark has been also illustrated. ...
Every dataset entered the shuffling of the general process of the reducer in guarantees that partitioning goes on partition divides completed at suitable reducers where for http is used by the reducer ...
doi:10.21123/bsj.2022.19.4.0887
fatcat:syywdq6xgret5gfezyjuam33qy
Acerca de la aplicación de MapReduce + Hadoop en el tratamiento de Big Data
2015
Revista Cubana de Ciencias Informáticas
MapReduce + Hadoop es un modelo de programación que es utilizado por disímiles empresas que se dedican al desarrollo de software en el mundo, entre ellas Google y Yahoo. ...
La belleza de Hadoop MapReduce es que los usuarios por lo general solo tienen que
− BORTHAKUR, D., et al. 2011. Apache Hadoop goes realtime at Facebook. [ed.] ...
Apache Hadoop goes realtime at Facebook. s.l. : Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, 2011. − CASSANDRA, A. 2013. The Apache Software Foundation. ...
doaj:dc662fe03ad1456b8ef3dc2acbe898fa
fatcat:ne4wdl7hwbgrrj42fma6nnhkmy
Perform wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm
[article]
2013
arXiv
pre-print
The paper describes performing a wordcount Map-Reduce Job in Single Node Apache Hadoop cluster and compress data using Lempel-Ziv-Oberhumer (LZO) algorithm. ...
This storage capacity can be reduced and distributed processing of huge data can be done using Apache Hadoop which uses Map-reduce algorithm and combines the repeating data so that entire data is stored ...
Apache Hadoop software library itself detects and handles any failures at application layer [2]. 2.2 Hadoop Distributed File System -HDFS A distributed user-level filesystem HDFS Hadoop Distributed File ...
arXiv:1307.1517v1
fatcat:deecdzioxvdsdkq52nfyawrgiq
Study of the Big Data Collection Scheme Based Apache Flume for Log Collection
2018
Journal of clean energy technologies
In this paper, we have studied the big data collection technology based on Apache Flume for bulk log collection. ...
log processing is designed to be matched with one web server and one Flume agent, and the Flume agents connected to the web server are connected to the Flume agent that plays the role of storing in the Hadoop ...
HDFS was originally built as infrastructure for the Apache Nutch web search engine project. HDFS is now an Apache Hadoop subproject. Fig. 7 shows the HDFS Architecture [3] .
B. ...
doi:10.7763/ijcte.2018.v10.1206
fatcat:75t6ft5ghvh4fog3tuyyavdt3e
Big Data Analytics
[article]
2017
Zenodo
"
In December 2012 apache releases Hadoop 1.0.0, more information and installation guide can be found at Apache Hadoop Documentation. ...
Apache Spark AMPLab at UC Berkeley. Spark fits into the Hadoop open-source 2. ...
doi:10.5281/zenodo.573349
fatcat:qg7licyavbgbtph6jadfm6bncu
A Maturity Analysis of Big Data Technologies
2017
Informatică economică
In recent years Big Data technologies have been developed at faster pace due to increase in demand from applications that generate and process vast amount of data. ...
and services that complement the open-source Apache Hadoop platform; DataStax provides a product which fully integrates Apache Hadoop with Apache Cassandra and Apache Solr in its DataStax Enterprise ...
processing: Big Data data visualization and advanced analytics; Real time stream processing; Machine learning at scale; Enterprise integration. DataTorrent is certified on Apache Hadoop, and ...
doi:10.12948/issn14531305/21.1.2017.05
fatcat:ibzrpjxlznedphvsft3uxaesky
A Comparative Analysis of Hadoop and Spark Frameworks using Word Count Algorithm
2021
International Journal of Advanced Computer Science and Applications
This paper provides a comprehensive comparison between Apache Hadoop & Apache Spark in terms of efficiency, scalability, security, cost-effectiveness, and other parameters. ...
The most efficient solutions to Big Data analysis in a distributed environment are Hadoop and Spark administered by Apache, both these solutions are open-source data management frameworks and they allow ...
The slope of the reducer function goes upward slightly as the size of the file increases, but the time slope of the mapper function goes upward at 75 degrees as the file size increases from 137MB to 202MB ...
doi:10.14569/ijacsa.2021.0120495
fatcat:t46gjqqcn5ak7diblq6tsuhqmy
Emerging trends and technologies in big data processing
2014
Concurrency and Computation
Lambdoop is still an ongoing project and has not been open sourced at the time of writing this paper. ...
This paper is first of its kind that reviews and analyses current trends and technologies in relation to the characteristics, evolution, and processing of Big Data. b) Data storage Kafka [18] : Apache ...
released a stable version of Hadoop. This was followed by Facebook and Yahoo started working on abstract layers over MapReduce. Yahoo! ...
doi:10.1002/cpe.3398
fatcat:qhyxvbwzereapnf3ac3i5eflgm
Developing a Real-Time Data Analytics Framework for Twitter Streaming Data
2017
2017 IEEE International Congress on Big Data (BigData Congress)
Currently there are different workflows offering realtime data analysis for Twitter, presenting general processing over streaming data. ...
The proposed framework includes data ingestion and stream processing and data visualization components with the Apache Kafka messaging system that is used to perform data ingestion task. ...
Hadoop, Spark, and Storm are implemented in JVM based
• Spark processes in-memory data whereas Hadoop MapReduce goes back to the disk after a map action or a reduce action; thereby, Hadoop MapReduce ...
doi:10.1109/bigdatacongress.2017.49
dblp:conf/bigdata/YadranjiaghdamY17
fatcat:5koxm6vjn5hajf6yhujwypwbjm
Chapter 3 Big Data Outlook, Tools, and Architectures
[chapter]
2020
Lecture Notes in Computer Science
Furthermore, the chapter covers prominent technologies, tools, and architectures developed to handle this large data at scale. ...
At the end, the chapter reviews knowledge graphs that address the challenges (e.g. heterogeneity, interoperability, variety) of big data through their specialised representation. ...
The distributed coordination manages the sharing of the locks, shared-variables, realtime-configurations at runtime among the nodes. ...
doi:10.1007/978-3-030-53199-7_3
fatcat:vy7ac2ccszcenmxl7kxdiapfvq
Business Process Analytics and Big Data Systems: A Roadmap to Bridge the Gap
2018
IEEE Access
Apache Flink 3 is another distributed in-memory data processing engine, which represents a flexible alternative for Hadoop that supports both batch and realtime processing [24] . ...
A Process Footprint goes beyond process-enactment generated data. ...
doi:10.1109/access.2018.2881759
fatcat:2fcc4au7bfgklf3zemq7xfxcii
Fault Tolerance in MapReduce: A Survey
[chapter]
2016
Computer Communications and Networks
Given that failures are common at large scale, these frameworks exhibit some fault tolerance and dependability techniques as built-in features. ...
Data-intensive computing systems, such as Hadoop MapReduce, have as main goal the processing of an enormous amount of data in a short time, by transmitting the computation where the data resides. ...
Apache Hadoop Reliability Since its appearance in 2006, Apache Hadoop has undergone many releases [27] . ...
doi:10.1007/978-3-319-44881-7_11
dblp:series/ccn/MemishiIPA16
fatcat:m5x33gpzunhzzgrdslagndiwzy
MapReduce: Simplified Data Analysis of Big Data
2015
Procedia Computer Science
The other one having such features is Hadoop which is the most popular open source MapReduce software adopted by many huge IT companies, such as Yahoo, Facebook, eBay and so on. ...
In this paper, we focus specifically on Hadoop and its implementation of MapReduce for analytical processing. ...
Fig. 1 : 1 Steps in MapReduce to process the database
Fig. 2 : 2 MapReduce with combiners, partitioners
Fig. 3 : 3 Primary contribution of Hadoop Apache Hadoop consists of several components. ...
doi:10.1016/j.procs.2015.07.392
fatcat:whtpro3grzbpphvfzbptlun744
« Previous
Showing results 1 — 15 out of 123 results