83 Hits in 1.6 sec


Shivaram Venkataraman, Ion Stoica, Matei Zaharia, Zongheng Yang, Davies Liu, Eric Liang, Hossein Falaki, Xiangrui Meng, Reynold Xin, Ali Ghodsi, Michael Franklin
2016 Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16  
We present SparkR, an R package that provides a frontend to Apache Spark and uses Spark's distributed computation engine to enable large scale data analysis from the R shell.  ...  R is a popular statistical programming language with a number of extensions that support data processing and machine learning tasks.  ...  In this paper, we look at how we can scale R programs while making it easy to use and deploy across a number of workloads.  ... 
doi:10.1145/2882903.2903740 dblp:conf/sigmod/VenkataramanYLL16 fatcat:inpgt6bmmne43bgtkzzhjv7vxu

Optimizing R with SparkR on a commodity cluster for biomedical research

Martin Sedlmayr, Tobias Würfl, Christian Maier, Lothar Häberle, Peter Fasching, Hans-Ulrich Prokosch, Jan Christoph
2016 Computer Methods and Programs in Biomedicine  
SparkR also scales better with the number of nodes in the cluster than MPI due to optimized data communication. Conclusion: R is a popular environment for clinical data analysis.  ...  The new SparkR solution offers elastic resources and allows supporting big data analysis using R even on nondedicated resources with minimal change to the original code.  ...  Acknowledgments The research has been supported by the Smart Data Program of the German Federal Ministry for Economic Affairs and Energy (1MT14001B).  ... 
doi:10.1016/j.cmpb.2016.10.006 pmid:28110735 fatcat:wc5fxrrczjaand642h346auwvy

Development of Multiple Big Data Analytics Platforms with Rapid Response

Bao Rong Chang, Yun-Da Lee, Po-Hao Liao
2017 Scientific Programming  
(BI) to carry out rapid data retrieval and analytics with R programming.  ...  In addition, users would simply give R commands rather than run Java or Scala program to perform the data retrieval and analytics in the proposed platforms.  ...  SparkR Based on Spark. SparkR is an R suite developed by AMPLab that provides Spark with a Resilient Distributed Dataset (RDD) [27] API that allows R to carry out distributed computing using Spark.  ... 
doi:10.1155/2017/6972461 fatcat:5zu5cwpbqfce3dggkr54rqemfu

Application of Big Data Analysis with Decision Treefor Road Accident

Addi Ait-Mlouk, Fatima Gharnati, Tarik Agouti
2017 Indian Journal of Science and Technology  
To deal with this challenge, Apache Spark stand as a powerful large scale distributed computing platform that can be used successfully for machine learning against very large databases.  ...  This work employed large-scale machine learning techniques especially Decision Tree with Apache Spark framework for big data analysis to build a model that can predict the factors lead to road accidents  ...  Decision rules extraction: In this step, SparkR 33 is used as an R package that provides a light-weight front end to use Apache Spark from R 34 .  ... 
doi:10.17485/ijst/2017/v10i29/117325 fatcat:ygddvftm2fgv5oe65dmnhwrxke

A Scalable Data Integration and Analysis Architecture for Sensor Data of Pediatric Asthma

Dimitris Stripelis, Jose Luis Ambite, Yao-Yi Chiang, Sandrah P. Eckel, Rima Habre
2017 2017 IEEE 33rd International Conference on Data Engineering (ICDE)  
A main contribution of this work is extending the Spark framework with a mediation layer, based on logical schema mappings and query rewriting, to facilitate data analysis over a consistent harmonized  ...  Our architecture is based on the Apache Kafka, Spark and Hadoop frameworks and PostgreSQL DBMS.  ...  Our system builds upon Apache Kafka and Apache Spark, which are used to integrate both sensor and traditional data sources, and to provide analytics at scale.  ... 
doi:10.1109/icde.2017.198 pmid:29731601 pmcid:PMC5935488 dblp:conf/icde/StripelisACEH17 fatcat:3i5xtfrrlrdfrbzz6ndpdoct7i

Processing large-scale data with Apache Spark
Apache Spark를 활용한 대용량 데이터의 처리

Seyoon Ko, Joong-Ho Won
2016 Korean Journal of Applied Statistics  
We also review the machine learning package MLlib, and the R language interface SparkR.  ...  In this work, we introduce the concept and programming model of Spark as well as show some implementations of simple statistical computing applications.  ...  Figure 3 . 1 . 31 A diagram of distributed execution model of Spark. Figure 3 . 2 . 32 Figure 3.2. Diagram for the word count program with a two-line sample text, "to be or"//"not to be".  ... 
doi:10.5351/kjas.2016.29.6.1077 fatcat:ljrmw53inje5vjife5l5mmsr44

Visualization and statistical modeling of financial big data: double-log modeling with skew-symmetric error distributions

Masayuki Jimichi, Daisuke Miyamoto, Chika Saka, Shuichi Nagata
2018 Japanese Journal of Statistics and Data Science  
We present exploratory data analysis carried out in the R programming language.  ...  This result is obtained by comparing the Akaike information criteria of several double-log models with independent and identically distributed random error terms with skew-symmetric distributions and by  ...  This work is partially supported by a Grant-in-Aid for Scientific Research (KAKENHI: No. 16K04022) and the Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures (JHPCN  ... 
doi:10.1007/s42081-018-0019-1 fatcat:xin5pg7nmvgdfddmsfcbk6mgnq

Enabling Signal Processing over Data Streams

Milos Nikolic, Badrish Chandramouli, Jonathan Goldstein
2017 Proceedings of the 2017 ACM International Conference on Management of Data - SIGMOD '17  
To support such increasingly important scenarios, many data management systems integrate with numerical frameworks like R.  ...  Such solutions, however, incur significant performance penalties as relational data processing engines and numerical tools operate on fundamentally different data models with expensive intercommunication  ...  Spark from R; • SciDB with R integration 14.12 (SCIDB-R) 2 .  ... 
doi:10.1145/3035918.3035935 dblp:conf/sigmod/NikolicCG17 fatcat:m7wwhctohfb2fi5pze6ubibnn4


Edward Ma, Vishrut Gupta, Meichun Hsu, Indrajit Roy
2016 Proceedings of the VLDB Endowment  
We have integrated ddR with many backends, such as R's single-node parallel framework, multi-node SNOW framework, Spark, and HPE Distributed R, with few or no modifications to any of these systems.  ...  As a result, data scientists have to learn to use different interfaces such as RHadoop, SparkR, Revolution R's ScaleR, and HPE's Distributed R.  ...  For example, SparkR, which is an R interface for the Apache Spark engine, exposes dozens of Spark's functions that can be non-intuitive to data scientists, and a program written in SparkR's API will not  ... 
doi:10.14778/3007263.3007268 fatcat:yueaeac55jfq7geg2n3vinnssi

SparkBench – A Spark Performance Testing Suite [chapter]

Dakshi Agrawal, Ali Butt, Kshitij Doshi, Josep-L. Larriba-Pey, Min Li, Frederick R Reiss, Francois Raab, Berni Schiefer, Toyotaro Suzumura, Yinglong Xia
2016 Lecture Notes in Computer Science  
Spark has emerged as an easy to use, scalable, robust and fast system for analytics with a rapidly growing and vibrant community of users and contributors.  ...  This proposal describes several desirable properties flowing from the larger scale, greater and evolving variety, and nuanced requirements of different applications of Spark.  ...  Acknowledgements The authors would like to acknowledge all those who contributed with suggestions, ideas and provided valuable feedback during earlier drafts of this document.  ... 
doi:10.1007/978-3-319-31409-9_3 fatcat:koanet7fdfdfnegkhujk3pj27q

Supporting distributed, interactive Jupyter and RStudio in a scheduled HPC environment with Spark using Open OnDemand

OH-TECH Consortium
2018 Figshare  
RStudio with Spark There are two competing R packages for connecting to a running Spark cluster: sparkR [18] and sparklyr [16] .  ...  INTRODUCTION Researchers want to apply large-scale computation to new disciplines and with new tools.  ... 
doi:10.6084/m9.figshare.6887693 fatcat:hep2a4ryu5dh3csdbx5upqp6te

A Survey on Spark Ecosystem for Big Data Processing [article]

Shanjiang Tang, Bingsheng He, Ce Yu, Yusen Li, Kun Li
2018 arXiv   pre-print
Finally, we make a discussion on the open issues and challenges for large-scale in-memory data processing with Spark.  ...  Spark adopts a flexible Resident Distributed Dataset (RDD) programming model with a set of provided transformation and action operators whose operating functions can be customized by users according to  ...  SparkR [142] , [53] is a light-weight frontend system that incorporates R into Spark and enables R programmers to perform large-scale data analysis from the R shell.  ... 
arXiv:1811.08834v1 fatcat:6fxvg6me7rayzm4suoabyg7fii

A Review: Predictive Analytics with Big Data

Mr. Rizwanahmed B. Mujawar, Dr. Dinesh B. Kulkarni
2017 IJARCCE  
[9] proposed an R front-end to Apache Spark and allows users to run large scale data analysis using Spark's distributed computation engine referred as SparkR.  ...  Author present SparkR, an R package that provides a front-end to Apache Spark and uses Spark's distributed computation engine to enable large scale data analysis from the R shell. Ping Sun et al.  ... 
doi:10.17148/ijarcce.2017.63124 fatcat:e2oyzsp2r5g6rj6r7k7trdgc5u

Database Integrated Analytics Using R: Initial Experiences with SQL-Server + R

Josep Ll. Berral, Nicolas Poggi
2016 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW)  
Here we show a first taste of such technology by testing the portability of our ALOJA-ML analytics framework, coded in R, to Microsoft SQL-Server 2016, one of the SQL+R solutions released recently.  ...  Recently, database service providers have decided to integrate "R-as-a-Service" in their DB solutions.  ...  set-up, using the traditional resources available on R it is possible to scale R procedures.  ... 
doi:10.1109/icdmw.2016.0009 dblp:conf/icdm/BerralP16 fatcat:izibngwkhnf2xanzxcrcs3kbwi

Majority Rule Approach to Deep Learning for Large Benchmark Data and Real Credit Card Transaction Data

Ayahiko Niimi
2018 Journal of Internet Technology and Secured Transaction  
Additionally, we validated the proposed methods using a large-scale transaction dataset.  ...  Herein, we validated our proposed methods by comparing benchmark experiments with other machine learning approaches.  ...  SparkR is an R package that provides a lightweight R-based frontend for Apache Spark [19] .  ... 
doi:10.20533/jitst.2046.3723.2018.0067 fatcat:7yst7kp5xjfm3o3sui4psrefui
« Previous Showing results 1 — 15 out of 83 results