Filters








25 Hits in 5.2 sec

A Survey of Big Data Machine Learning Applications Optimization in Cloud Data Centers and Networks [article]

Sanaa Hamid Mohamed, Taisir E.H. El-Gorashi, Jaafar M.H. Elmirghani
2019 arXiv   pre-print
The MapReduce programming model and its widely-used open-source platform; Hadoop, are enabling the development of a large number of cloud-based services and big data applications.  ...  MapReduce and Hadoop thus introduce innovative, efficient, and accelerated intensive computations and analytics.  ...  All data are provided in full in the results section of this paper.  ... 
arXiv:1910.00731v1 fatcat:kvi3br4iwzg3bi7fifpgyly7m4

Privacy-Preserving Scanning of Big Content for Sensitive Data Exposure with MapReduce

Fang Liu, Xiaokui Shu, Danfeng Yao, Ali R. Butt
2015 Proceedings of the 5th ACM Conference on Data and Application Security and Privacy - CODASPY '15  
We design new MapReduce algorithms for computing collection intersection for data leak detection. Our prototype implemented with the Hadoop system achieves 225 Mbps analysis throughput with 24 nodes.  ...  This transformation supports the secure outsourcing of the data leak detection to untrusted MapReduce and cloud providers.  ...  IMPLEMENTATION AND EVALUA-TION We implement our algorithms with Java in Hadoop, which is an open-source software system implementing MapReduce.  ... 
doi:10.1145/2699026.2699106 dblp:conf/codaspy/LiuSYB15 fatcat:ntcepdzfirdttlpbmm6tqtkbpa

A survey of open source tools for machine learning with big data in the Hadoop ecosystem

Sara Landset, Taghi M. Khoshgoftaar, Aaron N. Richter, Tawfiq Hasanin
2015 Journal of Big Data  
We discuss the advantages and disadvantages of three different processing paradigms along with a comparison of engines that implement them, including MapReduce, Spark, Flink, Storm, and H 2 O.  ...  Abstract With an ever-increasing amount of options, the task of selecting machine learning tools for big data can be difficult.  ...  While no single entity is working with data at this magnitude, many industries are still generating data too large to be processed efficiently using traditional techniques.  ... 
doi:10.1186/s40537-015-0032-1 fatcat:zgcsiokrynfhzbmaudqf7rcll4

Parallax - A New Operating System Prototype Demonstrating Service Scaling and Service Self-Repair in Multi-core Servers

Rao Mikkilineni, Ian Seyler
2011 2011 IEEE 20th International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises  
The operating system is implemented in assembler language for efficiency and supports C/C++ programming interfaces for high-level programming.  ...  A parallel signaling network overlay over a network of von Neumann stored program control (SPC) computing nodes is utilized to implement dynamic fault, configuration, accounting, performance, and security  ...  Eventually, PCIExpress, and TCP/IP will be added along with Shared Memory. Under Parallax, each DIME is addressable as a separate entity via the signaling and data channels.  ... 
doi:10.1109/wetice.2011.19 dblp:conf/wetice/MikkilineniS11 fatcat:swo2fpsrkfa3xna2vxll4av7x4

The MADlib Analytics Library or MAD Skills, the SQL [article]

Joe Hellerstein, Christopher Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng, Kun Li, Arun Kumar
2012 arXiv   pre-print
It provides an evolving suite of SQL-based algorithms for machine learning, data mining and statistics that run at scale within a database engine, with no need for data import/export to other tools.  ...  MADlib is a free, open source library of in-database analytic methods.  ...  a single pass over the data.  ... 
arXiv:1208.4165v1 fatcat:uouyhvyo3va2veamidxbeacou4

The MADlib analytics library

Joseph M. Hellerstein, Kun Li, Arun Kumar, Christoper Ré, Florian Schoppmann, Daisy Zhe Wang, Eugene Fratkin, Aleksander Gorajek, Kee Siong Ng, Caleb Welton, Xixuan Feng
2012 Proceedings of the VLDB Endowment  
It provides an evolving suite of SQL-based algorithms for machine learning, data mining and statistics that run at scale within a database engine, with no need for data import/export to other tools.  ...  MADlib is a free, open source library of in-database analytic methods.  ...  a single pass over the data.  ... 
doi:10.14778/2367502.2367510 fatcat:bqz6ufkkpvf2jngbjbmw7jhnwu

A Survey on Big Data for Trajectory Analytics

Damião Ribeiro de Almeida, Cláudio de Souza Baptista, Fabio Gomes de Andrade, Amilcar Soares
2020 ISPRS International Journal of Geo-Information  
the efficiency and development of decision-making systems that deal with trajectory data.  ...  With the considerable growth in the volume of trajectory data, storing such data into Spatial Database Management Systems (SDBMS) has become challenging.  ...  The most recent data are stored in a Redis database and the Azure for historical data. ST-Hadoop [47] was the first open-source MapReduce framework with native spatial-temporal data support.  ... 
doi:10.3390/ijgi9020088 fatcat:bgpfxcx5jngd7cjcj6u4fl4jpi

Point-of-Interest Recommendation [chapter]

2017 Encyclopedia of GIS  
Epidemiologists are primarily concerned with public health data, which includes the design of stud-ies, evaluation and interpretation of public health data, and the maintenance of data collection systems  ...  Efficient use of these interventions requires targeting sub-  ...  Homogeneous coordinates e of an entity are invariant with respect to multiplication by a scalar ¤ 0, thus that e and e represent the same entity.  ... 
doi:10.1007/978-3-319-17885-1_100975 fatcat:myyebmb3hrhgnpqmobyyvm2xum

Moving Objects Analytics: Survey on Future Location & Trajectory Prediction Methods [article]

Harris Georgiou, Sophia Karagiorgou, Yannis Kontoulis, Nikos Pelekis, Petros Petrou, David Scarlatti, Yannis Theodoridis
2018 arXiv   pre-print
We provide an extensive review of over 50 works, also proposing a novel taxonomy of predictive algorithms over moving objects.  ...  This source of information constitutes a rich input for data analytics processes, either offline (e.g. cluster analysis, hot motion discovery) or online (e.g. short-term forecasting of forthcoming positions  ...  the fact that there are multiple data sources that monitor the same moving objects.  ... 
arXiv:1807.04639v1 fatcat:lvje57kod5eldaplkl53wbwgti

Parallel MCNN (pMCNN) with Application to Prototype Selection on Large and Streaming Data

V. Susheela Devi, Lakhpat Meena
2017 Journal of Artificial Intelligence and Soft Computing Research  
The results of these algorithms using MCNN and pMCNN have been compared with an existing algorithm for streaming data.  ...  We have proposed two incremental algorithms using MCNN to carry out prototype selection on large and streaming data.  ...  We present efficient algorithms using MapReduce for data preparation and indexing.  ... 
doi:10.1515/jaiscr-2017-0011 fatcat:yrzysokhrnfqrhw4oayevpjyn4

WorkStream-- A Design Pattern for Multicore-Enabled Finite Element Computations

Bruno Turcksin, Martin Kronbichler, Wolfgang Bangerth
2016 ACM Transactions on Mathematical Software  
We also describe in detail how this design pattern can be efficiently implemented, and provide numerical scalability results from its use in the DEAL.II software library.  ...  that need to be performed in modern finite element codes can be described as an operation that needs to be done independently on every cell, followed by a reduction of these local results into a global data  ...  Computations were performed for a two-dimensional domain and are averaged over multiple time steps.  ... 
doi:10.1145/2851488 fatcat:tdyed2as6rdoxdueyqfooqaicy

Applications and Techniques for Fast Machine Learning in Science [article]

Allison McCarn Deiana, Joshua Agar, Michaela Blott, Giuseppe Di Guglielmo, Javier Duarte, Philip Harris, Scott Hauck, Mia Liu, Mark S. Neubauer, Jennifer Ngadiuba, Seda Ogrenci-Memik, Maurizio Pierini (+74 others)
2021 arXiv   pre-print
We also present overlapping challenges across the multiple scientific domains where common solutions can be found.  ...  This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.  ...  SYCL programs perform best when paired with SYCL-aware C++ compilers such as the open-source data-parallel C++ (DPC++) compiler [475] .  ... 
arXiv:2110.13041v1 fatcat:cvbo2hmfgfcuxi7abezypw2qrm

Image Steganography Using HBC and RDH Technique

Hemalatha M, Prasanna A, Dinesh Kumar R, Vinothkumar D
2014 International Journal of Computer Applications Technology and Research  
Reverse Data Hiding (RDH) is used to get the original image and it proceeds once when all the corners are unlocked with proper secret keys.  ...  With these methods the performance of the stegnographic technique is improved in terms of PSNR value.  ...  The primitive process that uses our MapReduce concurrently or a parallelized data crawler, which acts as between with the Internet through multiple independent crawling techniques.  ... 
doi:10.7753/ijcatr0303.1001 fatcat:4i6tujs4oje2tnxf5c25eh26x4

Frame-Semantic Parsing

Dipanjan Das, Desai Chen, André F. T. Martins, Nathan Schneider, Noah A. Smith
2014 Computational Linguistics  
Given the limited size of available resources, accurately producing richly structured frame-semantic structures with high coverage will require data-driven techniques beyond simple supervised classification  ...  We use a probabilistic framework that cleanly integrates the FrameNet lexicon and limited available training data.  ...  Entities roles.  ... 
doi:10.1162/coli_a_00163 fatcat:f5dtxrjnsnfpddy56smoniuc5e

Masthead - Full issue pdf

2017 Chemistry International  
The Rise of Primary Research Data 4 Chemistry International July-September 2017 society.  ...  ] We hope you enjoy the reading, and look forward to meeting you at the Congress in São Paulo, Brazil, [9][10][11][12][13][14] July and the Special Symposium on 13 July 2017. [17] Leah , and took over  ...  of MapReduce and Hadoop to distribute a search among multiple servers and then analyze the huge amounts of information that result  ... 
doi:10.1515/ci-2017-0300 fatcat:ibp2y332gje47hx727nnwegtgi
« Previous Showing results 1 — 15 out of 25 results