8 Hits in 5.5 sec

Machine Learning and Cloud Computing: Survey of Distributed and SaaS Solutions [article]

Daniel Pop
2016 arXiv   pre-print
Next on the list are libraries of distributed implementations for ML algorithms, and on-premise deployments of complex systems for data analytics and data mining.  ...  Applying popular machine learning algorithms to large amounts of data raised new challenges for the ML practitioners.  ...  of data mining and machine learning algorithms on multiple processor environments or on multiple threaded machines.  ... 
arXiv:1603.08767v1 fatcat:vuzeggijyfbb7bmlcqdnt3xdjy

Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks

Alberto Fernández, Sara del Río, Victoria López, Abdullah Bawakid, María J. del Jesus, José M. Benítez, Francisco Herrera
2014 Wiley Interdisciplinary Reviews Data Mining and Knowledge Discovery  
In particular, we focus on those systems for large-scale analytics based on the MapReduce scheme and Hadoop, its open-source implementation.  ...  The term 'Big Data' has spread rapidly in the framework of Data Mining and Business Intelligence.  ...  . • Wegener et al. 129 achieved the integration of Weka 6 (an open-source Machine Learning and Data Mining software tool) and MapReduce.  ... 
doi:10.1002/widm.1134 fatcat:fictc36jlnhblb45n2ic4fmfb4

Data Science and Distributed Intelligence: Recent Developments and Future Insights [chapter]

Alfredo Cuzzocrea, Mohamed Medhat Gaber
2013 Studies in Computational Intelligence  
Following this major trend, in this paper we provide a background of these new terms, followed by a discussion of recent developments in the data mining and data warehousing areas in the light of aforementioned  ...  Big Data, Data Science and MapReduce are three keywords that have flooded our research papers and technical articles during the last two years.  ...  A more generic contribution has been developed by Ghoting et al [17] , which propose a generic toolkit for the development of Data Mining algorithms using MapRe-duce, termed as NIMBLE.  ... 
doi:10.1007/978-3-642-32524-3_18 fatcat:sm3yofpcqbhs5lk2wa3k4xcmgy

29th International Conference on Data Engineering [book of abstracts]

2013 2013 IEEE 29th International Conference on Data Engineering Workshops (ICDEW)  
two sets of open research questions: Better systems support for the already established use cases of Machine Learning and support for recent advances in Machine Learning research seminar 2: Big Data Integration  ...  Using a data-parallel, NUMA-aware many-core implementation with block summaries, inverted index data structures, and efficient aggregation algorithms, we achieve one to two orders of magnitude better performance  ...  These volunteers welcome participants, give directions, help in the sessions and on the registration desk, and generally make sure the conference is running smoothly.  ... 
doi:10.1109/icdew.2013.6547409 fatcat:wadzpuh3b5htli4mgb4jreoika

Big data in healthcare: management, analysis and future prospects

Sabyasachi Dash, Sushil Kumar Shakyawar, Mohit Sharma, Sandeep Kaushik
2019 Journal of Big Data  
That is why data collection is an important part for every organization. We can also use this data for the prediction of current trends of certain parameters and future events.  ...  In the healthcare industry, various sources for big data include hospital records, medical records of patients, results of medical examinations, and devices that are a part of internet of things.  ...  One of most popular open-source distributed application for this purpose is Hadoop [16] . Hadoop implements MapReduce algorithm for processing and generating large datasets.  ... 
doi:10.1186/s40537-019-0217-0 fatcat:6yb7kk5ervaqhjt6lhe3utkivq

Strategic Plan for a Scientific Software Innovation Institute (S2I2) for High Energy Physics [article]

Peter Elmer, Mark Neubauer, Michael D. Sokoloff
2018 arXiv   pre-print
A commensurate investment in R&D for the software for acquiring, managing, processing and analyzing HL-LHC data will be critical to maximize the return-on-investment in the upgraded accelerator and detectors  ...  The quest to understand the fundamental building blocks of nature and their interactions is one of the oldest and most ambitious of human scientific endeavors.  ...  TMVA The Toolkit for Multivariate Data Analysis with ROOT is a standalone project that provides a ROOT-integrated machine learning environment for the processing and parallel evaluation of sophisticated  ... 
arXiv:1712.06592v2 fatcat:gm6v2suqj5dkphccrp4bcsymau

Research on improved K - nearest neighbor algorithm based on spark platform

Yushui Geng, Xianzhao Yan
2017 Proceedings of the 2017 2nd Joint International Information Technology, Mechanical and Electronic Engineering Conference (JIMEC 2017)   unpublished
The birth of Hadoop makes people concerned about the study of MapReduce, And Spark through the introduction of RDD data model and memory-based computing model, So that it can be well adapted to the data  ...  The Traditional K Near Neighbor Algorithm Unlike other model-based and rule-based classification algorithms, The KNN is an instance-based supervised machine learning method.  ...  NIMBLE: a toolkit for the implementation of parallel data mining and machine learning algorithms on mapreduce[C]//Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and  ... 
doi:10.2991/jimec-17.2017.120 fatcat:jhflmmfl2be4hdctt7ujxigwii

Exploiting comparable corpora for domain-specific statistical machine translation

Magdalena Plamadă
The availability of parallel texts in various language combinations is the main bottleneck for Statistical Machine Translation (SMT) systems, as their performance is strongly influenced by the amount and  ...  BLEU and METEOR) and on the percentage of aligned content words. The features have different weights and they are determined automatically on a training set.  ...  Shuruq is a smart, friendly little girl, full of energy. She learned to walk on her hands and moves around nimbly.  ... 
doi:10.5167/uzh-153218 fatcat:a4ksizaearhg7dc2omctwczo3i