Data science with R [article]

Roberta Turra, Giorgio Pedrazzi, Salvatore Cataudella, Alessandro Marani
2021 Zenodo  
The purpose of this course is to present researchers and scientists with R implementation of Machine Learning methods. The first part of the course will consist of introductory lectures on popular Machine Learning algorithms including unsupervised methods (Clustering, Association Rules) and supervised ones (Decision Trees, Naive Bayes, Random Forests and Deep Neural Network). Basic Machine Learning concepts such as training set, test set, validation set, overfitting, bagging, boosting will be
more » ... troduced as well as performance evaluation for supervised and unsupervised methods. The second part will consist of practical exercises such as reading data, using packages and building machine learning applications. Different options for parallel programming will be shown using specific R packages (parallel, h2o,...). For Deep Learning applications the Keras package will be presented. The examples will cover the analysis of large datasets and images datasets. Participants will use R on Cineca HPC facilities for practical assignments. Skills: At the end of the course, the student will be expected to have acquired: • the ability to perform basic operations on matrices and dataframes • the ability to manage packages • the ability to navigate in the RStudio interface • a general knowledge of Machine and Deep Learning methods • a general knowledge of the most popular packages for Machine and Deep Learning • a basic knowledge of different parallel programming techniques • the ability to build machine learning applications with large datasets and images datasets Target audience: Students and researchers with different backgrounds, looking for technologies and methods to analyze a large amount of data. Pre-requisites: Participants must have a basic statistics knowledge. Participants must also be familiar with basic Linux and R language.
doi:10.5281/zenodo.7565805 fatcat:x5yrikhmbrdqzlxwp4hizcgoui