2,907 Hits in 7.0 sec

Large scale distributed neural network training through online distillation [article]

Rohan Anil, Gabriel Pereyra, Alexandre Passos, Robert Ormandi, George E. Dahl, Geoffrey E. Hinton
2020 arXiv   pre-print
Two neural networks trained on disjoint subsets of the data can share knowledge by encouraging each model to agree with the predictions the other model would have made.  ...  Our first claim is that online distillation enables us to use extra parallelism to fit very large datasets about twice as fast.  ...  Another important comparison is to an ensemble of two neural networks, each trained with 128 GPUs and synchronous SGD.  ... 
arXiv:1804.03235v2 fatcat:ftkf2wofpjhqxaeilzksiwpfsq

Fourier-Based Parametrization of Convolutional Neural Networks for Robust Time Series Forecasting [chapter]

Sascha Krstanovic, Heiko Paulheim
2019 Lecture Notes in Computer Science  
To that end, we use Convolutional Neural Networks (CNNs) for time series forecasting and determine a part of the network layout based on the time series' Fourier coefficients.  ...  Instead of optimizing hyperparameters by training multiple models, we propose a method to estimate optimal hyperparameters directly from the characteristics of the time series at hand.  ...  The neural networks were trained on a NVIDIA Tesla K80 GPU and an Intel i7-6820HQ CPU was used for all other models as these don't profit from GPU usage.  ... 
doi:10.1007/978-3-030-33778-0_39 fatcat:mepeyuimnbdbtdec2vyjzjlbpy

Semi-Supervised Learning for Multi-Task Scene Understanding by Neural Graph Consensus [article]

Marius Leordeanu, Mihai Pirvu, Dragos Costea, Alina Marcu, Emil Slusanschi, Rahul Sukthankar
2020 arXiv   pre-print
We address the challenging problem of semi-supervised learning in the context of multiple visual interpretations of the world by finding consensus in a graph of neural networks.  ...  We give theoretical justifications of the proposed idea and validate it on a large dataset.  ...  We want to express our sincere gratitude towards Aurelian Marcu and The Center for Advanced Laser Technologies (CETAL) for their generosity and providing us access to GPU computational resources.  ... 
arXiv:2010.01086v2 fatcat:ng27p5utdnabplogh5qhokdlh4

Learning Neural Network Subspaces [article]

Mitchell Wortsman, Maxwell Horton, Carlos Guestrin, Ali Farhadi, Mohammad Rastegari
2021 arXiv   pre-print
With a similar computational cost as training one model, we learn lines, curves, and simplexes of high-accuracy neural networks.  ...  These neural network subspaces contain diverse solutions that can be ensembled, approaching the ensemble performance of independently trained networks without the training cost.  ...  We acknowledge Ludwig Schmidt for correcting Definitions 1 and 2 which previously measured average accuracy instead of worst case. MW acknowledges Apple for providing internship support.  ... 
arXiv:2102.10472v3 fatcat:bl3y3xhzrzgx5a7s5n4eqmwsfa

Randomized Prior Functions for Deep Reinforcement Learning [article]

Ian Osband, John Aslanides, Albin Cassirer
2018 arXiv   pre-print
There is a growing literature on uncertainty estimation for deep learning from fixed datasets, but many of the most popular approaches are poorly-suited to sequential decision problems.  ...  We highlight why this can be a crucial shortcoming and propose a simple remedy through addition of a randomized untrainable 'prior' network to each ensemble member.  ...  This paper can be thought of as a specific type of 'deep exploration via randomized value functions', whose line of research has been crucially driven by the contributions of (and conversations with) Benjamin  ... 
arXiv:1806.03335v2 fatcat:zkly3q224zad5cpqk7esoazr3e

Comparisons among different stochastic selection of activation layers for convolutional neural networks for healthcare [article]

Loris Nanni, Alessandra Lumini, Stefano Ghidoni, Gianluca Maguolo
2020 arXiv   pre-print
As a baseline, we used an ensemble of neural networks that only use ReLU activations. We tested our networks on several small and medium sized biomedical image datasets.  ...  In this paper we classify biomedical images using ensembles of neural networks.  ...  Conflicts of Interest: The authors declare no conflict of interest.  ... 
arXiv:2011.11834v1 fatcat:zslgtqwllzgzbeuqpurmnrhe4e

An Approach to Performance Prediction for Parallel Applications [chapter]

Engin Ipek, Bronis R. de Supinski, Martin Schulz, Sally A. McKee
2005 Lecture Notes in Computer Science  
In contrast, we employ multilayer neural networks trained on input data from executions on the target platform.  ...  Our model predicts performance on two large-scale parallel platforms within 5%-7% error across a large, multidimensional parameter space.  ...  We apply bagging to train an ensemble of models from the dataset, averaging predictions from the ensemble to reduce model variance.  ... 
doi:10.1007/11549468_24 fatcat:c7idlttjhvdjbkx6mo2ezw5mpq

Behavioural Intrusion Detection in Water Distribution Systems Using Neural Networks

Tsotsope Daniel Ramotsoela, Gerhard Petrus Hancke, Adnan M. Abu-Mahfouz
2020 IEEE Access  
Figure 10 shows the results of the implemented neural network architectures compared to the machine learning algorithms proposed in BATADAL implemented on the test dataset.  ...  In this paper a number of neural network architectures where trained on the normal BATADAL dataset.  ... 
doi:10.1109/access.2020.3032251 fatcat:kv6rzbcge5fn7i3f7fnqzzsc7q

A stacked deep learning approach to cyber-attacks detection in industrial systems: application to power system and gas pipeline systems

Wu Wang, Fouzi Harrou, Benamar Bouyeddou, Sidi-Mohammed Senouci, Ying Sun
2021 Cluster Computing  
The results of this investigation show the satisfying detection performance of the proposed stacked deep learning approach.  ...  Specifically, we investigate the feasibility of a deep learning approach for intrusion detection in SCADA systems.  ...  The proposed stacked deep learning-driven method To discriminate cyber attacks from normal operations, we propose a stacked deep learning model that ensembles the results of five forward neural networks  ... 
doi:10.1007/s10586-021-03426-w pmid:34629940 pmcid:PMC8490144 fatcat:wgvodkfndbc2xbb5dr2y3wtgam

Neural ensemble decoding for topological quantum error-correcting codes [article]

Milap Sheth, Sara Zafar Jafarzadeh, Vlad Gheorghiu
2019 arXiv   pre-print
We apply our framework to an ensemble of Minimum-Weight Perfect Matching (MWPM) and Hard-Decision Re-normalization Group (HDRG) decoders for the surface code in the depolarizing noise model.  ...  We use machine learning techniques to assign a given error syndrome to the decoder which is likely to decode it correctly.  ...  ACKNOWLEDGMENTS We thank Pooya Ronagh for useful discussions regarding methods for improving the training of the machine learning model.  ... 
arXiv:1905.02345v1 fatcat:zrsfz4k4wbhydodcfc4aslt3iy

The Curious Case of Convex Neural Networks [article]

Sarath Sivaprasad, Ankur Singh, Naresh Manwani, Vineet Gandhi
2021 arXiv   pre-print
We demonstrate the efficacy of the proposed idea using thorough experiments and ablation studies on standard image classification datasets with three different neural network architectures.  ...  In this paper, we investigate a constrained formulation of neural networks where the output is a convex function of the input.  ...  Even while training, IOC-NNs show no signs of fitting on noisy data and efficiently learns patterns from non noisy data.  ... 
arXiv:2006.05103v3 fatcat:4wei3xyytzcjpeu4yvuthpebhu

A Survey on Distributed Machine Learning

Joost Verbraeken, Matthijs Wolting, Jonathan Katzy, Jeroen Kloppenburg, Tim Verbelen, Jan S. Rellermeyer
2020 ACM Computing Surveys  
Chen et al. [32] developed DianNao, a hardware accelerator for large-scale neural networks with a small area footprint.  ...  The experimental evaluation using the different layers of several large neural network structures [48, 70, 90, 132, 133] shows a performance speedup of three orders of magnitude and an energy reduction  ...  Because neural networks require a large number of nodes, the understandability of a neural network's thought process is lower compared to, e.g., decision trees.  ... 
doi:10.1145/3377454 fatcat:apwpdtza4zc2tcn37hnxxrb74u

Learning for Robust Combinatorial Optimization: Algorithm and Application [article]

Zhihui Shao and Jianyi Yang and Cong Shen and Shaolei Ren
2021 arXiv   pre-print
Learning to optimize (L2O) has recently emerged as a promising approach to solving optimization problems by exploiting the strong prediction power of neural networks and offering lower runtime complexity  ...  While L2O has been applied to various problems, a crucial yet challenging class of problems -- robust combinatorial optimization in the form of minimax optimization -- have largely remained under-explored  ...  Then, with training samples and the loss which includes an ensemble of neural networks to efficiently function in Eqn (2), any standard learning approaches, like produce a solution to the inner maximization  ... 
arXiv:2112.10377v1 fatcat:yvzecurgqzd4nc7gfuastvc6fm

Selectivity estimation for range predicates using lightweight models

Anshuman Dutt, Chi Wang, Azade Nazi, Srikanth Kandula, Vivek Narasayya, Surajit Chaudhuri
2019 Proceedings of the VLDB Endowment  
We explore application of neural networks and tree-based ensembles to the important problem of selectivity estimation of multi-dimensional range predicates.  ...  While such techniques have the benefit of fast estimation and small memory footprint, they often incur large selectivity estimation errors.  ...  Our study includes neural networks and tree-based ensembles.  ... 
doi:10.14778/3329772.3329780 fatcat:tfd3rj5zcfavpnxq2wxqu4akmq

Mapping neutron star data to the equation of state using the deep neural network [article]

Yuki Fujimoto, Kenji Fukushima, Koichi Murase
2019 arXiv   pre-print
Here we show results from a novel theoretical technique to utilize deep neural network with supervised learning.  ...  We input up-to-date observational data from neutron star X-ray radiations into the trained neural network and estimate a relation between the pressure and the mass density.  ...  For the neural network to learn the correlation between the variances (σ R i , σ M i ) and how far the actual data is off from the genuine M -R curve, we prepare 100 ensembles of different variances for  ... 
arXiv:1903.03400v2 fatcat:ad7y7x7mozd2nkqwdth3sy4egy
« Previous Showing results 1 — 15 out of 2,907 results