A copy of this work was available on the public web and has been preserved in the Wayback Machine. The capture dates from 2021; you can also visit the original URL.
The file type is application/pdf
.
Filters
Memory Optimization for Deep Networks
[article]
2021
arXiv
pre-print
In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks. ...
This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs. ...
We presents MONET, an automatic framework to minimize memory footprint for deep networks. ...
arXiv:2010.14501v3
fatcat:4ohehe2qsjaehonjjzvwvowwdm
Profile-guided memory optimization for deep neural networks
[article]
2018
arXiv
pre-print
Recent years have seen deep neural networks (DNNs) becoming wider and deeper to achieve better performance in many applications of AI. ...
We address this challenge by developing a novel profile-guided memory optimization to efficiently and quickly allocate memory blocks during the propagation in DNNs. ...
Conclusion We propose a profile-guided memory optimization for DNNs. ...
arXiv:1804.10001v1
fatcat:uv75yc75crgh5p7dvtqfibzo5u
Memory Optimized Deep Dense Network for Image Super-resolution
2018
2018 Digital Image Computing: Techniques and Applications (DICTA)
To reduce the memory consumption during training, we propose a memory optimized deep dense network for image super-resolution. ...
Then we adopt share memory allocations to store concatenated features and Batch Normalization intermediate feature maps. The memory optimized network consumes less memory than normal dense network. ...
In order to efficiently use GPU memory to train network, we implement a memory optimized deep dense network for image super-resolution. ...
doi:10.1109/dicta.2018.8615829
dblp:conf/dicta/ShenWZ18
fatcat:lqc5gfwrvrat7laodkrhh4ftcm
Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs
[article]
2016
arXiv
pre-print
Experiments show the universal effect of our proposed optimizations on both single layers and various networks, with up to 27.9x for a single layer and up to 5.6x on the whole networks. ...
Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy. ...
There are two main reasons for the success of deep CNNs. The first is large-scale training data sets and the second is large and deep neural network structures. ...
arXiv:1610.03618v1
fatcat:r2estvyawfaute3citbu42odm4
Optimized Deep Stacked Long Short-Term Memory Network for Long-Term Load Forecasting
2021
IEEE Access
In the proposed model, the "Adam" optimizer is used for network training. ...
This feedback aims to add the memory concept to the neural network. ...
of each model and enhance it by using many techniques for regulation, optimization to avoid the overfitting problem. ...
doi:10.1109/access.2021.3077275
fatcat:lal6tmf2c5gy3ic4omi55bhtvq
Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory
[article]
2019
arXiv
pre-print
This paper introduces a new activation checkpointing method which allows to significantly decrease memory usage when training Deep Neural Networks with the back-propagation algorithm. ...
, but requires fewer recomputations in the backward phase), and we provide an algorithm to compute the optimal computation sequence for this model. ...
Introduction Training Deep Neural Network (DNN) is a memory-intensive operation. ...
arXiv:1911.13214v1
fatcat:ku7spslh45gnxbhskufmqpw6ny
Memory access optimized routing scheme for deep networks on a mobile coprocessor
2014
2014 IEEE High Performance Extreme Computing Conference (HPEC)
In this paper, we present a memory access optimized routing scheme for a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs) on a mobile platform. ...
We propose a new routing scheme for 3D convolutions by taking advantage of the characteristic of DCNNs to fully utilize all the resources in the hardware accelerator. ...
However, the problem with the GPUs is the limited cache memory for storing the filter coefficients of large networks [23] . ...
doi:10.1109/hpec.2014.7040963
dblp:conf/hpec/DundarJGMC14
fatcat:pkx4lfp6vzcslcsgghhivswuhu
Training Deep Nets with Sublinear Memory Cost
[article]
2016
arXiv
pre-print
Computation graph analysis is used for automatic in-place operation and memory sharing optimizations. ...
We propose a systematic approach to reduce the memory consumption of deep neural network training. ...
We thank Ian Goodfellow and Yu Zhang on helpful discussions on computation memory tradeoffs. We would like to thank David Warde-Farley for pointing out the relation to gradient checkpointing. ...
arXiv:1604.06174v2
fatcat:e27mozwtnvfnperyfndbkzuuu4
A DPDK-Based Acceleration Method for Experience Sampling of Distributed Reinforcement Learning
[article]
2022
arXiv
pre-print
As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication ...
Evaluation results show that, as a network optimization technique, kernel bypassing by DPDK reduces network access latencies to a shared memory server by 32.7% to 58.9%. ...
Optimization on Distributed Deep Reinforcement Learning Since the F- 4.1 Low-Latency Shared Memory by DPDK As the first network optimization, network access latency of a shared memory is reduced by applying ...
arXiv:2110.13506v2
fatcat:s34wxpnohjdypmcokxfroi22mi
Deep Neural Networks on Mobile Healthcare Applications: Practical Recommendations
2018
Proceedings (MDPI)
Deep neural networks hosted completely on mobile platforms are extremely valuable for providing healthcare services to remote areas without network connectivity. ...
Deep learning has for a long time been recognized as a powerful tool in the field of medicine for making predictions or detecting abnormalities in a patient's data. ...
In [18] , the authors shift their focus to energy optimizations for migrating deep neural networks into generic embedded systems. ...
doi:10.3390/proceedings2190550
fatcat:y24tk7di4rfotfui4iqqjtha3y
POSTER: Space and Time Optimal DNN Primitive Selection with Integer Linear Programming
2019
2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)
Under a tight memory budget, our ILP solver selects the optimal primitive for each layer such that the entire network is optimized for execution time subject to a memory budget, or vice versa. ...
But they can be too resourcehungry for mobile and embedded devices with tightly constrained memory and energy budgets. ...
ILP MODEL This optimization strategy is suitable for devices with enough physical memory to accommodate the entire network into main memory, optimizing for performance across the whole network. ...
doi:10.1109/pact.2019.00059
dblp:conf/IEEEpact/Wen0ROG19
fatcat:qmkiufmd5fbn5msgvr3m55luse
NullaNet: Training Deep Neural Networks for Reduced-Memory-Access Inference
[article]
2018
arXiv
pre-print
To cope with computational and storage complexity of these models, this paper presents a training method that enables a radically different approach for realization of deep neural networks through Boolean ...
Deep neural networks have been successfully deployed in a wide variety of applications including computer vision and speech recognition. ...
Ghasem Pasandi proposed the idea of performing offline simulation for extracting the truth table of a neuron and subsequently optimizing its Boolean function. ...
arXiv:1807.08716v2
fatcat:n2wmi2fugnbsbi7nmeejrpfvoa
Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs
[article]
2020
arXiv
pre-print
Accelerating the deep learning inference is very important for real-time applications. ...
Then, an approach for generating efficient fused code is designed, which goes deeper in multi-level memory usage for cross-layer data reuse. ...
Large and deep neural networks require substantial computing and memory throughput and existing methods do not make good use of this multilevel memory hierarchy for the complex architecture of GPUs. ...
arXiv:2007.06000v2
fatcat:nwv6glfp4ndabm5xuuhmnv5zqu
OpTorch: Optimized deep learning architectures for resource limited environments
[article]
2021
arXiv
pre-print
In this paper, we propose optimized deep learning pipelines in multiple aspects of training including time and memory. ...
We also explore the effect of weights on total memory usage in deep learning pipelines. ...
As the model size grows, memory and time to train such networks also increase. One of the greatest achievements of deep learning in 2020 was by OpenAI to introduce GPT-3. ...
arXiv:2105.00619v2
fatcat:ijjhfuw5bzctlldjy6csp5fidi
Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices
2017
IEEE Solid-State Circuits Magazine
This accommodates for the observation that the optimal word length for a deep network strongly varies from application to application and is even shown to differ across various layers of a single deep ...
(b) The advent of deep learning allowed the network to learn and extract the optimal feature sets. ...
doi:10.1109/mssc.2017.2745818
fatcat:fhm3cbpzb5dyjbncbgozertsfe
« Previous
Showing results 1 — 15 out of 196,796 results