Filters








196,796 Hits in 3.1 sec

Memory Optimization for Deep Networks [article]

Aashaka Shah, Chao-Yuan Wu, Jayashree Mohan, Vijay Chidambaram, Philipp Krähenbühl
2021 arXiv   pre-print
In this paper, we present MONeT, an automatic framework that minimizes both the memory footprint and computational overhead of deep networks.  ...  This prevents researchers from exploring larger architectures, as training large networks requires more memory for storing intermediate outputs.  ...  We presents MONET, an automatic framework to minimize memory footprint for deep networks.  ... 
arXiv:2010.14501v3 fatcat:4ohehe2qsjaehonjjzvwvowwdm

Profile-guided memory optimization for deep neural networks [article]

Taro Sekiyama, Takashi Imamichi, Haruki Imai, Rudy Raymond
2018 arXiv   pre-print
Recent years have seen deep neural networks (DNNs) becoming wider and deeper to achieve better performance in many applications of AI.  ...  We address this challenge by developing a novel profile-guided memory optimization to efficiently and quickly allocate memory blocks during the propagation in DNNs.  ...  Conclusion We propose a profile-guided memory optimization for DNNs.  ... 
arXiv:1804.10001v1 fatcat:uv75yc75crgh5p7dvtqfibzo5u

Memory Optimized Deep Dense Network for Image Super-resolution

Jialiang Shen, Yucheng Wang, Jian Zhang
2018 2018 Digital Image Computing: Techniques and Applications (DICTA)  
To reduce the memory consumption during training, we propose a memory optimized deep dense network for image super-resolution.  ...  Then we adopt share memory allocations to store concatenated features and Batch Normalization intermediate feature maps. The memory optimized network consumes less memory than normal dense network.  ...  In order to efficiently use GPU memory to train network, we implement a memory optimized deep dense network for image super-resolution.  ... 
doi:10.1109/dicta.2018.8615829 dblp:conf/dicta/ShenWZ18 fatcat:lqc5gfwrvrat7laodkrhh4ftcm

Optimizing Memory Efficiency for Deep Convolutional Neural Networks on GPUs [article]

Chao Li, Yi Yang, Min Feng, Srimat Chakradhar, Huiyang Zhou
2016 arXiv   pre-print
Experiments show the universal effect of our proposed optimizations on both single layers and various networks, with up to 27.9x for a single layer and up to 5.6x on the whole networks.  ...  Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-the-art recognition accuracy.  ...  There are two main reasons for the success of deep CNNs. The first is large-scale training data sets and the second is large and deep neural network structures.  ... 
arXiv:1610.03618v1 fatcat:r2estvyawfaute3citbu42odm4

Optimized Deep Stacked Long Short-Term Memory Network for Long-Term Load Forecasting

Tamer Ahmed Farrag, Ehab E. Elattar
2021 IEEE Access  
In the proposed model, the "Adam" optimizer is used for network training.  ...  This feedback aims to add the memory concept to the neural network.  ...  of each model and enhance it by using many techniques for regulation, optimization to avoid the overfitting problem.  ... 
doi:10.1109/access.2021.3077275 fatcat:lal6tmf2c5gy3ic4omi55bhtvq

Optimal checkpointing for heterogeneous chains: how to train deep neural networks with limited memory [article]

Julien Herrmann, Olivier Beaumont (HiePACS, UB, LaBRI), Lionel Eyraud-Dubois, Julien Hermann, Alexis Joly
2019 arXiv   pre-print
This paper introduces a new activation checkpointing method which allows to significantly decrease memory usage when training Deep Neural Networks with the back-propagation algorithm.  ...  , but requires fewer recomputations in the backward phase), and we provide an algorithm to compute the optimal computation sequence for this model.  ...  Introduction Training Deep Neural Network (DNN) is a memory-intensive operation.  ... 
arXiv:1911.13214v1 fatcat:ku7spslh45gnxbhskufmqpw6ny

Memory access optimized routing scheme for deep networks on a mobile coprocessor

Aysegul Dundar, Jonghoon Jin, Vinayak Gokhale, Berin Martini, Eugenio Culurciello
2014 2014 IEEE High Performance Extreme Computing Conference (HPEC)  
In this paper, we present a memory access optimized routing scheme for a hardware accelerated real-time implementation of deep convolutional neural networks (DCNNs) on a mobile platform.  ...  We propose a new routing scheme for 3D convolutions by taking advantage of the characteristic of DCNNs to fully utilize all the resources in the hardware accelerator.  ...  However, the problem with the GPUs is the limited cache memory for storing the filter coefficients of large networks [23] .  ... 
doi:10.1109/hpec.2014.7040963 dblp:conf/hpec/DundarJGMC14 fatcat:pkx4lfp6vzcslcsgghhivswuhu

Training Deep Nets with Sublinear Memory Cost [article]

Tianqi Chen and Bing Xu and Chiyuan Zhang and Carlos Guestrin
2016 arXiv   pre-print
Computation graph analysis is used for automatic in-place operation and memory sharing optimizations.  ...  We propose a systematic approach to reduce the memory consumption of deep neural network training.  ...  We thank Ian Goodfellow and Yu Zhang on helpful discussions on computation memory tradeoffs. We would like to thank David Warde-Farley for pointing out the relation to gradient checkpointing.  ... 
arXiv:1604.06174v2 fatcat:e27mozwtnvfnperyfndbkzuuu4

A DPDK-Based Acceleration Method for Experience Sampling of Distributed Reinforcement Learning [article]

Masaki Furukawa, Hiroki Matsutani
2022 arXiv   pre-print
As another network optimization technique, an in-network experience replay memory server between Actor and Learner nodes reduces access latencies to the experience replay memory by 11.7% to 28.1% and communication  ...  Evaluation results show that, as a network optimization technique, kernel bypassing by DPDK reduces network access latencies to a shared memory server by 32.7% to 58.9%.  ...  Optimization on Distributed Deep Reinforcement Learning Since the F- 4.1 Low-Latency Shared Memory by DPDK As the first network optimization, network access latency of a shared memory is reduced by applying  ... 
arXiv:2110.13506v2 fatcat:s34wxpnohjdypmcokxfroi22mi

Deep Neural Networks on Mobile Healthcare Applications: Practical Recommendations

Jose I. Benedetto, Pablo Sanabria, Andres Neyem, Jaime Navon, Christian Poellabauer, Bryan (Ning) Xia
2018 Proceedings (MDPI)  
Deep neural networks hosted completely on mobile platforms are extremely valuable for providing healthcare services to remote areas without network connectivity.  ...  Deep learning has for a long time been recognized as a powerful tool in the field of medicine for making predictions or detecting abnormalities in a patient's data.  ...  In [18] , the authors shift their focus to energy optimizations for migrating deep neural networks into generic embedded systems.  ... 
doi:10.3390/proceedings2190550 fatcat:y24tk7di4rfotfui4iqqjtha3y

POSTER: Space and Time Optimal DNN Primitive Selection with Integer Linear Programming

Yuan Wen, Andrew Anderson, Valentin Radu, Michael F.P. OBoyle, David Gregg
2019 2019 28th International Conference on Parallel Architectures and Compilation Techniques (PACT)  
Under a tight memory budget, our ILP solver selects the optimal primitive for each layer such that the entire network is optimized for execution time subject to a memory budget, or vice versa.  ...  But they can be too resourcehungry for mobile and embedded devices with tightly constrained memory and energy budgets.  ...  ILP MODEL This optimization strategy is suitable for devices with enough physical memory to accommodate the entire network into main memory, optimizing for performance across the whole network.  ... 
doi:10.1109/pact.2019.00059 dblp:conf/IEEEpact/Wen0ROG19 fatcat:qmkiufmd5fbn5msgvr3m55luse

NullaNet: Training Deep Neural Networks for Reduced-Memory-Access Inference [article]

Mahdi Nazemi, Ghasem Pasandi, Massoud Pedram
2018 arXiv   pre-print
To cope with computational and storage complexity of these models, this paper presents a training method that enables a radically different approach for realization of deep neural networks through Boolean  ...  Deep neural networks have been successfully deployed in a wide variety of applications including computer vision and speech recognition.  ...  Ghasem Pasandi proposed the idea of performing offline simulation for extracting the truth table of a neuron and subsequently optimizing its Boolean function.  ... 
arXiv:1807.08716v2 fatcat:n2wmi2fugnbsbi7nmeejrpfvoa

Accelerating Deep Learning Inference with Cross-Layer Data Reuse on GPUs [article]

Xueying Wang, Guangli Li, Xiao Dong, Jiansong Li, Lei Liu, Xiaobing Feng
2020 arXiv   pre-print
Accelerating the deep learning inference is very important for real-time applications.  ...  Then, an approach for generating efficient fused code is designed, which goes deeper in multi-level memory usage for cross-layer data reuse.  ...  Large and deep neural networks require substantial computing and memory throughput and existing methods do not make good use of this multilevel memory hierarchy for the complex architecture of GPUs.  ... 
arXiv:2007.06000v2 fatcat:nwv6glfp4ndabm5xuuhmnv5zqu

OpTorch: Optimized deep learning architectures for resource limited environments [article]

Salman Ahmed, Hammad Naveed
2021 arXiv   pre-print
In this paper, we propose optimized deep learning pipelines in multiple aspects of training including time and memory.  ...  We also explore the effect of weights on total memory usage in deep learning pipelines.  ...  As the model size grows, memory and time to train such networks also increase. One of the greatest achievements of deep learning in 2020 was by OpenAI to introduce GPT-3.  ... 
arXiv:2105.00619v2 fatcat:ijjhfuw5bzctlldjy6csp5fidi

Embedded Deep Neural Network Processing: Algorithmic and Processor Techniques Bring Deep Learning to IoT and Edge Devices

Marian Verhelst, Bert Moons
2017 IEEE Solid-State Circuits Magazine  
This accommodates for the observation that the optimal word length for a deep network strongly varies from application to application and is even shown to differ across various layers of a single deep  ...  (b) The advent of deep learning allowed the network to learn and extract the optimal feature sets.  ... 
doi:10.1109/mssc.2017.2745818 fatcat:fhm3cbpzb5dyjbncbgozertsfe
« Previous Showing results 1 — 15 out of 196,796 results