Filters








149 Hits in 4.7 sec

Structurally Sparsified Backward Propagation for Faster Long Short-Term Memory Training [article]

Maohua Zhu, Jason Clemons, Jeff Pool, Minsoo Rhu, Stephen W. Keckler, Yuan Xie
2018 arXiv   pre-print
Further, we can enforce structured sparsity in the gate gradients to make the LSTM backward pass up to 45% faster than the state-of-the-art dense approach and 168% faster than the state-of-the-art sparsifying  ...  Though the structured sparsifying method can impact the accuracy of a model, this performance gap can be eliminated by mixing our sparse training method and the standard dense training method.  ...  Introduction Long Short-Term Memories (LSTMs) [1] are an important type of Recurrent Neural Network (RNN) that are widely used to process sequential data.  ... 
arXiv:1806.00512v1 fatcat:wjblgkyzgfg7nfc34ztyeubhvi

Memorized Sparse Backpropagation [article]

Zhiyuan Zhang, Pengcheng Yang, Xuancheng Ren, Xu Sun
2019 arXiv   pre-print
Furthermore, a simple yet effective algorithm named memorized sparse backpropagation (MSBP) is proposed to remedy the problem of information loss by storing unpropagated gradients in memory for the next  ...  Despite its success of existing work in accelerating propagation through sparseness, the relevant theoretical characteristics remain unexplored and we empirically find that they suffer from the loss of  ...  for faster long short-term memory training.  ... 
arXiv:1905.10194v2 fatcat:imeg4wejyvctbid2ooaxck3f4m

ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity [article]

Xinchi Qiu, Javier Fernandez-Marques, Pedro PB Gusmao, Yan Gao, Titouan Parcollet, Nicholas Donald Lane
2022 arXiv   pre-print
When the available hardware cannot meet the memory and compute requirements to efficiently train high performing machine learning models, a compromise in either the training quality or the model complexity  ...  Such stage, which repeats hundreds of times (i.e. every round) and can involve thousands of devices, accounts for the majority of the time required to train federated models and, the totality of the energy  ...  Closer to our work is SWAT (Raihan & Aamodt, 2020), a framework that relies on sparsified weights during inference and sparsified weights and activations for backward propagation.  ... 
arXiv:2208.02507v1 fatcat:jn37qtssvza2dpr2cvsgwpqeja

Deep Generative Dual Memory Network for Continual Learning [article]

Nitin Kamra, Umang Gupta, Yan Liu
2018 arXiv   pre-print
replay of past experiences, (iii) demonstrating advantages of generative replay and dual memories via experiments, and (iv) improved performance retention on challenging tasks even for low capacity models  ...  Despite advances in deep learning, neural networks can only learn multiple tasks when trained on them jointly. When tasks arrive sequentially, they lose performance on previously learnt tasks.  ...  Our model comprises of two generative models: a short-term memory (STM) to emulate the human hippocampal system and a long term memory (LTM) to emulate the neocortical learning system.  ... 
arXiv:1710.10368v2 fatcat:haq7fpa3lndq7hujyf4immmcii

Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism [article]

Vipul Gupta, Dhruv Choudhary, Ping Tak Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Kannan Ramchandran, Michael W. Mahoney
2021 arXiv   pre-print
For communication efficient MP, DCT incorporates a novel technique to compress the activations and gradients sent across the network during the forward and backward propagation, respectively.  ...  We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training.  ...  First, for MP, activation values and gradient information need to be communicated from one sub-network to the next during forward and backward propagation.  ... 
arXiv:2010.08899v2 fatcat:egzt667a7rg45ozyuhbyynej6a

Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning

Seul-Ki Yeom, Philipp Seegerer, Sebastian Lapuschkin, Alexander Binder, Simon Wiedemann, Klaus-Robert Müller, Wojciech Samek
2021 Pattern Recognition  
At the same time, it has a computational cost in the order of gradient computation and is comparatively simple to apply without the need for tuning hyperparameters for pruning.  ...  We show that our proposed method can efficiently prune CNN models in transfer-learning setups in which networks pre-trained on large corpora are adapted to specialized tasks.  ...  Short-Term Memory Network (LSTM) models, which can be used for pruning.  ... 
doi:10.1016/j.patcog.2021.107899 fatcat:tgdzfm37p5b2tkpcqfpgb5u2ra

Pruning by Explaining: A Novel Criterion for Deep Neural Network Pruning [article]

Seul-Ki Yeom, Philipp Seegerer, Sebastian Lapuschkin, Alexander Binder, Simon Wiedemann, Klaus-Robert Müller, Wojciech Samek
2020 arXiv   pre-print
At the same time, it has a computational cost in the order of gradient computation and is comparatively simple to apply without the need for tuning hyperparameters for pruning.  ...  We show that our proposed method can efficiently prune CNN models in transfer-learning setups in which networks pre-trained on large corpora are adapted to specialized tasks.  ...  Long Short-Term Memory Network (LSTM) models, which can be used for pruning.  ... 
arXiv:1912.08881v2 fatcat:igbedbkz5vb7ldcfluk7qkbxle

Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries [article]

Xiaofei Sun, Zijun Sun, Yuxian Meng, Jiwei Li, Chun Fan
2022 arXiv   pre-print
Extensive experiments demonstrate that SOE produces long texts with significantly better quality, along with faster convergence speed.  ...  long text generation: the model first outlines the summaries for different segments of long texts, and then elaborates on each bullet point to generate the corresponding segment.  ...  Acknowledgement We would like to thank anonymous reviewers for their comments and suggestions.  ... 
arXiv:2010.07074v2 fatcat:ag2rpbmgrreblnsh3ixo72wczu

Compete to Compute

Rupesh Kumar Srivastava, Jonathan Masci, Sohrob Kazerounian, Faustino J. Gomez, Jürgen Schmidhuber
2013 Neural Information Processing Systems  
In this paper, we apply the concept to gradient-based, backprop-trained artificial multilayer NNs.  ...  NNs with competing linear units tend to outperform those with non-competing nonlinear units, and avoid catastrophic forgetting when training sets change over time.  ...  To test for this implicit long term memory, the MNIST training and test sets were each divided into two parts, P1 containing only digits {0, 1, 2, 3, 4}, and P2 consisting of the remaining digits {5, 6  ... 
dblp:conf/nips/SrivastavaMKGS13 fatcat:pczs4rl76ndazcf7egn5seidmu

Rethinking the Value of Network Pruning [article]

Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, Trevor Darrell
2019 arXiv   pre-print
For all state-of-the-art structured pruning algorithms we examined, fine-tuning a pruned model only gives comparable or worse performance than training that model with randomly initialized weights.  ...  Our results suggest the need for more careful baseline evaluations in future research on structured pruning methods.  ...  One may argue that we should instead train the small target model for fewer epochs since it may converge faster.  ... 
arXiv:1810.05270v2 fatcat:berzw3fmvbfnhcgdvornkwionu

One-Shot Pruning of Recurrent Neural Networks by Jacobian Spectrum Evaluation [article]

Matthew Shunshi Zhang, Bradly Stadie
2019 arXiv   pre-print
Wen et al. (2017) alters the structure of LSTMs to decrease their memory requirements.  ...  To arrive at a one-shot pruning criteria for recurrent neural networks, we consider the impact of the temporal Jacobian on both forward-and backward-propagation. • (Backpropagation) The formula for backpropagation  ... 
arXiv:1912.00120v1 fatcat:piqgwggryjhgnmwkgn33tdwgfi

Communication-Efficient and Distributed Learning Over Wireless Networks: Principles and Applications [article]

Jihong Park, Sumudu Samarakoon, Anis Elgabli, Joongheon Kim, Mehdi Bennis, Seong-Lyun Kim, Mérouane Debbah
2020 arXiv   pre-print
Machine learning (ML) is a promising enabler for the fifth generation (5G) communication systems and beyond.  ...  To achieve this goal, it is essential to cater for high ML inference accuracy at scale under time-varying channel and network dynamics, by continuously exchanging fresh data and ML model updates in a distributed  ...  the activation of Lk-1 Transmit the gradient of Lk -Backward propagate from Lk to L2 Transmit the gradient of L2 -Backward propagate on L1 (b) A flowchart of tripartite SL.  ... 
arXiv:2008.02608v1 fatcat:luuo5pja5zfihhpybger6tuqrq

Dual-view Snapshot Compressive Imaging via Optical Flow Aided Recurrent Neural Network [article]

Ruiying Lu, Bo Chen, Guanliang Liu, Ziheng Cheng, Mu Qiao, Xin Yuan
2021 arXiv   pre-print
However, it is challenging for existing model-based decoding algorithms to reconstruct each individual scene, which usually require exhaustive parameter tuning with extremely long running time for large  ...  Extensive results on both simulation and real data demonstrate the superior performance of our proposed model in a short inference time.  ...  Following this, we integrate recurrent mechanism and optical flow into our reconstruction network to achieve competitive results in a short time.  ... 
arXiv:2109.05287v1 fatcat:ku25b37l25dsvajoxalpbox5g4

Spartus: A 9.4 TOp/s FPGA-based LSTM Accelerator Exploiting Spatio-Temporal Sparsity [article]

Chang Gao, Tobi Delbruck, Shih-Chii Liu
2022 arXiv   pre-print
Long Short-Term Memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data such as speech recognition.  ...  Spatial sparsity is induced using a new Column-Balanced Targeted Dropout (CBTD) structured pruning method, which produces structured sparse weight matrices for balanced workloads.  ...  Linares-Barranco from the University of Seville for creating the baseboard for our FPGAs.  ... 
arXiv:2108.02297v4 fatcat:7lyeb4z2cvbhtnczrnbwgudosa

Convolutional Recurrent Neural Networks for Dynamic MR Image Reconstruction [article]

Chen Qin, Jo Schlemper, Jose Caballero, Anthony Price, Joseph V. Hajnal, Daniel Rueckert
2018 arXiv   pre-print
In particular, the proposed architecture embeds the structure of the traditional iterative algorithms, efficiently modelling the recurrence of the iterative reconstruction stages by using recurrent hidden  ...  is able to learn both the temporal dependency and the iterative reconstruction process effectively with only a very small number of parameters, while outperforming current MR reconstruction methods in terms  ...  Note this can be naturally generalised to other RNN units, such as long short-term memory (LSTM) and gated recurrent unit (GRU), which are considered to have better memory properties, although using these  ... 
arXiv:1712.01751v3 fatcat:lmye5hzeebbidgmfmcpakz6n4a
« Previous Showing results 1 — 15 out of 149 results