56 Hits in 4.7 sec

Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization [article]

Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng, Roeland Nusselder
2019 arXiv   pre-print
Optimization of Binarized Neural Networks (BNNs) currently relies on real-valued latent weights to accumulate small update steps.  ...  In this paper, we argue that these latent weights cannot be treated analogously to weights in real-valued networks. Instead their main role is to provide inertia during training.  ...  One way people currently translate knowledge from the real-valued network to the BNN is through initialization of the latent weights, which is becoming increasingly sophisticated [4, 8, 34] .  ... 
arXiv:1906.02107v2 fatcat:7fptrjarhnagna3ey6gy35kjc4

[Re] A comprehensive study on binary optimizer and its applicability

Nancy Nayak, Vishnu Raj, Sheetal Kalyani
2020 Zenodo  
In this report, we present a detailed study on the paper titled "Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization" by [1] which proposes a new optimization method for training  ...  Binarized Neural Networks are paving a way towards the deployment of deep neural networks with less memory and computation.  ...  From our observations, it can be concluded that for most of the cases latent weights on BNN do not perform better than binary weights.  ... 
doi:10.5281/zenodo.3818607 fatcat:qiknrszywfelndd7nlzexgwzhq

Training Binary Neural Networks using the Bayesian Learning Rule [article]

Xiangming Meng and Roman Bachmann and Mohammad Emtiyaz Khan
2020 arXiv   pre-print
Neural networks with binary weights are computation-efficient and hardware-friendly, but their training is challenging because it involves a discrete optimization problem.  ...  Our work provides a principled approach for training binary neural networks which justifies and extends existing approaches.  ...  They argue that "latent" weights used in STE based methods do not exist.  ... 
arXiv:2002.10778v4 fatcat:pqud77mus5fbharzmhernjbdgi

Hardware-Aware Design for Edge Intelligence

Warren J. Gross, Brett H. Meyer, Arash Ardakani
2020 IEEE Open Journal of Circuits and Systems  
INDEX TERMS Artificial intelligence, deep neural networks, hardware and systems, neural architecture search, quantization and pruning, stochastic computing, surveys and reviews.  ...  network edge.  ...  Binary neural networks take a heuristic approach to binarize their weights and activations.  ... 
doi:10.1109/ojcas.2020.3047418 fatcat:d5u57awixzgl3au7fk5hh2gezu

Structural Causal 3D Reconstruction [article]

Weiyang Liu, Zhen Liu, Liam Paull, Adrian Weller, Bernhard Schölkopf
2022 arXiv   pre-print
Unlike existing works that introduce explicit regularizations into objective functions, we look into a different space for implicit regularization -- the structure of latent space.  ...  Specifically, we restrict the structure of latent space to capture a topological causal ordering of latent factors (i.e., representing causal dependency as a directed acyclic graph).  ...  Since these results do not affect the conclusion drawn in the main paper, we omit them here.  ... 
arXiv:2207.10156v1 fatcat:if3weswp5fekdfl6qy7emglhuu

Reintroducing Straight-Through Estimators as Principled Methods for Stochastic Binary Networks [article]

Alexander Shekhovtsov, Viktor Yanush
2021 arXiv   pre-print
Training neural networks with binary weights and activations is a challenging problem due to the lack of gradients and difficulty of optimization over discrete weights.  ...  We analyze properties, estimation accuracy, obtain different forms of correct ST estimators for activations and weights, explain existing empirical approaches and their shortcomings, explain how latent  ...  Responding to the work "Latent weights do not exist: Rethinking binarized neural network optimization" [24] and the lack of formal basis to introduce latent weights in the literature (e.g., [27] ),  ... 
arXiv:2006.06880v4 fatcat:46pb4oeodfcengr5fzl5ijpnje

Shade: Information-Based Regularization for Deep Learning

Michael Blot, Thomas Robert, Nicolas Thome, Matthieu Cord
2018 2018 25th IEEE International Conference on Image Processing (ICIP)  
Regularization is a big issue for training deep neural networks. In this paper, we propose a new information-theory-based regularization scheme named SHADE for SHAnnon DEcay.  ...  We use momentum SGD for optimization (same protocol as [34]). Table 2 : 2 Classification accuracy (%) on CIFAR-10 test set with binarized activation.  ...  3: A simple convolutional network has been trained with different numbers of samples of MNIST-M and the optimal regularization weight for SHADE have been determined on the validation set (see training  ... 
doi:10.1109/icip.2018.8451092 dblp:conf/icip/BlotRTC18 fatcat:5tiaeawhjba2dbgoxmwxq6uj6i

Edge-Cloud Polarization and Collaboration: A Comprehensive Survey for AI [article]

Jiangchao Yao, Shengyu Zhang, Yang Yao, Feng Wang, Jianxin Ma, Jianwei Zhang, Yunfei Chu, Luo Ji, Kunyang Jia, Tao Shen, Anpeng Wu, Fengda Zhang (+6 others)
2022 arXiv   pre-print
We also discuss potentials and practical experiences of some on-going advanced edge AI topics including pretraining models, graph neural networks and reinforcement learning.  ...  There are also other FRL studies which do not belong to either HFRL or VFRL.  ...  Towards this end, efficient neural architecture search introduces the weight sharing technique and greatly reduces the search time [132] .  ... 
arXiv:2111.06061v3 fatcat:5rq6s5s4cvcidblidgahwynp34

Learning Compact Representations of Neural Networks using DiscriminAtive Masking (DAM) [article]

Jie Bu, Arka Daw, M. Maruf, Anuj Karpatne
2021 arXiv   pre-print
A central goal in deep learning is to learn compact representations of features at every layer of a neural network, which is useful for both unsupervised representation learning and structured network  ...  At the core of these limitations is the lack of a systematic approach that jointly prunes and refines weights during training in a single stage, and does not require any fine-tuning upon convergence to  ...  while the other neurons do not have sufficient time to recover the dropped features.  ... 
arXiv:2110.00684v1 fatcat:kobqndv2tbhqrb2oiksn2qpz7u

On the Spectral Bias of Neural Networks [article]

Nasim Rahaman, Aristide Baratin, Devansh Arpit, Felix Draxler, Min Lin, Fred A. Hamprecht, Yoshua Bengio, Aaron Courville
2019 arXiv   pre-print
In this work, we present properties of neural networks that complement this aspect of expressivity.  ...  Neural networks are known to be a class of highly expressive functions able to fit even random input-output mappings with 100% accuracy.  ...  do not exploit the geometry of the manifold like DNNs do.  ... 
arXiv:1806.08734v3 fatcat:jyjs552b35gqvk4kkvxkf7ikte

Syntactic Inductive Biases for Deep Learning Methods [article]

Yikang Shen
2022 arXiv   pre-print
On the other hand, the dependency inductive bias encourages models to find the latent relations between entities in the input sequence.  ...  For natural language, the latent relations are usually modeled as a directed dependency graph, where a word has exactly one parent node and zero or several children nodes.  ...  But they are optimized together with the other components of neural network models.  ... 
arXiv:2206.04806v1 fatcat:fjtogsbwfnfwbggaujsggy4yxq

Amortized Inference Regularization [article]

Rui Shu, Hung H. Bui, Shengjia Zhao, Mykel J. Kochenderfer, Stefano Ermon
2019 arXiv   pre-print
Toyota Research Institute provided funds to assist the authors with their research but this article solely reflects the opinions and conclusions of its authors and not TRI or any other Toyota entity.  ...  To do so, we consider the case where the inference model is a neural network encoder parameterized by weight matrices {W i } and leverage [18] 's weight normalization technique, which proposes to reparameterize  ...  By rethinking the role of the amortized inference model, amortized inference regularization provides a new direction for studying and improving the generalization performance of latent variable models.  ... 
arXiv:1805.08913v2 fatcat:2iudkgcjnbbr7e23wwnavdaasy

Finding MNEMON: Reviving Memories of Node Embeddings [article]

Yun Shen and Yufei Han and Zhikun Zhang and Min Chen and Ting Yu and Michael Backes and Yang Zhang and Gianluca Stringhini
2022 arXiv   pre-print
Previous security research efforts orbiting around graphs have been exclusively focusing on either (de-)anonymizing the graphs or understanding the security and privacy issues of graph neural networks.  ...  Deepwalk and Node2Vec are two well known shallow neural network-based (i.e., a neural network with one hidden layer) node embedding techniques.  ...  They can take node features into consideration and do not need random walk paths.  ... 
arXiv:2204.06963v2 fatcat:wrz73p5g5rcq5ko7ab2ytabpky

Shallow Feature Matters for Weakly Supervised Object Localization [article]

Jun Wei, Qin Wang, Zhen Li, Sheng Wang, S.Kevin Zhou, Shuguang Cui
2021 arXiv   pre-print
However, previous CAM-based methods did not take full advantage of the shallow features, despite their importance for WSOL.  ...  'Cls Backbone' is the network used for classification and 'Loc Backbone' represents the network used for localization. '-' means that the authors do not provide corresponding results.  ...  However, restrained by background noise, shallow features do not attract enough attention.  ... 
arXiv:2108.00873v1 fatcat:oxiz7g4fyjeurcurur7dfii77a

From Fully Trained to Fully Random Embeddings: Improving Neural Machine Translation with Compact Word Embedding Tables [article]

Krtin Kumar, Peyman Passban, Mehdi Rezagholizadeh, Yiu Sing Lau, Qun Liu
2022 arXiv   pre-print
We show that detracting syntactic and semantic information from word embeddings and running NMT systems with random embeddings is not as damaging as it initially sounds.  ...  In this paper, we analyze the impact and utility of such matrices in the context of neural machine translation (NMT).  ...  Neural networks have shown impressive performance with random weights for image classification tasks (Ramanujan et al., 2020) , our experiments show similar results for embedding matrices of NMT models  ... 
arXiv:2104.08677v2 fatcat:jy4t6fdp3bfsph7y3c6bvrqc4i
« Previous Showing results 1 — 15 out of 56 results