320 Hits in 3.7 sec

How Do Adam and Training Strategies Help BNNs Optimization? [article]

Zechun Liu, Zhiqiang Shen, Shichao Li, Koen Helwegen, Dong Huang, Kwang-Ting Cheng
2021 arXiv   pre-print
The best performing Binary Neural Networks (BNNs) are usually attained using Adam optimization and its multi-step training variants.  ...  specific training strategies.  ...  Besides comparing Adam to SGD, we further explore how training strategies affect BNN optimization. Previous works proposed different training strategies: Yang et al.  ... 
arXiv:2106.11309v1 fatcat:5m3ezp4gxvdsbdg2ptv2udpzmq

BAMSProd: A Step towards Generalizing the Adaptive Optimization Methods to Deep Binary Model [article]

Junjie Liu, Dongchao Wen, Deyu Wang, Wei Tao, Tse-Wei Chen, Kinya Osa, Masami Kato
2020 arXiv   pre-print
In this paper, we provide an explicit convex optimization example where training the BNNs with the traditionally adaptive optimization methods still faces the risk of non-convergence, and identify that  ...  Recent methods have significantly reduced the performance degradation of Binary Neural Networks (BNNs), but guaranteeing the effective and efficient training of BNNs is an unsolved problem.  ...  These models are all trained with default strategies and data augmentation in [43] .  ... 
arXiv:2009.13799v1 fatcat:oia2pd4pznellcg62rcyy3wkra

Deep Learning for Bayesian Optimization of Scientific Problems with High-Dimensional Structure [article]

Samuel Kim, Peter Y. Lu, Charlotte Loh, Jamie Smith, Jasper Snoek, Marin Soljačić
2022 arXiv   pre-print
However, surrogate models traditionally employed in BO, such as Gaussian Processes (GPs), scale poorly with dataset size and do not easily accommodate known structure.  ...  We demonstrate BO on a number of realistic problems in physics and chemistry, including topology optimization of photonic crystal materials using convolutional neural networks, and chemical property optimization  ...  The library optimizes over the acquisition function in the inner loop using the L-BFGS algorithm. LIPO (Malherbe & Vayatis, 2017) is implemented in the dlib library (King, 2009) .  ... 
arXiv:2104.11667v3 fatcat:2dlov4dkrralziqfwcshck63ji

Nonlocal optimization of binary neural networks [article]

Amir Khoshaman, Giuseppe Castiglione, Christopher Srinivasa
2022 arXiv   pre-print
We explore training Binary Neural Networks (BNNs) as a discrete variable inference problem over a factor graph.  ...  Compared to traditional gradient methods for BNNs, our results indicate that both stochastic BP and SP find better configurations of the parameters in the BNN.  ...  We compare this, again, with a full-precision Adam optimizer, which converges within to 90% after 120 epochs.  ... 
arXiv:2204.01935v1 fatcat:eqkpm35vofcyph5qhkypugvb7u

Policy Optimization as Wasserstein Gradient Flows [article]

Ruiyi Zhang, Changyou Chen, Chunyuan Li, Lawrence Carin
2018 arXiv   pre-print
We place policy optimization into the space of probability measures, and interpret it as Wasserstein gradient flows.  ...  Our technique is applicable to several RL settings, and is related to many state-of-the-art policy-optimization algorithms.  ...  Acknowledgements We acknowledge Tuomas Haarnoja et al. for making their code public and thank Ronald Parr for insightful advice. This research was supported in part by DARPA, DOE, NIH, ONR and NSF.  ... 
arXiv:1808.03030v1 fatcat:i3swiw5wrvdnnk7nry6ijir4rm

BANANAS: Bayesian Optimization with Neural Architectures for Neural Architecture Search [article]

Colin White, Willie Neiswanger, Yash Savani
2020 arXiv   pre-print
acquisition optimization strategy.  ...  Bayesian optimization (BO), which has long had success in hyperparameter optimization, has recently emerged as a very promising strategy for NAS when it is coupled with a neural predictor.  ...  Acknowledgments We thank Jeff Schneider, Naveen Sundar Govindarajulu, and Liam Li for their help with this project.  ... 
arXiv:1910.11858v3 fatcat:pgrwhrstw5fjtoce2xayohydvq

Augmenting Neural Networks with Priors on Function Values [article]

Hunter Nisonoff, Yixin Wang, Jennifer Listgarten
2022 arXiv   pre-print
How can we coherently leverage such prior knowledge to help improve a neural network model that is quite accurate in some regions of input space -- typically near the training data -- but wildly wrong  ...  Herein, we tackle this problem by developing an approach to augment BNNs with prior information on the function values themselves.  ...  The GB1 data set used a fully-connected neural network with 1 hidden layer containing 300 dimensions, ReLU non-linearities, and was optimized using Adam with a weight-decay of 0.0001.  ... 
arXiv:2202.04798v3 fatcat:tb25rlg65vdvzj4r3pdgeb3ujy

Bimodal Distributed Binarized Neural Networks [article]

Tal Rozen, Moshe Kimhi, Brian Chmiel, Avi Mendelson, Chaim Baskin
2022 arXiv   pre-print
Preserving this distribution during binarization-aware training creates robust and informative binary feature maps and significantly reduces the generalization error of the BNN.  ...  Our source code, experimental settings, training logs, and binary models are available at .  ...  It contains over 1.2M training images from 1,000 different categories. For ImageNet, we use an ADAM optimizer with a momentum of 0.9 and a learning rate set to 1e − 3.  ... 
arXiv:2204.02004v1 fatcat:hbck33udlbfvrolw4nf76d24cu

Binary Ensemble Neural Network: More Bits per Network or More Networks per Bit?

Shilin Zhu, Xin Dong, Hao Su
2019 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)  
We conclude that the error of BNNs are predominantly caused by the intrinsic instability (training time) and non-robustness (train & test time).  ...  While ensemble techniques have been broadly believed to be only marginally helpful for strong classifiers such as deep neural networks, our analysis and experiments show that they are naturally a perfect  ...  and how it interacts with different optimizers such as SGD or ADAM [37] .  ... 
doi:10.1109/cvpr.2019.00506 dblp:conf/cvpr/ZhuDS19 fatcat:fed4idlbqrcnzg2kicg5uoazpu

Understanding Learning Dynamics of Binary Neural Networks via Information Bottleneck [article]

Vishnu Raj, Nancy Nayak, Sheetal Kalyani
2020 arXiv   pre-print
However, training BNNs are not easy due to the discontinuity in activation functions, and the training dynamics of BNNs is not well understood.  ...  We analyze BNNs through the Information Bottleneck principle and observe that the training dynamics of BNNs is considerably different from that of Deep Neural Networks (DNNs).  ...  Hence, an insight into the learning dynamics can help in the development of efficient optimizers targeted towards training BNNs.  ... 
arXiv:2006.07522v1 fatcat:3tz44z7ia5hw5ftohiqg67z2b4

A comprehensive review of Binary Neural Network [article]

Chunyu Yuan, Sos S. Agaian
2022 arXiv   pre-print
Along the way, it examines BNN (a) purpose: their early successes and challenges; (b) BNN optimization: selected representative works that contain essential optimization techniques; (c) deployment: open-source  ...  frameworks for BNN modeling and development; (d) terminal: efficient computing architectures and devices for BNN and (e) applications: diverse applications with BNN.  ...  Extend based on Real-to-Bin's training strategy, BNN-Adam investigates and designs a new training scheme based on Adam optimizer and can successfully improve Real-to-Bin and ReActNet's trained performance  ... 
arXiv:2110.06804v3 fatcat:b2w6atz27fbgdacq5aiov32bpi

Gryffin: An algorithm for Bayesian optimization of categorical variables informed by expert knowledge [article]

Florian Häse, Matteo Aldeghi, Riley J. Hickman, Loïc M. Roch, Alán Aspuru-Guzik
2021 arXiv   pre-print
To date, the development of data-driven experiment planning strategies for autonomous experimentation has largely focused on continuous process parameters despite the urge to devise efficient strategies  ...  Gryffin augments Bayesian optimization based on kernel density estimation with smooth approximations to categorical distributions.  ...  to accelerate the search, and how this bias can be refined during the optimization to gain scientific insight.  ... 
arXiv:2003.12127v2 fatcat:stdbhzymg5fljkckkjd3gtqw4a

Optimization Models for Machine Learning: A Survey [article]

Claudio Gambella, Bissan Ghaddar, Joe Naoum-Sawaya
2020 arXiv   pre-print
This paper surveys the machine learning literature and presents in an optimization framework several commonly used machine learning approaches.  ...  Particularly, mathematical optimization models are presented for regression, classification, clustering, deep learning, and adversarial learning, as well as new emerging applications in machine teaching  ...  Acknowledgement We are very grateful to four anonymous referees for their valuable feedback and comments that helped improve the content and presentation of the paper.  ... 
arXiv:1901.05331v4 fatcat:3bwfbl34rrf2tkpqeidl5hfoxu

VIME: Variational Information Maximizing Exploration [article]

Rein Houthooft, Xi Chen, Yan Duan, John Schulman, Filip De Turck, Pieter Abbeel
2017 arXiv   pre-print
While there are methods with optimality guarantees in the setting of discrete state and action spaces, these methods cannot be applied in high-dimensional deep RL scenarios.  ...  This paper introduces Variational Information Maximizing Exploration (VIME), an exploration strategy based on maximization of information gain about the agent's belief of environment dynamics.  ...  Acknowledgments This work was done in collaboration between UC Berkeley, Ghent University and OpenAI.  ... 
arXiv:1605.09674v4 fatcat:lrwm2ssr7nb3dhrektnzymohuu

BARS: Joint Search of Cell Topology and Layout for Accurate and Efficient Binary ARchitectures [article]

Tianchen Zhao, Xuefei Ning, Xiangsheng Shi, Songyi Yang, Shuang Liang, Peng Lei, Jianfei Chen, Huazhong Yang, Yu Wang
2021 arXiv   pre-print
And we propose to automatically search for the optimal information flow.  ...  %A notable challenge of BNN architecture search lies in that binary operations exacerbate the "collapse" problem of differentiable NAS, for which we incorporate various search and derive strategies to  ...  Many recent works [18, 3, 28] follow its binarization scheme, and so do we.  ... 
arXiv:2011.10804v3 fatcat:67m4p5vdofev5ldb7cf56g3kqu
« Previous Showing results 1 — 15 out of 320 results