342 Hits in 9.2 sec

Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality [article]

Yi Zhang, Orestis Plevrakis, Simon S. Du, Xingguo Li, Zhao Song, Sanjeev Arora
2020 arXiv   pre-print
It is unclear how to extend these results to adversarial training because of the min-max objective.  ...  Recently, a first step towards this direction was made by Gao et al. using tools from online learning, but they require the width of the net to be exponential in input dimension d, and with an unnatural  ...  This is very extreme over-parametrization, and this curse of dimensionality is inherent to their argument.  ... 
arXiv:2002.06668v2 fatcat:hvaezu4jdbaovizm3zdc5zftwi

Tensor-Train Density Estimation [article]

Georgii S. Novikov, Maxim E. Panov, Ivan V. Oseledets
2022 arXiv   pre-print
We develop an efficient non-adversarial training procedure for TTDE based on the Riemannian optimization.  ...  Experimental results demonstrate the competitive performance of the proposed method in density estimation and sampling tasks, while TTDE significantly outperforms competitors in training speed.  ...  Acknowledgements The research was supported by the Russian Science Foundation grant 20-71-10135.  ... 
arXiv:2108.00089v2 fatcat:iso33ew6vvfsnepcsmoqe5c57i

On the Generalization Properties of Adversarial Training [article]

Yue Xing, Qifan Song, Guang Cheng
2021 arXiv   pre-print
In the former regime, after overcoming the non-smoothness of adversarial training, the adversarial risk of the trained models can converge to the minimal adversarial risk.  ...  In contrast, this paper studies the generalization performance of a generic adversarial training algorithm.  ...  ., and Arora, S. (2020), "Over-parameterized Adversarial Training: An Analysis Overcoming the Curse of Dimensionality," arXiv preprint arXiv:2002.06668. The structure of appendix is as follows.  ... 
arXiv:2008.06631v2 fatcat:arubu7mwvfaxrjhymaobufzih4

Does Preprocessing Help Training Over-parameterized Neural Networks? [article]

Zhao Song, Shuo Yang, Ruizhe Zhang
2021 arXiv   pre-print
The classical training method requires paying Ω(mnd) cost for both forward computation and backward computation, where m is the width of the neural network, and we are given n training points in d-dimensional  ...  From the technical perspective, our result is a sophisticated combination of tools in different fields, greedy-type convergence analysis in optimization, sparsity observation in practical work, high-dimensional  ...  Over-parameterized adversarial training: An analysis overcoming the curse of dimensionality. In NeurIPS. arXiv preprint arXiv:2002.06668, 2020. Roadmap. In Section A, we present our main algorithms.  ... 
arXiv:2110.04622v1 fatcat:mcw5eewvunacthfwhyvbgfpxki

Rethinking embedding coupling in pre-trained language models [article]

Hyung Won Chung, Thibault Févry, Henry Tsai, Melvin Johnson, Sebastian Ruder
2020 arXiv   pre-print
We re-evaluate the standard practice of sharing weights between input and output embeddings in state-of-the-art pre-trained language models.  ...  Our analysis shows that larger output embeddings prevent the model's last layers from overspecializing to the pre-training task and encourage Transformer representations to be more general and more transferable  ...  We perform an extensive analysis on this behavior in §6. We show results with an English BERT Base model in Appendix A.6, which show the same trend.  ... 
arXiv:2010.12821v1 fatcat:x2qbyadgljbllcs4yu5gna4h64

Adaptive Simulation-based Training of AI Decision-makers using Bayesian Optimization [article]

Brett W. Israelsen, Nisar Ahmed, Kenneth Center, Roderick Green, Winston Bennett Jr
2017 arXiv   pre-print
This work studies how an AI-controlled dog-fighting agent with tunable decision-making parameters can learn to optimize performance against an intelligent adversary, as measured by a stochastic objective  ...  Simulation studies show that HRMS improves the accuracy of GP surrogate models, allowing AI decision-makers to more accurately predict performance and efficiently tune parameters.  ...  As in the case of Thompson sampling for classical SS, this approach is asymptotically efficient but still suffers from the curse of dimensionality. B.  ... 
arXiv:1703.09310v2 fatcat:ppmame2e2vhdjnzxzqaxnkdqpu

Bridging a Gap in SAR-ATR: Training on Fully Synthetic and Testing on Measured Data

Nathan Inkawhich, Matthew Joseph Inkawhich, Eric Davis, Uttam Majumder, Erin Tripp, Christopher Thomas Capraro, Yiran Chen
2021 IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing  
ultimately achieved over 95% accuracy on the SAMPLE dataset.  ...  However, this approach relies on the availability of some amount of measured data. In this work, we focus on the case of having 100% synthetic training data, while testing on only measured data.  ...  M60 116 60 176 8 T72 56 52 108 9 ZSU23 116 58 174 Totals 806 539 requirements of training DNNs and also implicates the "curse of dimensionality" because our training samples are sparsely  ... 
doi:10.1109/jstars.2021.3059991 fatcat:2e6whpnw2ba7vhhjnn3mwyi47y

Overview frequency principle/spectral bias in deep learning [article]

Zhi-Qin John Xu, Yaoyu Zhang, Tao Luo
2022 arXiv   pre-print
In recent years, a research line from Fourier analysis sheds lights into this magical "black box" by showing a Frequency Principle (F-Principle or spectral bias) of the training behavior of deep neural  ...  The F-Principle is first demonstrated by one-dimensional synthetic data followed by the verification in high-dimensional real datasets.  ...  This theorem shows that if we consider an over-parameterized FEM, it solves the green function of the PDE.  ... 
arXiv:2201.07395v1 fatcat:23damdzck5ekxnulslddq2pgbq

Error Estimates for the Deep Ritz Method with Boundary Penalty [article]

Johannes Müller, Marius Zeinhofer
2021 arXiv   pre-print
For essential boundary conditions, given an approximation rate of r in H^1(Ω) and an approximation rate of s in L^2(∂Ω) of the ansatz classes, the optimal decay rate of the estimated error is min(s/2,  ...  For non-essential boundary conditions the error of the Ritz method decays with the same rate as the approximation rate of the ansatz classes.  ...  Jentzen, Analysis of the generalization error: Empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of  ... 
arXiv:2103.01007v3 fatcat:vpz2n2thdzhfpgrujnb5eyvb7u

Adversarial Robustness with Semi-Infinite Constrained Learning [article]

Alexander Robey and Luiz F. O. Chamon and George J. Pappas and Hamed Hassani and Alejandro Ribeiro
2021 arXiv   pre-print
Thus, there is a gap between the theory and practice of adversarial training, particularly with respect to when and why adversarial training works.  ...  While adversarial training can mitigate this issue in practice, state-of-the-art methods are increasingly application-dependent, heuristic in nature, and suffer from fundamental trade-offs between nominal  ...  The authors would like to thank Juan Cervino and Samuel Sokota for their helpful feedback.  ... 
arXiv:2110.15767v1 fatcat:4co75xxx4vf43ofmuq5qdyweca

Variational Adversarial Active Learning [article]

Samarth Sinha, Sayna Ebrahimi, Trevor Darrell
2019 arXiv   pre-print
Our method learns a latent space using a variational autoencoder (VAE) and an adversarial network trained to discriminate between unlabeled and labeled data.  ...  Our results demonstrate that our adversarial approach learns an effective low dimensional latent space in large-scale settings and provides for a computationally efficient sampling method.  ...  In this analysis we investigate the performance of VAAL in the presence of noisy data caused by an inaccurate oracle.  ... 
arXiv:1904.00370v3 fatcat:4l7wyohuhzfe7pohdy5m74xk4a

Coulomb Autoencoders [article]

Emanuele Sansone and Hafiz Tiomoko Ali and Sun Jiacheng
2019 arXiv   pre-print
Learning the true density in high-dimensional feature spaces is a well-known problem in machine learning.  ...  In particular, (i) we prove that MMD coupled with Coulomb kernels has optimal convergence properties, which are similar to convex functionals, thus improving the training of autoencoders, and (ii) we provide  ...  Therefore, the problem of density estimation in a high-dimensional feature space is converted into a problem of estimation in a lower dimensional vector space, thus overcoming the curse of dimensionality  ... 
arXiv:1802.03505v6 fatcat:jkm6axspubbqng3nr4izuog32a

Uncertainty-Aware Deep Classifiers Using Generative Models

Murat Sensoy, Lance Kaplan, Federico Cerutti, Maryam Saleki
Through extensive analysis, we demonstrate that the proposed approach provides better estimates of uncertainty for in- and out-of-distribution samples, and adversarial examples on well-known data sets  ...  However, selection or creation of such an auxiliary data set is non-trivial, especially for high dimensional data such as images.  ...  The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the U.S.  ... 
doi:10.1609/aaai.v34i04.6015 fatcat:v5lveat4e5fxte3w4pvymkpmzi

Deep Learning and Its Application to LHC Physics

Dan Guest, Kyle Cranmer, Daniel Whiteson
2018 Annual Review of Nuclear and Particle Science  
Machine learning has played an important role in the analysis of high-energy physics data for decades.  ...  The connections between machine learning and high energy physics data analysis are explored, followed by an introduction to the core concepts of neural networks, examples of the key results demonstrating  ...  ACKNOWLEDGMENTS The authors thank Ben Nachman for helpful comments. D.W. and D.G. are supported by the Office of Science at the US Department of Energy.  ... 
doi:10.1146/annurev-nucl-101917-021019 fatcat:4ll2ex624jcutgimi5w7wya2bq

Adversarial Deep Embedded Clustering: on a better trade-off between Feature Randomness and Feature Drift [article]

Nairouz Mrabah, Mohamed Bouguessa, Riadh Ksantini
2019 arXiv   pre-print
training.  ...  In the absence of concrete supervisory signals, the embedded clustering objective function may distort the latent space by learning from unreliable pseudo-labels.  ...  To address the curse of dimensionality, the original high-dimensional data should be projected in a low-dimensional feature space.  ... 
arXiv:1909.11832v1 fatcat:e7ewts6ocfeuvhc5bnz5utmtxu
« Previous Showing results 1 — 15 out of 342 results