69 Hits in 0.74 sec

Explaining generalization in deep learning: progress and fundamental limits [article]

Vaishnavh Nagarajan
2021 arXiv   pre-print
The results in this chapter have been previously published in Nagarajan and Kolter [2017].  ...  The results in this chapter have previously been published in Nagarajan and Kolter [2019].  ... 
arXiv:2110.08922v1 fatcat:blzj6rhlffgrjpeaj2ia57g6gq

Lifelong Learning in Costly Feature Spaces [article]

Maria-Florina Balcan, Avrim Blum, Vaishnavh Nagarajan
2017 arXiv   pre-print
An important long-term goal in machine learning systems is to build learning agents that, like humans, can learn many tasks over their lifetime, and moreover use information from these tasks to improve their ability to do so efficiently. In this work, our goal is to provide new theoretical insights into the potential of this paradigm. In particular, we propose a lifelong learning framework that adheres to a novel notion of resource efficiency that is critical in many real-world domains where
more » ... ture evaluations are costly. That is, our learner aims to reuse information from previously learned related tasks to learn future tasks in a feature-efficient manner. Furthermore, we consider novel combinatorial ways in which learning tasks can relate. Specifically, we design lifelong learning algorithms for two structurally different and widely used families of target functions: decision trees/lists and monomials/polynomials. We also provide strong feature-efficiency guarantees for these algorithms; in fact, we show that in order to learn future targets, we need only slightly more feature evaluations per training example than what is needed to predict on an arbitrary example using those targets. We also provide algorithms with guarantees in an agnostic model where not all the targets are related to each other. Finally, we also provide lower bounds on the performance of a lifelong learner in these models, which are in fact tight under some conditions.
arXiv:1706.10271v1 fatcat:tgegoo7b5rggxnnzhiyykmzagm

A Learning Theoretic Perspective on Local Explainability [article]

Jeffrey Li, Vaishnavh Nagarajan, Gregory Plumb, Ameet Talwalkar
2020 arXiv   pre-print
Vaishnavh Nagarajan was supported by a grant from the Bosch Center for AI.  ...  Vaishnavh Nagarajan and J. Zico Kolter. Uniform convergence may be unable to explain generalization in deep learning. In Advances in Neural Information Processing Systems 32, pp. 11615-11626.  ... 
arXiv:2011.01205v1 fatcat:j7gr6ijdb5asdmqkk7nenf4qxa

A Reinforcement Learning Approach to Online Learning of Decision Trees [article]

Abhinav Garlapati, Aditi Raghunathan, Vaishnavh Nagarajan and Balaraman Ravindran
2015 arXiv   pre-print
Online decision tree learning algorithms typically examine all features of a new data point to update model parameters. We propose a novel alternative, Reinforcement Learning- based Decision Trees (RLDT), that uses Reinforcement Learning (RL) to actively examine a minimal number of features of a data point to classify it with high accuracy. Furthermore, RLDT optimizes a long term return, providing a better alternative to the traditional myopic greedy approach to growing decision trees. We
more » ... trate that this approach performs as well as batch learning algorithms and other online decision tree learning algorithms, while making significantly fewer queries about the features of the data points. We also show that RLDT can effectively handle concept drift.
arXiv:1507.06923v1 fatcat:sytaki5gjzazfaecjlm4nn6fzu

Assessing Generalization of SGD via Disagreement [article]

Yiding Jiang, Vaishnavh Nagarajan, Christina Baek, J. Zico Kolter
2022 arXiv   pre-print
Yiding Jiang and Vaishnavh Nagarajan were supported by funding from the Bosch Center for Artificial Intelligence.  ...  ., 2018; Nagarajan & Kolter, 2019a; Jiang et al., 2020b; .  ...  Representative works in this large area of research include Neyshabur et al. (2014; 2017; 2018); Dziugaite & Roy (2017); Bartlett et al. (2017); Nagarajan & Kolter (2019b;c); Krishnan Notations.  ... 
arXiv:2106.13799v2 fatcat:f4ytduv6nvh2lcoltxuvgwzbbq

Provably Safe PAC-MDP Exploration Using Analogies [article]

Melrose Roderick, Vaishnavh Nagarajan, J. Zico Kolter
2021 arXiv   pre-print
Acknowledgments and Disclosure of Funding Melrose Roderick and Vaishnavh Nagarajan were supported by a grant from the Bosch Center for AI.  ... 
arXiv:2007.03574v2 fatcat:dabgchcckfeatpn77u42tre5da

Understanding the Failure Modes of Out-of-Distribution Generalization [article]

Vaishnavh Nagarajan, Anders Andreassen, Behnam Neyshabur
2021 arXiv   pre-print
Vaishnavh Nagarajan and J. Zico Kolter. Generalization in deep networks: The role of distance from initialization. 2017. Vaishnavh Nagarajan and J. Zico Kolter.  ...  (2017) ; Nagarajan & Kolter (2017; 2019) for norms of overparameterized neural networks.  ... 
arXiv:2010.15775v2 fatcat:3rc2s7gutrf3tg7i3dcrgqdoaa

Revisiting Adversarial Risk [article]

Arun Sai Suggala, Adarsh Prasad, Vaishnavh Nagarajan, Pradeep Ravikumar
2019 arXiv   pre-print
Recent works on adversarial perturbations show that there is an inherent trade-off between standard test accuracy and adversarial accuracy. Specifically, they show that no classifier can simultaneously be robust to adversarial perturbations and achieve high standard test accuracy. However, this is contrary to the standard notion that on tasks such as image classification, humans are robust classifiers with low error rate. In this work, we show that the main reason behind this confusion is the
more » ... exact definition of adversarial perturbation that is used in the literature. To fix this issue, we propose a slight, yet important modification to the existing definition of adversarial perturbation. Based on the modified definition, we show that there is no trade-off between adversarial and standard accuracies; there exist classifiers that are robust and achieve high standard accuracy. We further study several properties of this new definition of adversarial risk and its relation to the existing definition.
arXiv:1806.02924v5 fatcat:3wx5nklztnhebkkvsvxeu6sokq

Generalization in Deep Networks: The Role of Distance from Initialization [article]

Vaishnavh Nagarajan, J. Zico Kolter
2019 arXiv   pre-print
Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network? To answer this question, we advocate a notion of effective model capacity that is dependent on a given random initialization of the network and not just the training algorithm and the data distribution. We provide empirical evidences that demonstrate that the model capacity of SGD-trained deep networks is in fact
more » ... cted through implicit regularization of the ℓ_2 distance from the initialization. We also provide theoretical arguments that further highlight the need for initialization-dependent notions of model capacity. We leave as open questions how and why distance from initialization is regularized, and whether it is sufficient to explain generalization.
arXiv:1901.01672v2 fatcat:e46fgius35a5vae66hs6ut5lyy

Incorporating Side Information in Tensor Completion

Hemank Lamba, Vaishnavh Nagarajan, Kijung Shin, Naji Shajarisales
2016 Proceedings of the 25th International Conference Companion on World Wide Web - WWW '16 Companion  
Matrix and tensor completion techniques have proven useful in many applications such as recommender systems, image/video restoration, and web search. We explore the idea of using external information in completing missing values in tensors. In this work, we present a framework that employs side information as kernel matrices for tensor factorization. We apply our framework to problems of recommender systems and video restoration and show that our framework effectively deals with the cold-start problem.
doi:10.1145/2872518.2889371 dblp:conf/www/LambaNSS16 fatcat:mgrqomtcsve3jgco36mtnannlu

Gradient descent GAN optimization is locally stable [article]

Vaishnavh Nagarajan, J. Zico Kolter
2018 arXiv   pre-print
Despite the growing prominence of generative adversarial networks (GANs), optimization in GANs is still a poorly understood topic. In this paper, we analyze the "gradient descent" form of GAN optimization i.e., the natural setting where we simultaneously take small gradient steps in both generator and discriminator parameters. We show that even though GAN optimization does not correspond to a convex-concave game (even for simple parameterizations), under proper conditions, equilibrium points of
more » ... this optimization procedure are still locally asymptotically stable for the traditional GAN formulation. On the other hand, we show that the recently proposed Wasserstein GAN can have non-convergent limit cycles near equilibrium. Motivated by this stability analysis, we propose an additional regularization term for gradient descent GAN updates, which is able to guarantee local stability for both the WGAN and the traditional GAN, and also shows practical promise in speeding up convergence and addressing mode collapse.
arXiv:1706.04156v3 fatcat:25onu33mk5apbo2uqegfkkmh5u

Every team deserves a second chance: an extended study on predicting team performance

Leandro Soriano Marcolino, Aravind S. Lakshminarayanan, Vaishnavh Nagarajan, Milind Tambe
2016 Autonomous Agents and Multi-Agent Systems  
Voting among different agents is a powerful tool in problem solving, and it has been widely applied to improve the performance in finding the correct answer to complex problems. We present a novel benefit of voting, that has not been observed before: we can use the voting patterns to assess the performance of a team and predict their final outcome. This prediction can be executed at any moment during problem-solving and it is completely domain independent. Hence, it can be used to identify when
more » ... a team is failing, allowing an operator to take remedial procedures (such as changing team members, the voting rule, or increasing the allocation of resources). We present three main theoretical results: (1) we show a theoretical explanation of why our prediction method works; (2) contrary to what would be expected based on a simpler explanation using classical voting models, we show that we can make accurate predictions irrespective of the strength (i.e., performance) of the teams, and that in fact, the prediction can work better for diverse teams composed of different agents than uniform teams made of copies of the best agent; (3) we show that the quality of our prediction increases with the size of the action space. 123 Auton Agent Multi-Agent Syst quality predictions about the final outcome of games. We analyze the prediction accuracy for three different teams with different levels of diversity and strength, and show that the prediction works significantly better for a diverse team. Additionally, we show that our method still works well when trained with games against one adversary, but tested with games against another, showing the generality of the learned functions. Moreover, we evaluate four different board sizes, and experimentally confirm better predictions in larger board sizes. We analyze in detail the learned prediction functions, and how they change according to each team and action space size. In order to show that our method is domain independent, we also present results in Ensemble Learning, where we make online predictions about the performance of a team of classifiers, while they are voting to classify sets of items. We study a set of classical classification algorithms from machine learning, in a data-set of hand-written digits, and we are able to make high-quality predictions about the final performance of two different teams. Since our approach is domain independent, it can be easily applied to a variety of other domains.
doi:10.1007/s10458-016-9348-2 fatcat:crwesyelobgefe5j3kzo33eeqy

Geriatrix: Aging what you see and what you don't see. A file system aging approach for modern storage systems

Saurabh Kadekodi, Vaishnavh Nagarajan, Gregory R. Ganger
2018 USENIX Annual Technical Conference  
File system performance on modern primary storage devices (Flash-based SSDs) is greatly affected by aging of the free space, much more so than were mechanical disk drives. We introduce Geriatrix, a simple-to-use profile driven file system aging tool that induces target levels of fragmentation in both allocated files (what you see) and remaining free space (what you don't see), unlike previous approaches that focus on just the former. This paper describes and evaluates the effectiveness of
more » ... rix, showing that it recreates both fragmentation effects better than previous approaches. Using Geriatrix, we show that measurements presented in many recent file systems papers are higher than should be expected, by up to 30% on mechanical (HDD) and up to 80% on Flash (SSD) disks. Worse, in some cases, the performance rank ordering of file system designs being compared are different from the published results. Geriatrix will be released as open source software with eight built-in aging profiles, in the hopes that it can address the need created by the increased performance impact of file system aging in modern SSD-based storage.
dblp:conf/usenix/KadekodiNG18 fatcat:cqtl6co4rfhgzo5ycymx3wz5ti

Uniform convergence may be unable to explain generalization in deep learning [article]

Vaishnavh Nagarajan, J. Zico Kolter
2021 arXiv   pre-print
Vaishnavh Nagarajan is supported by a grant from the Bosch Center for AI.  ...  MIT Press, 2012. [27] Vaishnavh Nagarajan and J. Zico Kolter. Generalization in deep networks: The role of distance from initialization.  ...  Deep Learning: Bridging Theory and Practice Workshop in Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, 2017. [28] Vaishnavh Nagarajan  ... 
arXiv:1902.04742v4 fatcat:r2t5fjuv7jg77m64ex4gacgwl4

Learning-Theoretic Foundations of Algorithm Configuration for Combinatorial Partitioning Problems [article]

Maria-Florina Balcan, Vaishnavh Nagarajan, Ellen Vitercik, Colin White
2018 arXiv   pre-print
Max-cut, clustering, and many other partitioning problems that are of significant importance to machine learning and other scientific fields are NP-hard, a reality that has motivated researchers to develop a wealth of approximation algorithms and heuristics. Although the best algorithm to use typically depends on the specific application domain, a worst-case analysis is often used to compare algorithms. This may be misleading if worst-case instances occur infrequently, and thus there is a
more » ... for optimization methods which return the algorithm configuration best suited for the given application's typical inputs. We address this problem for clustering, max-cut, and other partitioning problems, such as integer quadratic programming, by designing computationally efficient and sample efficient learning algorithms which receive samples from an application-specific distribution over problem instances and learn a partitioning algorithm with high expected performance. Our algorithms learn over common integer quadratic programming and clustering algorithm families: SDP rounding algorithms and agglomerative clustering algorithms with dynamic programming. For our sample complexity analysis, we provide tight bounds on the pseudodimension of these algorithm classes, and show that surprisingly, even for classes of algorithms parameterized by a single parameter, the pseudo-dimension is superconstant. In this way, our work both contributes to the foundations of algorithm configuration and pushes the boundaries of learning theory, since the algorithm classes we analyze consist of multi-stage optimization procedures and are significantly more complex than classes typically studied in learning theory.
arXiv:1611.04535v4 fatcat:tj5j3iyinvd6ffkbwyy3e6didm
« Previous Showing results 1 — 15 out of 69 results