Filters








102 Hits in 2.6 sec

Robust Regression via Hard Thresholding [article]

Kush Bhatia and Prateek Jain and Purushottam Kar
2015 arXiv   pre-print
We study the problem of Robust Least Squares Regression (RLSR) where several response variables can be adversarially corrupted. More specifically, for a data matrix X ∈ R^p x n and an underlying model w*, the response vector is generated as y = X'w* + b where b ∈ R^n is the corruption vector supported over at most C.n coordinates. Existing exact recovery results for RLSR focus solely on L1-penalty based convex formulations and impose relatively strict model assumptions such as requiring the
more » ... uptions b to be selected independently of X. In this work, we study a simple hard-thresholding algorithm called TORRENT which, under mild conditions on X, can recover w* exactly even if b corrupts the response variables in an adversarial manner, i.e. both the support and entries of b are selected adversarially after observing X and w*. Our results hold under deterministic assumptions which are satisfied if X is sampled from any sub-Gaussian distribution. Finally unlike existing results that apply only to a fixed w*, generated independently of X, our results are universal and hold for any w* ∈ R^p. Next, we propose gradient descent-based extensions of TORRENT that can scale efficiently to large scale problems, such as high dimensional sparse recovery and prove similar recovery guarantees for these extensions. Empirically we find TORRENT, and more so its extensions, offering significantly faster recovery than the state-of-the-art L1 solvers. For instance, even on moderate-sized datasets (with p = 50K) with around 40 proposed method called TORRENT-HYB is more than 20x faster than the best L1 solver.
arXiv:1506.02428v1 fatcat:gn3zyiro5nbqtodko6xn2yjkaa

Online learning with dynamics: A minimax perspective [article]

Kush Bhatia, Karthik Sridharan
2020 arXiv   pre-print
We study the problem of online learning with dynamics, where a learner interacts with a stateful environment over multiple rounds. In each round of the interaction, the learner selects a policy to deploy and incurs a cost that depends on both the chosen policy and current state of the world. The state-evolution dynamics and the costs are allowed to be time-varying, in a possibly adversarial way. In this setting, we study the problem of minimizing policy regret and provide non-constructive upper
more » ... bounds on the minimax rate for the problem. Our main results provide sufficient conditions for online learnability for this setup with corresponding rates. The rates are characterized by 1) a complexity term capturing the expressiveness of the underlying policy class under the dynamics of state change, and 2) a dynamics stability term measuring the deviation of the instantaneous loss from a certain counterfactual loss. Further, we provide matching lower bounds which show that both the complexity terms are indeed necessary. Our approach provides a unifying analysis that recovers regret bounds for several well studied problems including online learning with memory, online control of linear quadratic regulators, online Markov decision processes, and tracking adversarial targets. In addition, we show how our tools help obtain tight regret bounds for a new problems (with non-linear dynamics and non-convex losses) for which such bounds were not known prior to our work.
arXiv:2012.01705v1 fatcat:bhhidmzdazftnnl53u2jssxmgq

Agnostic learning with unknown utilities [article]

Kush Bhatia, Peter L. Bartlett, Anca D. Dragan, Jacob Steinhardt
2021 arXiv   pre-print
Traditional learning approaches for classification implicitly assume that each mistake has the same cost. In many real-world problems though, the utility of a decision depends on the underlying context x and decision y. However, directly incorporating these utilities into the learning objective is often infeasible since these can be quite complex and difficult for humans to specify. We formally study this as agnostic learning with unknown utilities: given a dataset S = {x_1, ..., x_n} where
more » ... data point x_i ∼𝒟, the objective of the learner is to output a function f in some class of decision functions ℱ with small excess risk. This risk measures the performance of the output predictor f with respect to the best predictor in the class ℱ on the unknown underlying utility u^*. This utility u^* is not assumed to have any specific structure. This raises an interesting question whether learning is even possible in our setup, given that obtaining a generalizable estimate of utility u^* might not be possible from finitely many samples. Surprisingly, we show that estimating the utilities of only the sampled points S suffices to learn a decision function which generalizes well. We study mechanisms for eliciting information which allow a learner to estimate the utilities u^* on the set S. We introduce a family of elicitation mechanisms by generalizing comparisons, called the k-comparison oracle, which enables the learner to ask for comparisons across k different inputs x at once. We show that the excess risk in our agnostic learning framework decreases at a rate of O(1/k). This result brings out an interesting accuracy-elicitation trade-off – as the order k of the oracle increases, the comparative queries become harder to elicit from humans but allow for more accurate learning.
arXiv:2104.08482v1 fatcat:hxzvbvvjrbdufmlofu6cjdpkku

Establishing Appropriate Trust via Critical States [article]

Sandy H. Huang, Kush Bhatia, Pieter Abbeel, Anca D. Dragan
2018 arXiv   pre-print
In order to effectively interact with or supervise a robot, humans need to have an accurate mental model of its capabilities and how it acts. Learned neural network policies make that particularly challenging. We propose an approach for helping end-users build a mental model of such policies. Our key observation is that for most tasks, the essence of the policy is captured in a few critical states: states in which it is very important to take a certain action. Our user studies show that if the
more » ... obot shows a human what its understanding of the task's critical states is, then the human can make a more informed decision about whether to deploy the policy, and if she does deploy it, when she needs to take control from it at execution time.
arXiv:1810.08174v1 fatcat:64okgolh3be53gej55r7d3x7zi

Efficient and Consistent Robust Time Series Analysis [article]

Kush Bhatia, Prateek Jain, Parameswaran Kamalaruban, Purushottam Kar
2016 arXiv   pre-print
We study the problem of robust time series analysis under the standard auto-regressive (AR) time series model in the presence of arbitrary outliers. We devise an efficient hard thresholding based algorithm which can obtain a consistent estimate of the optimal AR model despite a large fraction of the time series points being corrupted. Our algorithm alternately estimates the corrupted set of points and the model parameters, and is inspired by recent advances in robust regression and
more » ... ding methods. However, a direct application of existing techniques is hindered by a critical difference in the time-series domain: each point is correlated with all previous points rendering existing tools inapplicable directly. We show how to overcome this hurdle using novel proof techniques. Using our techniques, we are also able to provide the first efficient and provably consistent estimator for the robust regression problem where a standard linear observation model with white additive noise is corrupted arbitrarily. We illustrate our methods on synthetic datasets and show that our methods indeed are able to consistently recover the optimal parameters despite a large fraction of points being corrupted.
arXiv:1607.00146v1 fatcat:p2dicyf365d7jdrpomcc5ogfgi

Bayesian Robustness: A Nonasymptotic Viewpoint [article]

Kush Bhatia, Yi-An Ma, Anca D. Dragan, Peter L. Bartlett, Michael I. Jordan
2019 arXiv   pre-print
We study the problem of robustly estimating the posterior distribution for the setting where observed data can be contaminated with potentially adversarial outliers. We propose Rob-ULA, a robust variant of the Unadjusted Langevin Algorithm (ULA), and provide a finite-sample analysis of its sampling distribution. In particular, we show that after T= Õ(d/ε_acc) iterations, we can sample from p_T such that dist(p_T, p^*) ≤ε_acc + Õ(ϵ), where ϵ is the fraction of corruptions. We corroborate our
more » ... retical analysis with experiments on both synthetic and real-world data sets for mean estimation, regression and binary classification.
arXiv:1907.11826v1 fatcat:rmyv2rue2jettjd4nskq2nkb7u

Adaptive Hard Thresholding for Near-optimal Consistent Robust Regression [article]

Arun Sai Suggala, Kush Bhatia, Pradeep Ravikumar, Prateek Jain
2019 arXiv   pre-print
et al., 2015 (Bhatia et al., , 2017 .  ...  We prove this for TORRENT (Bhatia et al., 2015) ; proof for CRR (Bhatia et al., 2017) can be similarly worked out.  ...  Then, for any δ ą 0, with probability at least 1´δ, the following statements are true: Lemma 33 (Bhatia et al. (2015) ) Let X P R nˆp be the matrix of covariates with columns sampled from N p0, Iq.  ... 
arXiv:1903.08192v1 fatcat:pyqsvm467rg5vcilbjbt5kdw6q

Lazy Generic Cuts

Dinesh Khandelwal, Kush Bhatia, Chetan Arora, Parag Singla
2016 Computer Vision and Image Understanding  
LP relaxation based message passing and flow-based algorithms are two of the popular techniques for performing MAP inference in graphical models. Generic Cuts (GC) (Arora et al., 2015) combines the two approaches to generalize the traditional max-flow min-cut based algorithms for binary models with higher order clique potentials. The algorithm has been shown to be significantly faster than the state of the art algorithms. The time and memory complexities of Generic Cuts are linear in the number
more » ... of constraints, which in turn is exponential in the clique size. This limits the applicability of the approach to small cliques only. In this paper, we propose a lazy version of Generic Cuts exploiting the property that in most of such inference problems a large fraction of the constraints are never used during the course of minimization. We start with a small set of constraints (called the active constraints) which are expected to play a role during the minimization process. GC is then run with this reduced set allowing it to be efficient in time and memory. The set of active constraints is adaptively learnt over multiple iterations while guaranteeing convergence to the optimum for submodular clique potentials. Our experiments show that the number of constraints required by the algorithm is typically less than 3% of the total number of constraints. Experiments on computer vision datasets show that our approach can significantly outperform the state of the art both in terms of time and memory and is scalable to clique sizes that could not be handled by existing approaches.
doi:10.1016/j.cviu.2015.10.016 fatcat:5lqpqvgfgjhjvbt26srl6ug7la

Locally Non-linear Embeddings for Extreme Multi-label Learning [article]

Kush Bhatia and Himanshu Jain and Purushottam Kar and Prateek Jain and Manik Varma
2015 arXiv   pre-print
The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches make training and prediction tractable by assuming that the training label matrix is low-rank and hence the effective number of labels can be reduced by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches have
more » ... een unable to deliver high prediction accuracies or scale to large problems as the low rank assumption is violated in most real world applications. This paper develops the X-One classifier to address both limitations. The main technical contribution in X-One is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows X-One to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors. We conducted extensive experiments on several real-world as well as benchmark data sets and compared our method against state-of-the-art methods for extreme multi-label classification. Experiments reveal that X-One can make significantly more accurate predictions then the state-of-the-art methods including both embeddings (by as much as 35%) as well as trees (by as much as 6%). X-One can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods.
arXiv:1507.02743v1 fatcat:akyvda6lqzhn3kufwyosfdrn5u

FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network [article]

Aditya Kusupati, Manish Singh, Kush Bhatia, Ashish Kumar, Prateek Jain, Manik Varma
2019 arXiv   pre-print
This paper develops the FastRNN and FastGRNN algorithms to address the twin RNN limitations of inaccurate training and inefficient prediction. Previous approaches have improved accuracy at the expense of prediction costs making them infeasible for resource-constrained and real-time applications. Unitary RNNs have increased accuracy somewhat by restricting the range of the state transition matrix's singular values but have also increased the model size as they require a larger number of hidden
more » ... its to make up for the loss in expressive power. Gated RNNs have obtained state-of-the-art accuracies by adding extra parameters thereby resulting in even larger models. FastRNN addresses these limitations by adding a residual connection that does not constrain the range of the singular values explicitly and has only two extra scalar parameters. FastGRNN then extends the residual connection to a gate by reusing the RNN matrices to match state-of-the-art gated RNN accuracies but with a 2-4x smaller model. Enforcing FastGRNN's matrices to be low-rank, sparse and quantized resulted in accurate models that could be up to 35x smaller than leading gated and unitary RNNs. This allowed FastGRNN to accurately recognize the "Hey Cortana" wakeword with a 1 KB model and to be deployed on severely resource-constrained IoT microcontrollers too tiny to store other RNN models. FastGRNN's code is available at https://github.com/Microsoft/EdgeML/.
arXiv:1901.02358v1 fatcat:qpuncb62jbhyfdwvivqrg46tpq

The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models [article]

Alexander Pan, Kush Bhatia, Jacob Steinhardt
2022 arXiv   pre-print
Reward hacking -- where RL agents exploit gaps in misspecified reward functions -- has been widely observed, but not yet systematically studied. To understand how reward hacking arises, we construct four RL environments with misspecified rewards. We investigate reward hacking as a function of agent capabilities: model capacity, action space resolution, observation space noise, and training time. More capable agents often exploit reward misspecifications, achieving higher proxy reward and lower
more » ... rue reward than less capable agents. Moreover, we find instances of phase transitions: capability thresholds at which the agent's behavior qualitatively shifts, leading to a sharp decrease in the true reward. Such phase transitions pose challenges to monitoring the safety of ML systems. To address this, we propose an anomaly detection task for aberrant policies and offer several baseline detectors.
arXiv:2201.03544v2 fatcat:sc3uzh5hqvg63ov5z56hw2dage

Congested Bandits: Optimal Routing via Short-term Resets

Pranjal Awasthi, Kush Bhatia, Sreenivas Gollapudi, Kostas Kollias
2022 International Conference on Machine Learning  
Correspondence to: Kush Bhatia <kushbhatia@berkeley.edu>. We usually suppress the dependence of this history on the algorithm, but make it explicit whenever it is not clear from context.  ... 
dblp:conf/icml/AwasthiBGK22 fatcat:6lwb7y4f6zetzigdfledissu7m

Explaining robot policies

Olivia G Watkins, Sandy H Huang, Julius Frost, Kush Bhatia, Eric Weiner, Pieter Abbeel, Trevor Darrell, Bryan Plummer, Kate Saenko, Anca Dragan
2021 Applied AI Letters  
CONFLICT OF INTEREST The authors declare no potential conflict of interests. 8 | AUTHOR CONTRIBUTIONS Sandy Huang and Kush Bhatia, advised by Pieter Abbeel and Anca Dragan conceived of and experimented  ... 
doi:10.1002/ail2.52 fatcat:k27gs4a4krdqjatsa2rm5nitsa

Sparse Local Embeddings for Extreme Multi-label Classification

Kush Bhatia, Himanshu Jain, Purushottam Kar, Manik Varma, Prateek Jain
2015 Neural Information Processing Systems  
The objective in extreme multi-label learning is to train a classifier that can automatically tag a novel data point with the most relevant subset of labels from an extremely large label set. Embedding based approaches attempt to make training and prediction tractable by assuming that the training label matrix is low-rank and reducing the effective number of labels by projecting the high dimensional label vectors onto a low dimensional linear subspace. Still, leading embedding approaches have
more » ... en unable to deliver high prediction accuracies, or scale to large problems as the low rank assumption is violated in most real world applications. In this paper we develop the SLEEC classifier to address both limitations. The main technical contribution in SLEEC is a formulation for learning a small ensemble of local distance preserving embeddings which can accurately predict infrequently occurring (tail) labels. This allows SLEEC to break free of the traditional low-rank assumption and boost classification accuracy by learning embeddings which preserve pairwise distances between only the nearest label vectors. We conducted extensive experiments on several real-world, as well as benchmark data sets and compared our method against state-of-the-art methods for extreme multi-label classification. Experiments reveal that SLEEC can make significantly more accurate predictions then the state-of-the-art methods including both embedding-based (by as much as 35%) as well as tree-based (by as much as 6%) methods. SLEEC can also scale efficiently to data sets with a million labels which are beyond the pale of leading embedding methods.
dblp:conf/nips/BhatiaJKVJ15 fatcat:y42jjydpzzhcjkkrmqysys7pf4

Statistical and Computational Trade-offs in Variational Inference: A Case Study in Inferential Model Selection [article]

Kush Bhatia, Nikki Lijing Kuang, Yi-An Ma, Yixin Wang
2022 arXiv   pre-print
Variational inference has recently emerged as a popular alternative to the classical Markov chain Monte Carlo (MCMC) in large-scale Bayesian inference. The core idea of variational inference is to trade statistical accuracy for computational efficiency. It aims to approximate the posterior, reducing computation costs but potentially compromising its statistical accuracy. In this work, we study this statistical and computational trade-off in variational inference via a case study in inferential
more » ... odel selection. Focusing on Gaussian inferential models (a.k.a. variational approximating families) with diagonal plus low-rank precision matrices, we initiate a theoretical study of the trade-offs in two aspects, Bayesian posterior inference error and frequentist uncertainty quantification error. From the Bayesian posterior inference perspective, we characterize the error of the variational posterior relative to the exact posterior. We prove that, given a fixed computation budget, a lower-rank inferential model produces variational posteriors with a higher statistical approximation error, but a lower computational error; it reduces variances in stochastic optimization and, in turn, accelerates convergence. From the frequentist uncertainty quantification perspective, we consider the precision matrix of the variational posterior as an uncertainty estimate. We find that, relative to the true asymptotic precision, the variational approximation suffers from an additional statistical error originating from the sampling uncertainty of the data. Moreover, this statistical error becomes the dominant factor as the computation budget increases. As a consequence, for small datasets, the inferential model need not be full-rank to achieve optimal estimation error. We finally demonstrate these statistical and computational trade-offs inference across empirical studies, corroborating the theoretical findings.
arXiv:2207.11208v1 fatcat:o42bmco6gnc63mxgjbsbrqwrem
« Previous Showing results 1 — 15 out of 102 results