67 Hits in 0.3 sec

Some Architectures for Chebyshev Interpolation [article]

Theja Tulabandhula
2010 arXiv   pre-print
Digital architectures for Chebyshev interpolation are explored and a variation which is word-serial in nature is proposed. These architectures are contrasted with equispaced system structures. Further, Chebyshev interpolation scheme is compared to the conventional equispaced interpolation vis-a-vis reconstruction error and relative number of samples. It is also shown that the use of a hybrid (or dual) Analog to Digital converter unit can reduce system power consumption by as much as 1/3rd of
more » ... much as 1/3rd of the original.
arXiv:1001.1185v1 fatcat:6xey6fuon5aophvgwdlejkf3ya

Learning Personalized Optimal Control for Repeatedly Operated Systems [article]

Theja Tulabandhula
2016 arXiv   pre-print
We consider the problem of online learning of optimal control for repeatedly operated systems in the presence of parametric uncertainty. During each round of operation, environment selects system parameters according to a fixed but unknown probability distribution. These parameters govern the dynamics of a plant. An agent chooses a control input to the plant and is then revealed the cost of the choice. In this setting, we design an agent that personalizes the control input to this plant taking
more » ... this plant taking into account the stochasticity involved. We demonstrate the effectiveness of our approach on a simulated system.
arXiv:1609.05536v1 fatcat:uqas4n4l3jggngcejkx4327qau

Efficient Reinforcement Learning via Initial Pure Exploration [article]

Sudeep Raja Putta, Theja Tulabandhula
2017 arXiv   pre-print
In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice tests to know which areas she needs to improve upon. Based of the scores she obtains in these practice tests, she would formulate a strategy for maximizing her scores in the actual tests. We treat this scenario in the context of an agent exploring a
more » ... loring a fixed-horizon episodic Markov Decision Process (MDP), where the agent can practice on the MDP for some number of episodes (not necessarily known in advance) before starting to incur regret for its actions. During practice, the agent's goal must be to maximize the probability of following an optimal policy. This is akin to the problem of Pure Exploration (PE). We extend the PE problem of Multi Armed Bandits (MAB) to MDPs and propose a Bayesian algorithm called Posterior Sampling for Pure Exploration (PSPE), which is similar to its bandit counterpart. We show that the Bayesian simple regret converges at an optimal exponential rate when using PSPE. When the agent starts being evaluated, its goal would be to minimize the cumulative regret incurred. This is akin to the problem of Reinforcement Learning (RL). The agent uses the Posterior Sampling for Reinforcement Learning algorithm (PSRL) initialized with the posteriors of the practice phase. We hypothesize that this PSPE + PSRL combination is an optimal strategy for minimizing regret in RL problems with an initial practice phase. We show empirical results which prove that having a lower simple regret at the end of the practice phase results in having lower cumulative regret during evaluation.
arXiv:1706.02237v1 fatcat:cebfjmlyofbz7dtfw2t2wj52iu

Robust Optimization using Machine Learning for Uncertainty Sets [article]

Theja Tulabandhula, Cynthia Rudin
2014 arXiv   pre-print
Our goal is to build robust optimization problems for making decisions based on complex data from the past. In robust optimization (RO) generally, the goal is to create a policy for decision-making that is robust to our uncertainty about the future. In particular, we want our policy to best handle the the worst possible situation that could arise, out of an uncertainty set of possible situations. Classically, the uncertainty set is simply chosen by the user, or it might be estimated in overly
more » ... timated in overly simplistic ways with strong assumptions; whereas in this work, we learn the uncertainty set from data collected in the past. The past data are drawn randomly from an (unknown) possibly complicated high-dimensional distribution. We propose a new uncertainty set design and show how tools from statistical learning theory can be employed to provide probabilistic guarantees on the robustness of the policy.
arXiv:1407.1097v1 fatcat:uqyj5yqs3veqxh4y65siqicmtq

Multi-Purchase Behavior: Modeling and Optimization [article]

Theja Tulabandhula, Deeksha Sinha, Prasoon Patidar
2020 arXiv   pre-print
(Rusmevichientong et al. 2010a, Sinha and Tulabandhula 2020) . We refer the readers to Kök et al. (2008) for an overview of these optimization methods.  ...  Binary Search with Efficient Comparisons A binary search based efficient algorithm for a single-choice model was first described in Sinha and Tulabandhula (2020) , who evaluate the efficacy of using nearest  ... 
arXiv:2006.08055v1 fatcat:cq2el5bgrreqnkrohesvjcl36q

Privacy-preserving Targeted Advertising [article]

Theja Tulabandhula, Shailesh Vaya, Aritra Dhar
2018 arXiv   pre-print
Recommendation systems form the center piece of a rapidly growing trillion dollar online advertisement industry. Even with numerous optimizations and approximations, collaborative filtering (CF) based approaches require real-time computations involving very large vectors. Curating and storing such related profile information vectors on web portals seriously breaches the user's privacy. Modifying such systems to achieve private recommendations further requires communication of long encrypted
more » ... long encrypted vectors, making the whole process inefficient. We present a more efficient recommendation system alternative, in which user profiles are maintained entirely on their device, and appropriate recommendations are fetched from web portals in an efficient privacy preserving manner. We base this approach on association rules.
arXiv:1710.03275v2 fatcat:3ncwkclgrffx5na7ihrkbqblxm

Symmetry Learning for Function Approximation in Reinforcement Learning [article]

Anuj Mahajan, Theja Tulabandhula
2017 arXiv   pre-print
In this paper we explore methods to exploit symmetries for ensuring sample efficiency in reinforcement learning (RL), this problem deserves ever increasing attention with the recent advances in the use of deep networks for complex RL tasks which require large amount of training data. We introduce a novel method to detect symmetries using reward trails observed during episodic experience and prove its completeness. We also provide a framework to incorporate the discovered symmetries for
more » ... etries for functional approximation. Finally we show that the use of potential based reward shaping is especially effective for our symmetry exploitation mechanism. Experiments on various classical problems show that our method improves the learning performance significantly by utilizing symmetry information.
arXiv:1706.02999v1 fatcat:f7vofrfqlrdubfasmdi67zo27q

Learning to Partition using Score Based Compatibilities [article]

Arun Rajkumar and Koyel Mukherjee and Theja Tulabandhula
2017 arXiv   pre-print
We study the problem of learning to partition users into groups, where one must learn the compatibilities between the users to achieve optimal groupings. We define four natural objectives that optimize for average and worst case compatibilities and propose new algorithms for adaptively learning optimal groupings. When we do not impose any structure on the compatibilities, we show that the group formation objectives considered are $NP$ hard to solve and we either give approximation guarantees or
more » ... ation guarantees or prove inapproximability results. We then introduce an elegant structure, namely that of \textit{intrinsic scores}, that makes many of these problems polynomial time solvable. We explicitly characterize the optimal groupings under this structure and show that the optimal solutions are related to \emph{homophilous} and \emph{heterophilous} partitions, well-studied in the psychology literature. For one of the four objectives, we show $NP$ hardness under the score structure and give a $\frac{1}{2}$ approximation algorithm for which no constant approximation was known thus far. Finally, under the score structure, we propose an online low sample complexity PAC algorithm for learning the optimal partition. We demonstrate the efficacy of the proposed algorithm on synthetic and real world datasets.
arXiv:1703.07807v1 fatcat:55e5mr7ddraonmbx32loavlm3q

Bandits with Temporal Stochastic Constraints [article]

Priyank Agrawal, Theja Tulabandhula
2020 arXiv   pre-print
We study the effect of impairment on stochastic multi-armed bandits and develop new ways to mitigate it. Impairment effect is the phenomena where an agent only accrues reward for an action if they have played it at least a few times in the recent past. It is practically motivated by repetition and recency effects in domains such as advertising (here consumer behavior may require repeat actions by advertisers) and vocational training (here actions are complex skills that can only be mastered
more » ... nly be mastered with repetition to get a payoff). Impairment can be naturally modelled as a temporal constraint on the strategy space, and we provide two novel algorithms that achieve sublinear regret, each working with different assumptions on the impairment effect. We introduce a new notion called bucketing in our algorithm design, and show how it can effectively address impairment as well as a broader class of temporal constraints. Our regret bounds explicitly capture the cost of impairment and show that it scales (sub-)linearly with the degree of impairment. Our work complements recent work on modeling delays and corruptions, and we provide experimental evidence supporting our claims.
arXiv:1811.09026v2 fatcat:nn75yzk5jzbhlh3tpmxxcjifve

Incentivising Exploration and Recommendations for Contextual Bandits with Payments [article]

Priyank Agrawal, Theja Tulabandhula
2020 arXiv   pre-print
We propose a contextual bandit based model to capture the learning and social welfare goals of a web platform in the presence of myopic users. By using payments to incentivize these agents to explore different items/recommendations, we show how the platform can learn the inherent attributes of items and achieve a sublinear regret while maximizing cumulative social welfare. We also calculate theoretical bounds on the cumulative costs of incentivization to the platform. Unlike previous works in
more » ... previous works in this domain, we consider contexts to be completely adversarial, and the behavior of the adversary is unknown to the platform. Our approach can improve various engagement metrics of users on e-commerce stores, recommendation engines and matching platforms.
arXiv:2001.07853v1 fatcat:7ed2qdnxrze2bmna7ezzijo25e

Learning by Repetition: Stochastic Multi-armed Bandits under Priming Effect [article]

Priyank Agrawal, Theja Tulabandhula
2020 arXiv   pre-print
Our model and solution is one among a growing literature Kveton et al. [2015] , den Boer and Keskin [2017] , Shah et al. [2018] , Tulabandhula and Wang [2020] that focuses on combining empirically  ...  Theja Tulabandhula and Yunjuan Wang. Thompson sam- pling for a fatigue-aware online recommendation sys- tem. In International Symposium on AI and Mathe- matics, 2020. Kuang Xu and Se-Young Yun.  ... 
arXiv:2006.10356v1 fatcat:6dxlhk2h65hfxdos4eqglj4b5e

On Combining Machine Learning with Decision Making [article]

Theja Tulabandhula, Cynthia Rudin
2014 arXiv   pre-print
The conference paper of (Tulabandhula et al., 2011) contains a summary of work on the ML&TRP, and the paper Tulabandhula and Rudin (2013) provides a more complete explanation of the MLOC framework,  ...  The new framework of Machine Learning with Operational Costs (MLOC) (Tulabandhula and Rudin, 2013) provides a mechanism to do this, and is a type of exploratory decision theory.  ... 
arXiv:1104.5061v2 fatcat:gzagwuq25rbpxj3sqsh7dnmzgm

Faster Reinforcement Learning Using Active Simulators [article]

Vikas Jain, Theja Tulabandhula
2017 arXiv   pre-print
In this work, we propose several online methods to build a \emph{learning curriculum} from a given set of target-task-specific training tasks in order to speed up reinforcement learning (RL). These methods can decrease the total training time needed by an RL agent compared to training on the target task from scratch. Unlike traditional transfer learning, we consider creating a sequence from several training tasks in order to provide the most benefit in terms of reducing the total time to train.
more » ... otal time to train. Our methods utilize the learning trajectory of the agent on the curriculum tasks seen so far to decide which tasks to train on next. An attractive feature of our methods is that they are weakly coupled to the choice of the RL algorithm as well as the transfer learning method. Further, when there is domain information available, our methods can incorporate such knowledge to further speed up the learning. We experimentally show that these methods can be used to obtain suitable learning curricula that speed up the overall training time on two different domains.
arXiv:1703.07853v2 fatcat:qg5mqb2sejgr3mj7s6s64hicay

Optimizing Revenue over Data-driven Assortments [article]

Deeksha Sinha, Theja Tulabandhula
2018 arXiv   pre-print
We revisit the problem of large-scale assortment optimization under the multinomial logit choice model without any assumptions on the structure of the feasible assortments. Scalable real-time assortment optimization has become essential in e-commerce operations due to the need for personalization and the availability of a large variety of items. While this can be done when there are simplistic assortment choices to be made, not imposing any constraints on the collection of feasible assortments
more » ... asible assortments gives more flexibility to incorporate insights of store-managers and historically well-performing assortments. We design fast and flexible algorithms based on variations of binary search that find the revenue of the (approximately) optimal assortment. We speed up the comparisons steps using novel vector space embeddings, based on advances in the information retrieval literature. For an arbitrary collection of assortments, our algorithms can find a solution in time that is sub-linear in the number of assortments and for the simpler case of cardinality constraints - linear in the number of items (existing methods are quadratic or worse). Empirical validations using the Billion Prices dataset and several retail transaction datasets show that our algorithms are competitive even when the number of items is $\sim 10^5$ ($100$x larger instances than previously studied).
arXiv:1708.05510v2 fatcat:wegsx7tcqve65o2huqtyvpyf4i

An Online Algorithm for Learning Buyer Behavior under Realistic Pricing Restrictions [article]

Debjyoti Saharoy, Theja Tulabandhula
2018 arXiv   pre-print
We propose a new efficient online algorithm to learn the parameters governing the purchasing behavior of a utility maximizing buyer, who responds to prices, in a repeated interaction setting. The key feature of our algorithm is that it can learn even non-linear buyer utility while working with arbitrary price constraints that the seller may impose. This overcomes a major shortcoming of previous approaches, which use unrealistic prices to learn these parameters making them unsuitable in practice.
arXiv:1803.01968v1 fatcat:g3mgyshkm5elrml5e5fl2o34i4
« Previous Showing results 1 — 15 out of 67 results