528 Hits in 6.4 sec

Optimal Sequential Exploration: Bandits, Clairvoyants, and Wildcats

David B. Brown, James E. Smith
2013 Operations Research  
The analysis relies heavily on results for bandit superprocesses, a generalization of the multiarmed bandit problem.  ...  This paper was motivated by the problem of developing an optimal policy for exploring an oil and gas field in the North Sea. Where should we drill first? Where do we drill next?  ...  Acknowledgments The authors are grateful to Jo Eidsvik and Gabriele Martinelli for sparking interest in this problem, for sharing their model for the North Sea example, and for many helpful conversations  ... 
doi:10.1287/opre.2013.1164 fatcat:d2yhzb2vkfeeba5u7uhhbleuda

Machine Learning Techniques for Stackelberg Security Games: a Survey [article]

Giuseppe De Nittis, Francesco Trovò
2016 arXiv   pre-print
, then describes how to face the problem of having attacker's payoffs not defined and how to estimate them and, finally, presents how online learning techniques have been exploited to learn a model of  ...  After a brief introduction on Stackelberg Security Games (SSGs) and the poaching setting, the rest of the work presents how to model a boundedly rational attacker taking into account her human behavior  ...  Whittle proposed a heuristic index policy for RMABs by considering the Lagrangian relaxation of the problem [24] .  ... 
arXiv:1609.09341v1 fatcat:mglaovlwvvevxcbk7waal72aoe

Long range dependency and forecasting of housing price index and mortgage market rate: evidence of subprime crisis

Nadhem Selmi, Nejib Hachicha
2015 Management Science Letters  
Table 3 presents the results of the Local Whittle and Local Polynomial Whittle of GARMA (p,d,q) estimators.  ...  In this work, since we proposed a new non stationary process jointly with a reliable and robust wavelet-based estimation technique, it is necessary to give out a procedure of the forecast for this novel  ... 
doi:10.5267/j.msl.2015.3.012 fatcat:j4gqnmjbnzhfviwlotudcbaro4

Spectral Subsampling MCMC for Stationary Time Series [article]

Robert Salomone, Matias Quiroz, Robert Kohn, Mattias Villani, Minh-Ngoc Tran
2020 arXiv   pre-print
We propose a novel technique for speeding up MCMC for time series data by efficient data subsampling in the frequency domain.  ...  For several challenging time series models, we demonstrate a speedup of up to two orders of magnitude while incurring negligible bias compared to MCMC on the full dataset.  ...  Quiroz et al. (2019a) propose speeding up MCMC for large n by replacing L n (θ) with an estimate L(θ, u) based on a small random subsample of m n observations, where u = (u 1 , ..., u m ) indexes the  ... 
arXiv:1910.13627v2 fatcat:k7npozn55jcmzg3grado7qm6ju

NEO: NEuro-Inspired Optimization—A Fractional Time Series Approach

Sarthak Chatterjee, Subhro Das, Sérgio Pequito
2021 Frontiers in Physiology  
We provide evidence of the efficacy of the proposed method on a wide variety of settings implicitly found in practice.  ...  Solving optimization problems is a recurrent theme across different fields, including large-scale machine learning systems and deep learning.  ...  AUTHOR CONTRIBUTIONS SC, SD, and SP performed the research. SC was responsible for the execution of the numerical experiments and wrote the manuscript with revisions by SD and SP.  ... 
doi:10.3389/fphys.2021.724044 pmid:34621183 pmcid:PMC8491743 fatcat:nsvpjljvhrezbkj6ngkqopxybe

Sequential Decision Making with Limited Observation Capability: Application to Wireless Networks [article]

Kesav Kaza, Rahul Meshram, Varun Mehta, S.N.Merchant
2019 arXiv   pre-print
The Whittle-index policy for solving LRB problem is analyzed; indexability of LRBs is shown.  ...  Further, closed-form index expressions are provided for two sets of special cases; for more general cases, an algorithm for index computation is provided.  ...  The index formula for each interval is given as follow. 1) For π ∈ A 1 , the Whittle-index W (π) = ρ(π). 2) For π ∈ A 2 , we consider following cases. a) if γ 0 (p 1,0 ) ≥ π, then, the Whittle-index is  ... 
arXiv:1801.01301v2 fatcat:f3dl4tl2gfh4bbmrp54eprvbru

Customer Acquisition via Display Advertising Using Multi-Armed Bandit Experiments

Eric M. Schwartz, Eric T. Bradlow, Peter S. Fader
2017 Marketing science (Providence, R.I.)  
Finally, we show that customer acquisition would decrease about 10% if the firm were to optimize click through rates instead of conversion directly, a finding that has implications for understanding the  ...  Within a campaign, firms try to adapt to intermediate results of their tests, optimizing what they earn while learning about their ads.  ...  After a model update at time t, we utilize the uncertainty around parameters β j to obtain the key distribution for our implementation of TS, the joint predictive distribution of ad conversion rates for  ... 
doi:10.1287/mksc.2016.1023 fatcat:oxkaeege6zeo3oopaxf7bvb5oy

Deep Reinforcement Learning for Simultaneous Sensing and Channel Access in Cognitive Networks [article]

Yoel Bokobza, Ron Dabora, Kobi Cohen
2021 arXiv   pre-print
To achieve this goal, we develop a novel algorithm that learns both access and sensing policies via deep Q-learning, dubbed Double Deep Q-network for Sensing and Access (DDQSA).  ...  To the best of our knowledge, this is the first paper that solves both sensing and access policies for DSA via deep Q-learning.  ...  In this work, we develop a novel algorithm for a single agent that learns both access and sensing policies via deep Q-learning, dubbed Double Deep Q-network for Sensing and Access (DDQSA).  ... 
arXiv:2110.14541v1 fatcat:3p5ntbzatrdexoyut73lorljxe

Scheduling in Time-correlated Wireless Networks with Imperfect CSI and Stringent Constraint [article]

Wenzhuo Ouyang, Atilla Eryilmaz, Ness B. Shroff
2014 arXiv   pre-print
In this work, we incorporate a stringent constraint on the simultaneously scheduled users and propose a low-complexity scheduling algorithm that dynamically implements user scheduling and dummy packet  ...  In recent work, a low-complexity optimal solution was developed for this problem under a long-term time-average resource constraint.  ...  The RMBP is Whittle indexable if every project is Whittle indexable.  ... 
arXiv:1403.7773v1 fatcat:3khj2fvirjbazfk2d6s5acd3f4

Bayesian Structure Learning for Stationary Time Series [article]

Alex Tank, Nicholas Foti, Emily Fox
2015 arXiv   pre-print
We leverage a Whittle likelihood approximation and define a conjugate prior---the hyper complex inverse Wishart---on the complex-valued and graph-constrained spectral matrices.  ...  We take a Bayesian approach to structure learning, placing priors on (i) the graph structure and (ii) spectral matrices given the graph.  ...  Motivated by the connection between GGMs and our TGMs, and the analogous structure of our TGM-based Whittle likelihood of Eq. (10) to that of a GGM with N i.i.d. observations, we propose a novel hyper  ... 
arXiv:1505.03131v2 fatcat:654al3dctrdohjqjzaqzaued4m

Nonparametric collective spectral density estimation with an application to clustering the brain signals [article]

Mehdi Maadooliat, Ying Sun, Tianbo Chen
2017 arXiv   pre-print
In this paper, we develop a method for the simultaneous estimation of spectral density functions (SDFs) for a collection of stationary time series that share some common features.  ...  A web-based shiny App found at "" is developed for visualization, training and learning the SDFs collectively using the proposed technique.  ...  Conclusion A novel approach for collectively estimating multiple SDFs was developed in this paper.  ... 
arXiv:1704.03907v3 fatcat:s67vg2o2ofgzhetqlfzecz4noq

Cooperative Multi-Agent Reinforcement Learning Based Distributed Dynamic Spectrum Access in Cognitive Radio Networks [article]

Xiang Tan, Li Zhou, Haijun Wang, Yuli Sun, Haitao Zhao, Boon-Chong Seet, Jibo Wei, Victor C.M. Leung
2021 arXiv   pre-print
We employ the deep recurrent Q-network (DRQN) to address the partial observability of the state for each cognitive user.  ...  The ultimate goal is to learn a cooperative strategy which maximizes the sum throughput of cognitive radio network in distributed fashion without coordination information exchange between cognitive users  ...  The algorithm is model-based and in fact it is a single-agent Q-Learning framework implemented independently on SBSs.  ... 
arXiv:2106.09274v1 fatcat:cb5767uktrespkeoqmsvh2bxpq

Robust Restless Bandits: Tackling Interval Uncertainty with Deep Reinforcement Learning [article]

Jackson A. Killian, Lily Xu, Arpita Biswas, Milind Tambe
2021 arXiv   pre-print
novel deep reinforcement learning algorithm for solving RMABs.  ...  To address this, we formulate the adversary oracle as a multi-agent reinforcement learning problem and solve it with a multi-agent extension of RMABPPO, which may be of independent interest as the first  ...  Acknowledgments and Disclosure of Funding  ... 
arXiv:2107.01689v1 fatcat:zkenanwvcvdj3a2xiljx6s7gy4

Dynamic Assortment with Demand Learning for Seasonal Consumer Goods

Felipe Caro, Jérémie Gallien
2007 Management science  
Focusing on a stylized version of this problem, we study a finite horizon multiarmed bandit model with several plays per stage and Bayesian learning.  ...  It yields a closed-form dynamic index policy capturing the key exploration versus exploitation trade-off and associated suboptimality bounds.  ...  Finally, the second author is indebted to Martha Nieto for a conversation about Zara that sparked his interest in fast-fashion companies and was key to the genesis of this project.  ... 
doi:10.1287/mnsc.1060.0613 fatcat:s2zxaxjm6zgmzhhind4snnyyci

Dynamic spectrum access with deep Q-learning in densely occupied and partially observable environments

Slavica Tomović, Igor Radusinović
2021 Telfor Journal  
We have developed a novel Deep Reinforcement Learning (DRL) based DSA method which combines a double deep Q-learning architecture with a recurrent neural network and takes advantage of a prioritized experience  ...  Compared with other DRL methods for DSA, the proposed solution can find a near-optimal policy in a smaller number of iterations and suits a wider range of communication environments, including dynamic  ...  However, when the channels are correlated, the Myopic and Whittle Index approaches cannot be applied.  ... 
doi:10.5937/telfor2101001t fatcat:alp7qmzn65frfoc4w6hhgrtin4
« Previous Showing results 1 — 15 out of 528 results