$\mathbf{Q}$- and $\mathbf{A}$-Learning Methods for Estimating Optimal Dynamic Treatment Regimes

Phillip J. Schulte, Anastasios A. Tsiatis, Eric B. Laber, Marie Davidian
2014 Statistical Science  
In clinical practice, physicians make a series of treatment decisions over the course of a patient's disease based on his/her baseline and evolving characteristics. A dynamic treatment regime is a set of sequential decision rules that operationalizes this process. Each rule corresponds to a decision point and dictates the next treatment action based on the accrued information. Using existing data, a key goal is estimating the optimal regime, that, if followed by the patient population, would
more » ... ld the most favorable outcome on average. Q- and A-learning are two main approaches for this purpose. We provide a detailed account of these methods, study their performance, and illustrate them using data from a depression study.
doi:10.1214/13-sts450 pmid:25620840 pmcid:PMC4300556 fatcat:wbofrw46qrb7jeavcv742l4lze