 Dynamic treatment regime

In medical research, a dynamic treatment regime (DTR) or adaptive treatment strategy is a set of rules for choosing effective treatments for individual patients. The treatment choices made for a particular patient are based on that individual's characteristics and history, with the goal of optimizing his or her longterm clinical outcome. A dynamic treatment regime is analogous to a policy in the field of reinforcement learning, and analogous to a controller in control theory. While most work on dynamic treatment regimes has been done in the context of medicine, the same ideas apply to timevarying policies in other fields, such as education, marketing, and economics.
Contents
History
Historically, medical research and the practice of medicine tended to rely on an acute care model for the treatment of all medical problems, including chronic illness (Wagner et al. 2001). More recently, the medical field has begun to look at long term care plans to treat patients with a chronic illness. This shift in ideology, coupled with increased demand for evidence based medicine and individualized care, has led to the application of sequential decision making research to medical problems and the formulation of dynamic treatment regimes.
Example
The figure below illustrates a hypothetical dynamic treatment regime for Attention Deficit Hyperactivity Disorder (ADHD). There are two decision points in this DTR. The initial treatment decision depends on the patient's baseline disease severity. The second treatment decision is a "responder/nonresponder" decision: At some time after receiving the first treatment, the patient is assessed for response, i.e. whether or not the initial treatment has been effective. If so, that treatment is continued. If not, the patient receives a different treatment. In this example, for those who did not respond to initial medication, the second "treatment" is a package of treatments—it is the initial treatment plus behavior modification therapy. "Treatments" can be defined as whatever interventions are appropriate, whether they take the form of medications or other therapies.
Optimal Dynamic Treatment Regimes
The decisions of a dynamic treatment regime are made in the service of producing favorable clinical outcomes in patients who follow it. To make this more precise, the following mathematical framework is used:
Mathematical Formulation
For a series of decision time points, , define A_{t} to be the treatment ("action") chosen at time point t, and define O_{t} to be all clinical observations made at time t, immediately prior to treatment A_{t}. A dynamic treatment regime, π = (π_{1},...,π_{T}) consists of a set of rules, one for each for each time point t, for choosing treatment A_{t} based clinical observations O_{t}. Thus π_{t}(o_{1},a_{1},...,o_{t},a_{t}), is a function of the past and current observations, (o_{1},...,o_{t}) and past treatments (a_{1},...,a_{t − 1}), which returns a choice of the current treatment, a_{t}.
Also observed at each time point is a measure of success called a reward R_{t}. The goal of a dynamic treatment regime is to make decisions that result in the largest possible expected sum of rewards, R = ∑ _{t} R _{t}. A dynamic treatment regime, π ^{*} is optimal if it satisfies
where E is an expectation over possible observations and rewards. The quantity E[R  treatments are chosen according to π] is often referred to as the value of π.
In the example above, the possible first treatments for A_{1} are "LowDose Bmod" and "LowDose Medication". The possible second treatments for A_{2} are "Increase Bmod Dose", "Continue Treatment", and "Augment w/Bmod". The observations O_{1} and O_{2} are the labels on the arrows: The possible O_{1} are "Less Severe" and "More Severe", and the possible O_{2} are "NonResponse" and "Response". The rewards R_{1},R_{2} are not shown; one reasonable possibility for reward would be to set R_{1} = 0 and set R_{2} to a measure of classroom performance after a fixed amount of time.
Delayed Effects
To find an optimal dynamic treatment regime, it might seem reasonable to find the optimal treatment that maximizes the immediate reward at each time point and then patch these treatment steps together to create a dynamic treatment regime. However, this approach is shortsighted and can result in an inferior dynamic treatment regime, because it ignores the potential for the current treatment action to influence the reward obtained at more distant time points.
For example a treatment may be desirable as a first treatment even if it does not achieve a high immediate reward. For example, when treating some kinds of cancer, a particular medication may not result in the best immediate reward (best acute effect) among initial treatments. However, this medication may impose sufficiently low side effects so that some nonresponders are able to become responders with further treatment. Similarly a treatment that is less effective acutely may lead to better overall rewards, if it encourages/enables nonresponders to adhere more closely to subsequent treatments.
Estimating Optimal Dynamic Treatment Regimes
Dynamic treatment regimes can be developed in the framework of evidencebased medicine, where clinical decision making is informed by data on how patients respond to different treatments. The data used to find optimal dynamic treatment regimes consist of the sequence of observations and treatments (o_{1},a_{1},o_{2},a_{2},..,o_{T},a_{T})_{i} for multiple patients i = 1,...,n along with those patients' rewards R_{t}. A central difficulty is that intermediate outcomes both depend on previous treatments and determine subsequent treatment. However, if treatment assignment is independent of potential outcomes conditional on past observations—i.e., treatment is sequentially unconfounded—a number of algorithms exist to estimate the causal effect of timevarying treatments or dynamic treatment regimes.
While this type of data can be obtained through careful observation, it is often preferable to collect data through experimentation if possible. The use of experimental data, where treatments have been randomly assigned, is preferred because it helps eliminate bias caused by unobserved confounding variables that influence both the choice of the treatment and the clinical outcome. This is especially important when dealing with sequential treatments, since these biases can compound over time. Given an experimental data set, an optimal dynamic treatment regime can be estimated from the data using a number of different algorithms. Inference can also be done to determine whether the estimated optimal dynamic treatment regime results in significant improvements in expected reward over an alternative dynamic treatment regime.
Experimental design
Experimental designs of clinical trials that generate data for estimating optimal dynamic treatment regimes involve an initial randomization of patients to treatments, followed by rerandomizations at each subsequent time point to another treatment. The rerandomizations at each subsequent time point my depend on information collected after previous treatments, but prior to assigning the new treatment, such as how successful the previous treatment was. These types of trials were introduced and developed in Lavori & Dawson (2000), Lavori (2003) and Murphy (2005) and are often referred to as SMART trials (Sequential Multiple Assignment Randomized Trial). Some examples of SMART trials are the CATIE trial for treatment of Alzheimer's (Schneider et al. 2001) and the STAR*D trial for treatment of major depressive disorder (Lavori et al. 2001, Rush, Trivedi & Fava 2003).
SMART trials attempt to mimic the decisionmaking that occurs in clinical practice, but still retain the advantages of experimentation over observation. They can be more involved than singlestage randomized trials; however, they produce the data trajectories necessary for estimating optimal policies that take delayed effects into account. Several suggestions have been made to attempt to reduce complexity and resources needed. One can combine data over same treatment sequences within different treatment regimes. One may also wish to split up a large trial into screening, refining, and confirmatory trials (Collins et al. 2005). One can also use fractional factorial designs rather than a full factorial design (Nair et al. 2008), or target primary analyses to simple regime comparisons (Murphy 2005).
Reward construction
A critical part of finding the best dynamic treatment regime is the construction of a meaningful and comprehensive reward variable, R_{t}. To construct a useful reward, the goals of the treatment need to be well defined and quantifiable. The goals of the treatment can include multiple aspects of a patient's health and welfare, such as degree of symptoms, severity of side effects, time until treatment response, quality of life and cost. However, quantifying the various aspects of a successful treatment with a single function can be difficult, and work on providing useful decision making support that analyzes multiple outcomes is ongoing (Lizotte 2010). Ideally, the outcome variable should reflect how successful the treatment regime was in achieving the overall goals for each patient.
Variable selection and feature construction
Analysis is often improved by the collection of any variables that might be related to the illness or the treatment. This is especially important when data is collected by observation, to avoid bias in the analysis due to unmeasured confounders. Subsequently more observation variables are collected than are actually needed to estimate optimal dynamic treatment regimes. Thus variable selection is often required as a preprocessing step on the data before algorithms used to find the best dynamic treatment regime are employed.
Algorithms and Inference
Several algorithms exist for estimating optimal dynamic treatment regimes from data. Many of these algorithms were developed in the field of computer science to help robots and computers make optimal decisions in an interactive environment. These types of algorithms are often referred to as reinforcement learning methods (Sutton & Barto 1998) . The most popular of these methods used to estimate dynamic treatment regimes is called qlearning (Watkins 1989). In qlearning models are fit sequentially to estimate the value of the treatment regime used to collect the data and then the models are optimized with respect to the treatmens to find the best dynamic treatment regime. Many variations of this algorithm exist including modeling only portions of the Value of the treatment regime (Murphy 2003, Robins 2004). Using modelbased Bayesian methods, the optimal treatment regime can also be calculated directly from posterior predictive inferences on the effect of dynamic policies (Zajonc 2010).
See also
References
 Banerjee, A.; Tsiatis, A. A. (2006), "Adaptive twostage designs in phase II clinical trials", Statistics in Medicine 25 (19): 3382–3395
 Collins, L. M.; Murphy, S. A.; Nair, V.; Strecher, V. (2005), "A strategy for optimizing and evaluating behavioral interventions", Annals of Behavioral Medicine 30: 65–73
 Guo, X.; Tsiatis, A. A. (2005), "Estimation of survival distributions in twostage randomization designs with censored data", International Journal of Biostatistics 1 (1)
 Hernán, Miguel A.; Lanoy, Emilie; Costagliola, Dominique; Robins, James M. (2006), "Comparison of Dynamic Treatment Regimes via Inverse Probability Weighting", Basic & Clinical Pharmacology & Toxicology 98: 237–242
 Lavori, P. W.; Dawson, R. (2000), "A design for testing clinical strategies: biased adaptive withinsubject randomization", Journal of the Royal Statistical Society, Series A 163: 29–38
 Lavori, P.W.; Rush, A.J.; Wisniewski,, S.R.; Alpert, J.; Fava, M.; Kupfer, D.J.; Nierenberg, A.; Quitkin, F.M. et al. (2001), "Strengthening clinical effectiveness trials: Equipoisestratified randomization", Biological Psychiatry 50: 792–801
 Lavori, P. W. (2003), "Dynamic treatment regimes: practical design considerations", Clinical Trials 1: 9–20
 Lizotte, D. L.; Bowling, M.; Murphy, S. A. (2010), "Efficient Reinforcement Learning with Multiple Reward Functions for Randomized Clinical Trial Analysis", TwentySeventh Annual International Conference on Machine Learning
 Lokhnygina, Y; Tsiatis, A. A. (2008), "Optimal twostage group sequential designs", Journal of Statistical Planning and Inference 138: 489–499
 Lunceford, J. K.; Davidian, M.; Tsiatis, A. A. (2002), "Estimation of survival distributions of treatment policies in twostage randomization designs in clinical trials", Biometrics 58: 48–57
 Moodie, E. E. M.; Richardson, T. S.; Stephens, D. A. (2007), "Demystifying optimal dynamic treatment regimes", Biometrics 63: 447–455
 Murphy, Susan A.; van der Laan, M. J.; Robins; CPPRG; (2001), "Marginal Mean Models for Dynamic Regimes", Journal of the American Statistical Association 96: 1410–1423
 Murphy, Susan A. (2003), "Optimal Dynamic Treatment Regimes", Journal of the Royal Statistical Society, Series B 65 (2): 331–366
 Murphy, Susan A. (2005), "An Experimental Design for the Development of Adaptive Treatment Strategies", Statistics in Medicine 24: 1455–1481
 Murphy, Susan A.; Daniel Almiral; (2008), "Dynamic Treatment Regimes", Encyclopedia of Medical Decision Making: #–#
 Nair, V.; Strecher, V.; Fagerlin, A.; Ubel, P.; Resnicow, K.; Murphy, S.; Little, R.; Chakraborty, B. et al. (2008), "Screening Experiments and Fractional Factorial Designs in Behavioral Intervention Research", The American Journal of Public Health 98 (8): 1534–1539
 Orellana, Liliana; Rotnitzky, Andrea; Robins, James M. (2010), "Dynamic Regime Marginal Structural Mean Models for Estimation of Optimal Dynamic Treatment Regimes, Part I: Main Content", The International Journal of Biostatistics 6 (2), http://www.bepress.com/ijb/vol6/iss2/8
 Orellana, Liliana; Rotnitzky, Andrea; Robins, James M. (2010), "Dynamic Regime Marginal Structural Mean Models for Estimation of Optimal Dynamic Treatment Regimes, Part II: Proofs of Results", The International Journal of Biostatistics 6 (2), http://www.bepress.com/ijb/vol6/iss2/9
 Robins, James M. (2004), "Optimal structural nested models for optimal sequential decisions", in Lin, D. Y.; Heagerty, P. J., Proceedings of the Second Seattle Symposium on Biostatistics, Springer, New York, pp. 189–326
 Robins, James M. (1986), "A new approach to causal inference in mortality studies with sustained exposure periodsapplication to control of the healthy worker survivor effect", Computers and Mathematics with Applications 14: 1393–1512
 Robins, James M. (1987), "Addendum to 'A new approach to causal inference in mortality studies with sustained exposure periodsapplication to control of the healthy worker survivor effect'", Computers and Mathematics with Applications 14: 923–945
 Rush, A.J.; Trivedi, M.; Fava (2003), "Depression IV: STAR*D treatment trial for depression", American Journal of Psychiatry 160 (2): 237
 Schneider, L.S.; Tariot, P.N.; Lyketsos, C.G.; Dagerman, K.S.; Davis, K.L.; Davis, S.; Hsiao, J.K.; Jeste, D.V. et al. (2001), "National Institute of Mental Health clinical antipsychotic trials of intervention effectiveness (CATIE) Alzheimer disease trial methodology", American Journal of Geriatric Psychiatry 9 (4): 346–360
 Sutton, R. S.; Barto, A. G. (1998), Reinforcement Learning: An Introduction, MIT Press, ISBN 0262193981, http://www.cs.ualberta.ca/~sutton/book/ebook/thebook.html
 van der Laan, M. J.; Robins, James M. (2003), Unified Methods for Censored Longitudinal Data and Causality, SpringerVerlag, ISBN 0387955569
 van der Laan, M. J.; Petersen, M. L. (2004), HistoryAdjusted Marginal Structural Models and StaticallyOptimal Dynamic Treatment Regimes, http://works.bepress.com/mark_van_der_laan/83
 Wagner, E. H.; Austin, B. T.; Davis, C.; Hindmarsh, M.; Schaefer, J.; Bonomi, A. (2001), "Improving Chronic Illness Care: Translating Evidence Into Action", Health Affairs 20 (6): 64–78
 Wahed, A.. S.; Tsiatis, A. A. (2004), "Optimal estimator for the survival distribution and related quantities for treatment policies in twostage randomization designs in clinical trials", Biometrics 60: 124–133
 Watkins, C. J. C. H. (1989), "Learning from Delayed Rewards", PhD thesis, Cambridge University, Cambridge, England
 Zhao, Y.; Kosorok, M. R.; Zeng, D. (2009), "Reinforcement learning design for cancer clinical trials", Statistics in Medicine 28: 3294–3315
Categories:
Wikimedia Foundation. 2010.
Look at other dictionaries:
Status dynamic psychotherapy — [1] (“SDT”) is an approach to psychotherapy that was created by Peter G. Ossorio at the University of Colorado in the late 1960s as part of a larger system known as Descriptive Psychology, and that has subsequently been developed by other… … Wikipedia
Relational order theories — A number of independent lines of research depict the universe, including the social organization of living creatures which is of particular interest to humans, as systems, or networks, of relationships. Relational systems, or regimes, are often… … Wikipedia
Descriptive psychology — Psychology … Wikipedia
Antimalarial drug — Antimalarial drugs are designed to prevent or cure malaria. Some antimalarial agents, particularly chloroquine and hydroxychloroquine, are also used in the treatment of rheumatoid arthritis and lupus associated arthritis. There are many of these… … Wikipedia
O'Donnabhain v. Commissioner — 134 T.C. No. 4 was a case recently before the United States Tax Court. The issue for the court is whether a taxpayer who has been diagnosed with gender identity disorder can deduct sex reassignment surgery costs as necessary medical expenses… … Wikipedia
EVI1 — Ecotropic viral integration site 1 is a human protein encoded by the EVI1 gene. EVI1 was first identified as a common retroviral integration site in AKXD murine myeloid tumors. It has since been identified in a plethora of other organisms, and… … Wikipedia
Europe, history of — Introduction history of European peoples and cultures from prehistoric times to the present. Europe is a more ambiguous term than most geographic expressions. Its etymology is doubtful, as is the physical extent of the area it designates.… … Universalium
UNITED STATES OF AMERICA — UNITED STATES OF AMERICA, country in N. America. This article is arranged according to the following outline: introduction Colonial Era, 1654–1776 Early National Period, 1776–1820 German Jewish Period, 1820–1880 East European Jewish Period,… … Encyclopedia of Judaism
United States — a republic in the N Western Hemisphere comprising 48 conterminous states, the District of Columbia, and Alaska in North America, and Hawaii in the N Pacific. 267,954,767; conterminous United States, 3,022,387 sq. mi. (7,827,982 sq. km); with… … Universalium
china — /chuy neuh/, n. 1. a translucent ceramic material, biscuit fired at a high temperature, its glaze fired at a low temperature. 2. any porcelain ware. 3. plates, cups, saucers, etc., collectively. 4. figurines made of porcelain or ceramic material … Universalium