Safe Reinforcement Learning in mHealth

Eric Laber (Duke University)



An optimal mHealth strategy for type I diabetes (T1D) maximizes longterm patient health by tailoring recommendations for diet, exercise, and insulin to the unique biology and evolving health status of each patient. We develop a response-adaptive randomization method that learns an optimal intervention strategy while controlling the risk of adverse events. The method, which uses a variant of Thompson Sampling (TS) to facilitate learning, maximizes efficiency while providing strict controls on the probability of an adverse event. We illustrate the application of NP-TS using data from a pilot mHealth study on T1D.


Back to Day 2