Statistical Inference for Contextual Bandit with Delayed Action Effects
In many real-world situations, the consequences of actions are not immediately apparent and may only become observable after a certain time has elapsed. For instance, in healthcare settings where medications are administered to patients, the drug’s effects are not instantly evident. Often, the current reward or outcome is connected to past actions. Although some existing literature has addressed this temporal aspect by considering time-discounted effects or delayed consequences, the role of statistical inference becomes crucial when it comes to sensitive applications like medication administration. Extra caution is required before making any recommendations regarding drug intake, highlighting the need for rigorous statistical evaluation. The application-driven motivation for this line of research can be rooted in the study of medication effects for diabetes, where the historical medical treatment has prolonged effects on the current patients’ well-being. A significant challenge in this context is the violation of the Markov assumption, due to the extended impact of treatment actions. Within the scope of this study, my objective is to develop a methodology that transforms conventional problem settings, predominantly centered around off-policy evaluation, into Markov Decision Processes. This transformation will facilitate the application of established RL algorithms to this domain.