3 | Academic

Online Statistical Inference for Matrix Contextual Bandit.

Wed, 21 Dec 2022 00:00:00 +0000

Click the link below to view our work published in arxiv.

Active Learning in RL with Human Feedback

Tue, 21 Dec 2021 00:00:00 +0000

In the realm of RL, the incorporation of human feedback serves as a framework for enhancing the agent’s learning process through domain-specific insights. While conventional RL algorithms strive to maximize reward by interacting with their environment, defining what precisely constitutes a “reward” is often a complex task. Such definitions demand relevant domain knowledge to appropriately specify “good” and “bad” agent behaviors. Moreover, human feedback often plays a critical role in establishing action constraints to ensure safety and fairness. Research in this domain shows significant promise for enhancing the adaptability of RL algorithms across a wide range of real-world applications. However, there is still a lack of comprehensive scientific investigations in this area. As such, one of my forthcoming research objectives is to contribute substantively to the development of this field.

Off-Policy Evaluation For Low-Rank Tensor Markov Decision Processes.

Mon, 01 Jan 0001 00:00:00 +0000

Statistical Inference for Contextual Bandit with Delayed Action Effects

Mon, 01 Jan 0001 00:00:00 +0000

In many real-world situations, the consequences of actions are not immediately apparent and may only become observable after a certain time has elapsed. For instance, in healthcare settings where medications are administered to patients, the drug’s effects are not instantly evident. Often, the current reward or outcome is connected to past actions. Although some existing literature has addressed this temporal aspect by considering time-discounted effects or delayed consequences, the role of statistical inference becomes crucial when it comes to sensitive applications like medication administration. Extra caution is required before making any recommendations regarding drug intake, highlighting the need for rigorous statistical evaluation. The application-driven motivation for this line of research can be rooted in the study of medication effects for diabetes, where the historical medical treatment has prolonged effects on the current patients’ well-being. A significant challenge in this context is the violation of the Markov assumption, due to the extended impact of treatment actions. Within the scope of this study, my objective is to develop a methodology that transforms conventional problem settings, predominantly centered around off-policy evaluation, into Markov Decision Processes. This transformation will facilitate the application of established RL algorithms to this domain.