3

Collaboration:
- This is a collaborative work with Will Wei Sun and Yichen Zhang. Under first round revision for Annals of Statistics (AOS).
Key Contributions:
- Introduced a novel methodology enabling online estimation and debiasing for low-rank estimators under a contextual bandit framework with a wide range of decision-making policies.
- Developed new error bound on sequential low-rank SGD estimation with adaptively collected data.
- Constructed valid confidence interval for both low-rank parameters and optimal value with data collected via a wide range of bandit policies.

Qiyu Han, Will Wei Sun, Yichen Zhang

Develop frameworks to incorporate human feedback into reinforcement learning to refine reward struc- tures, ensuring agent behaviors align with domain-specific quality and safety standards.
Advance research on the critical role of human feedback in establishing action constraints for safety and fairness, aiming to improve RL algorithm adaptability in real-world applications.

Collaboration:
- This is a collaborative work with Pratik Ramprasad, Zhengling Qi, and Will Wei Sun. To be submitted to JASA (Journal of the American Statistical Association), manuscript available upon request.
Key Contributions:
- Developed a batch off-policy evaluation method to capture the low-rank structure in the tensor param- eter for the state-action value Q function.
- Developed novel concentration inequalities for the suprema of certain matrix-valued empirical processes under Markovian noise.

Pratik Ramprasad, Qiyu Han, Zhengling Qi, Will Wei Sun

Investigate the delayed effects of actions in critical settings, such as healthcare, where the impact of medications emerges over time and is linked to past decisions.
Conduct statistical inference on delayed feedback in contextual bandit frameworks, aiming to rigorously evaluate drug effects and optimize decision-making in healthcare applications.