Off-Policy Evaluation For Low-Rank Tensor Markov Decision Processes.

Abstract

In the line of research within tensor learning, I focus on off-policy evaluation through the introduction of a tensor MDP framework. This framework is particularly capable of capturing the dynamics of sequential decision-making processes when the state-action features are represented as tensors. Using tensor features in their original form as inputs—for instance, in neuro-imaging—preserves critical spatial information that could be diminished or lost when the data is simplistically converted into vector covariates for application in traditional modeling approaches. When the Q function can be approximated using a tensor parameter with a low-rank structure, we develop a method for estimating this low-rank tensor within the evolution of sequential decision-making processes}. Theoretical guarantees are established for our proposed estimation algorithm, laying the foundation for the pioneering integration of tensor methodologies into the RL setting.

Qiyu Han
Qiyu Han
Ph.D Candidate in Quantitative Method

My research interests include distributed robotics, mobile computing and programmable matter.