In the line of research within tensor learning, I focus on off-policy evaluation through the introduction of a tensor MDP framework. This framework is particularly capable of capturing the dynamics of sequential decision-making processes when the state-action features are represented as tensors. Using tensor features in their original form as inputs—for instance, in neuro-imaging—preserves critical spatial information that could be diminished or lost when the data is simplistically converted into vector covariates for application in traditional modeling approaches. When the Q function can be approximated using a tensor parameter with a low-rank structure, we develop a method for estimating this low-rank tensor within the evolution of sequential decision-making processes}. Theoretical guarantees are established for our proposed estimation algorithm, laying the foundation for the pioneering integration of tensor methodologies into the RL setting.