Active Learning in RL with Human Feedback

In the realm of RL, the incorporation of human feedback serves as a framework for enhancing the agent’s learning process through domain-specific insights. While conventional RL algorithms strive to maximize reward by interacting with their environment, defining what precisely constitutes a “reward” is often a complex task. Such definitions demand relevant domain knowledge to appropriately specify “good” and “bad” agent behaviors. Moreover, human feedback often plays a critical role in establishing action constraints to ensure safety and fairness. Research in this domain shows significant promise for enhancing the adaptability of RL algorithms across a wide range of real-world applications. However, there is still a lack of comprehensive scientific investigations in this area. As such, one of my forthcoming research objectives is to contribute substantively to the development of this field.

Active Learning in RL with Human Feedback

Qiyu Han

Ph.D Candidate in Quantitative Method