In Reinforcement Learning there is often a need for greater sample efficiency when learning an optimal policy, whether due to the complexity of the problem or the difficulty in obtaining data. One family of approaches to tackling this problem is Assisted Reinforcement Learning, in which external information is transferred to the agent, for example in the form of advice offered by a domain expert. But these approaches often break down when advice is offered by multiple experts, who can often contradict each other. In general, experts (especially humans) can give incorrect advice. Our work investigates how an RL agent can benefit from good advice it receives while being robust to bad advice, and how it can exploit consensus and contradiction among a panel of experts to maximise information gained.
Tamlin Love is a lecturer at the University of the Witwatersrand, where he received a master’s degree in Computer Science. He is an active member of the RAIL lab, and is interested in research in Reinforcement Learning and Robotics. His particular interest lies in Assisted Reinforcement Learning, looking at the problem of human-agent interaction, how agents can better understand humans and how humans can adapt to better teach and interact with agents.
19 October 2022