The PAILA project, undertaken during our InstaDeep internship, aims to bolster single-environment Reinforcement Learning (RL) algorithms through cross-environment knowledge sharing. To achieve this, we aimed to use symmetric learning agents (SymLA), a meta-reinforcement learning algorithm introducing backpropagation symmetries that improves the generalisation capabilities of black-box meta-learning methods. Our effort to improve SymLA’s efficacy included introducing additional architectural layers and state bootstrapping. However, after observing that our agent overfitted in training environments, we started focusing on discovering RL algorithms in the mirror learning framework — a framework with theoretical guarantees for RL algorithms like Proximal Policy Optimization (PPO).
Batsi Ziki and Jaren Cohen have recently completed their honours degrees at the University of Cape Town and successfully concluded a six-month internship at InstaDeep, where their primary focus was on Meta-Reinforcement Learning. Both of them are now pursuing postgraduate studies.
23 August 2023