Abstract

Offline Reinforcement Learning (RL) has emerged as a critical paradigm for real-world AI deployment, particularly in domains where active data collection is either prohibitively expensive or physically dangerous. By enabling agents to learn optimal policies from fixed, pre-collected datasets, Offline RL bypasses the need for risky online exploration; however, this shift introduces the significant challenge of accumulating extrapolation error on out-of-distribution (OOD) actions. This talk will explore the fundamental principles of Offline RL and the algorithmic strategies used to mitigate distributional shift, ultimately extending these concepts to the multi-agent domain through the lens of my PhD research regarding Offline MARL.

Bio

Claude Formanek completed his undergraduate studies in Mathematics and Computer Science at the University of Cape Town (UCT), where he continued his academic journey to an MSc and eventually PhD. Under the supervision of Jonathan Shock, his doctoral research focused on Multi-Agent Reinforcement Learning (MARL), specifically exploring the challenges of learning from static datasets. During his PhD, Claude published two papers on Offline MARL at NeurIPS and officially submitted his thesis for examination in January 2026. In addition to his academic roots at UCT, he brings years of industry experience as a Machine Learning Research Engineer at InstaDeep.