
Abstract
What does it take to learn purely from the consequences of your actions — no labels, no corrections, just a signal telling you how well you did? Reinforcement learning (RL) answers that question, and it turns out the solutions are elegant, surprising, and occasionally alarming. This talk builds the core ideas from the ground up: what makes RL distinct from supervised learning, how to formalise the problem, and how an agent can learn good behaviour through trial and error. We use Q-learning as our worked example, building intuition before formalism. We close by examining where RL has genuinely succeeded — game-playing, robotics, training large language models — where it fails in instructive ways, and what the remaining talks in this series do about it. No prior RL knowledge assumed.
Bio
St John Grimbly is a PhD candidate in Applied Mathematics at the University of Cape Town, where he works on active inference — a Bayesian framework for understanding how agents learn, decide, and act under uncertainty. His earlier research focused on model-based and causal reinforcement learning (Honours) and causally reasoning agents (MSc).