Abstract

Many real-world problems are extremely difficult, combinatorial in nature, and can require complex coordination. This level of complexity can cause even well-trained RL systems to hit a performance ceiling that they are unable to break through with zero-shot inference. Fortunately, many practical settings offer a compute budget at inference time, and this budget can be used to close the gap between one-shot performance and the optimum. In this talk I’ll introduce inference strategies for RL through the lens of combinatorial optimisation, using the Travelling Salesman Problem as a running example. We’ll cover how CO problems are framed for RL, why the gap between zero-shot and optimal performance opens up, and walk through a progression of inference strategies, from simple sampling, to tree search, to methods that adapt the policy at inference time. The goal is to leave you with a clear picture of how the performance of RL agents can be enhanced at inference time.

Bio

Noah De Nicola recently completed his MSc in Applied Mathematics at UCT, under the supervision of Jonathan Shock, with a thesis on Reinforcement Learning for Combinatorial Optimisation. He now works as a Research Engineer at InstaDeep, focusing on inference strategies for multi-agent RL systems. He co-authored MEMENTO (NeurIPS 2025 spotlight) on memory-enhanced neural solvers for routing problems, and contributed to Breaking the Performance Ceiling (NeurIPS 2025 oral) on inference strategies for RL.