ShockLab

AUTHORS

Ruan de Kock, Arnu Pretorius, Jonathan Shock

DATE PUBLISHED

28 May 2025

Abstract

One of the core challenges frequently cited in the multi-agent reinforcement learning (MARL) literature motivating the framing of a sequential decision-making problem as a multi-agent problem, instead of a centralised single-agent problem, is the exponential growth in the action space with the number of agents. The assumption that this is always a challenge suggests that this exponentially larger action space poses two specific problems compared with centralised approaches:(1) overwhelming memory requirements and (2) low sample efficiency due to the large optimisation space. Although a core tenet within the MARL community, few works have concretely tested this assumption empirically within a controlled setting to give some indication of its severity in practice. In this work, we compare fully centralised learning with fully decentralised learning. Using a novel ud835udc41-agent array game akin to the canonical Climbing matrix game, we re-establish a wellknown result; that fully centralised learning is able to find the globally optimal solution while decentralised learning fails. We further demonstrate that these trends hold for more modern MARL benchmarks that run on hardware accelerators and leverage the computational efficiency gains of the JAX framework.