Juan Claude Formanek, Marcel Hedman, Kale-ab Tessera, Christopher C. Holmes, Jonathan P. Shock
26 May 2026
Many of the settings where reinforcement learning could matter most are both social and data-limited: agents must act in the presence of other decision-makers, yet cannot rely on online interaction to learn how to do so. Current evaluations do not target data-constrained social reasoning. Standard offline RL benchmarks treat the environment as non-social, while multi-agent benchmarks focus exclusively on fully cooperative settings with fixed background agents. Thus the challenge of reasoning about partner identity and motivations under mixed incentives, and generalizing across social structures from offline data alone, remains untested. We introduce Molten Pot, an evaluation protocol, datasets and benchmark for offline mixed-motive social reinforcement learning built on Melting Pot substrates. The protocol spans five substrates, 47 social scenarios, approximately one terabyte of trajectory data, and defines three complementary evaluation settings that each probe a different aspect of social robustness. Setting 1 selects single scenarios and tests the performance of standard offline RL in mixed-motive settings where the background population is kept fixed. Setting 2 pools datasets across every scenario of a substrate, requiring the learnt policy to handle varying partner agent behavior without explicit context. Setting 3 evaluates zero-shot social generalization through disjoint train/test splits that isolate specific social shifts. Finally, we benchmark four representative offline RL algorithms on our evaluation protocol and datasets, finding clear limitations in current methods’ ability to learn robust social strategies from offline data alone. Molten Pot establishes offline social evaluation as a distinct and necessary target for further reinforcement learning research.
Venue: ICML 2026 Workshop on Decision-Making from Offline Datasets to Online Adaptation (poster)