reinforcement learning

The Synergy of LLMs & RL Unlocks Offline Learning of Generalizable Language-Conditioned Policies with Low-fidelity Data

We train LLM agents as Language-conditioned policies without requiring expensive labeled data or online experimentation. The framework leverages LLMs to enable the use of unlabeled datasets and improve generalization to unseen goals and states.

Thomas Pouplin, Kasia Kobalczyk, Hao Sun, Mihaela van der Schaar

May 1, 2025