Despites that Large Language Models (LLMs) become the status quo in Natural Language Processing community, most of the existing state-of-the-art models need to be trained directly on language tasks to obtain language skills. On the contrary, humans can indirectly lean language as a byproduct of achieving non-language objectives.
Motivated by this observation, a question arise: Can embodied reinforcement learning agents gain language skill in a similarly indirectly manner? In a new paper Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning, a Stanford University research team investigates this question in their customized multi-task environment and affirms that simple language skills can emerge in meta-RL agents without direct language supervision.
The main focus of this work is to investigate whether RL agents can learn language indirectly without any language supervision. To this end, the team first designs an office navigation environment where to goal is to find a target office as soon as possible.
In their exploration in their customized office environment, the team summarizes that they aim to answer the following four questions:
- Our main question: Can agents learn language without explicit language supervision?
- Can agents learn to read other modalities beyond language, such as a pictorial map?
- What factors impact language emergence?
- Do these results scale to 3D environments with high dimensional pixel observations?
To find out whether language can emerge, the team first trained DREAM on the 2D office with language floor plans. They…