A 100+ page detailed analysis on 18 LLMs for embodied decision making.
ArXiv: https://arxiv.org/abs/2410.07166 Website: https://embodied-agent-interface.github.io.
The research focuses on evaluating how well Large Language Models (LLMs) can make decisions in environments where physical actions are…
Problem: We aim to evaluate Large Language Models (LLMs) for embodied decision making. While a significant body of work has been leveraging LLMs for decision making in embodied environments, we still lack a systematic understanding of their performances, because they are usually applied in different domains for different purposes, and built based on different inputs and outputs. Furthermore, existing evaluations tend to rely solely on a final success rate, making it difficult to pinpoint what ability is missing in LLMs and where the problem lies, which in turn, blocks embodied agents from leveraging LLMs effectively and selectively.