{"id":214986,"date":"2025-05-29T05:20:56","date_gmt":"2025-05-29T10:20:56","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2025\/05\/3dllm-mem-long-term-spatial-temporal-memory-for-embodied-3d-large-language-model"},"modified":"2025-05-29T05:20:56","modified_gmt":"2025-05-29T10:20:56","slug":"3dllm-mem-long-term-spatial-temporal-memory-for-embodied-3d-large-language-model","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2025\/05\/3dllm-mem-long-term-spatial-temporal-memory-for-embodied-3d-large-language-model","title":{"rendered":"3DLLM-Mem: Long-Term Spatial-Temporal Memory for Embodied 3D Large Language Model"},"content":{"rendered":"<p><\/p>\n<p><iframe style=\"display: block; margin: 0 auto; width: 100%; aspect-ratio: 4\/3; object-fit: contain;\" src=\"https:\/\/www.youtube.com\/embed\/PbnWizEJL8w?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope;\n   picture-in-picture\" allowfullscreen><\/iframe><\/p>\n<p>Humans excel at performing complex tasks by leveraging long-term memory across temporal and spatial experiences. In contrast, current Large Language Models (LLMs) struggle to effectively plan and act in dynamic, multi-room 3D environments. We posit that part of this limitation is due to the lack of proper 3D spatial-temporal memory modeling in LLMs. To address this, we first introduce 3DMem-Bench, a comprehensive benchmark comprising over 26,000 trajectories and 2,892 embodied tasks, question-answering and captioning, designed to evaluate an agent\u2019s ability to reason over long-term memory in 3D environments. Second, we propose 3DLLM-Mem, a novel dynamic memory management and fusion model for embodied spatial-temporal reasoning and actions in LLMs.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Humans excel at performing complex tasks by leveraging long-term memory across temporal and spatial experiences. In contrast, current Large Language Models (LLMs) struggle to effectively plan and act in dynamic, multi-room 3D environments. We posit that part of this limitation is due to the lack of proper 3D spatial-temporal memory modeling in LLMs. To address [\u2026]<\/p>\n","protected":false},"author":709,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-214986","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/214986","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/709"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=214986"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/214986\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=214986"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=214986"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=214986"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}