{"id":202633,"date":"2024-12-28T08:41:46","date_gmt":"2024-12-28T14:41:46","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2024\/12\/new-llm-optimization-technique-slashes-memory-costs-up-to-75"},"modified":"2024-12-28T08:41:46","modified_gmt":"2024-12-28T14:41:46","slug":"new-llm-optimization-technique-slashes-memory-costs-up-to-75","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2024\/12\/new-llm-optimization-technique-slashes-memory-costs-up-to-75","title":{"rendered":"New LLM optimization technique slashes memory costs up to 75%"},"content":{"rendered":"<p><a class=\"aligncenter blog-photo\" href=\"https:\/\/lifeboat.com\/blog.images\/new-llm-optimization-technique-slashes-memory-costs-up-to-752.jpg\"><\/a><\/p>\n<p>Universal transformer memory optimizes prompts using neural attention memory models (NAMMs), simple neural networks that decide whether to \u201cremember\u201d or \u201cforget\u201d each given token stored in the LLM\u2019s memory.<\/p>\n<p>\u201cThis new capability allows Transformers to discard unhelpful or redundant details, and focus on the most critical information, something we find to be crucial for tasks requiring long-context reasoning,\u201d the researchers write.<\/p>\n<p>NAMMs are trained separately from the LLM and are combined with the pre-trained model at inference time, which makes them flexible and easy to deploy. However, they need access to the inner activations of the model, which means they can only be applied to open-source models.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Universal transformer memory optimizes prompts using neural attention memory models (NAMMs), simple neural networks that decide whether to \u201cremember\u201d or \u201cforget\u201d each given token stored in the LLM\u2019s memory. \u201cThis new capability allows Transformers to discard unhelpful or redundant details, and focus on the most critical information, something we find to be crucial for tasks [\u2026]<\/p>\n","protected":false},"author":662,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[],"class_list":["post-202633","post","type-post","status-publish","format-standard","hentry","category-robotics-ai"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/202633","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/662"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=202633"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/202633\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=202633"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=202633"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=202633"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}