{"id":149548,"date":"2022-11-06T00:25:23","date_gmt":"2022-11-06T05:25:23","guid":{"rendered":"https:\/\/lifeboat.com\/blog\/2022\/11\/rome-locating-and-editing-factual-associations-in-gpt-paper-explained-author-interview"},"modified":"2022-11-06T00:25:23","modified_gmt":"2022-11-06T05:25:23","slug":"rome-locating-and-editing-factual-associations-in-gpt-paper-explained-author-interview","status":"publish","type":"post","link":"https:\/\/lifeboat.com\/blog\/2022\/11\/rome-locating-and-editing-factual-associations-in-gpt-paper-explained-author-interview","title":{"rendered":"ROME: Locating and Editing Factual Associations in GPT (Paper Explained &amp; Author Interview)"},"content":{"rendered":"<p><\/p>\n<p><iframe style=\"display: block; margin: 0 auto; width: 100%; aspect-ratio: 4\/3; object-fit: contain;\" src=\"https:\/\/www.youtube.com\/embed\/_NMQyOu2HTo?feature=oembed\" frameborder=\"0\" allow=\"accelerometer; autoplay; encrypted-media; gyroscope;\n   picture-in-picture\" allowfullscreen><\/iframe><\/p>\n<p>Large Language Models have the ability to store vast amounts of facts about the world. But little is known, how these models actually do this. This paper aims at discovering the mechanism and location of storage and recall of factual associations in GPT models, and then proposes a mechanism for the targeted editing of such facts, in form of a simple rank-one update to a single MLP layer. This has wide implications both for how we understand such models\u2019 inner workings, and for our ability to gain greater control over such models in the future.<\/p>\n<p>OUTLINE:<br \/> 0:00 \u2014 Introduction.<br \/> 1:40 \u2014 What are the main questions in this subfield?<br \/> 6:55 \u2014 How causal tracing reveals where facts are stored.<br \/> 18:40 \u2014 Clever experiments show the importance of MLPs.<br \/> 24:30 \u2014 How do MLPs store information?<br \/> 29:10 \u2014 How to edit language model knowledge with precision?<br \/> 36:45 \u2014 What does it mean to know something?<br \/> 39:00 \u2014 Experimental Evaluation &amp; the CounterFact benchmark.<br \/> 45:40 \u2014 How to obtain the required latent representations?<br \/> 51:15 \u2014 Where is the best location in the model to perform edits?<br \/> 58:00 \u2014 What do these models understand about language?<br \/> 1:02:00 \u2014 Questions for the community.<\/p>\n<p>Paper: <a href=\"https:\/\/arxiv.org\/abs\/2202.05262\">https:\/\/arxiv.org\/abs\/2202.05262<\/a><br \/> Follow-up paper on Mass-Editing Memory in a Transformer: <a href=\"https:\/\/arxiv.org\/abs\/2210\">https:\/\/arxiv.org\/abs\/2210<\/a>.<\/p>\n<p>Abstract:<br \/> We analyze the storage and recall of factual associations in autoregressive transformer language models, finding evidence that these associations correspond to localized, directly-editable computations. We first develop a causal intervention for identifying neuron activations that are decisive in a model\u2019s factual predictions. This reveals a distinct set of steps in middle-layer feed-forward modules that mediate factual predictions while processing subject tokens. To test our hypothesis that these computations correspond to factual association recall, we modify feed-forward weights to update specific factual associations using Rank-One Model Editing (ROME). We find that ROME is effective on a standard zero-shot relation extraction (zsRE) model-editing task, comparable to existing methods. To perform a more sensitive evaluation, we also evaluate ROME on a new dataset of counterfactual assertions, on which it simultaneously maintains both specificity and generalization, whereas other methods sacrifice one or another. Our results confirm an important role for mid-layer feed-forward modules in storing factual associations and suggest that direct manipulation of computational mechanisms may be a feasible approach for model editing. The code, dataset, visualizations, and an interactive demo notebook are available at this https URL<\/p>\n<p>Authors: Kevin Meng, David Bau, Alex Andonian, Yonatan Belinkov.<\/p>\n<p>Links:<\/p>\n<div class=\"more-link-wrapper\"> <a class=\"more-link\" href=\"https:\/\/lifeboat.com\/blog\/2022\/11\/rome-locating-and-editing-factual-associations-in-gpt-paper-explained-author-interview\">Continue reading \u201cROME: Locating and Editing Factual Associations in GPT (Paper Explained &amp; Author Interview)\u201d | &gt;<\/a><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Large Language Models have the ability to store vast amounts of facts about the world. But little is known, how these models actually do this. This paper aims at discovering the mechanism and location of storage and recall of factual associations in GPT models, and then proposes a mechanism for the targeted editing of such [\u2026]<\/p>\n","protected":false},"author":556,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1523,47],"tags":[],"class_list":["post-149548","post","type-post","status-publish","format-standard","hentry","category-computing","category-neuroscience"],"_links":{"self":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/149548","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/users\/556"}],"replies":[{"embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/comments?post=149548"}],"version-history":[{"count":0,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/posts\/149548\/revisions"}],"wp:attachment":[{"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/media?parent=149548"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/categories?post=149548"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/lifeboat.com\/blog\/wp-json\/wp\/v2\/tags?post=149548"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}