Large Language Models have the ability to store vast amounts of facts about the world. But little is known, how these models actually do this. This paper aims at discovering the mechanism and location of storage and recall of factual associations in GPT models, and then proposes a mechanism for the targeted editing of such facts, in form of a simple rank-one update to a single MLP layer. This has wide implications both for how we understand such models’ inner workings, and for our ability to gain greater control over such models in the future.
OUTLINE:
0:00 — Introduction.
1:40 — What are the main questions in this subfield?
6:55 — How causal tracing reveals where facts are stored.
18:40 — Clever experiments show the importance of MLPs.
24:30 — How do MLPs store information?
29:10 — How to edit language model knowledge with precision?
36:45 — What does it mean to know something?
39:00 — Experimental Evaluation & the CounterFact benchmark.
45:40 — How to obtain the required latent representations?
51:15 — Where is the best location in the model to perform edits?
58:00 — What do these models understand about language?
1:02:00 — Questions for the community.
Paper: https://arxiv.org/abs/2202.05262
Follow-up paper on Mass-Editing Memory in a Transformer: https://arxiv.org/abs/2210.
Abstract: