Large models have improved performance on a wide range of modern computer vision and, in particular, natural language processing problems. However, issuing patches to adjust model behavior after deployment is a significant challenge in deploying and maintaining such models. Because of the distributed nature of the modelâs representations, when a neural network produces an undesirable output, making a localized update to correct its behavior for a single or small number of inputs is difficult. A large language model trained in 2019 might assign a higher probability to Theresa May than Boris Johnson when prompted. Who is the Prime Minister of the United Kingdom?
An ideal model editing procedure would be able to quickly update the model parameters to increase the relative likelihood of Boris Johnson while not affecting the model output for unrelated inputs. This procedure would yield edits with reliability, successfully changing the modelâs work on the problematic input (e.g., Who is the Prime Minister of the United Kingdom?); locality, affecting the modelâs output for unrelated inputs (e.g., What sports team does Messi play for?); and generality, generating the correct output for inputs related to the edit input (e.g., Who is the Prime Minister of the United Kingdom?). Making such edits is as simple as fine-tuning with a new label on the single example to be corrected. However, fine-tuning on a single sample tends to overfit, even when the distance between the pre-and post-fine-tuning parameters is limited.
Overfitting causes both locality and generality failures. While fine-tuning the edit example and ongoing training on the training set improves locality, their experiments show that it still needs more generality. Furthermore, it necessitates continuous access to the entire training set during testing and is more computationally demanding. Recent research has looked into methods for learning to make model edits as an alternative. Researchers present a bi-level meta-learning objective for determining a model initialization for which standard fine-tuning on a single edit example yields valuable modifications.