KnowEdit

A Comprehensive Study of Knowledge Editing for Large Language Models

Ninyu Zhang*¹, Yunzhi Yao*¹, Bozhong Tian*¹, Peng Wang*¹,
Shumin Deng*², Mengru Wang¹, Zekun Xi¹, Shengyu Mao¹, Jintian Zhang¹, Yuansheng Ni¹, Siyuan Cheng¹, Ziwen Xu¹, Xin Xu¹, Jia-Chen Gu¹, Yong Jiang¹, Pengjun Xie¹, Fei Huang¹, Lei Liang¹, Zhiqiang Zhang¹, Xiaowei Zhu¹, Jun Zhou¹, Huajun Chen†¹

¹Zhejiang University, ²National University of Singapore, ³Univers of California, Los Angeles, ⁴Ant Group ⁵Alibaba Group

*Equal Contribution †Corresponding Author
Corresponding to: zhangninyu@zju.edu.cn, yyztodd@zju.edu.cn,

arXiv

🤗

HF Paper

🤗

Dataset Code

🔔News

[2024-11-19]: We update the Table 4 results in the paper "A Comprehensive Study of Knowledge Editing for Large Language Models" after optimizing certain methods (related to AdaLoRA) and fixing computational bugs (related to ROME and MEMIT) in the EasyEdit. These improvements have led to better results than before. We further add several new literatures on knowledge mechanism, steering, memory, and AI safety, and discuss the current status of this field. We will continue updating this paper and welcome everyone to discuss and exchange ideas :)
[2024-01-03]: We release a new paper: "A Comprehensive Study of Knowledge Editing for Large Language Models" with a new benchmark KnowEdit!

Abstract

Large Language Models (LLMs) have shown extraordinary capabilities in understanding and generating text that closely mirrors human communication. However, a primary limitation lies in the significant computational demands during training, arising from their extensive parameterization. This challenge is further intensified by the dynamic nature of the world, necessitating frequent updates to LLMs to correct outdated information or integrate new knowledge, thereby ensuring their continued relevance. Note that many applications demand continual model adjustments post-training to address deficiencies or undesirable behaviors. There is an increasing interest in efficient, lightweight methods for on-the-fly model modifications. To this end, recent years have seen a burgeoning in the techniques of knowledge editing for LLMs, which aim to efficiently modify LLMs' behaviors within specific domains while preserving overall performance across various inputs. In this paper, we first define the knowledge editing problem and then provide a comprehensive review of cutting-edge approaches. Drawing inspiration from educational and cognitive research theories, we propose a unified categorization criterion that classifies knowledge editing methods into three groups: resorting to external knowledge, merging knowledge into the model, and editing intrinsic knowledge. Furthermore, we introduce a new benchmark, KnowEdit, for a comprehensive empirical evaluation of representative knowledge editing approaches. Additionally, we provide an in-depth analysis of knowledge location, which can give a deeper understanding of the knowledge structures inherent within LLMs. Finally, we discuss several potential applications of knowledge editing, outlining its broad and impactful implications.

Knowledge Editing for LLMs

Task Definition

The initial goal of knowledge editing is to modify the specific knowledge $k$ in the LLM and improve the consistency and performance of the LLM without fine-tuning the whole model. Knowledge editing is challenging due to the distributed and entangled nature of knowledge in LLMs.

Suppose the original model is $θ$ and given the knowledge $k$ to be changed, by knowledge editing process $F$ , we would get the post-edited model $θ^{^{'}}$ :

$θ^{'} = F (θ, k)$

The post-edited model $θ^{^{'}}$ is supposed to override undesired model beliefs on the knowledge $k$ and keep other knowledge intact:

${\begin{cases} θ^{^{'}} (k) \neq θ (k) \\ \forall k^{^{'}} \neq k, θ^{^{'}} (k^{^{'}}) = θ (k^{^{'}}) \end{cases}$

As a knowledge base, it's paramount that knowledge editing cater to three fundamental settings: knowledge insertion, knowledge modification, and knowledge erasure.

Knowledge Modification

Knowledge modification refers to altering knowledge already stored in LLMs:

$\theta' = F(\theta, \{k\} \rightarrow \{k'\})$

This can be classified into two categories:

Knowledge amendment - This aims at rectifying the inaccuracies embedded in LLMs to ensure the delivery of accurate information. As vast repositories of knowledge, LLMs are prone to housing outdated or erroneous information. Knowledge amendment serves to correct these fallacies, ensuring that models always generate accurate, up-to-date information.
Knowledge disruption - Modifying LLMs to answer counterfactual or error prompts. This is more challenging as counterfactual notions initially receive lower scores compared to factual knowledge, as shown by Meng et al. This necessitates more targeted modification efforts.

Knowledge Erasure

Knowledge erasure targets the excision or obliteration of pre-existing knowledge in a model, primarily to reset distinct facts, relationships, or attributes. Formally, we have:

$\theta' = F(\theta, \{k\} \rightarrow \{\emptyset\})$

Implementing knowledge erasure is pivotal to expunge biases and noxious knowledge and to curtail the recollection of confidential or private data, thereby fostering responsible and trustworthy AI.

Knowledge Insertion

As fields and entities progress, it becomes imperative for LLMs to assimilate emergent information. Knowledge insertion fulfills this by bestowing upon LLMs new knowledge previously outside their purview:

$θ^{'} = F (θ, {\emptyset} \to {k})$

Knowledge Modification

Knowledge modification refers to altering knowledge already stored in LLMs:

$θ^{'} = F (θ, {k} \to {k^{'}})$

This can be classified into two categories:

Knowledge amendment - This aims at rectifying the inaccuracies embedded in LLMs to ensure the delivery of accurate information. As vast repositories of knowledge, LLMs are prone to housing outdated or erroneous information. Knowledge amendment serves to correct these fallacies, ensuring that models always generate accurate, up-to-date information.
Knowledge disruption - Modifying LLMs to answer counterfactual or error prompts. This is more challenging as counterfactual notions initially receive lower scores compared to factual knowledge, as shown by Meng et al. This necessitates more targeted modification efforts.

Knowledge Erasure

Knowledge erasure targets the excision or obliteration of pre-existing knowledge in a model, primarily to reset distinct facts, relationships, or attributes. Formally, we have:

$θ^{'} = F (θ, {k} \to {\emptyset})$

Implementing knowledge erasure is pivotal to expunge biases and noxious knowledge and to curtail the recollection of confidential or private data, thereby fostering responsible and trustworthy AI.

Knowledge Insertion

$\theta' = F(\theta, \{\emptyset\} \rightarrow \{k\})$

Knowledge Modification

Knowledge modification refers to altering knowledge already stored in LLMs:

$\theta' = F(\theta, \{k\} \rightarrow \{k'\})$

This can be classified into two categories:

Knowledge amendment - This aims at rectifying the inaccuracies embedded in LLMs to ensure the delivery of accurate information. As vast repositories of knowledge, LLMs are prone to housing outdated or erroneous information. Knowledge amendment serves to correct these fallacies, ensuring that models always generate accurate, up-to-date information.
Knowledge disruption - Modifying LLMs to answer counterfactual or error prompts. This is more challenging as counterfactual notions initially receive lower scores compared to factual knowledge, as shown by Meng et al. This necessitates more targeted modification efforts.

Knowledge Erasure

Knowledge erasure targets the excision or obliteration of pre-existing knowledge in a model, primarily to reset distinct facts, relationships, or attributes. Formally, we have:

$\theta' = F(\theta, \{k\} \rightarrow \{\emptyset\})$

Implementing knowledge erasure is pivotal to expunge biases and noxious knowledge and to curtail the recollection of confidential or private data, thereby fostering responsible and trustworthy AI.

Knowledge Insertion

$\theta' = F(\theta, \{\emptyset\} \rightarrow \{k\})$

In conclusion, the interplay between knowledge insertion, modification, and erasure forms essential aspects of model editing techniques. When combined, these techniques empower LLMs to transform, self-correct, and ethically adapt as needed.

Method

The development of LLMs has reached a point where their capabilities closely resemble human cognitive processes, especially in learning and acquiring knowledge. Drawing inspiration from how humans learn, we can analogously apply these concepts to the process of editing LLMs as Figure shows below.

Applying Human Learning Phases to Knowledge Editing in LLMs: We see an analogy of Human Learning Phases and Knowledge Editing in LLMs and categorize current knowledge editing methods based on the learning phases of humans: recognition, association, and mastery.

Educational and cognitive research delineates human knowledge acquisition into three distinct phases: recognition, association, and mastery. These phases offer a framework for conceptualizing the methods of knowledge editing in LLMs and we list them in Table below.

Comparison between representative approaches of knowledge editing for LLMs. No Training refers to the methods that do not require additional training; Batch Edit means whether the methods can support editing multiple cases simultaneously in just one process. Edit Area refers to where the model's components are used; Editor #Params indicates the parameters that need to be updated for editing. $L$ refers to the number of layers to update. $d_{h}$ denotes the dimensionality of the hidden layers in the Transformers. $d_{m}$ refers to the intermediate dimension that exists between the up projection and the down projection. $N$ symbolizes the total number of neurons that undergo updates within each individual layer.

New Benchmark: KnowEdit

To evaluate the effectiveness of knowledge editing methods, several datasets have been proposed. In this Section, we present an overview of the current datasets used for knowledge editing and introduce a new benchmark, KnowEdit, which serves as a comprehensive evaluation framework for various knowledge editing techniques.

For this study, we have curated a set of six datasets that are well-suited for assessing knowledge editing methods. A detailed statistical overview of these datasets is presented in Table below, and they encompass a range of editing types, including fact manipulation, sentiment modification, and hallucination generation.

Statistics on the benchmark KnowEdit, with six selected datasets for the evaluation of knowledge editing methods. We select different knowledge types for the insertion, modification, and erasure settings.

Evaluation for Knowledge Editing

Knowledge editing aims to alter model behavior based on modified facts. However, knowledge is interconnected; changing one fact may ripple outwards and affect other facts in complex ways. This interdependence makes assessing the effects of editing difficult. We summarize key evaluation criteria from prior work into four categories: edit success, portability, locality, and fluency.

Locality

When editing the knowledge, we may inadvertently change the knowledge that we don't want to modify. A good edit is supposed to modify the knowledge locality without influencing the knowledge that is unrelated. The evaluation of locality includes two levels:

In-Distribution: this one includes the knowledge that comes from the same distribution. As shown in previous work, overediting is a common phenomenon. Here, we follow Menget al. , Cohen et al. , Yao et al. and construct the related in-distribution knowledge, including \emph{forgetfulness} and \emph{relation specificity}. Forgetfulness evaluates whether the post-edit model retains the original objects in one-to-many relationships. The principle of relation specificity posits that any other attributes of the subject, which have been previously updated, should remain unaltered following the editing process.
Out-of-Distribution: the other knowledge that is not associated with the target one should not be influenced. That is, we also don't want the edited model to lose their general ability to deal with other tasks. Hence, here we test the edited model on the popular NLP benchmark in Experiment.

Generative Capacity

Previous work find that, after editing the model, some models tend to generate repeated things and often generate the edited target whenever encountering the subject words. Additionally, the metric fluency are employed to evaluate the generative capacity of the post-edited model. Here we follow ROME and employ the fluency to measure the model's generation ability after editing. In particular, we calculate the weighted average of bi-gram and tri-gram entropies to assess the diversity of text generations. A decrease in this value indicates increased repetitiveness in the generated text.

Edit Success

The purpose of editing is to change the model's output of given knowledge. Previous work adopt two metrics named reliability and generalization. Reliability aims to test whether the post-edited model give the target answer. However, for the knowledge editing, the given text and the paraphrase. We follow previous work~\cite{mitchell2022fast,Li2023PMETPM} and collectively refer to reliability and generalization the as edit success. Hence, here, edit suceess means the post-edit model should not only answer the question itself correctly but also give the right answer for input with similar expressions.

Portability

Meanwhile, knowledge is not isolated, and solely changing the given knowledge is not enough for downstream use. When the knowledge is corrected, the model is supposed to reason about the downstream effects of the correction. Here, we follow previous work to evaluate whether the edited model can address the implications of an edit for real-world applications and name it as portability to evaluate what would ensue after the knowledge editing. Portability contains three different parts:

Alias: The editing of one subject should not vary from its expression. Wikidata maintains a set of aliases for every entity. Hence, here, we follow Cohen et al, Cohen et al. to replace the question’s subject with an alias or synonym to evaluate post-edited model's performance on other descriptions of the subject.
Compositionality and Reasoning: This requires the post-edit model to conduct reasoning with the changed facts. For example, when we change the current president of the U.S. from Donald Trump to Joe Biden, the answer to the question ``Who is the First Lady of the United States?'' should also be changed.
Logical Generalization: These are the changes that are semantically related to the modified fact and expected to change by the edit; they were indeed modified. For example, as mentioned by Yao et al., when the fact of $(s, r, o)$ are changed, the reversed relation of the knowledge $(o, \hat{r}, s)$ should also be changed.

Locality

In-Distribution: this one includes the knowledge that comes from the same distribution. As shown in previous work, overediting is a common phenomenon. Here, we follow Menget al. , Cohen et al. , Yao et al. and construct the related in-distribution knowledge, including \emph{forgetfulness} and \emph{relation specificity}. Forgetfulness evaluates whether the post-edit model retains the original objects in one-to-many relationships. The principle of relation specificity posits that any other attributes of the subject, which have been previously updated, should remain unaltered following the editing process.
Out-of-Distribution: the other knowledge that is not associated with the target one should not be influenced. That is, we also don't want the edited model to lose their general ability to deal with other tasks. Hence, here we test the edited model on the popular NLP benchmark in Experiment.

Generative Capacity

Edit Success

Portability

Alias: The editing of one subject should not vary from its expression. Wikidata maintains a set of aliases for every entity. Hence, here, we follow Cohen et al, Cohen et al. to replace the question’s subject with an alias or synonym to evaluate post-edited model's performance on other descriptions of the subject.
Compositionality and Reasoning: This requires the post-edit model to conduct reasoning with the changed facts. For example, when we change the current president of the U.S. from Donald Trump to Joe Biden, the answer to the question ``Who is the First Lady of the United States?'' should also be changed.
Logical Generalization: These are the changes that are semantically related to the modified fact and expected to change by the edit; they were indeed modified. For example, as mentioned by Yao et al., when the fact of $(s,r,o)$ are changed, the reversed relation of the knowledge $(o,\hat{r},s)$ should also be changed.

Locality

In-Distribution: this one includes the knowledge that comes from the same distribution. As shown in previous work, overediting is a common phenomenon. Here, we follow Menget al. , Cohen et al. , Yao et al. and construct the related in-distribution knowledge, including \emph{forgetfulness} and \emph{relation specificity}. Forgetfulness evaluates whether the post-edit model retains the original objects in one-to-many relationships. The principle of relation specificity posits that any other attributes of the subject, which have been previously updated, should remain unaltered following the editing process.
Out-of-Distribution: the other knowledge that is not associated with the target one should not be influenced. That is, we also don't want the edited model to lose their general ability to deal with other tasks. Hence, here we test the edited model on the popular NLP benchmark in Experiment.

Generative Capacity

Edit Success

Experiments

In our study, we conduct experiments using current methods and datasets to investigate knowledge editing techniques in the context of LLMs. By conducting experiments using these methods and leveraging appropriate datasets, we aimed to evaluate the performance and efficacy of knowledge editing techniques in LLMs. Our goal was to gain insights into the challenges, limitations, and potential improvements associated with editing knowledge in these models.

Experiment Settings

All the experiments are conducted by EasyEdit. As to the evaluation of the post-edited model, some of the previous works computed the probability difference of the output for pre-edit and post-edit models: $P [y^{*} | θ^{^{'}}] - P [y | θ]$ . $y^{*}$ is the edit target, and $y$ is the original model's prediction. However, the higher probability for $y^{*}$ does not mean an idea outcome, and for realistic usage, when we edit the model, we hope it generates the desired output. Hence, for the evaluation of fact datasets such as WikiData $_{r e c e n t}$ , ZsRE, and \wikicf, we compute the metric as Yao et al. which computes the accuracy of the outputs. Suppose $x_{k}$ is the expression for the updated knowledge $k$ and $y_{k}^{*}$ is the corresponding target output for editing.

$Edit Succ. = \sum_{(x_{k}, y_{k}^{*})} 1 {{argmax}_{y} f_{θ^{^{'}}} (y ∣ x_{k}) = y_{k}^{*}}$

Also, for portability, we compute the post-edited model's performance on the given sets. As to the calculation of locality, some work computes the post-edited model's performance on the locality set $O (x_{k})$ . Here, for a better comparison, we test whether the model keeps its original answer.

$Locality = E_{x_{k}, y_{k}^{*} \sim O (x_{k})} 1 {f_{θ^{^{'}}} (y ∣ x_{k}) = f_{θ} (y ∣ x_{k})}$

Meanwhile, for the sentiment edit task Convsent, we compute the Edit Succ. and Locality as the original dataset:

${Edit Succ.}_{Convsent} ≜ z_{sentiment} \cdot z_{topic}$

Where $z_{sentiment}$ goes to one if the edited model generates correct sentiment responses and $z_{topic}$ one if the edited model's answer related to the target topic. The locality of Convsent~is computed as the KL-divergence so the lower the number, the better the performance is:

${Locality}_{Convsent} ≜ KL (f_{θ} (\cdot ∣ x_{k}) ‖ f_{θ^{^{'}}} (\cdot ∣ x_{k}))$

For the knowledge erasure task Sanitation, we calculate edit success as whether the model answers ``I don't know.'' for the given knowledge. As for the locality, we compute the performance on the retain sets as whether the model keeps their original answer.

Main Results

We list the results of current knowledge editing methods on Llama2-7b-chat in Table below.

Results of existing knowledge edit methods on the constructed benchmark. The symbol $↑$ indicates that higher numbers correspond to better performance, while $↓$ denotes the opposite, with lower numbers indicating better performance. The locality of Convsent is computed as the KL-divergence so the lower the number, the better the performance is. For WikiBio and Convsent, we do not test the portability as they are about specific topics.

Impact of Knowledge Editing on General Tasks

In this Section, we explore the impact of applying knowledge editing methods on the performance of a language model across various domains. Our main goal is to determine if incorporating edits related to specific factual knowledge can unintentionally hinder the model's proficiency in unrelated areas.

Average sub-metrics performance of results on several fact edit datasets in Portability and Locality.

When directly fine-tuning the entire model on the provided edited cases, a significant decline in the post-edited model's performance on general tasks is observed. We find that the post-edited model becomes inclined to generate outputs resembling the edited cases, which highlights the presence of overfitting. An intriguing observation from Table below is that, on a holistic level, the edited models managed to sustain a performance level that is close to their unedited counterparts.

The zero-shot performance on the general LLM benchmark with Llama2-Chat-7B as the base model. Here, we conduct 5 consecutive edits for each method using the Wiki $_{r e c e n t}$ dataset to evaluate the post-edited model's general ability. We adopt the OpenCompass to evaluate the model and use the HuggingFace setting. The MMLU and AGIEval are both the average performance of the sub-tasks.

This suggests that the negative impact of the editing was limited to directly altered topics. However, one exception to this trend is the FT-L model's performance on TriviaQA, which shows a noticeable decline from an initial score of 45.39 to 34.60 after the edit. Nevertheless, taking a broader perspective, we can observe commendable consistency. This implies that contemporary knowledge editing methods are effective in executing five targeted factual updates with minimal disruptions to the model's cognitive capabilities and adaptability across diverse knowledge domains.

Multi-Task Knowledge Editing

Previous work considered a sequential edit for a lifelong knowledge editing. However, they always conduct sequential editing on a single dataset from the same distribution. This is a bit different from Continuous learning. Knowledge editing is not a task focusing on single-domain knowledge or fact. In reality, we may want to modify our model from different perspectives from different distributions.

Cross-domain Editing

Both MEND and SERAC methods rely on a training dataset to help the model learn how to edit parameters. We evaluate their performance in a cross-domain setting and present the results in Table below.

Cross-Domain Editing Results. Performance (accuracy) of the compared methods, which are firstly trained on a source dataset and then directly conduct prediction on a target dataset (denoted as source $\Rightarrow$ target).

For the MEND method, the hyper-network trained using the ZsRE dataset exhibits better cross-domain performance than that trained with the recent dataset. This can be attributed to the enormous size of the ZsRE dataset, allowing MEND's hyper-network to enhance its parameter-editing capabilities. Meanwhile, the SERAC approach, by leveraging its cache, exhibits significant cross-domain editing prowess.

Continual Editing

Methods like LoRA and ROME do not require a training set and can be applied directly to different domains. Hence, we consider a more challenging setting for continual editing. We mix different knowledge editing cases using the ZsRE, Wiki $_{r e c e n t}$ and Wiki $_{c o u n t e r f a c t}$ . We combine different numbers of settings, including 10, 100, 500, and 1000, and edit the knowledge from different sets randomly. Here, we mainly consider three methods: FT-L, ROME, and AdaLoRA. We report the empirical findings in Figure below.

Sequential editing results in randomly selected data from WikiData $_{c o u n t e r f a c t}$ , ZsRE ~and WikiData $_{r e c e n t}$ with different numbers.

When dealing with sequential editing, we can observe that these three methods all suffer from 1,000 editing times with a dramatic drop in all evaluation metrics, and the trend is similar for three different tasks. Relatively, AdaLoRA shows a stable performance for about 100 edits. Current editing methods tend to edit the same area for different knowledge (e.g. ROME the fifth layer, MEND the last three layers), while the knowledge is not stored in this area.

Error and Case Analysis

As shown in the results, different methods demonstrate different performance on different tasks. Here, we conduct a study to comprehensively understand their limitations and advantages. In analyzing the failure modes of knowledge editing methods, we categorize the deficiencies into four primary types:

Meaningless Token Generation: The edited model produces meaningless tokens such as `\n' or repetitive letter combinations that lack semantic meaning or grounding.
Missing Token Generation: The model generates only a subset of the target answer, omitting critical tokens.
Knowledge-Irrelevant Generation: The model produces text unrelated to the expected factual knowledge.
Partial Token Replacement: The generated answer contains substitutions or replacements of key tokens from the target, often retaining fragments from the original incorrect output.

The occurrence of these error types helps identify the limitations of the editing methods. Meaningless and missing token cases highlight difficulties in fully encoding the target fact, while knowledge-irrelevant and partial replacement generations suggest that the edits fail to supplant previously learned information. We conduct an error analysis on the ZsRE tasks and counted the error cases for each editing method. The results are presented in Figure below.

Bad cases statistics for different knowledge editing methods.

Here, we can find the main error type is the partial token replacement due to the conflict of the knowledge in the original model and our target one. The analysis reveals that the main error type is partial token replacement, indicating a conflict between the knowledge in the original model and the target knowledge. Specifically, the SERAC method tends to generate meaningless tokens due to the limited generation ability of the small model used. The AdaLoRA method may miss some tokens related to the target knowledge. For the fine-tuning methods, the percentage of fact-irrelevant words is higher compared to other editing methods, and it is the most common error type (47.3\%) for FT-L. This suggests that the objective of fine-tuning might not be suitable for editing specific knowledge. Additionally, in the following section, we find that FT-L tends to modify more areas in the parameters, leading to more irrelevant generations.

We also show the generated texts for different editing methods for the cases in Table below.

Results for one case of different editing methods. Prompts are presented in italicized text. Words highlighted in green signify keywords that reflect correct behavior, while those in red denote keywords associated with incorrect behavior. Texts in cyan are repeated or meaningless sentences.

Here, we can find that current editing methods, like IKE, MEND, ROME can successfully modify the material of the Queen Amina Statue from bronze to limestone and generate fluent texts. SERAC and FT-L, despite changing the facts successfully, tend to generate repeated sentences or meaningless entities. Additionally, AdaLoRA failed to change the fact and kept the original answer, "bronze".

Analysis

Current research has explored the effectiveness of knowledge editing methods in LLMs, but the underlying reasons for their superior performance remain unexplored. Additionally, the comparison between model editing and fine-tuning approaches, as well as the efficacy of knowledge location methods, requires further investigation. This study proposes a simple attempt to bridge these gaps by examining the differences between model editing and fine-tuning, exploring the effectiveness of knowledge location techniques, and understanding the knowledge structure within LLMs. We hope further investigation will unveil the mechanisms of knowledge in LLMs.

Comparison of Different Knowledge Editing Methods

The effectiveness of current knowledge editing methods is commendable, but the reasons behind their superior performance compared to other approaches remain elusive. In this section, we focus on methods that involve parameter adjustments within the model, specifically MEND, ROME, MEMIT, and FT-L. As these methods modify the model's parameters, a fundamental question arises: what makes some knowledge editing methods, like MEND, superior in terms of locality and overall performance? We formally represent the change as $W^{'} = W + Δ W_{edit}$ , where $W$ is the original weight matrix, and $Δ W_{edit}$ represents the modifications made during editing. Therefore, our primary focus in this section is to discern the differences between the matrices $Δ W_{edit}$ for different editing methods.

Sparsity

An important characteristic of knowledge editing is its intention to modify a specific piece of knowledge within the model. This suggests an intuitive hypothesis that the $\Delta \boldsymbol{W}$ matrix is likely to be sparse. Following the approach of De Cao et al. , we present visualizations that capture weight updates resulting from knowledge edits, as depicted in Figure below.

The heatmap shows how different model editing methods affect the weights of the model. Darker colors indicate more changes in the weights. The heatmap reveals which parts of the model are most sensitive to changes for each method.

ROME, MEND, and MEMIT exhibit a distinct pattern of sparse updates, while fine-tuning spreads its modifications more uniformly across weights. Particularly, for knowledge editing methods like ROME and MEMIT, it is intriguing to observe a concentrated focus on one or several columns of the value layer. This finding aligns with earlier research that emphasizes the value layer's pivotal role in encapsulating correlated knowledge. Regarding the MEND methods, we propose that the learned hypernetwork can be viewed as a tool or a "probe" that helps us explore and understand the internal mechanisms used by the model to encode knowledge, providing insights into how the model represents and processes information.

Mapping to Embedding Space

To further investigate the differences between different editing methods, we conduct an embedding space analysis following the approach of Dar et al. They analyze the Transformer's parameters by mapping the weights of the LLMs to the vocabulary space and find that the embedding space can interpret these weights. Here, we map the two matrices, $\boldsymbol{W}'$ and $\boldsymbol{W}$, to observe the differences between these methods. From the sparsity analysis, we select the top five columns of the updated value matrix $\Delta \boldsymbol{W}$ and map the corresponding columns of $\boldsymbol{W}'$ and $\boldsymbol{W}$ into the embedding matrices $\boldsymbol{E}$ to obtain the logits in the vocabulary space. We then compute the Hit@10 and Hit@50 of the new knowledge in the output logits. We select cases from ZsRE where all four methods successfully edit the knowledge and present the average performance in Figure below.

The Hit@10 and Hit@50 performance for the target knowledge in the model’s parameters before and after editing.

From the figure, we observe that MEND and MEMIT significantly inject the target knowledge into the parameters. Notably, MEND demonstrates a remarkable capacity for editing, with the Hit@50 rate already exceeding 90\% before the edit. This means that MEND might be able to find and change the right neurons that hold the target knowledge without having to do a full knowledge-locating analysis. After the editing process, we observe a substantial increase in the Hit@10 score. In fact, in our experiments, the Hit@1 for MEND is also above 90\% after editing, demonstrating its strong editing capacity. For MEMIT, we also observe an increase in Hit@50 (59.7\% $\rightarrow$ 70.2\%), and the original neurons already have a high Hit score before editing. However, for ROME and FT-L, we do not observe an increase in performance, indicating that their editing mechanisms require further investigation to understand their specific characteristics and limitations.

Sparsity

An important characteristic of knowledge editing is its intention to modify a specific piece of knowledge within the model. This suggests an intuitive hypothesis that the $Δ W$ matrix is likely to be sparse. Following the approach of De Cao et al. , we present visualizations that capture weight updates resulting from knowledge edits, as depicted in Figure below.

Mapping to Embedding Space

To further investigate the differences between different editing methods, we conduct an embedding space analysis following the approach of Dar et al. They analyze the Transformer's parameters by mapping the weights of the LLMs to the vocabulary space and find that the embedding space can interpret these weights. Here, we map the two matrices, $W^{'}$ and $W$ , to observe the differences between these methods. From the sparsity analysis, we select the top five columns of the updated value matrix $Δ W$ and map the corresponding columns of $W^{'}$ and $W$ into the embedding matrices $E$ to obtain the logits in the vocabulary space. We then compute the Hit@10 and Hit@50 of the new knowledge in the output logits. We select cases from ZsRE where all four methods successfully edit the knowledge and present the average performance in Figure below.

The Hit@10 and Hit@50 performance for the target knowledge in the model’s parameters before and after editing.

From the figure, we observe that MEND and MEMIT significantly inject the target knowledge into the parameters. Notably, MEND demonstrates a remarkable capacity for editing, with the Hit@50 rate already exceeding 90\% before the edit. This means that MEND might be able to find and change the right neurons that hold the target knowledge without having to do a full knowledge-locating analysis. After the editing process, we observe a substantial increase in the Hit@10 score. In fact, in our experiments, the Hit@1 for MEND is also above 90\% after editing, demonstrating its strong editing capacity. For MEMIT, we also observe an increase in Hit@50 (59.7\% $\to$ 70.2\%), and the original neurons already have a high Hit score before editing. However, for ROME and FT-L, we do not observe an increase in performance, indicating that their editing mechanisms require further investigation to understand their specific characteristics and limitations.

Sparsity

Mapping to Embedding Space

The Hit@10 and Hit@50 performance for the target knowledge in the model’s parameters before and after editing.

Sparsity

The Effectiveness of Knowledge Locating in LLMs

We adopt the computing of the \textbf{Relative Similarity} (RSim) as: $max (\frac{{Sim}_{cand} - {Sim}_{all}}{1 - {Sim}_{all}}, 0)$ . We adopt their dataset klob-r (designed for measuring consistency) and klob-c (designed for measuring relevance) and apply them to the casual analysis method proposed by ROME~\cite{meng2022locating}. Since the casual analysis is a layer-wise intervention, here we compute the similarity using the overlap between the identified layers. We show the RSim score in Figure below.

RSim for the different number of layers.

Here, we can find the Rsim score is less than 0.6 when we consider more than five layers for both consistency and relevance, which means the locating results for unrelated knowledge and related knowledge chains didn't show much difference. To be more tangible, we conduct a case study here.

Case Study

We consider three settings for a given fact associated with the entity SMAP and show it in Figure below.

First, we conduct a causal analysis of the fact with the entity [ $SMAP \overset{created in}{\to} Japan$ ]. Second, we consider a related question with the fact,[ $SMAP \overset{created in}{\to} Japan \overset{language}{\to} Japanese$ ], where the model should answer the question based on the fact. Then, we adopt an unrelated fact [ $SMAP \overset{type of}{\to} seminal group$ ].

We first conduct a causal analysis of the fact: [ $SMAP \overset{created in}{\to} Japan$ ]. Then, we consider a related question with the fact [ $SMAP \overset{created in}{\to} Japan \overset{language}{\to} Japanese$ ], where the model should answer the question based on the fact. Finally, we adopt an unrelated fact [ $SMAP \overset{type of}{\to} seminal group$ ] with the question. The results show that these facts are possibly related to the same place around 5 layers. However, as Ju and Zhang mentioned, the locating results for specific knowledge and its related knowledge chain should exhibit greater similarity compared to those for unrelated knowledge. Currently, casual analysis methods seem to just locate the area that is related to the entity itself, not the whole fact. Whether the model performs these answers by cheating with answers memorized from the pretraining corpus or via a multi-step reasoning mechanism is still unclear. This is strongly related to the knowledge editing tasks. More broadly, better insight into models' knowledge processes could unlock capabilities like explainability and fact verification. However, fully understanding how exactly knowledge is organized and interconnected within such large models presents an ongoing challenge. Key open questions include developing methods to trace factual usage during reasoning, designing location techniques that identify knowledge most salient for model outputs, and learning how architectural properties relate to knowledge utilization. Unpacking these knowledge architectures will be integral to enabling more precise and robust model interventions through approaches like knowledge editing but currently manipulating only the MLP weights is not enough.

The Implicit Knowledge Structure in LLMs

Understanding the knowledge structure in LLM is crucial for effective knowledge editing. Previous research often conceptualized knowledge within LLMs as resembling triples in Knowledge Graphs (KG), comprising subjects, relations, and objects. This analogy, while useful, simplifies the intricate nature of knowledge representation in LLMs.

Editing knowledge in a KG, where the task usually involves modifying a single relationship between two nodes, is comparatively straightforward. KGs inherently support easy reasoning tasks and allow for the preservation of the rest of the knowledge structure. This resilience is illustrated in Figure below, where edits and subsequent recovery processes result in the complete restoration of the original KG structure.

Comparison of editing effects on Knowledge Graphs vs. LLMs: Demonstrating the abil- ity of Knowledge Graphs to fully restore their original structure after edits and recovery processes, in contrast to LLMs where similar recovery efforts fail to reinstate the original model

On the other hand, knowledge editing in LLMs presents unique challenges due to the entangled nature of knowledge within these models. Unlike KGs, where knowledge is neatly compartmentalized, in LLMs, knowledge is distributed across various parameters and layers, making it difficult to isolate and edit specific information without affecting other knowledge areas. The current perspective of viewing knowledge in LLMs as triples is somewhat limited and fails to capture the full complexity and interconnected nature of these models. This complexity is further highlighted by previous work, who discuss the challenges of modifying intrinsic knowledge within parameters.

Furthermore, previous research has revealed that knowledge editing in LLMs can lead to unintended propagation effects. Li et al. illustrates that current knowledge editing methods can result in knowledge conflict and knowledge distortion within LLMs. Unlike structured knowledge bases, neural networks lack strict constraints on knowledge structure and interrelationships. This makes it difficult to confine edits to a localized scope within the model, and the free-form nature of LLMs further complicates the editing process. Consequently, a more comprehensive understanding of the LM's mechanisms is required.

Currently, methods like T-Patcher or IKE offer plug-and-play functionality and easy reversibility. They provide flexibility and user-friendliness and can be easily integrated into or detached from the LLMs as needed. These methods aim to mitigate some of the challenges associated with knowledge editing in LLMs, allowing for convenient and reversible modifications. As the field evolves, it is imperative to continue developing methods that not only address the challenges of knowledge editing but also harness the full potential of these complex systems, turning vanilla LLMs into WikiModels, a.k.a., neural knowledge bases that is feasibility for editing.

BibTeX


    @article{zhang2024comprehensive,
      title={A Comprehensive Study of Knowledge Editing for Large Language Models},
      author={Zhang, Ningyu and Yao, Yunzhi and Tian, Bozhong and Wang, Peng and Deng, Shumin and Wang, Mengru and Xi, Zekun and Mao, Shengyu and Zhang, Jintian and Ni, Yuansheng and others},
      journal={arXiv preprint arXiv:2401.01286},
      year={2024}
    }
    @article{wang2023easyedit,
      title={EasyEdit: An Easy-to-use Knowledge Editing Framework for Large Language Models},
      author={Wang, Peng and Zhang, Ningyu and Xie, Xin and Yao, Yunzhi and Tian, Bozhong and Wang, Mengru and Xi, Zekun and Cheng, Siyuan and Liu, Kangwei and Zheng, Guozhou and others},
      journal={arXiv preprint arXiv:2308.07269},
      year={2023}
    }
    @article{yao2023editing,
      title={Editing Large Language Models: Problems, Methods, and Opportunities},
      author={Yao, Yunzhi and Wang, Peng and Tian, Bozhong and Cheng, Siyuan and Li, Zhoubo and Deng, Shumin and Chen, Huajun and Zhang, Ningyu},
      journal={arXiv preprint arXiv:2305.13172},
      year={2023}
    }