Reducing Memorization in LLMs A technique that masks tokens in large language models, protecting data privacy

Published

sep 18, 2024

Reading time

2 min read

Studies have established that large language models can memorize the text passages they’ve been trained on repeatedly and regurgitate them when prompted in adversarial and, though rarely, in benign ways. Researchers proposed a way to reduce this tendency and attendant risks to intellectual property and privacy.

What’s new: Abhimanyu Hans and colleagues from University of Maryland introduced the goldfish loss, a modification of the next-token-prediction loss function typically used in large language models. The goldfish loss avoids memorization of long passages by masking some tokens during the loss computation.

Key insight: Certain passages may appear many times during training, either because the model takes multiple passes over data or because they’re duplicated in the training corpus. Randomly masking individual tokens from the loss computation doesn’t prevent a model from memorizing repeated passages because the model, over many repetitions, still sees every word and its place in the order. But masking a long passage the same way with every repetition ensures the model can’t memorize the passage regardless of the number of repetitions.

How it works: The goldfish loss masks the current token from the loss computation based on previous tokens. A deterministic hashing function decides which tokens to mask effectively at random the first time it encounters a particular 13-token sequence, but identically if it encounters the same sequence again. At a high level, it masks a certain percentage of tokens, typically one in three or four. The authors compared the goldfish loss to the next-token-prediction loss function in two settings: one that mimicked a typical training process and one that made memorization more likely.

For the typical training process, the authors trained TinyLLaMa-1.1B for one epoch on a subset of RedPajama, a de-duplicated dataset of text scraped from the web. To provide duplicate text, they added 2,000 sequences from Wikipedia, each repeated 50 times.
To promote memorization, they fine-tuned a pretrained Llama 2 7B for 100 epochs on 100 Wikipedia articles.

Results: The authors assessed the results using two metrics: (i) ROUGE-L, which falls between 0 and 100 percent and reflects the longest subsequence in common between ground-truth and generated data, and (ii) the percentage of tokens that exactly matched the original text in proper order. Both measure memorization, so lower scores are better.

In the typical setting, the model trained using the next-token-prediction loss memorized heavily, while the model trained with the goldfish loss memorized just a little bit.
In the setting that promoted memorization, the model trained using the next-token-prediction loss exactly matched 85 percent of the tokens in the Wikipedia articles and achieved 96 percent ROUGE-L. The model using the goldfish loss exactly matched 0 percent of the Wikipedia tokens and achieved 51 percent ROUGE-L.
Both models achieved similar performance on six common-sense reasoning and question answering tasks, indicating that the goldfish loss didn’t hinder the accuracy on those tasks.

Why it matters: Businesses are worried about whether using LLMs poses risks to intellectual property rights and privacy. Techniques that address this concern without significantly impacting performance are welcome.

We’re thinking: Memorization also happens in models generating images. We look forward to research into using similar techniques in that domain.

Subscribe to The Batch