Fine-Tuning Fine Points Active inheritance, a smarter way to fine-tune models on synthetic data

Published

Jan 29, 2025

Reading time

3 min read

The practice of fine-tuning models on synthetic data is becoming well established. But synthetic training data, even if it represents the training task well, may include characteristics like toxicity that impart unwelcome properties in the trained model’s output, and it may inconsistently represent desired traits such as the target output length. Researchers developed a method that reduces aspects of generated data and retains desired ones.

What’s new: Luísa Shimabucoro and colleagues at Cohere introduced active inheritance, a fine-tuning method that automatically selects synthetic training examples that have desirable characteristics.

Key insight: A naive way to generate synthetic fine-tuning data is to feed prompts to a model, collect its output, and use that as the fine-tuning set. But synthetic data is cheap, so we can afford to be more choosy. By generating several responses to each prompt, we can select the one that best suits our purposes.

How it works: The authors used Llama 2 7B and Mixtral 8x7B as both teachers and students in all combinations. They prompted the models with 52,000 prompts from the Alpaca dataset and used automated methods to evaluate their outputs in terms of characteristics including social bias, toxicity, word count, lexical diversity, and calibration (how well a model’s estimated probabilities match its accuracy).

The authors generated 10 responses to each prompt.
For each response, they measured social bias according to StereoSet, CrowS-Pairs, and Bias Benchmark for Question-Answering. They measured toxicity according to Perspective API and their own code. They measured calibration according to HELM. They used TextDescriptives to calculate metrics related to text.
They fine-tuned separate models on (i) the initial responses, (ii) one response to each prompt selected at random, and (iii) the response to each prompt that best maximized each desired characteristic.

Results: Fine-tuning on the best response for each characteristic improved performance with respect to that characteristic beyond using the initial outputs or selecting outputs randomly.

The authors’ method helped Mixtral 8x7B to generate less-toxic responses. For example, before fine-tuning, the model’s expected maximum toxicity measured 65.2 (lower is better). After fine-tuning on the lowest-toxicity responses generated by Llama 2 7B, Mixtral 8x7B’s expected maximum toxicity fell to 43.2. Conversely, after fine-tuning on random responses generated by Llama 2 7B, its expected maximum toxicity rose to 70.3.
It also helped Llama 2 7B to cut its toxicity. Before fine-tuning, the model’s expected maximum toxicity was 71.7. After fine-tuning on its own least-toxic responses, expected maximum toxicity dropped to 50.7. Fine-tuning on random responses made its expected maximum toxicity fall less sharply to 68.1.
Examining the impact of the authors’ method on more typical measures of performance, fine-tuning on the least-toxic responses and fine-tuning on random responses had about the same effect across seven benchmarks. Fine-tuning Llama 2 7B on its own least-toxic responses increased performance on average from 59.97 percent accuracy to 60.22 percent accuracy, while fine-tuning on random responses increased performance on average from 59.97 percent accuracy to 61.05 percent accuracy.
However, the process degraded performance in some cases. Fine-tuning Mixtral-8x7B on the least-toxic Llama 2 7B responses decreased its average performance across seven benchmarks for question answering and common-sense reasoning from 70.24 percent accuracy to 67.48 percent accuracy. Fine-tuning it on random Llama 2 7B responses cut its average performance from 70.24 percent accuracy to 65.64 percent accuracy.

Why it matters: Training on synthetic data is becoming increasingly common. While it shows great promise, best practices for data generation are still being formulated. The authors’ method helps by automatically steering models toward generating more desirable responses, reducing negative traits and reinforcing positive traits.

We’re thinking: Knowledge distillation lately has led to more capable and compact models. This approach adds levers of fine control to that technique.

Subscribe to The Batch