Transformers
Recipe for Smaller, Capable Models: Mistral uses cascade distillation on Mistral 3 to build Ministral family
Mistral compressed Mistral Small 3.1 into much smaller versions, yielding a family of relatively small, open-weights, vision-language models that perform better by some measures than competing models of similar size. The method combines pruning and distillation.