Open Models for Math and Audio Alibaba advances open-weight LLMs with Qwen2 Math and Audio variants

Published
Reading time
3 min read
How Qwen2-Audio performs against the competitors.

Alibaba followed up its open-weights Qwen2 large language models with specialized variations.

What’s new: Qwen2-Math and Qwen2-Audio are model families devoted to, respectively, solving math problems and generating text directly from audio. Both set new states of the art in a variety of English and Chinese benchmarks, and some versions offer open weights. Notably Qwen2-Math-Instruct-72B, whose 72 billion parameters are fine-tuned according to human preferences, outperformed top models including Claude 3.5 Sonnet, Gemini 1.5-Pro, GPT-4o, and Llama-3.1-405B on some math benchmarks.

Math mavens: Qwen2-Math models include pretrained and instruction-tuned variations that comprise 1.5 billion, 7 billion, and 72 billion parameters. The license for the largest version is free for noncommerical development and commercial developers who have less than 100 million monthly active users. 

  • How it works: Qwen2-Math base models were initialized to Qwen2 weights and further pretrained on a corpus of math articles, books, exams, and data generated by Qwen2. The instruction-tuned versions were fine-tuned on more model-generated data using supervised learning followed by a reinforcement learning algorithm called group relative policy optimization. The team removed examples that significantly overlapped benchmark test sets and prominent math competitions.
  • Results: Using few-shot, chain-of-thought prompting, Qwen2-Math-Instruct-72B achieved state-of-the-art performance in English math benchmarks including MATH and Chinese math benchmarks including CMATHGaoKao Math Cloze, and GaoKao Math QA. (The 72 billion-parameter Qwen2-Math achieved state-of-the-art scores in GSM8k and MMLU STEM.) Qwen2-Math-Instruct-72B also outperformed Claude 3 Opus, GPT-4 Turbo, Gemini 1.5 Pro and Gemini Math-Specialized 1.5 Pro in the AIME 2024 math competition in some settings. The smaller, instruction-tuned versions outperformed other models of the same size by some measures.

Audio/text to text: A revision of the earlier Qwen-Audio, Qwen2-Audio takes text and audio inputs and generates text outputs. It’s designed to (i) provide text chat in response to voice input including voice transcription and translation between eight languages and (ii) discuss audio input including voice, music, and natural sounds. Weights (8.2 billion parameters) are available for base and instruction-tuned versions. You can try it here.

  • How it works: Given a text prompt and audio, a Whisperlarge-v3 audio encoder embeds the audio, and a pretrained Qwen-7B language model uses the text prompt and audio embedding to generate text. The team further pretrained the system to predict the next text token based on a text-audio dataset that included 370,000 hours of recorded speech, 140,000 hours of music, and 10,000 hours of other sounds. They fine-tuned the system for chat in a supervised fashion and for factuality and prompt adherence using DPO. You can read the technical report here.
  • Results: Qwen2-Audio outperformed previous state-of-the-art models in benchmarks that evaluate speech recognition (LibrispeechAISHELL-2FLEURS-ZH), speech-to-text translation (CoVoST2), and audio classification (Vocalsound) as well as AIR-Bench tests for evaluating interpretation of speech, music, sound, and mixed-audio soundscapes.

Why it matters: Qwen2 delivered extraordinary performance with open weights, putting Alibaba on the map of large language models (LLMs). These specialized additions to the family push forward math performance and audio integration in AI while delivering state-of-the-art models into the hands of more developers. 

We’re thinking: It’s thrilling to see models with open weights that outperform proprietary models. The white-hot competition between open and closed technology is good for everyone!

Share

Subscribe to The Batch

Stay updated with weekly AI News and Insights delivered to your inbox