Qwen2 tops leaderboards for open LLMs Plus, Kling, a new Chinese rival to Sora

Published

Jun 14, 2024

Reading time

3 min read

This week’s top AI stories included:

• Nvidia’s AI toolkit for Windows developers
• AMD’s new competitor to Nvidia’s H100
• TowerLLM, a translation model that beats GPT-4o
• Google’s updated smart notebook app

But first:

Qwen2’s multilingual models show improved coding and math capabilities
Alibaba’s Qwen2, a series of AI models in five sizes ranging from half a billion to 72 billion parameters, have been trained on data in 29 languages. Qwen2 shows state-of-the-art performance across various advanced benchmarks, with significant improvements in coding and mathematics over both Qwen1.5 and comparable open models like Llama3. Qwen2 models also support extended context lengths up to 128K tokens and are open source, with all but the largest models released under an Apache 2.0 license. (GitHub)

Kuaishou’s new video generation model Kling draws comparisons to OpenAI’s Sora
Kling creates highly realistic and detailed videos up to 2 minutes long at 1080p resolution from text prompts, rivaling the quality of OpenAI’s invitation-only Sora model. Kling reportedly employs a unique 3D Variational Autoencoder (VAE) for detailed face and body reconstruction from a single image and utilizes a 3D spatiotemporal joint attention mechanism to handle complex scenes and movements while adhering to the laws of physics. While currently only accessible to users with Chinese phone numbers, Kling’s impressive capabilities are generating excitement among AI enthusiasts and filmmakers, and may pressure U.S.-based AI video model providers to make their offerings available sooner. (Kuaishou)

NVIDIA launches RTX AI Toolkit for Windows developers
Nvidia’s RTX AI Toolkit is a free set of tools and software developer kits that enables Windows developers to integrate customized AI models into their applications. Particularly noteworthy are Nvidia’s TensorRT tools: The TensorRT Model Optimizer can quantize models to be up to three times smaller without significantly reducing accuracy. The toolkit simplifies the process of fine-tuning pretrained models, optimizing them for performance on various Nvidia GPUs, and deploying them locally or in the cloud using the Nvidia AI Inference Manager (AIM) SDK. (Nvidia)

AMD unveils MI325X AI accelerator, outlines future MI350 and MI400 series
At Computex, AMD announced its new Instinct MI325X accelerators, set for release in late 2024. The MI325X features up to 288GB of memory and (AMD says) delivers 1.3x better inference performance compared to Nvidia’s H100. AMD also revealed plans for next year’s MI350 series, based on the CDNA4 architecture, promising a 35-fold increase in AI inference performance over the current MI300 series, and the MI400 series, set to launch in 2026. AMD is about a year behind Nvidia in its current generation of AI accelerators, but if it maintains this annual release schedule and can meet delivery demands, it could continue to be a competitive option. (AMD)

Unbabel claims its TowerLLM AI model outperforms GPT-4o in language translation
Unbabel tested its model against various AI systems, including those from OpenAI, Google, and DeepL, and found that TowerLLM performed better in most cases, especially in domain-specific translations. Unbabel attributes TowerLLM’s success to training on multilingual texts and fine-tuning using high-quality translations between language pairs curated by another AI model, CometKiwi. If these results hold up, it provides an example where a smaller language model specifically trained and fine-tuned for one task outperforms the largest and most powerful models. (Fortune)

Google’s NotebookLM expands capabilities with new data sources and Gemini 1.5 Pro
Google updated its note-taking app, NotebookLM, with new features that allow users to upload a wider variety of sources, including Google Slides and web URLs. The app now also includes a Notebook Guide that can create study guides, FAQs, or briefing documents based on the uploaded sources, and it can answer questions about charts, images, and diagrams using Google’s latest large language model, Gemini 1.5 Pro. NotebookLM is designed to help researchers, students, and writers organize and analyze information, but it has found additional use cases in grant writing, software development, and even in preparing Dungeons & Dragons campaigns. (The Verge)

Still want to know more about what matters in AI right now?

If you missed it, read last week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng discussed agentic design and inclusive work in the AI community:

“More and more people are building systems that prompt a large language model multiple times using agent-like design patterns. But there’s a gray zone between what clearly is not an agent (prompting a model once) and what clearly is (say, an autonomous agent that, given high-level instructions, plans, uses tools, and carries out multiple, iterative steps of processing). Rather than arguing over which work to include or exclude as being a true agent, we can acknowledge that there are different degrees to which systems can be agentic. Then we can more easily include everyone who wants to work on agentic systems.”

Read Andrew's full letter here.

Other top AI news and research stories we covered in depth included everything about Apple’s Gen AI strategy, Stability AI's enhanced text-to-audio generator, the results from the AI Seoul Summit and the AI Global Forum, and Google's AMIE, a chatbot that outperformed doctors in diagnostic conversations.

Subscribe to Data Points