DeepSeek-V3 is the new best open model All about OpenAI’s upcoming o-series models

Published

Dec 27, 2024

Reading time

3 min read

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

Genesis uses generative AI in a physics-based robotics/world platform
Qwen presents QVQ, an open language/vision model
ModernBERT updates BERT as a classic classifier/retrieval workhorse
CodeLLM switches between models depending on your language and query

But first:

DeepSeek matches Sonnet 3.5/GPT-4o performance at lower costs

DeepSeek released DeepSeek-V3, a large language model with 671 billion total parameters and 37 billion activated for each token. The model uses a low-cost Mixture-of-Experts architecture and novel techniques like multi-token prediction. DeepSeek-V3 outperforms other open source models and rivals leading closed models on various benchmarks, while requiring only 2.8 million GPU hours for training. The model is available for commercial use with an MIT license and can be run locally using several open-source frameworks. (HuggingFace)

OpenAI’s new reasoning models shatter benchmarks

OpenAI announced its latest AI reasoning models, o3 and o3-mini, which use a “private chain of thought” approach to simulate reasoning beyond basic large language models. The o3 model achieved record-breaking scores on several benchmarks, including the ARC-AGI visual reasoning test and graduate-level academic exams. OpenAI plans to make these models available for public safety testing and research access, with o3-mini expected to launch in late January followed by o3 shortly after. (Ars Technica and Arc Prize)

Genesis combines physics simulation with generative AI for robotics

Genesis is a new physics simulation platform designed for robotics and embodied AI applications. The platform integrates a universal physics engine with generative AI capabilities to create realistic simulations across multiple modalities, including video, 3D scenes, and robotic motions. Genesis claims to deliver extremely fast simulation speeds, running up to 430,000 times faster than real-time in certain scenarios. While the physics engine is now open source, the full generative framework will be released gradually in the future. (GitHub)

Qwen team introduces multimodal/visual reasoning QVQ model

Qwen researchers developed QVQ, an open-weight model built on Qwen2-VL-72B that aims to enhance the model’s visual understanding and problem-solving abilities. QVQ achieves a score of 70.3 on the MMMU benchmark and shows improvements on math-related tasks compared to its predecessor. The model excels at visual reasoning through step-by-step analysis, though it has limitations like mixing up languages and potential hallucinations during multi-step reasoning. Qwen hopes its model could lead to more sophisticated problem-solving in fields requiring complex visual and analytical thinking. (GitHub)

ModernBERT updates legendary BERT encoder models

Answer.AI and LightOn released ModernBERT, a new family of encoder-only models that outperform older BERT-style models across speed and accuracy benchmarks. ModernBERT incorporates recent advances from large language models, including an 8,192 token context length, improved architecture, and training on diverse data including code. The models aim to be drop-in replacements for BERT in applications like retrieval, classification, and entity extraction, offering better performance while maintaining the efficiency advantages of encoder-only models over larger generative models. (Hugging Face)

CodeLLM editor integrates multiple language models for coding

Abacus.AI released CodeLLM, an AI-powered code editor that helps developers write, review, and refactor code. CodeLLM provides access to multiple language models optimized for different coding tasks and automatically switches between them based on the language and query. Integrated models include Claude Sonnet 3.5, OpenAI’s o1, Qwen 72B, and others. The Visual Studio Code-based editor offers features like code completion, code chat, and integration with ChatLLM Teams for Git functionality and pull requests. CodeLLM is available as part of a $10 monthly subscription that includes access to ChatLLM’s broader AI capabilities. (Abacus.AI)

Still want to know more about what matters in AI right now?

Read this week’s special issue of The Batch for in-depth analysis of news and research looking back at 2024.

In this week’s letter to readers and learners, Andrew Ng highlighted the year’s rapid progress in AI technology and applications, emphasized the importance of staying at the cutting edge, and encouraged learning with DeepLearning.AI courses to remain relevant in the field.

“Consider this: GPT-4 was released March 2023. Since then, models have become much faster, cheaper, sometimes smaller, more multimodal, and better at reasoning, and many more open weight versions are available — so progress has been fantastic! (Claims that AI is ‘hitting a wall’ seem extremely ill-informed.) But more significantly, many applications that already were theoretically possible using the March 2023 version of GPT-4 — in areas such as customer service, question answering, and process automation — now have significant early momentum.”

Read Andrew’s full letter here.

Our special end-of-the-year review issue features five stories we covered in depth: LLMs’ evolution with agentic workflows, enabling autonomous reasoning and collaboration; AI price wars drove costs down as competition intensifies; generative video models revolutionized content creation with stunning realism; compact AI models redefined efficiency, bringing advanced capabilities to everyday devices; and tech giants forged strategic partnerships as an alternative to acquisitions, securing essential talent and technology.

Subscribe to Data Points