Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:
- DeepSeek updates its V3 model with new skills and MIT license
- Reve Image 1.0 excels at text and typography design
- Qwen2.5-Omni tackles text. Images, audio, and video
- Software developers have tried AI, but some like it better
But first:
Google’s latest AI model emphasizes reasoning over raw computation
Google launched Gemini 2.5 Pro, a new AI model that claims the top spot on the LMArena leaderboard. The model achieved state-of-the-art scores in MMLU-Pro and GPQA Diamond of 86 percent and 83 percent, plus 17.7 percent on Humanity’s Last Exam and 88 percent on AIME 2024. The model features a 1 million token context window, native multimodality, and what Google calls “thinking capabilities” that help it analyze information and draw logical conclusions before responding. Gemini 2.5 Pro is Google’s largest and most capable reasoning model, now available as a free experimental model in Google AI Studio and in the Gemini App for Gemini Advanced users. (Google)
OpenAI unveils new image generation capabilities in GPT-4o
OpenAI integrated advanced image generation directly into GPT-4o, enabling precise text rendering, detailed multi-object scenes, and the ability to learn from and refine images using the chatbot. The model can create practical visuals like diagrams, logos, and infographics while maintaining photorealistic quality and following complex instructions for images with up to 20 distinct objects. The new capabilities have proven so popular that within a few days of launch, OpenAI had to withdraw image generation from the free tier of ChatGPT. This native integration of image and language capabilities makes AI image generation more useful for real-world applications, though the system still has limitations with tasks like dense text rendering and precise image editing. (OpenAI)
DeepSeek’s latest model achieves double-digit gains across key benchmarks
DeepSeek released DeepSeek-V3-0324, a new version of its large language model that achieved significant improvements across multiple benchmarks, including a 19.8-point gain on the AIME mathematics test and better scores in reasoning and coding tasks. The model shows enhanced capabilities in Chinese language processing, web development, and function calling accuracy, making it more competitive, especially among models with open weights. (DeepSeek V3 now has an MIT license rather than a custom one.) These improvements demonstrate how rapidly AI models continue to advance in both specialized technical tasks and general language abilities. (Hugging Face)
AI startup Reve launches new image generation model
Reve unveiled Reve Image 1.0, a new image model designed to improve prompt understanding and visual output quality. The company claims its approach moves beyond simple pattern matching to create a “semantic intermediate representation” that both humans and machines can understand and manipulate. This launch signals growing competition in the AI image generation space, where companies increasingly focus on precise creative control and natural interaction rather than just technical capabilities. (Reve)
Qwen releases versatile multimodal AI model with streaming capabilities
Qwen launched Qwen2.5-Omni, a new AI model that processes text, images, audio, and video while generating real-time text and speech responses through its novel Thinker-Talker architecture. The 7 billion parameter model outperforms similarly sized competitors across multiple benchmarks, including speech recognition, translation, and video understanding tasks. This release is a step toward comprehensive AI systems that can seamlessly handle virtually any type of input and output, enabling more natural human-AI interactions. (GitHub)
Software developers split on AI’s impact in industry survey
A Wired survey of 730 software developers found that while most use AI coding tools, they disagree sharply about AI’s long-term impact on programming jobs. The majority of respondents use AI at least once a week and largely view AI as a productivity tool for automating repetitive tasks; only a small group predict that AI will fully replace human programmers. Mid-level engineers were more likely to be pessimistic about AI, while junior engineers were more likely to be optimistic. The survey suggests AI tools serve most professional developers as assistants for basic coding and analysis while leaving complex architecture and debugging decisions to humans. (Wired)
Still want to know more about what matters in AI right now?
Read this week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng shared his thoughts on when fine-tuning small language models is truly necessary — and when simpler approaches like prompting or agentic workflows may be more effective and easier to maintain.
“While fine-tuning is an important and valuable technique, many teams that are currently using it probably could get good results with simpler approaches, such as prompting, few-shot prompting, or simple agentic workflows.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth: Google released Gemma 3, a family of compact vision-language models with open weights, enabling multimodal capabilities on a single GPU; researchers introduced shortcut models that generate high-quality diffusion images in fewer steps, improving speed without sacrificing performance; a study showed that GPT-4 can significantly enhance remote tutors’ effectiveness by providing real-time pedagogical support; and a new technique using pretrained embeddings like DINOv2 helped diffusion transformers learn faster, reducing training time while improving image quality.