Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:
- Google’s two new Gemini vision-language-action robotics models
- Cohere’s Command A, another lightweight LMM
- New China regulations require mandatory labels for AI content
- Monitoring reasoning models for reward hacking or unwanted behavior
But first:
Baidu releases ERNIE 4.5 and ERNIE X1 models
Baidu launched its latest foundation models, ERNIE 4.5 and ERNIE X1, with free access for individual users through ERNIE Bot’s website. ERNIE 4.5 is a multimodal model integrating text, images, audio, and video, while ERNIE X1 is a deep-thinking reasoning model with enhanced planning and problem-solving capabilities. Enterprise users and developers can try both models free on Baidu’s ERNIE bot website, or access ERNIE 4.5 on Baidu AI Cloud’s Qianfan with pricing starting at RMB 0.004 per thousand tokens. ERNIE 4.5 reportedly outperforms GPT-4.5 on multiple benchmarks at 1 percent of the price, while ERNIE X1 (available via API soon) offers performance comparable to DeepSeek-R1, with input prices of RMB 0.002 per thousand tokens and output prices of RMB 0.008 per thousand tokens, about half the price. (PR Newswire)
OLMo 2 32B launches as a high-performing, fully open model
AI2 released OLMo 2 32B, the largest model in their OLMo 2 lineup. This model, trained on trillions of tokens and post-trained with Tulu 3.1, competes with leading open weight models (Qwen 2.5 72B, Llama 3.1 and 3.3 70B) and outperforms GPT-3.5 Turbo and GPT4o-mini on various academic benchmarks. Developers and researchers could gain from OLMo 2 32B’s open code and open data availability, allowing them to study and customize advanced model pipelines and experiment with multimodal input. (Allen AI)
Google DeepMind introduces Gemini models for robotics
Google DeepMind unveiled Gemini Robotics and Gemini Robotics-ER, two AI models based on Gemini 2.0, designed to enhance robotic capabilities in the physical world. Gemini Robotics incorporates vision-language-action abilities for direct robot control, and it adapts easily to various robot types, while Gemini Robotics-ER focuses on spatial reasoning and can connect with existing low-level robot controllers. These new AI models aim to improve generalization, interactivity, and dexterity in robots, making them better suited for real-world tasks and applications. (Google)
Cohere’s new model offers long contexts and language support for enterprise
Cohere released Command A, a 111 billion parameter generative model that aims to deliver strong performance for enterprise applications using minimal hardware. Command Al matches or exceeds the performance of GPT-4o and DeepSeek-V3 on tasks such as instruction following and multilingual capabilities while requiring only two A100 or H100 GPUs for deployment. Command A supports the 23 languages spoken by most of the world’s population, and can be deployed locally or via API for $2.50/million input tokens and $10/million output tokens. (Cohere)
China introduces extensive AI content labeling regulations
China will require explicit labeling of AI-generated content under new regulations taking effect on September 1. The new regulations were announced by the Cyberspace Administration of China and other agencies. The rules mandate visible and metadata-based labels for AI-generated text, images, audio, video, and virtual content, and require service providers to ensure compliance with cybersecurity regulations. These measures aim to curb misinformation and enhance online transparency, similar to regulatory efforts in the EU and U.S. (South China Morning Post)
How to use LLMs to detect misbehavior in reasoning models
Open AI researchers showed that they can use large language models to monitor the chain-of-thought (CoT) reasoning of frontier models to detect misbehavior. They found that when these reasoning models exploit or circumvent tasks using “bad thoughts” (unwanted steps in the reasoning chain or attempts at reward hacking), penalizing these thoughts led models to hide their intentions, rather than ceasing misbehavior. Findings suggest that CoT monitoring can provide valuable insights for overseeing advanced AI models in the future, but the researchers recommend against strong optimization pressure on CoTs as it risks making models’ intentions less transparent. (OpenAI)
Still want to know more about what matters in AI right now?
Read last week’s issue of The Batch for in-depth analysis of news and research.
Last week, Andrew Ng defended the importance of learning to code, arguing that as AI-assisted coding makes programming easier, more people should code—not fewer. He pushed back against claims that programming will become obsolete, arguing that understanding the “language of software” empowers individuals to work effectively with AI tools and maximize their impact.
“One question I’m asked most often is what someone should do who is worried about job displacement by AI. My answer is: Learn about AI and take control of it, because one of the most important skills in the future will be the ability to tell a computer exactly what you want, so it can do that for you. Coding (or getting AI to code for you) is the best way to do that.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth: QwQ-32B emerged as a strong contender against DeepSeek-R1 and other larger reasoning models, challenging the dominance of high-parameter architectures with compact reasoning; Microsoft’s Phi-4 Multimodal model offered simultaneous processing of text, images, and speech; a U.S. court ruling rejected the fair use defense in the Thomson Reuters AI lawsuit, citing Ross's attempt to use copyrighted material to build a competing product; and Perplexity launched an uncensored version of DeepSeek-R1, raising discussions about AI safety and adapting open language models.