Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:
- Researchers develop a highly capable language diffusion model
- Google’s hypothesis-making agent is your new research partner
- A new family of vision-language models for low-resource devices
- HP will brick Humane’s Ai Pin and repurpose its tech for new devices
But first:
Microsoft unveils Muse, a generative AI model for video games
Microsoft Research introduced Muse, a World and Human Action Model (WHAM) that can generate game visuals and controller actions for video games. Trained on over one billion images and controller actions from the Xbox game Bleeding Edge, Muse shows strong consistency, diversity, and persistence when generating gameplay sequences. Microsoft is making Muse’s weights, sample data, and a demonstrator tool open source to help researchers explore and build upon this technology for creative applications in game development. Microsoft claims Muse is the first generative joint world-action model that can generate complete game dynamics, including video and controller actions that can respond to one another. (Microsoft Research and Nature)
Perplexity remakes DeepSeek-R1 reasoning model
Perplexity released R1 1776, an open weight, MIT-licensed version of the DeepSeek-R1 model fine-tuned to provide accurate information on topics censored by the Chinese government. The company used a dataset of 40,000 prompts and detailed answers about sensitive topics to retrain the model, aiming to maintain its chain-of-thought reasoning capabilities while removing built-in censorship. This release (on Hugging Face and Perplexity’s Sonar API) allows developers to access a powerful open-source language model that can engage with a broader range of topics without political restrictions. (Perplexity)
LLaDA challenges autoregressive models as foundation for large language models
Researchers at Renmin University of China introduced LLaDA, a diffusion model trained to generate tokens in a nonlinear sequence that achieves performance similar to top autoregressive language models. LLaDA uses a masked diffusion approach and demonstrates strong scalability, in-context learning, and instruction-following (after supervised fine-tuning). The 8 billion parameter model outperformed GPT-4 on a reversal reasoning task and showed promise in areas like multi-turn dialogue generation. This work establishes diffusion models as a viable alternative to autoregressive ones, offering unique advantages like bidirectional modeling and consistent performance on both forward and reverse tasks without sacrificing general language understanding. (GitHub and arXiv)
Google’s co-scientist hopes to accelerate scientific discoveries
Google introduced an AI co-scientist system built with Gemini 2.0, designed to generate novel research hypotheses from natural language prompts across multiple scientific domains. The system uses specialized AI agents to iteratively refine ideas through processes modeled on the scientific method, including hypothesis generation, ranking, and evolution. Google’s co-scientist outperforms other models on complex research goals as rated by domain experts, and preliminary laboratory experiments validated some of the AI co-scientist’s novel predictions in areas like drug repurposing and antimicrobial resistance. Future versions may add improved literature reviews, factuality checking, cross-checks with external tools, and other tools. (Google Research)
SmolVLM2 updates small, efficient video understanding models
Researchers at Stanford and elsewhere released SmolVLM2, an updated family of compact but powerful video language models in 2.2 billion, 500 million, and 256 million parameter sizes. The models can run on devices from phones to servers and perform well on benchmarks like Video-MME while using less memory than larger models. The team demonstrated SmolVLM2’s capabilities through applications like an iPhone app for local video analysis, VLC media player integration for semantic video navigation, and a video highlight generator. These models could enable new vision applications for a wide range of low-resource devices, potentially transforming how we use local models to interact with and analyze video content. (Hugging Face)
HP buys Humane’s AI tech as ambitious wearable device flops
Humane, a start-up that created the Ai Pin wearable device, agreed to sell its AI capabilities, software platform, and intellectual property to HP for $116 million, a number substantially smaller than it raised. The Ai Pin, which aimed to replace smartphones with a clip-on device controlled by voice commands and laser projections, failed to meet sales expectations and faced criticism for performance issues. HP plans to integrate Humane’s technology into its products, focusing on building an “intelligent ecosystem” and enhancing its AI-powered capabilities across its lineup of computers and services. (Axios)
Still want to know more about what matters in AI right now?
Read this week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng shared a powerful story about how AI saved a police officer’s life, highlighting the impact of Skyfire AI’s drone technology in emergency response.
“Fortunately, because the drone had pinpointed the location of the officer and his assailant, dispatch was able to direct additional units to assist. The first arrived not in 5-7 minutes but in 45 seconds.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth: xAI unveiled Grok 3, a new model family trained at scales beyond its predecessors; Replit updated its mobile app to enable full app development using its AI agent; Elon Musk’s $97.4 billion bid for OpenAI was rejected, intensifying the power struggle between companies; and global leaders at the latest AI summit showed their deep divisions over regulation and governance.