Google adds Thinking Mode to Flash 2.0 OpenAI’s o1 now available in the API

Published
Dec 20, 2024
Reading time
3 min read
A crowded street in a bustling big city with towering buildings, traffic, and people walking around.

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

  • Speech-to-speech models fall short on benchmarks
  • Backlash against misleading news summaries
  • Google updates its video and image models
  • Nvidia’s $250 palm-sized computer for AI developers

But first:

OpenAI’s top o1 model priced the same as o1-preview

OpenAI rolled out o1, a new reasoning model designed for complex multi-step tasks, to developers on usage tier 5. In the API, o1 costs $15/$60 per million input/output tokens, with a half-priced discount for cached input tokens. OpenAI reports that o1-2024-12-17, the latest version, sets new state-of-the-art results on several benchmarks, improving cost-efficiency and performance over its predecessor, o1-preview. (OpenAI)

Google’s AI gets introspective with new Thinking Mode

Google introduced an even more experimental version of Gemini 2.0 Flash called Thinking Mode, designed to generate the model’s “thinking process” as part of its response. The new feature is available through Google AI Studio and the Gemini API, with developers able to access the model’s thoughts via specific API calls or through a dedicated panel in the Studio interface. While Thinking Mode offers enhanced reasoning capabilities, it comes with limitations such as a 32k token input limit and text-only output. (Google)

New audio benchmark shows performance gap in speech reasoning

Artificial Analysis released Big Bench Audio, a dataset and benchmark test for evaluating audio language models’ reasoning capabilities. The dataset adapts questions from Big Bench Hard into the audio domain, covering topics like formal fallacies and object counting. Initial results show a significant “speech reasoning gap,” with GPT-4o’s accuracy dropping from 92 percent in text-only format to 66 percent in Speech to Speech mode. Traditional speech-to-text pipeline approaches currently outperform native audio models for reasoning tasks, suggesting that developers may need to consider trade-offs between audio capabilities and reasoning accuracy in speech-enabled applications. (Hugging Face)

Generated news summaries spark accuracy concerns

Reporters Without Borders called on Apple to remove its new AI-powered notification summary feature after it created a false headline about murder suspect Luigi Mangione. The feature, part of Apple Intelligence, misrepresented a BBC News article by claiming the suspect had shot himself, which was untrue. This incident, plus a similar misrepresentation of a New York Times article, show there’s still a delicate balance between time-saving innovations and the need for accuracy in news dissemination. (BBC)

Google’s updated models create more vibrant videos and images

Google introduced Veo 2 and an updated Imagen 3, two AI models for video and image generation that improve on their predecessors. Veo 2 creates high-quality videos with improved understanding of physics and human movement, while Imagen 3 generates images with better composition and in diverse art styles. These models are now available in Google’s VideoFX and ImageFX interfaces, with plans to expand to YouTube Shorts and other products next year. Google also introduced, Whisk, an image-to-image generator that uses Imagen 3 and Google 2.0 Flash to read and remix original or generated images. (Google)

Nvidia unveils tiny computer with AI accelerator chips

Nvidia updated and cut the price of the Jetson Orin Nano Super Developer Kit, a palm-sized generative AI computer priced at $249. The device provides up to 1.7x increase in generative AI inference performance and consists of a system-on-module with an Ampere architecture GPU and a 6-core Arm CPU. This compact computer delivers up to 157 TOPS (depending on the configuration) and runs Nvidia software including Isaac for robotics and Metropolis for vision AI. This update enables a wide range of users—from commercial AI developers to students—to more easily build applications such as LLM chatbots, visual AI agents, and AI-based robots. (Nvidia)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng celebrated the achievements of his former students and postdocs, who won both of this year’s NeurIPS Test of Time Paper Awards, and shared reflections on the importance of following one’s convictions and scaling innovations in AI, while looking ahead to explore new ideas for the future.

“But taking a brief look at the past can help us reflect on lessons for the future. One takeaway from looking at what worked 10 to 15 years ago is that many of the teams I led bet heavily on scaling to drive AI progress — a bet that laid a foundation to build larger and larger AI systems.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Microsoft’s Phi-4, blending synthetic and organic data, surpassed models five times its size in math and reasoning benchmarks; Tencent released HunyuanVideo, an open-source model rivaling commercial video generators; Google launched Gemini 2.0 Flash, a faster and more capable multimodal model; and a Stanford study revealed that AI matches human experts in writing research proposals, but struggles to evaluate proposals: a mixed result for hopes of AI-assisted innovation.


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research