Harvard University releases giant dataset of public-domain books Google’s Gemini 2.0 Flash beats top models on benchmarks

Published
Dec 13, 2024
Reading time
3 min read
Image of a large, modern library with a vibrant technological environment.

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

  • ChatGPT’s Advanced Voice Mode can see as well as hear
  • GM drops robotaxi division in favor of AI-assisted driving
  • Ruliad’s new model offers transparent reasoning in a small package
  • Large Concept Models move on from token-based AI

But first:

Harvard’s copyright-free archive aims to democratize AI training data

Harvard University released a dataset of nearly 1 million public-domain books for AI training, funded by Microsoft and OpenAI. The collection spans multiple genres, decades, and languages, including literary classics as well as obscure Czech math textbooks and Welsh pocket dictionaries. Created from Google Books scans of copyright-expired works, the dataset aims to provide high-quality, diverse training data to a wider range of AI developers. This initiative comes amid ongoing legal battles over the use of copyrighted material in AI training, potentially offering an alternative path for model development. (Wired and Harvard)

Google’s Gemini 2.0 affirms new era of advanced AI agents

Google launched Gemini 2.0, a new AI model with enhanced multimodal capabilities and native tool use. The model supports multimodal input and output, including text, images, video, and audio, and can natively call tools like Google Search and code execution. Gemini 2.0 Flash, an experimental version available now, outperforms the 1.5 Pro model on key benchmarks at twice the speed. Google is exploring various AI agents powered by Gemini 2.0, including Project Astra for video and audio real-world assistance, Project Mariner, an agent that can read and perform tasks in the browser, and Jules for automated developer and coding support. (Google)

ChatGPT adds real-time video analysis to Advanced Voice Mode

OpenAI released visual capabilities for ChatGPT’s Advanced Voice Mode, allowing Plus, Team, and Pro subscribers to interact with their surroundings using their phone’s camera or screen sharing. The feature can analyze objects, explain device settings, and offer suggestions on various tasks, though it’s not yet available for Enterprise, Edu, or European users. This upgrade significantly expands ChatGPT’s multimodal capabilities, potentially opening new use cases for AI in real-time visual analysis and interaction. (TechCrunch and YouTube)

GM abandons Cruise robotaxi venture, pivots to driver-assist tech

General Motors announced it will stop funding its Cruise autonomous vehicle unit and instead focus on developing partially automated driver-assist systems for personal vehicles. GM cited the considerable resources needed to scale the robotaxi business and increasing competition as reasons for the retreat. The move represents a significant shift for GM, which has invested billions in Cruise since acquiring a controlling stake in 2016, resulting in over $10 billion in operating losses. (Associated Press)

Ruliad unveils step-by-step AI reasoning model DeepThought-8B

AI startup Ruliad launched DeepThought-8B, an AI reasoning model built on LLaMA-3.1 8B, designed to make its inference process more transparent and controllable. The model breaks down its thinking into clear steps, documenting each one in JSON format, and can take as many reasoning steps as needed to solve complex problems. DeepThought-8B is available through Ruliad’s chat application, with plans to open a developer API and release open model weights in the coming weeks. (Ruliad)

Generating language in complete ideas, not word to word

Large Concept Models (LCMs), a new AI model architecture from Meta Research, represent a novel approach to generative AI that operates on sentence-level embeddings rather than individual tokens. This shift allows for modeling language at a more abstract, semantic level across multiple languages and modalities. The researchers developed several LCM architectures, including diffusion-based and quantized models, using the SONAR multilingual embedding space. Key advantages of LCMs include strong zero-shot cross-lingual performance, efficient handling of long contexts, and potential improvements in long-form text coherence. While current LCMs don’t yet match the performance of top token-based language models, they show promise as an alternative approach that could lead to more flexible, globally applicable generative AI technologies. (Meta)


Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng shared emerging best practices for AI Product Management, including beginning with concrete examples, assessing technical feasibility through prompting, and managers rapidly building prototypes without engineers.

“Just as a machine learning algorithm needs training examples to learn from, an AI product development team needs concrete examples of what we want an AI system to do. In other words, the data is your PRD (product requirements document)!”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Amazon unveiled Nova models for text, image, and video, offering competitive performance at competitive pricesOpenAI introduced o1 and o1 pro mode for advanced reasoning, available in a new plan called GPTPro and priced at $200/month; Google launched Genie 2, bringing interactive 3D worlds to life; and researchers at Lamini proposed a memory method designed to reduce hallucinations in large language models, enhancing factual accuracy.


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research