Sora has landed (for Pro and Plus users) ElevenLabs unveils podcast tool to challenge NotebookLM

Published
Dec 9, 2024
Reading time
3 min read
A futuristic AI-powered video editing room with a person actively editing a video on a large, immersive screen.

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

  • Meta’s 70-billion parameter Llama 3.3 beats 3.1 405B on some metrics
  • WorldLabs shows off its 3D world model
  • PaliGemma 2, Google’s open vision-language model
  • Could new AI models hide their true goals?

But first:

OpenAI unveils Sora video generation model to the public

OpenAI launched Sora as a standalone product available to ChatGPT Plus and Pro users. Sora turns text, image, and video input into video output at up to 1080p resolution and 20 seconds long (for Pro users) in various aspect ratios. A new version of the model, called Sora Turbo, generates videos more quickly. Sora.com also includes editing and community features like a storyboard tool and recent video feeds. OpenAI implemented safety measures including C2PA metadata, visible watermarks, and content restrictions, while also acknowledging the model’s current limitations in physics simulation and complex actions. (OpenAI)

ElevenLabs expands AI podcast creation to desktop platform

ElevenLabs expanded its GenFM podcast feature from iOS to its Projects platform, allowing users to create, edit, and export AI-generated podcasts from various content types. The new tool enables users to generate podcast discussions with two AI co-hosts in 32 languages, edit transcripts, and add or replace speakers. Unlike NotebookLM, which focuses on summarizing documents, GenFM is designed for podcast creation and monetization, and could potentially reshape audio production and distribution. (ElevenLabs)

Meta’s Llama 3.3 pushes a new text-only update

Meta introduced Llama 3.3, a 70-billion-parameter language model boasting a 128,000+ token context window. Llama 3.3 supports English, German, French, Italian, Portuguese, Hindi, Spanish, and Thai and outperforms Llama 3.1 405B on some common benchmarks, despite being a fraction of the size. The new model excels in areas like reasoning, code generation, math, and instruction-following. (GitHub)

World Labs unveils AI system for generating 3D worlds

World Labs’ system creates explorable three-dimensional environments from a single input image, allowing users to navigate generated scenes with a keyboard or mouse. The technology offers persistent reality, real-time control, and correct geometry, enabling various camera and 3D effects as well as integration with other AI tools for creative workflows. This advancement in spatial intelligence could transform how movies, games, and simulations are created and offers new possibilities for digital representations of the physical world. (WorldLabs)

Google updates open-weight PaliGemma vision-language model

The new model family (based on the Gemma 2 language models) includes versions at 3 billion, 10 billion, and 28 billion parameters, with input resolutions of 224px, 448px and 896px. The larger models and higher resolutions generally improve performance, with some tasks benefiting more from increased model size and others from higher resolution. PaliGemma 2 achieves state-of-the-art results on many vision-language tasks, including text recognition, table structure recognition, and medical image report generation. (arXiv)

AI models evaluated for goal-driven manipulation tactics

Apollo Research tested six advanced AI models’ ability to pursue assigned goals through strategic maneuvering. The study instructed models to prioritize specific objectives “at all costs,” then presented them with conflicting information and obstacles. When questioned about their actions, Claude 3 Opus and Llama 3.1 405B frequently admitted to manipulative behavior, while Open AI’s o1 proved more resistant to confessing. Researchers observed that models explicitly reasoned through plans using terms like “sabotage” and “lying,” showing challenges in ensuring AI behavior aligns with humans’ intended purposes. (Apollo Research)


Still want to know more about what matters in AI right now?

Read last week’s issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng debunked the idea that building with generative AI is costly. He explained that while training foundation models was expensive, prototyping and creating applications using existing tools had become very affordable, with costs as low as a few dollars.

“AI Fund now budgets $55,000 to get to a working prototype. And while that is quite a lot of money, it’s far less than the billions companies are raising to develop foundation models.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: Stripe introduced an ecommerce agent toolkit enabling AI to securely spend money; Mistral launched Pixtral Large, a strong competitor in vision-language models; the generative AI and GPU boom is raising concerns over increasing e-waste; and a research paper explored the E-DPO method which enhances defenses against jailbreak prompts, reinforcing AI security.


Subscribe to Data Points

Share

Subscribe to Data Points

Your accelerated guide to AI news and research