Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:
- Spirit LM, an open-weights speech model from Meta
- IBM’s open-source, 8–billion parameter Granite 3.0 8B Instruct
- Google expands its music sandbox for pros and amateurs
- A new leaderboard for smaller, quantized language models
But first:
Claude models gain improved coding skills and computer interaction abilities
Anthropic released an upgraded Claude 3.5 Sonnet and a new Claude 3.5 Haiku model, both offering significant performance improvements, especially in coding. The company also introduced a new “computer use” capability in public beta, allowing Claude to interact with computer interfaces like a human user. This API enables developers to create AI applications to automate repetitive processes, build and test software, conduct open-ended research tasks, and navigate complex user interfaces across multiple programs. (Anthropic)
Stability AI unveils new family of image creation models
Stable Diffusion’s new 3.5 versions, including Large and Large Turbo, run on regular computers and are free for most users under Stability AI’s license. These models excel at creating diverse outputs, adapting to various visual styles, and adhering closely to text prompts without extensive user input. A Medium version, designed to balance quality and ease of use on consumer hardware, will launch on October 29th. (Stability AI)
Spirit LM offers speech-to-speech and text-to-speech processing
Meta’s FAIR lab introduced Spirit LM, an open weights language model that integrates text and speech processing using a word-level interleaving method. The model comes in two versions: Spirit LM Base, which uses phonetic tokens, and Spirit LM Expressive, which incorporates pitch and style tokens to capture and generate expressive speech. Spirit LM aims to improve natural-sounding speech generation and cross-modal learning, potentially advancing research in speech recognition, text-to-speech, and speech classification. (Meta and arXiv)
IBM open-sources Granite 3.0 language models for enterprise use
IBM’s release includes the Granite 3.0 8B Instruct model, as well as base models, guardrail models, mixture-of-experts models for low latency, and a speculative decoder for faster inference. Granite 3.0 8B Instruct performs well relative to other models its size. The company released all Granite models under the Apache 2.0 license and provided detailed disclosures of training data and methods, emphasizing Granite’s transparency relative to less permissive models. Planned updates include expansion of all context windows to 128,000+ tokens, improvements in multilingual support and new image-input text-output capabilities. (IBM)
Google enhances AI music software with fast generation and pro audio tools
Google released updates to its AI-powered music creation tools, including a reimagined MusicFX DJ and an expanded Music AI Sandbox. MusicFX DJ now offers improved controls, real-time streaming, and what Google calls production-quality audio output, allowing users to generate and manipulate music live. Google collaborated with industry professionals to develop these tools, aiming to balance the needs of music professionals with accessibility for novice creators. (Google DeepMind)
Tiny titans clash in AI arena for budget-conscious developers
A new project pits smaller language models against each other in a battle of wits, with a maximum size of 9 billion parameters. The arena, built on Ollama and hosted on Hugging Face, allows users to compare model outputs, vote on performances, and track results on a leaderboard. This platform enables AI enthusiasts to experiment with compact models without requiring expensive hardware. As of this writing, Rombos Qwen (7B, 4-bit) tops the leaderboard with a score of 0.7941 out of 1. (Hugging Face)
Still want to know more about what matters in AI right now?
Read this week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng emphasized the importance of speedy execution with Generative AI and the need to quickly gather user feedback to iterate on products responsibly.
“Generative AI makes it possible to quickly prototype AI capabilities. AI capabilities that used to take months can sometimes be built in days or hours by simply prompting a large language model.”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth: Major AI companies plan to meet growing demand with nuclear energy; the once-strong partnership between Microsoft and OpenAI faces challenges as both companies seek greater independence; Mistral AI launches two models that set new standards for small language models, making them suitable for edge devices; and researchers cut training costs for video generators, resulting in a competitive open-source text-to-video model with training code to be released.