Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:
- Nvidia’s new weather prediction model
- The best models for performing function calls
- Another copyright lawsuit against Anthropic
- FineTuning OpenAI’s GPT-4o
But first:
AI21 releases new open hybrid-architecture language models with long context windows
AI21 Labs released the Jamba 1.5 family of language models, including Mini (12 billion parameter) and Large (94 billion parameter) versions, both under the same open model license. The two models use a hybrid solid state model-transformer architecture, feature an effective 256,000-token context window, and outperform competitors in their size on the Arena Hard benchmark and speed and throughput tests. According to the RULER benchmark, Jamba 1.5’s performance on long context tasks surpasses models claiming a much longer context window, including Claude 3.5, Gemini 1.5, and more. (AI21 Labs)
Ideogram releases new AI image generation model with search and developer API
Ideogram launched its 2.0 model, offering improved capabilities for generating realistic images, graphic design, and typography, claiming better performance than DALL-E 3 and Flux Pro at lower cost. The company released an iOS app, a beta API for developers, and a search feature for its library of over 1 billion user-generated images. Ideogram 2.0 introduces new features like style controls, color palette selection, and advanced prompting tools, aiming to enhance creative workflows for designers and businesses. (Ideogram)
NVIDIA’s StormCast model advances kilometer-scale weather prediction
NVIDIA Research announced StormCast, a generative AI model that can emulate high-fidelity atmospheric dynamics at smaller scales than previously possible, enabling reliable weather prediction critical for disaster planning. The model can predict over 100 variables and offers forecasts with lead times of up to six hours that are up to 10% more accurate than NOAA’s state-of-the-art operational model. This model’s development represents a significant advancement in using AI for climate research and extreme weather prediction, potentially saving lives and reducing damage from natural disasters. (NVIDIA)
An updated leaderboard measures models’ ability to handle function calls
Researchers updating the Berkeley Function-Calling Leaderboard (BFCL) released BFCL V2 • Live, a new dataset featuring 2,251 user-contributed function-calling scenarios. This dataset aims to evaluate large language models’ ability to interface with external tools and APIs in real-world applications. BFCL V2 • Live addresses issues of data contamination and bias by using live, user-contributed function documentation and queries, providing a more accurate measure of LLMs’ function-calling performance in diverse environments. Currently, OpenAI models hold the top spots on the leaderboard, followed by a Llama 3.1-based model, and various versions of Anthropic’s Claude. (UC Berkeley/Gorilla)
Authors sue Anthropic over alleged copyright infringement in training
Three authors filed a class-action lawsuit against Anthropic, alleging the company used pirated versions of their books to train its chatbot Claude. The complaint accuses Anthropic of “stealing hundreds of thousands of copyrighted books” to build its business. This lawsuit adds to a growing number of legal challenges against AI companies over the use of copyrighted material in training large language models, particularly related to the once-popular Books3 AI dataset. (The Guardian)
OpenAI brings fine-tuning to GPT-4o
OpenAI launched fine-tuning for GPT-4o, allowing developers to customize the model by training it on their own datasets. The company offers 1 million free training tokens daily per organization until September 23, with fine-tuning available to all developers on paid usage tiers. This development significantly expands the capabilities of AI developers, enabling them to create more specialized and efficient models tailored to their unique use cases, potentially accelerating innovation across industries and applications. (OpenAI)
Still want to know more about what matters in AI right now?
Read this week’s issue of The Batch for in-depth analysis of news and research.
This week, Andrew Ng discussed why the DEFIANCE Act and FTC ban on fake product reviews take the right approach to regulating AI:
“The DEFIANCE Act, which passed unanimously in the Senate (and still requires passage in the House of Representatives before the President can sign it into law) imposes civil penalties for the creating and distributing non-consensual, deepfake porn. This disgusting application is harming many people including underage girls. While many image generation models do have guardrails against generating porn, these guardrails often can be circumvented via jailbreak prompts or fine-tuning (for models with open weights).”
Read Andrew’s full letter here.
Other top AI news and research stories we covered in depth: An agentic workflow that generates novel scientific research papers, all about Google’s Imagen 3 and Alibaba’s Qwen2-Math and Qwen2-Audio, and scaling laws for data quality.