Redefining what counts as open source AI A small hybrid model for fast on-device inference

Published

Sep 2, 2024

Reading time

3 min read

Twice a week, Data Points brings you the latest AI news, tools, models, and research in brief. In today’s edition, you’ll find:

NVIDIA’s new CUDA libraries
Simulating DOOM using a system of AI models
Removing the human-in-the-loop from AI planning algorithms
An open source text-to-video family from QingYing

But first:

Open source AI definition updated with community input

The Open Source Initiative updated its Open Source AI Definition draft, clarifying that both AI models and weights must meet open source standards. The revision addresses the complex issue of training data, recognizing that while it’s valuable for studying AI biases, it’s often difficult to share due to copyright laws, privacy concerns, and protection of Indigenous knowledge. This update attempts to balance the need for openness with practical and ethical constraints in AI development. (Open Source Initiative)

Zyphra releases small but powerful AI model for on-device use

Zyphra announced Zamba2-mini, a 1.2 billion parameter language model that uses a hybrid architecture of Mamba (SSM) layers and shared attention layers. The model achieves strong performance on benchmark evaluations in its class, outperforming similar-sized models from Google, Hugging Face, Apple, StabilityAI, and Microsoft. Zamba2-mini requires less than 700MB of memory at 4-bit quantization and boasts 2x faster time-to-first-token, 27% lower memory overhead, and 1.29x lower generation latency compared to Microsoft’s Phi3-3.8B model, making it particularly well-suited for resource-constrained environments. (Zyphra)

NVIDIA releases new CUDA libraries for AI and data tasks

NVIDIA unveiled new libraries for accelerated computing, including NeMo Curator for dataset creation, cuVS for vector search, and updates to Warp for physics simulations. The company claims these tools can provide substantial performance improvements over CPU-only solutions in tasks like data processing and AI model training. NVIDIA reports that some customers have achieved speedups ranging from 10x to 180x across various workloads when using its GPU-accelerated platform compared to CPU-only setups. (NVIDIA)

AI-powered game engine simulates DOOM in real time

Google researchers developed GameNGen, an AI-powered game engine that can simulate the classic game DOOM interactively at over 20 frames per second using a single TPU. The system uses a two-phase training approach: a reinforcement learning agent learns to play the game, and a diffusion model generates the next frame based on past frames and actions. GameNGen’s ability to generate high-quality, interactive game environments in real time marks a notable step forward in AI-driven game simulation and could influence future game development and testing methods. (GitHub)

Automated feedback boosts accuracy of AI-generated planning components

Researchers at Cornell and IBM developed AutoToS, a thought-of-search process that generates accurate successor and goal test functions for AI planning problems using automated feedback to language models. The system achieves perfect accuracy on domains like BlocksWorld and Sokoban with minimal iterations, eliminating the need for human refinement. Experiments show that soundness and completeness tests significantly improve the quality of planning components across various large language models. (arXiv)

Open source video generation model released in two versions

QingYing released CogVideoX, an open source (Apache 2.0) video generation model, in two versions: a 2B-parameter entry-level model and a 5B-parameter larger model with higher quality output. Both models support various inference precisions and offer different VRAM consumption levels and inference speeds on A100 and H100 GPUs, generating 6-second, 720x480 resolution videos at 8 frames per second. This open release gives AI developers more access to video generation capabilities, which could lead to new applications and improvements in AI-generated video technology. (GitHub)

Still want to know more about what matters in AI right now?

Read last week’s issue of The Batch for in-depth analysis of news and research.

Last week, Andrew Ng discussed how token prices for top language models have been falling rapidly, leading to new opportunities for developers

“I continue to hear from teams that are surprised to find out how cheap LLM usage is when they actually work through cost calculations. For many applications, it isn’t worth too much effort to optimize the cost. So first and foremost, I advise teams to focus on building a useful application rather than on optimizing LLM costs.”

Read Andrew’s full letter here.

Other top AI news and research stories we covered in depth: expansion of the AI lobby, Genie’s new coding agent, how a language model and brain implants helped an ALS patient regain his speech, and a new paper on 4M-21, a multimodal model developed by researchers at Apple and EPFL.

Subscribe to Data Points