Google Translate uses an AI assist to add over 100 new languages Plus, Meta’s LLM Compiler brings language models to assembly code

Data Points

Published

Jun 28, 2024

Reading time

3 min read

Twice a week, Data Points brings you the latest AI news in brief. Today's edition includes:

Florence-2, a small but capable family of vision models
Qualcomm’s AI Hub gives developers access to on-device tools and models
How Google’s new system uses video to generate synchronized sound
ElevenLabs’ new text-to-sound effects API

But first:

Google Translate adds 110 new languages using PaLM 2
Google Translate’s largest expansion to date represents more than 614 million speakers, including major world languages, indigenous languages, and languages with active revitalization efforts. The PaLM 2 model helped Google Translate more efficiently learn languages that are closely related to each other, including languages similar to Hindi, like Awadhi and Marwadi, as well as French creoles such as Seychellois Creole and Mauritian Creole. The expansion covers languages from various regions, with a quarter of the new languages coming from Africa, and includes considerations for language varieties and distinct spelling conventions. (Google)

Meta releases LLM Compiler models for code optimization and compiler tasks
The models, built on Code Llama and available in 7B and 13B parameter versions, can emulate compiler behavior, predict optimal passes for code size reduction, and disassemble code. Fine-tuned versions of LLM Compiler achieve 77% of the optimizing potential of an autotuning search, and 45% disassembly round trip (14% exact match). LLM Compiler fills a gap in code completion and optimization models, as few are trained on assembly code or compiler intermediate representations. Released under a permissive license for research and commercial use, LLM Compiler aims to provide a foundation for further development in AI-aided compiler optimization. (Meta)

Microsoft releases Florence-2 family of vision models
Florence-2 is available in 770 million and 230 million parameter sizes, including fine-tuned versions of each model, all under an MIT license. The base model was trained on FLD-5B, a dataset of 5.4 billion annotations of 126 million images, created through an iterative process of automated annotation and model refinement. Florence-2’s sequence-to-sequence architecture demonstrates strong zero-shot and fine-tuned capabilities across various tasks, including captioning, object detection, and visual grounding. (Microsoft and Hugging Face)

Qualcomm launches AI Hub for Snapdragon X Elite developers
Qualcomm’s hub offers pre-trained models for tasks like image classification and generative AI, along with tools and documentation to simplify application development for Snapdragon X Elite devices. Developers can filter searches using tags like ‘backbone,’ ‘foundation,’ ‘quantized,’ and ‘real-time,’ making it easier to find models for specific applications. These resources make it easier to create AI-enabled applications that leverage Qualcomm’s 45 TOPS Hexagon NPU for Windows PCs. (Qualcomm)

Google develops AI system to generate audio for silent videos
Google researchers created a video-to-audio (V2A) technology that produces soundtracks for silent videos using video pixels and text prompts, allowing (among other uses) video generators to add synchronized sound. The system can generate multiple audio options for any video input, allowing users to experiment with different soundtracks. Google’s V2A uses a diffusion model to iteratively refine audio from random noise, guided by visual input and natural language prompts. While the technology shows promise, Google’s team is still working to address limitations such as improving lip synchronization and audio quality for videos with artifacts. (Google)

ElevenLabs opens up developer API for its text to sound effects model
ElevenLabs’ text to sound effects tool enables developers to generate high-quality audio from short descriptions, useful for game development and music production. The API offers a Python SDK for easy integration, with options to control sound duration and prompt influence. The API is priced based on character count, with costs calculated per 100 characters for automatic duration or 25 characters per second for set durations. (ElevenLabs)

Still want to know more about what matters in AI right now?

Read this week’s issue of The Batch for in-depth analysis of news and research.

This week, Andrew Ng discussed the contrasting views of AI as a tool versus a separate entity:

“If I’m allowed to build a house, I want to be allowed to use a hammer, saw, drill, or any other tool that might get the job done efficiently. If I’m allowed to read a webpage, I’d like to be allowed to read it with any web browser, and perhaps even have the browser modify the page’s formatting for accessibility. More generally, if we agree that humans are allowed to do certain things — such as read and synthesize information on the web — then my inclination is to let humans direct AI to automate this task.”

Read Andrew's full letter here.

Other top AI news and research stories we covered in depth included the U.S. antitrust investigation on three AI giants, the new multilingual competitor to GPT-4, a growing market for lifelike avatars of deceased loved ones, and new benchmarks for agentic behaviors.

Subscribe to Data Points