Transformers

8 Posts

Diagram of Coconut, a method training LLMs to process thought chains as vectors, comparing it to Chain-of-Thought (CoT).

Reasoning in Vectors, Not Text: Meta introduces Chain of Continuous Thought (Coconut) to improve next-token prediction

Although large language models can improve their performance by generating a chain of thought (CoT) — intermediate text tokens that break down the process of responding to a prompt into a series of steps.

X-CLR loss: training models to link text captions and image similarity.

Transformers

Calibrating Contrast: X-CLR, an approach to contrastive learning for better vision models

Contrastive loss functions make it possible to produce good embeddings without labeled data. A twist on this idea makes even more useful embeddings.

Performance comparison for Gemini models across benchmarks.

Transformers

Multimodal Modeling on the Double: Google introduces Gemini 2.0 Flash, a faster, more capable AI model

Google’s Gemini 2.0 Flash, the first member of its updated Gemini family of large multimodal models, combines speed with performance that exceeds that of its earlier flagship model, Gemini 1.5 Pro, on several measures.

Grounding DINO animation depicting object detection with bounding boxes on images.

Transformers

Object Detection for Small Devices: Grounding DINO 1.5, an edge device model built for faster, smarter object detection

An open source model is designed to perform sophisticated object detection on edge devices like phones, cars, medical equipment, and smart doorbells.

Efficient Foundations animation showing layered AI model components.

Transformers

More-Efficient Training for Transformers: Researchers reduce transformer training costs by 20% with minimal performance loss

Researchers cut the processing required to train transformers by around 20 percent with only a slight degradation in performance.

Transformers

Making LLMs Explainable: Google’s Gemma Scope probes how large language models think

Researchers have probed the inner workings of individual layers of large language models. A new tool applies this approach to all layers.

Throughput and latency at different context lengths

Transformers

Long Context Gets Up to Speed: AI21 Labs’ Jamba 1.5 outpaces transformers in long-text processing

A new model generates tokens faster than current transformers, especially when processing long inputs.

A man with electrodes connected through his skull is connected to a machine.

Transformers

A Lost Voice Regained: Brain implants paired with neural network reconstruct speech for ALS patient

A man who lost the ability to speak four years ago is sounding like his earlier self, thanks to a collection of brain implants and machine learning models.

Transformers

Reasoning in Vectors, Not Text: Meta introduces Chain of Continuous Thought (Coconut) to improve next-token prediction

Calibrating Contrast: X-CLR, an approach to contrastive learning for better vision models

Multimodal Modeling on the Double: Google introduces Gemini 2.0 Flash, a faster, more capable AI model

Object Detection for Small Devices: Grounding DINO 1.5, an edge device model built for faster, smarter object detection

More-Efficient Training for Transformers: Researchers reduce transformer training costs by 20% with minimal performance loss

Making LLMs Explainable: Google’s Gemma Scope probes how large language models think

Long Context Gets Up to Speed: AI21 Labs’ Jamba 1.5 outpaces transformers in long-text processing

A Lost Voice Regained: Brain implants paired with neural network reconstruct speech for ALS patient

Subscribe to The Batch