Computer Vision

19 Posts

AI model leaderboard comparing performance across tasks like math, vision, and document analysis.

Alibaba’s Answer to DeepSeek: Alibaba debuts Qwen2.5-VL, a powerful family of open vision-language models

While Hangzhou’s DeepSeek flexed its muscles, Chinese tech giant Alibaba vied for the spotlight with new open vision-language models.

GIF of two humanoid robots walking, one on grass and the other on a paved surface.

Computer Vision

Humanoid Robot Price Break: Unitree and EngineAI showcase affordable humanoid robots

Chinese robot makers Unitree and EngineAI showed off relatively low-priced humanoid robots that could bring advanced robotics closer to everyday applications.

X-CLR loss: training models to link text captions and image similarity.

Computer Vision

Calibrating Contrast: X-CLR, an approach to contrastive learning for better vision models

Contrastive loss functions make it possible to produce good embeddings without labeled data. A twist on this idea makes even more useful embeddings.

Table comparing model performance on Mathvista, MMMU, ChartQA, DocVQA, and other tasks.

Computer Vision

Mistral’s Vision-Language Contender: Mistral unveils Pixtral Large, a rival to top vision-language models

Mistral AI unveiled Pixtral Large, which rivals top models at processing combinations of text and images.

Grounding DINO animation depicting object detection with bounding boxes on images.

Computer Vision

Object Detection for Small Devices: Grounding DINO 1.5, an edge device model built for faster, smarter object detection

An open source model is designed to perform sophisticated object detection on edge devices like phones, cars, medical equipment, and smart doorbells.

Computer Vision

Landmine Recognition: AI supports specialists in battlefields by detecting landmines and other unexploded ordnance.

An AI system is scouring battlefields for landmines and other unexploded ordnance, enabling specialists to defuse them.

Computer Vision

Amazon Rethinks Cashier-Free Stores: Amazon scales back its AI-powered "Just Walk Out" checkout service

Amazon withdrew Just Walk Out, an AI-driven checkout service, from most of its Amazon Fresh grocery stores...

Computer Vision

High Yields for Small Farms: AI elevates chili farming in India with smarter yields.

Indian farmers used chatbots and computer vision to produce higher yields at lower costs. The state government of Telangana in South India partnered with agricultural aid organization Digital Green to provide AI tools to chili farmers.

Computer Vision

The Big Picture and the Details: I-JEPA, or how vision models understand the relationship between parts and the whole

A novel twist on self-supervised learning aims to improve on earlier methods by helping vision models learn how parts of an image relate to the whole.

Excerpt from Google Pixel 8 promotional video

Computer Vision

Generative AI Calling: Google brings advanced computer vision and audio tech to Pixel 8 and 8 Pro phones.

Google’s new mobile phones put advanced computer vision and audio research into consumers’ hands. The Alphabet division introduced its flagship Pixel 8 and Pixel 8 Pro smartphones at its annual hardware-launch event. Both units feature AI-powered tools for editing photos and videos.

Computer Vision

Vision Transformers Made Manageable: FlexiViT, the vision transformer that allows users to specify the patch size

Vision transformers typically process images in patches of fixed size. Smaller patches yield higher accuracy but require more computation. A new training method lets AI engineers adjust the tradeoff.

Security cameras somewhere around the Red Square in Moscow, Russia

Computer Vision

From Pandemic to Panopticon: How Russia is using face recognition to punish dissidents.

Governments are repurposing Covid-focused face recognition systems as tools of repression. Russia’s internal security forces are using Moscow’s visual surveillance system, initially meant to help enforce pandemic-era restrictions, to crack down on anti-government...

Flowcharts show how a new contrastive learning approach uses metadata to improve AI image classifiers

Computer Vision

Learning From Metadata: Descriptive Text Improves Performance for AI Image Classification Systems

Images in the wild may not come with labels, but they often include metadata. A new training method takes advantage of this information to improve contrastive learning.

Computer Vision

On the Ball: An AI-Powered App Lets Amateur Footballers Try Out for the Pros

AiSCOUT uses computer vision to grade amateur footballers and recommends those who score highest to representatives of professional teams.

A Cadillac SUV drives through one of UVeye's Atlas arches

Computer Vision

Auto Diagnosis: AI-Powered Inspections Arrive at Dealers for GM and Volvo

A drive-through system from UVeye automatically inspects vehicles for dents, leaks, and low tire pressure.

Computer Vision

Alibaba’s Answer to DeepSeek: Alibaba debuts Qwen2.5-VL, a powerful family of open vision-language models

Humanoid Robot Price Break: Unitree and EngineAI showcase affordable humanoid robots

Calibrating Contrast: X-CLR, an approach to contrastive learning for better vision models

Mistral’s Vision-Language Contender: Mistral unveils Pixtral Large, a rival to top vision-language models

Object Detection for Small Devices: Grounding DINO 1.5, an edge device model built for faster, smarter object detection

Landmine Recognition: AI supports specialists in battlefields by detecting landmines and other unexploded ordnance.

Amazon Rethinks Cashier-Free Stores: Amazon scales back its AI-powered "Just Walk Out" checkout service

High Yields for Small Farms: AI elevates chili farming in India with smarter yields.

The Big Picture and the Details: I-JEPA, or how vision models understand the relationship between parts and the whole

Generative AI Calling: Google brings advanced computer vision and audio tech to Pixel 8 and 8 Pro phones.

Vision Transformers Made Manageable: FlexiViT, the vision transformer that allows users to specify the patch size

From Pandemic to Panopticon: How Russia is using face recognition to punish dissidents.

Learning From Metadata: Descriptive Text Improves Performance for AI Image Classification Systems

On the Ball: An AI-Powered App Lets Amateur Footballers Try Out for the Pros

Auto Diagnosis: AI-Powered Inspections Arrive at Dealers for GM and Volvo

Subscribe to The Batch