Benchmark Tests Are Meaningless: The problem with training data contamination in machine learning
The universe of web pages includes correct answers to common questions that are used to test large language models. How can we evaluate new models if they’ve studied the answers before we give them the test?
Models Ranked for Hallucinations: Measuring language model hallucinations during information retrieval
How often do large language models make up information when they generate text based on a retrieved document? A study evaluated the tendency of popular models to hallucinate while performing retrieval-augmented generation (RAG).
Benchmarks for Agentic Behaviors: New LLM benchmarks for Tool Use and Planning in workplace tasks
Tool use and planning are key behaviors in agentic workflows that enable large language models (LLMs) to execute complex sequences of steps. New benchmarks measure these capabilities in common workplace tasks.
Sample-Efficient Training for Robots: Reinforcement learning from human feedback to train robots
Training an agent that controls a robot arm to perform a task — say, opening a door — that involves a sequence of motions (reach, grasp, turn, pull, release) can take from tens of thousands to millions of examples...
When Trees Outdo Neural Networks: Decision Trees Perform Best on Most Tabular Data
While neural networks perform well on image, text, and audio datasets, they fall behind decision trees and their variations for tabular datasets. New research looked into why.
Humanized Training for Robot Arms: New Research Improves Robot Performance and Adaptability
Robots trained via reinforcement learning usually study videos of robots performing the task at hand. A new approach used videos of humans to pre-train robotic arms.
Toward Next-Gen Language Models: New Benchmarks Test the Limits of Large Language Models
A new benchmark aims to raise the bar for large language models. Researchers at 132 institutions worldwide introduced the Beyond the Imitation Game benchmark (BIG-bench), which includes tasks that humans perform well but current state-of-the-art models don’t.
AI Progress Report: Stanford University's fifth annual AI Report for 2022
A new study showcases AI’s growing importance worldwide. What’s new: The fifth annual AI Index from Stanford University’s Institute for Human-Centered AI documents rises in funding, regulation, and performance.
Load More
Subscribe to The Batch
Stay updated with weekly AI News and Insights delivered to your inbox