May 29, 2024
We Need Better Evals for LLM Applications: It’s hard to evaluate AI applications built on large language models. Better evals would accelerate progress.
A barrier to faster progress in generative AI is evaluations (evals), particularly of custom AI applications that generate free-form text.