AdaptGauge
Detect when few-shot examples make your LLM worse
AdaptGauge – Detects when few-shot examples reduce LLM performance
Summary: AdaptGauge identifies when adding few-shot examples worsens large language model (LLM) accuracy by tracking learning curves and detecting failure patterns. It analyzes multiple models and tasks to classify performance collapses and compare example selection methods.
What it does
AdaptGauge tests LLMs across tasks to detect few-shot collapse by monitoring performance drops, classifying failure types, and evaluating example selection strategies automatically.
Who it's for
It is designed for researchers and developers working with LLMs who need to identify when few-shot prompting degrades model results.
Why it matters
It solves the problem of unnoticed performance degradation caused by adding few-shot examples, preventing failures before production deployment.