Library

StatsTest Blog

Experimental design, data analysis, and statistical tooling for modern teams. No hype, just the math.

StatsTest
Model EvaluationJan 26New

Multiple Prompts and Metrics: Controlling False Discoveries in Evals

When evaluating models across many prompts or metrics, false positives multiply. Learn how to control false discovery rate and make defensible claims about model improvements.