Library

StatsTest Blog

Experimental design, data analysis, and statistical tooling for modern teams. No hype, just the math.

Model EvaluationJan 26New

Statistically Significant but Meaningless: Practical Thresholds for Evals

A 0.5% accuracy improvement with p<0.001 is real but worthless. Learn how to distinguish statistically significant from practically meaningful in model evaluation.

statstest_flow Model Evaluation Supporting