Library

StatsTest Blog

Experimental design, data analysis, and statistical tooling for modern teams. No hype, just the math.

Model EvaluationJan 26New

Bootstrap for Metric Deltas: AUC, F1, and Other ML Metrics

How to compute confidence intervals and p-values for differences in ML metrics like AUC, F1, and precision. Learn paired bootstrap for defensible model comparisons.

statstest_flow Model Evaluation Supporting