Library

StatsTest Blog

Experimental design, data analysis, and statistical tooling for modern teams. No hype, just the math.

Model EvaluationJan 26New

Comparing Two Models: Win Rate, Binomial CI, and Proper Tests

How to rigorously compare two ML models using win rate analysis. Learn about binomial confidence intervals, significance tests, and how many examples you actually need.

statstest_flow Model Evaluation Supporting

Two-Group ComparisonsJan 26New

Comparing Variances: Levene's Test, Bartlett's Test, and the F-Test

When you need to test whether two or more groups have equal variances. Covers Levene's test, Bartlett's test, Brown-Forsythe, and when each is appropriate.

statstest_flow Two-Group Comparisons Supporting

Effect SizesJan 26New

Confidence Intervals for Non-Normal Metrics: Bootstrap Methods

How to construct confidence intervals when your data isn't normal. Covers percentile, BCa, and studentized bootstrap methods with practical guidance on when each works best.

statstest_flow Effect Sizes Supporting

Multi-Group ComparisonsJan 26New

Controlling for Covariates: ANCOVA vs. Regression

When and how to control for covariates in group comparisons. Covers ANCOVA, regression adjustment, and the key assumptions that make covariate adjustment valid.

statstest_flow Multi-Group Comparisons Supporting

Survival AnalysisJan 26New

Cox Proportional Hazards: What 'Proportional' Actually Means

A practical guide to Cox regression for product analysts. Learn what the proportional hazards assumption means, how to check it, what to do when it fails, and how to interpret hazard ratios correctly.

statstest_flow Survival Analysis Supporting

A/B TestingJan 26New

CUPED and Variance Reduction: When It Helps and When It Backfires

Learn how CUPED (Controlled-experiment Using Pre-Experiment Data) can dramatically reduce variance in A/B tests, when to use it, and the pitfalls that can make it backfire.

statstest_flow A/B Testing Supporting

DistributionsJan 26New

Dealing with Zeros: Zero-Inflated and Two-Part Models

How to handle metrics with many zeros—revenue from non-purchasers, engagement from inactive users, events that didn't happen. Learn when to use zero-inflated models, two-part models, and simpler alternatives.

statstest_flow Distributions Supporting

DistributionsJan 26New

Delta Method vs. Bootstrap: When Each Is Appropriate

A practical guide to choosing between delta method and bootstrap for variance estimation. Learn when each approach excels, their assumptions, and how to implement both.

statstest_flow Distributions Supporting

Model EvaluationJan 26New

Drift Detection: KS Test, PSI, and Interpreting Signals

How to detect when your model's inputs or outputs have shifted. Learn about KS tests, Population Stability Index, and when drift actually matters.

statstest_flow Model Evaluation Supporting

Effect SizesJan 26New

Effect Sizes for Mean Differences: Cohen's d, Hedges' g, and Raw Differences

A practical guide to effect sizes for comparing means. Learn when to use standardized vs. raw effect sizes, how to calculate and interpret them, and how to report them properly.

statstest_flow Effect Sizes Supporting

Effect SizesJan 26New

Effect Sizes for Proportions: Risk Difference, Risk Ratio, and Odds Ratio

A practical guide to effect sizes when comparing rates and proportions. Learn when to use risk difference vs. risk ratio vs. odds ratio, and how to interpret each correctly.

statstest_flow Effect Sizes Supporting

AssumptionsJan 26New

Equal Variance and Welch's T-Test: When It Actually Matters

A deep dive into the equal variance assumption for t-tests and ANOVA. Learn when violations are problematic, how to detect them, and why Welch's correction should be your default.

statstest_flow Assumptions Supporting