StatsTest Blog
Experimental design, data analysis, and statistical tooling for modern teams. No hype, just the math.
Drift Detection: KS Test, PSI, and Interpreting Signals
How to detect when your model's inputs or outputs have shifted. Learn about KS tests, Population Stability Index, and when drift actually matters.
Effect Sizes for Mean Differences: Cohen's d, Hedges' g, and Raw Differences
A practical guide to effect sizes for comparing means. Learn when to use standardized vs. raw effect sizes, how to calculate and interpret them, and how to report them properly.
Effect Sizes for Proportions: Risk Difference, Risk Ratio, and Odds Ratio
A practical guide to effect sizes when comparing rates and proportions. Learn when to use risk difference vs. risk ratio vs. odds ratio, and how to interpret each correctly.
Equal Variance and Welch's T-Test: When It Actually Matters
A deep dive into the equal variance assumption for t-tests and ANOVA. Learn when violations are problematic, how to detect them, and why Welch's correction should be your default.
Experiment Guardrails: Stopping Rules, Ramp Criteria, and Managing Risk
Protect your experiments and users with proper guardrails. Learn when to stop an experiment, how to safely ramp exposure, and what metrics should trigger automatic rollback.
Feature Scaling and Transforms: When Preprocessing Changes the Story
A practical guide to standardization, centering, and transformations in regression. Learn when scaling affects interpretation, when it's required, and how to interpret coefficients on transformed variables.
Handling Outliers: Trimmed Means, Winsorization, and Robust Methods
How to analyze data with outliers without throwing away information or letting extreme values dominate. Covers trimming, winsorization, robust estimators, and when each is appropriate.
Hazard Ratio Interpretation for Product Teams: When NOT to Use It
A practical guide to interpreting hazard ratios for non-statisticians. Learn what hazard ratios actually mean, common misinterpretations, when they're misleading, and better alternatives for communicating survival results.
Heteroskedastic Groups: When Variances Differ and What to Do About It
How to handle multi-group comparisons when variances are unequal. Covers Welch's ANOVA, Games-Howell post-hoc, and why this matters more than non-normality.
Independence: The Silent Killer of Statistical Validity
The independence assumption is the most critical and most commonly violated. Learn to detect non-independence from repeated measures, clustering, and time series—and what to do about it.
Inter-Rater Reliability: Cohen's Kappa and Krippendorff's Alpha
How to measure agreement between human raters for AI evaluation. Learn when to use Cohen's Kappa vs. Krippendorff's Alpha, how to interpret values, and what to do when agreement is low.