StatsTest Blog
Experimental design, data analysis, and statistical tooling for modern teams. No hype, just the math.
Normality Tests Are Overrated: Better Diagnostics and Thresholds
Why formal normality tests like Shapiro-Wilk are problematic and what to use instead. Learn practical thresholds for when non-normality actually matters.
The One-Slide Experiment Readout: Five Numbers That Matter
A template for presenting experiment results in one slide. Focus on the five numbers executives actually need to make a decision.
One-Way ANOVA: Assumptions, Effect Sizes, and Proper Reporting
A practical guide to one-way ANOVA covering assumptions, diagnostics, effect size measures (eta-squared, omega-squared), and how to report results properly.
P-Values vs. Confidence Intervals: How to Interpret Both for Decisions
Understand the relationship between p-values and confidence intervals, when they agree, when they seem to disagree, and how to use them together for better decisions.
Paired Evaluation: McNemar's Test for Before/After Classification
When the same examples are evaluated by two models, use McNemar's test for proper inference. Learn why paired analysis is more powerful and how to implement it correctly.
Paired vs. Independent Data: A Diagnostic Checklist
How to determine whether your data is paired or independent, and why getting this wrong can dramatically affect your statistical power and validity.
Percentiles and Latency: Comparing P50, P95, P99 Correctly
How to properly compare percentile metrics like latency P50, P95, and P99 across groups. Learn about bootstrap inference, quantile regression, and the pitfalls of naive percentile comparisons.
Poisson vs. Negative Binomial: Modeling Counts and Rates
A practical guide to choosing between Poisson and negative binomial regression for count data. Learn to detect overdispersion, handle excess zeros, and interpret rate ratios correctly.
Post-Hoc Tests: Tukey, Dunnett, and Games-Howell Decision Tree
How to choose the right post-hoc test after ANOVA. Covers Tukey's HSD, Dunnett's test, Games-Howell, Scheffé, and provides a clear decision tree for selection.
Power Analysis Without Cargo Culting: Traps and Practical Heuristics
A practical guide to statistical power analysis that avoids common pitfalls. Learn when standard power calculations mislead, how to think about sample size decisions, and practical heuristics for real-world experimentation.
Practical Significance Thresholds: Defining Business Impact Before You Analyze
Learn how to set meaningful thresholds for practical significance before running experiments. Covers MDE, business context, ROI-based thresholds, and the difference between statistical and practical significance.