Library

StatsTest Blog

Experimental design, data analysis, and statistical tooling for modern teams. No hype, just the math.

Paired Evaluation: McNemar's Test for Before/After Classification
Model EvaluationJan 26New

Paired Evaluation: McNemar's Test for Before/After Classification

When the same examples are evaluated by two models, use McNemar's test for proper inference. Learn why paired analysis is more powerful and how to implement it correctly.

Paired vs. Independent Data: A Diagnostic Checklist
Two-Group ComparisonsJan 26New

Paired vs. Independent Data: A Diagnostic Checklist

How to determine whether your data is paired or independent, and why getting this wrong can dramatically affect your statistical power and validity.

Percentiles and Latency: Comparing P50, P95, P99 Correctly
DistributionsJan 26New

Percentiles and Latency: Comparing P50, P95, P99 Correctly

How to properly compare percentile metrics like latency P50, P95, and P99 across groups. Learn about bootstrap inference, quantile regression, and the pitfalls of naive percentile comparisons.

Poisson vs. Negative Binomial: Modeling Counts and Rates
RegressionJan 26New

Poisson vs. Negative Binomial: Modeling Counts and Rates

A practical guide to choosing between Poisson and negative binomial regression for count data. Learn to detect overdispersion, handle excess zeros, and interpret rate ratios correctly.

Post-Hoc Tests: Tukey, Dunnett, and Games-Howell Decision Tree
Multi-Group ComparisonsJan 26New

Post-Hoc Tests: Tukey, Dunnett, and Games-Howell Decision Tree

How to choose the right post-hoc test after ANOVA. Covers Tukey's HSD, Dunnett's test, Games-Howell, Scheffé, and provides a clear decision tree for selection.

Power Analysis Without Cargo Culting: Traps and Practical Heuristics
Effect SizesJan 26New

Power Analysis Without Cargo Culting: Traps and Practical Heuristics

A practical guide to statistical power analysis that avoids common pitfalls. Learn when standard power calculations mislead, how to think about sample size decisions, and practical heuristics for real-world experimentation.

Practical Significance Thresholds: Defining Business Impact Before You Analyze
Effect SizesJan 26New

Practical Significance Thresholds: Defining Business Impact Before You Analyze

Learn how to set meaningful thresholds for practical significance before running experiments. Covers MDE, business context, ROI-based thresholds, and the difference between statistical and practical significance.

Pre-Analysis Checklist: Green, Yellow, and Red Flags for Analysts
AssumptionsJan 26New

Pre-Analysis Checklist: Green, Yellow, and Red Flags for Analysts

A practical pre-flight checklist before running statistical analyses. Covers data quality, assumption checks, and common pitfalls that can derail your analysis.

Pre-Registration Lite for Product Experiments: A Pragmatic Workflow
ReportingJan 26New

Pre-Registration Lite for Product Experiments: A Pragmatic Workflow

A lightweight pre-registration process that works in fast-moving product teams. Document your analysis plan in 15 minutes and build credibility through transparency.

Ratio Metrics (CTR, Conversion): Why They're Tricky and Stable Alternatives
DistributionsJan 26New

Ratio Metrics (CTR, Conversion): Why They're Tricky and Stable Alternatives

Why ratio metrics like CTR and conversion rates require special statistical treatment. Learn about variance estimation, the delta method, and when to use alternative approaches.

Regression vs. t-Test vs. ANOVA: The Unifying View (and When the Simpler Tool Suffices)
RegressionJan 26New

Regression vs. t-Test vs. ANOVA: The Unifying View (and When the Simpler Tool Suffices)

Understand how t-tests, ANOVA, and regression are all the same underlying model. Learn when to use the simpler approach and when regression's flexibility is worth it.

Reporting Templates: Stakeholder Language Without Overclaiming
Effect SizesJan 26New

Reporting Templates: Stakeholder Language Without Overclaiming

Ready-to-use templates for presenting statistical results to non-technical stakeholders. Learn to communicate effect sizes, uncertainty, and practical significance without oversimplifying or overclaiming.