Library

StatsTest Blog

Experimental design, data analysis, and statistical tooling for modern teams. No hype, just the math.

Model EvaluationJan 26New

Paired Evaluation: McNemar's Test for Before/After Classification

When the same examples are evaluated by two models, use McNemar's test for proper inference. Learn why paired analysis is more powerful and how to implement it correctly.

statstest_flow Model Evaluation Supporting

Two-Group ComparisonsJan 26New

Paired vs. Independent Data: A Diagnostic Checklist

How to determine whether your data is paired or independent, and why getting this wrong can dramatically affect your statistical power and validity.

statstest_flow Two-Group Comparisons Supporting

DistributionsJan 26New

Percentiles and Latency: Comparing P50, P95, P99 Correctly

How to properly compare percentile metrics like latency P50, P95, and P99 across groups. Learn about bootstrap inference, quantile regression, and the pitfalls of naive percentile comparisons.

statstest_flow Distributions Supporting

RegressionJan 26New

Poisson vs. Negative Binomial: Modeling Counts and Rates

A practical guide to choosing between Poisson and negative binomial regression for count data. Learn to detect overdispersion, handle excess zeros, and interpret rate ratios correctly.

statstest_flow Regression Supporting

Multi-Group ComparisonsJan 26New

Post-Hoc Tests: Tukey, Dunnett, and Games-Howell Decision Tree

How to choose the right post-hoc test after ANOVA. Covers Tukey's HSD, Dunnett's test, Games-Howell, Scheffé, and provides a clear decision tree for selection.

statstest_flow Multi-Group Comparisons Supporting

Effect SizesJan 26New

Power Analysis Without Cargo Culting: Traps and Practical Heuristics

A practical guide to statistical power analysis that avoids common pitfalls. Learn when standard power calculations mislead, how to think about sample size decisions, and practical heuristics for real-world experimentation.

statstest_flow Effect Sizes Supporting

Effect SizesJan 26New

Practical Significance Thresholds: Defining Business Impact Before You Analyze

Learn how to set meaningful thresholds for practical significance before running experiments. Covers MDE, business context, ROI-based thresholds, and the difference between statistical and practical significance.

statstest_flow Effect Sizes Supporting

AssumptionsJan 26New

Pre-Analysis Checklist: Green, Yellow, and Red Flags for Analysts

A practical pre-flight checklist before running statistical analyses. Covers data quality, assumption checks, and common pitfalls that can derail your analysis.

statstest_flow Assumptions Supporting

ReportingJan 26New

Pre-Registration Lite for Product Experiments: A Pragmatic Workflow

A lightweight pre-registration process that works in fast-moving product teams. Document your analysis plan in 15 minutes and build credibility through transparency.

statstest_flow Reporting Supporting

DistributionsJan 26New

Ratio Metrics (CTR, Conversion): Why They're Tricky and Stable Alternatives

Why ratio metrics like CTR and conversion rates require special statistical treatment. Learn about variance estimation, the delta method, and when to use alternative approaches.

statstest_flow Distributions Supporting

RegressionJan 26New

Regression vs. t-Test vs. ANOVA: The Unifying View (and When the Simpler Tool Suffices)

Understand how t-tests, ANOVA, and regression are all the same underlying model. Learn when to use the simpler approach and when regression's flexibility is worth it.

statstest_flow Regression Supporting

Effect SizesJan 26New

Reporting Templates: Stakeholder Language Without Overclaiming

Ready-to-use templates for presenting statistical results to non-technical stakeholders. Learn to communicate effect sizes, uncertainty, and practical significance without oversimplifying or overclaiming.

statstest_flow Effect Sizes Supporting