Contents
Practical Significance Thresholds: Defining Business Impact Before You Analyze
Learn how to set meaningful thresholds for practical significance before running experiments. Covers MDE, business context, ROI-based thresholds, and the difference between statistical and practical significance.
Quick Hits
- •Statistical significance (p < 0.05) doesn't mean the effect matters
- •Define 'what effect would change our decision' BEFORE analyzing
- •Consider implementation costs, opportunity costs, and expected ROI
- •Document thresholds in advance to prevent post-hoc rationalization
TL;DR
Statistical significance (p < 0.05) tells you an effect probably exists. Practical significance tells you it's big enough to matter. With enough data, even trivial effects become statistically significant. Define your practical significance threshold BEFORE analyzing: What effect size would change your decision? This prevents post-hoc rationalization and keeps focus on what actually matters for your business.
The Problem
When Significance Doesn't Mean Important
import numpy as np
from scipy import stats
def demonstrate_meaningless_significance():
"""
Show how large samples make trivial effects 'significant'.
"""
np.random.seed(42)
# Very large sample, very tiny effect
n = 100000
true_effect = 0.3 # 0.3 units on a 100-point scale
control = np.random.normal(50, 15, n)
treatment = np.random.normal(50 + true_effect, 15, n)
_, p = stats.ttest_ind(control, treatment)
d = true_effect / 15 # Cohen's d
print("THE PROBLEM: Meaningless Significance")
print("=" * 50)
print()
print(f"Sample size: {n:,} per group")
print(f"True effect: {true_effect} points (on 100-point scale)")
print(f"Cohen's d: {d:.3f} (negligible)")
print()
print(f"P-value: {p:.10f}")
print(f"Result: p < 0.05 ✓ (statistically significant!)")
print()
print("BUT...")
print(f" • A 0.3 point difference is meaningless")
print(f" • No one would notice or care")
print(f" • Yet we would 'reject the null hypothesis'")
demonstrate_meaningless_significance()
The Two Questions
def two_questions():
"""
Distinguish statistical from practical significance.
"""
print("TWO SEPARATE QUESTIONS")
print("=" * 50)
print()
print("1. STATISTICAL SIGNIFICANCE")
print(" Question: Is there probably a real effect?")
print(" Tool: P-value, hypothesis test")
print(" Answer: Yes/No")
print()
print("2. PRACTICAL SIGNIFICANCE")
print(" Question: Is the effect big enough to matter?")
print(" Tool: Effect size vs. threshold")
print(" Answer: Yes/No/Uncertain")
print()
print("THE COMBINATIONS:")
print("-" * 50)
print()
print("Stat Sig + Prac Sig = ACTION")
print(" Real effect that matters")
print()
print("Stat Sig + NOT Prac Sig = NO ACTION")
print(" Real effect but too small to matter")
print()
print("NOT Stat Sig + Prac Sig = NEED MORE DATA")
print(" Can't tell if effect is real, but might be important")
print()
print("NOT Stat Sig + NOT Prac Sig = NO ACTION")
print(" No evidence of meaningful effect")
two_questions()
Setting Thresholds
Business-Based Approach
def business_threshold_framework():
"""
Framework for setting practical significance thresholds.
"""
print("SETTING PRACTICAL SIGNIFICANCE THRESHOLDS")
print("=" * 60)
print()
questions = {
"Implementation Cost": [
"What does it cost to implement this change?",
"Engineering time, design resources, QA",
"Ongoing maintenance costs"
],
"Opportunity Cost": [
"What else could we do with these resources?",
"How long does the experiment delay other work?",
"What's the cost of being wrong?"
],
"Expected Impact": [
"What metric are we moving?",
"How does that metric translate to business value?",
"What's the lifetime value of the improvement?"
],
"Risk Tolerance": [
"What if we implement and the effect disappears?",
"What if the effect is real but smaller than measured?",
"Can we easily revert?"
]
}
for category, qs in questions.items():
print(f"\n{category}:")
for q in qs:
print(f" • {q}")
print("\n" + "=" * 60)
print("\nTHRESHOLD CALCULATION:")
print(" Minimum effect = Cost / (Value per unit × Scale)")
print()
print("Example:")
print(" Implementation cost: $50,000")
print(" Value per conversion: $10")
print(" Users affected per year: 1,000,000")
print(" Current conversion: 5%")
print()
print(" Minimum lift for 1-year ROI:")
print(" $50,000 / ($10 × 1,000,000) = 0.5% absolute")
print(" Relative: 0.5% / 5% = 10% relative lift needed")
business_threshold_framework()
def calculate_required_lift(implementation_cost, value_per_conversion,
annual_users, current_rate, payback_period=1):
"""
Calculate minimum lift needed to justify implementation.
"""
annual_value_per_pct = value_per_conversion * annual_users / 100
min_absolute_lift = implementation_cost / (annual_value_per_pct * payback_period)
min_relative_lift = min_absolute_lift / current_rate * 100
return {
'min_absolute_lift': min_absolute_lift,
'min_relative_lift': min_relative_lift,
'breakeven_value': implementation_cost,
'annual_value_per_pct': annual_value_per_pct
}
# Example
result = calculate_required_lift(
implementation_cost=50000,
value_per_conversion=10,
annual_users=1000000,
current_rate=5.0,
payback_period=1
)
print("\nTHRESHOLD CALCULATION RESULT:")
print("-" * 40)
print(f"Minimum absolute lift: {result['min_absolute_lift']:.2f} percentage points")
print(f"Minimum relative lift: {result['min_relative_lift']:.1f}%")
print(f"(Baseline rate: 5.0%)")
Domain-Specific Guidelines
E-commerce / Conversion
def ecommerce_thresholds():
"""
Typical thresholds for e-commerce metrics.
"""
print("E-COMMERCE THRESHOLD GUIDELINES")
print("=" * 50)
thresholds = {
"Conversion rate": {
"typical_baseline": "2-5%",
"meaningful_lift": "5-10% relative",
"example": "3% → 3.15% (5% lift) might be worthwhile",
"note": "Even 0.1% absolute can be huge at scale"
},
"Average order value": {
"typical_baseline": "$50-150",
"meaningful_lift": "3-5% relative",
"example": "$80 → $84 (5% lift)",
"note": "Consider distribution - median may matter more"
},
"Revenue per visitor": {
"typical_baseline": "$1-5",
"meaningful_lift": "5-15% relative",
"example": "$2.50 → $2.75 (10% lift)",
"note": "Compounds conversion and AOV"
},
"Cart abandonment": {
"typical_baseline": "60-80%",
"meaningful_lift": "2-5% absolute reduction",
"example": "70% → 67% abandonment",
"note": "Absolute change matters here"
}
}
for metric, info in thresholds.items():
print(f"\n{metric}:")
for k, v in info.items():
print(f" {k}: {v}")
ecommerce_thresholds()
SaaS / Retention
def saas_thresholds():
"""
Typical thresholds for SaaS metrics.
"""
print("SAAS THRESHOLD GUIDELINES")
print("=" * 50)
thresholds = {
"Monthly churn rate": {
"typical_baseline": "2-8%",
"meaningful_lift": "0.5-1% absolute reduction",
"example": "5% → 4.5% monthly churn",
"note": "Small changes compound significantly"
},
"Trial-to-paid conversion": {
"typical_baseline": "5-15%",
"meaningful_lift": "10-20% relative",
"example": "10% → 11% trial conversion",
"note": "Very high LTV justifies small lifts"
},
"Feature adoption": {
"typical_baseline": "20-60%",
"meaningful_lift": "5-10% absolute",
"example": "30% → 35% feature usage",
"note": "Depends on feature importance"
},
"NPS / Satisfaction": {
"typical_baseline": "30-50",
"meaningful_lift": "5-10 points",
"example": "40 → 45 NPS",
"note": "Leading indicator of retention"
}
}
for metric, info in thresholds.items():
print(f"\n{metric}:")
for k, v in info.items():
print(f" {k}: {v}")
saas_thresholds()
Pre-Specifying Thresholds
Why Pre-Specification Matters
def why_prespecify():
"""
Explain importance of pre-specifying thresholds.
"""
print("WHY PRE-SPECIFY THRESHOLDS?")
print("=" * 50)
print()
problems_without = [
"Post-hoc rationalization: 'This effect is big enough'",
"Goal posts move based on results",
"Confirmation bias in interpretation",
"Harder to defend decisions to stakeholders"
]
benefits = [
"Clear decision criteria before seeing results",
"Separates analysis from decision-making",
"Easier to communicate and get buy-in",
"Documents your thinking for future reference"
]
print("Problems WITHOUT pre-specification:")
for p in problems_without:
print(f" ✗ {p}")
print("\nBenefits OF pre-specification:")
for b in benefits:
print(f" ✓ {b}")
why_prespecify()
Documentation Template
def threshold_documentation_template():
"""
Template for documenting practical significance threshold.
"""
template = """
PRACTICAL SIGNIFICANCE THRESHOLD DOCUMENTATION
==============================================
Experiment: [Name]
Date: [Date]
Author: [Name]
1. METRIC DEFINITION
Primary metric: [e.g., Conversion rate]
Current baseline: [e.g., 5.0%]
Measurement period: [e.g., 2 weeks]
2. THRESHOLD CALCULATION
Implementation cost: $[X]
- Engineering: $[Y]
- Design: $[Z]
- Other: $[W]
Expected annual impact per 1% lift: $[A]
- Users affected: [N]
- Value per conversion: $[V]
Payback period: [e.g., 1 year]
MINIMUM THRESHOLD:
- Absolute: [X]% lift (e.g., 5.0% → 5.5%)
- Relative: [Y]% improvement
3. DECISION RULES
If CI entirely above threshold → SHIP
If CI entirely below threshold → DO NOT SHIP
If CI spans threshold → [Decision/Next steps]
4. RATIONALE
[Why this threshold makes sense for this experiment]
[Any context-specific considerations]
5. SIGN-OFF
Approved by: [Name]
Date: [Date]
"""
print(template)
threshold_documentation_template()
Making Decisions
Decision Framework
def decision_framework(effect, ci, threshold, alpha=0.05):
"""
Framework for making decisions with practical significance.
"""
ci_low, ci_high = ci
# Statistical significance
stat_sig = ci_low > 0 or ci_high < 0
# Practical significance scenarios
definitely_meaningful = ci_low > threshold
definitely_trivial = ci_high < threshold
uncertain = not definitely_meaningful and not definitely_trivial
print("DECISION FRAMEWORK")
print("=" * 60)
print()
print(f"Effect estimate: {effect:.3f}")
print(f"95% CI: [{ci_low:.3f}, {ci_high:.3f}]")
print(f"Practical threshold: {threshold:.3f}")
print()
# Visual representation
print("Visual:")
print(" Null Threshold")
print(" | |")
scale_min, scale_max = min(ci_low, 0) - 0.5, max(ci_high, threshold) + 0.5
print(f" {'-'*40}")
print("\nAssessment:")
print("-" * 40)
if stat_sig:
print("✓ Statistically significant (CI excludes 0)")
else:
print("✗ NOT statistically significant (CI includes 0)")
if definitely_meaningful:
print("✓ Definitely practically meaningful (entire CI > threshold)")
elif definitely_trivial:
print("✗ Definitely NOT practically meaningful (entire CI < threshold)")
else:
print("? Uncertain practical significance (CI spans threshold)")
print("\nDECISION:")
print("-" * 40)
if stat_sig and definitely_meaningful:
print("→ SHIP: Strong evidence of meaningful effect")
elif stat_sig and definitely_trivial:
print("→ DO NOT SHIP: Effect is real but too small to matter")
elif stat_sig and uncertain:
print("→ CONDITIONAL: Effect exists but might not be meaningful")
print(" Consider: risk tolerance, implementation cost, reversibility")
elif not stat_sig and definitely_trivial:
print("→ DO NOT SHIP: No evidence of meaningful effect")
elif not stat_sig and (uncertain or definitely_meaningful):
print("→ INCONCLUSIVE: Can't rule out meaningful effect")
print(" Options: gather more data, accept uncertainty, or go/no-go based on priors")
else:
print("→ INCONCLUSIVE: Need more information")
# Examples
print("\n" + "="*70)
print("SCENARIO 1: Clear win")
print("="*70)
decision_framework(effect=0.08, ci=(0.05, 0.11), threshold=0.03)
print("\n" + "="*70)
print("SCENARIO 2: Significant but trivial")
print("="*70)
decision_framework(effect=0.02, ci=(0.01, 0.03), threshold=0.05)
print("\n" + "="*70)
print("SCENARIO 3: Promising but uncertain")
print("="*70)
decision_framework(effect=0.04, ci=(-0.01, 0.09), threshold=0.03)
Common Mistakes
def common_mistakes():
"""
Common mistakes in practical significance assessment.
"""
print("COMMON MISTAKES TO AVOID")
print("=" * 50)
mistakes = {
"Using Cohen's benchmarks blindly": {
"problem": "d = 0.2 is 'small' but could be huge in your context",
"solution": "Derive thresholds from business impact"
},
"Setting threshold post-hoc": {
"problem": "Easy to rationalize any result as 'meaningful'",
"solution": "Document threshold before seeing results"
},
"Ignoring confidence interval width": {
"problem": "Point estimate might be above threshold but CI isn't",
"solution": "Make decisions based on CI bounds, not just point estimate"
},
"Conflating statistical and practical significance": {
"problem": "p < 0.05 doesn't mean the effect matters",
"solution": "Evaluate both separately"
},
"Using same threshold for all metrics": {
"problem": "Different metrics have different scales and business value",
"solution": "Calculate threshold for each primary metric"
}
}
for mistake, details in mistakes.items():
print(f"\n{mistake}:")
print(f" Problem: {details['problem']}")
print(f" Solution: {details['solution']}")
common_mistakes()
R Implementation
# Practical significance framework in R
decision_framework <- function(effect, ci_low, ci_high, threshold) {
cat("PRACTICAL SIGNIFICANCE ASSESSMENT\n")
cat(rep("=", 50), "\n\n")
cat(sprintf("Effect: %.3f\n", effect))
cat(sprintf("95%% CI: [%.3f, %.3f]\n", ci_low, ci_high))
cat(sprintf("Threshold: %.3f\n\n", threshold))
# Assessments
stat_sig <- (ci_low > 0) | (ci_high < 0)
definitely_meaningful <- ci_low > threshold
definitely_trivial <- ci_high < threshold
cat("Statistical significance: ")
cat(ifelse(stat_sig, "Yes\n", "No\n"))
cat("Practical significance: ")
if (definitely_meaningful) {
cat("Definitely meaningful\n")
} else if (definitely_trivial) {
cat("Definitely trivial\n")
} else {
cat("Uncertain\n")
}
cat("\nDECISION: ")
if (stat_sig && definitely_meaningful) {
cat("SHIP - Clear meaningful effect\n")
} else if (stat_sig && definitely_trivial) {
cat("DO NOT SHIP - Effect too small\n")
} else {
cat("NEEDS JUDGMENT - Consider context\n")
}
}
# Usage:
# decision_framework(effect = 0.05, ci_low = 0.02, ci_high = 0.08, threshold = 0.03)
Related Methods
- Effect Sizes Master Guide — The pillar article
- MDE and Sample Size — Planning for detection
- Power Analysis — Powering for meaningful effects
Key Takeaway
Practical significance means the effect is large enough to matter for your decision. Define this threshold BEFORE analyzing based on business context: implementation cost, expected value, and stakeholder expectations. Statistical significance without practical significance is a false positive for decision-making. Use confidence intervals to assess whether effects are definitively meaningful, definitively trivial, or uncertain—and plan your decision rule for each scenario in advance.
References
- https://doi.org/10.1037/a0024338
- https://doi.org/10.1002/9781118445112.stat06538
- Kirk, R. E. (1996). Practical significance: A concept whose time has come. *Educational and Psychological Measurement*, 56(5), 746-759.
- Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers. *Professional Psychology: Research and Practice*, 40(5), 532-538.
- Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. *Advances in Methods and Practices in Psychological Science*, 1(2), 259-269.
Frequently Asked Questions
How do I determine what effect size is 'meaningful'?
Should I use Cohen's benchmarks (0.2, 0.5, 0.8)?
What if my CI spans both meaningful and trivial effects?
Key Takeaway
Practical significance means the effect is large enough to matter for your decision. Define this threshold BEFORE analyzing, based on business context: What effect would justify implementation costs? What would stakeholders consider meaningful? Statistical significance without practical significance is a false positive for decision-making.