Effect Sizes

Practical Significance Thresholds: Defining Business Impact Before You Analyze

Learn how to set meaningful thresholds for practical significance before running experiments. Covers MDE, business context, ROI-based thresholds, and the difference between statistical and practical significance.

Share

Quick Hits

  • Statistical significance (p < 0.05) doesn't mean the effect matters
  • Define 'what effect would change our decision' BEFORE analyzing
  • Consider implementation costs, opportunity costs, and expected ROI
  • Document thresholds in advance to prevent post-hoc rationalization

TL;DR

Statistical significance (p < 0.05) tells you an effect probably exists. Practical significance tells you it's big enough to matter. With enough data, even trivial effects become statistically significant. Define your practical significance threshold BEFORE analyzing: What effect size would change your decision? This prevents post-hoc rationalization and keeps focus on what actually matters for your business.


The Problem

When Significance Doesn't Mean Important

import numpy as np
from scipy import stats

def demonstrate_meaningless_significance():
    """
    Show how large samples make trivial effects 'significant'.
    """
    np.random.seed(42)

    # Very large sample, very tiny effect
    n = 100000
    true_effect = 0.3  # 0.3 units on a 100-point scale

    control = np.random.normal(50, 15, n)
    treatment = np.random.normal(50 + true_effect, 15, n)

    _, p = stats.ttest_ind(control, treatment)
    d = true_effect / 15  # Cohen's d

    print("THE PROBLEM: Meaningless Significance")
    print("=" * 50)
    print()
    print(f"Sample size: {n:,} per group")
    print(f"True effect: {true_effect} points (on 100-point scale)")
    print(f"Cohen's d: {d:.3f} (negligible)")
    print()
    print(f"P-value: {p:.10f}")
    print(f"Result: p < 0.05 ✓ (statistically significant!)")
    print()
    print("BUT...")
    print(f"  • A 0.3 point difference is meaningless")
    print(f"  • No one would notice or care")
    print(f"  • Yet we would 'reject the null hypothesis'")


demonstrate_meaningless_significance()

The Two Questions

def two_questions():
    """
    Distinguish statistical from practical significance.
    """
    print("TWO SEPARATE QUESTIONS")
    print("=" * 50)
    print()
    print("1. STATISTICAL SIGNIFICANCE")
    print("   Question: Is there probably a real effect?")
    print("   Tool: P-value, hypothesis test")
    print("   Answer: Yes/No")
    print()
    print("2. PRACTICAL SIGNIFICANCE")
    print("   Question: Is the effect big enough to matter?")
    print("   Tool: Effect size vs. threshold")
    print("   Answer: Yes/No/Uncertain")
    print()
    print("THE COMBINATIONS:")
    print("-" * 50)
    print()
    print("Stat Sig + Prac Sig = ACTION")
    print("  Real effect that matters")
    print()
    print("Stat Sig + NOT Prac Sig = NO ACTION")
    print("  Real effect but too small to matter")
    print()
    print("NOT Stat Sig + Prac Sig = NEED MORE DATA")
    print("  Can't tell if effect is real, but might be important")
    print()
    print("NOT Stat Sig + NOT Prac Sig = NO ACTION")
    print("  No evidence of meaningful effect")


two_questions()

Setting Thresholds

Business-Based Approach

def business_threshold_framework():
    """
    Framework for setting practical significance thresholds.
    """
    print("SETTING PRACTICAL SIGNIFICANCE THRESHOLDS")
    print("=" * 60)
    print()

    questions = {
        "Implementation Cost": [
            "What does it cost to implement this change?",
            "Engineering time, design resources, QA",
            "Ongoing maintenance costs"
        ],
        "Opportunity Cost": [
            "What else could we do with these resources?",
            "How long does the experiment delay other work?",
            "What's the cost of being wrong?"
        ],
        "Expected Impact": [
            "What metric are we moving?",
            "How does that metric translate to business value?",
            "What's the lifetime value of the improvement?"
        ],
        "Risk Tolerance": [
            "What if we implement and the effect disappears?",
            "What if the effect is real but smaller than measured?",
            "Can we easily revert?"
        ]
    }

    for category, qs in questions.items():
        print(f"\n{category}:")
        for q in qs:
            print(f"  • {q}")

    print("\n" + "=" * 60)
    print("\nTHRESHOLD CALCULATION:")
    print("  Minimum effect = Cost / (Value per unit × Scale)")
    print()
    print("Example:")
    print("  Implementation cost: $50,000")
    print("  Value per conversion: $10")
    print("  Users affected per year: 1,000,000")
    print("  Current conversion: 5%")
    print()
    print("  Minimum lift for 1-year ROI:")
    print("  $50,000 / ($10 × 1,000,000) = 0.5% absolute")
    print("  Relative: 0.5% / 5% = 10% relative lift needed")


business_threshold_framework()


def calculate_required_lift(implementation_cost, value_per_conversion,
                           annual_users, current_rate, payback_period=1):
    """
    Calculate minimum lift needed to justify implementation.
    """
    annual_value_per_pct = value_per_conversion * annual_users / 100

    min_absolute_lift = implementation_cost / (annual_value_per_pct * payback_period)
    min_relative_lift = min_absolute_lift / current_rate * 100

    return {
        'min_absolute_lift': min_absolute_lift,
        'min_relative_lift': min_relative_lift,
        'breakeven_value': implementation_cost,
        'annual_value_per_pct': annual_value_per_pct
    }


# Example
result = calculate_required_lift(
    implementation_cost=50000,
    value_per_conversion=10,
    annual_users=1000000,
    current_rate=5.0,
    payback_period=1
)

print("\nTHRESHOLD CALCULATION RESULT:")
print("-" * 40)
print(f"Minimum absolute lift: {result['min_absolute_lift']:.2f} percentage points")
print(f"Minimum relative lift: {result['min_relative_lift']:.1f}%")
print(f"(Baseline rate: 5.0%)")

Domain-Specific Guidelines

E-commerce / Conversion

def ecommerce_thresholds():
    """
    Typical thresholds for e-commerce metrics.
    """
    print("E-COMMERCE THRESHOLD GUIDELINES")
    print("=" * 50)

    thresholds = {
        "Conversion rate": {
            "typical_baseline": "2-5%",
            "meaningful_lift": "5-10% relative",
            "example": "3% → 3.15% (5% lift) might be worthwhile",
            "note": "Even 0.1% absolute can be huge at scale"
        },
        "Average order value": {
            "typical_baseline": "$50-150",
            "meaningful_lift": "3-5% relative",
            "example": "$80 → $84 (5% lift)",
            "note": "Consider distribution - median may matter more"
        },
        "Revenue per visitor": {
            "typical_baseline": "$1-5",
            "meaningful_lift": "5-15% relative",
            "example": "$2.50 → $2.75 (10% lift)",
            "note": "Compounds conversion and AOV"
        },
        "Cart abandonment": {
            "typical_baseline": "60-80%",
            "meaningful_lift": "2-5% absolute reduction",
            "example": "70% → 67% abandonment",
            "note": "Absolute change matters here"
        }
    }

    for metric, info in thresholds.items():
        print(f"\n{metric}:")
        for k, v in info.items():
            print(f"  {k}: {v}")


ecommerce_thresholds()

SaaS / Retention

def saas_thresholds():
    """
    Typical thresholds for SaaS metrics.
    """
    print("SAAS THRESHOLD GUIDELINES")
    print("=" * 50)

    thresholds = {
        "Monthly churn rate": {
            "typical_baseline": "2-8%",
            "meaningful_lift": "0.5-1% absolute reduction",
            "example": "5% → 4.5% monthly churn",
            "note": "Small changes compound significantly"
        },
        "Trial-to-paid conversion": {
            "typical_baseline": "5-15%",
            "meaningful_lift": "10-20% relative",
            "example": "10% → 11% trial conversion",
            "note": "Very high LTV justifies small lifts"
        },
        "Feature adoption": {
            "typical_baseline": "20-60%",
            "meaningful_lift": "5-10% absolute",
            "example": "30% → 35% feature usage",
            "note": "Depends on feature importance"
        },
        "NPS / Satisfaction": {
            "typical_baseline": "30-50",
            "meaningful_lift": "5-10 points",
            "example": "40 → 45 NPS",
            "note": "Leading indicator of retention"
        }
    }

    for metric, info in thresholds.items():
        print(f"\n{metric}:")
        for k, v in info.items():
            print(f"  {k}: {v}")


saas_thresholds()

Pre-Specifying Thresholds

Why Pre-Specification Matters

def why_prespecify():
    """
    Explain importance of pre-specifying thresholds.
    """
    print("WHY PRE-SPECIFY THRESHOLDS?")
    print("=" * 50)
    print()

    problems_without = [
        "Post-hoc rationalization: 'This effect is big enough'",
        "Goal posts move based on results",
        "Confirmation bias in interpretation",
        "Harder to defend decisions to stakeholders"
    ]

    benefits = [
        "Clear decision criteria before seeing results",
        "Separates analysis from decision-making",
        "Easier to communicate and get buy-in",
        "Documents your thinking for future reference"
    ]

    print("Problems WITHOUT pre-specification:")
    for p in problems_without:
        print(f"  ✗ {p}")

    print("\nBenefits OF pre-specification:")
    for b in benefits:
        print(f"  ✓ {b}")


why_prespecify()

Documentation Template

def threshold_documentation_template():
    """
    Template for documenting practical significance threshold.
    """
    template = """
PRACTICAL SIGNIFICANCE THRESHOLD DOCUMENTATION
==============================================

Experiment: [Name]
Date: [Date]
Author: [Name]

1. METRIC DEFINITION
   Primary metric: [e.g., Conversion rate]
   Current baseline: [e.g., 5.0%]
   Measurement period: [e.g., 2 weeks]

2. THRESHOLD CALCULATION

   Implementation cost: $[X]
   - Engineering: $[Y]
   - Design: $[Z]
   - Other: $[W]

   Expected annual impact per 1% lift: $[A]
   - Users affected: [N]
   - Value per conversion: $[V]

   Payback period: [e.g., 1 year]

   MINIMUM THRESHOLD:
   - Absolute: [X]% lift (e.g., 5.0% → 5.5%)
   - Relative: [Y]% improvement

3. DECISION RULES

   If CI entirely above threshold → SHIP
   If CI entirely below threshold → DO NOT SHIP
   If CI spans threshold → [Decision/Next steps]

4. RATIONALE

   [Why this threshold makes sense for this experiment]
   [Any context-specific considerations]

5. SIGN-OFF

   Approved by: [Name]
   Date: [Date]
"""
    print(template)


threshold_documentation_template()

Making Decisions

Decision Framework

def decision_framework(effect, ci, threshold, alpha=0.05):
    """
    Framework for making decisions with practical significance.
    """
    ci_low, ci_high = ci

    # Statistical significance
    stat_sig = ci_low > 0 or ci_high < 0

    # Practical significance scenarios
    definitely_meaningful = ci_low > threshold
    definitely_trivial = ci_high < threshold
    uncertain = not definitely_meaningful and not definitely_trivial

    print("DECISION FRAMEWORK")
    print("=" * 60)
    print()
    print(f"Effect estimate: {effect:.3f}")
    print(f"95% CI: [{ci_low:.3f}, {ci_high:.3f}]")
    print(f"Practical threshold: {threshold:.3f}")
    print()

    # Visual representation
    print("Visual:")
    print("       Null  Threshold")
    print("        |       |")
    scale_min, scale_max = min(ci_low, 0) - 0.5, max(ci_high, threshold) + 0.5
    print(f"  {'-'*40}")

    print("\nAssessment:")
    print("-" * 40)

    if stat_sig:
        print("✓ Statistically significant (CI excludes 0)")
    else:
        print("✗ NOT statistically significant (CI includes 0)")

    if definitely_meaningful:
        print("✓ Definitely practically meaningful (entire CI > threshold)")
    elif definitely_trivial:
        print("✗ Definitely NOT practically meaningful (entire CI < threshold)")
    else:
        print("? Uncertain practical significance (CI spans threshold)")

    print("\nDECISION:")
    print("-" * 40)

    if stat_sig and definitely_meaningful:
        print("→ SHIP: Strong evidence of meaningful effect")
    elif stat_sig and definitely_trivial:
        print("→ DO NOT SHIP: Effect is real but too small to matter")
    elif stat_sig and uncertain:
        print("→ CONDITIONAL: Effect exists but might not be meaningful")
        print("   Consider: risk tolerance, implementation cost, reversibility")
    elif not stat_sig and definitely_trivial:
        print("→ DO NOT SHIP: No evidence of meaningful effect")
    elif not stat_sig and (uncertain or definitely_meaningful):
        print("→ INCONCLUSIVE: Can't rule out meaningful effect")
        print("   Options: gather more data, accept uncertainty, or go/no-go based on priors")
    else:
        print("→ INCONCLUSIVE: Need more information")


# Examples
print("\n" + "="*70)
print("SCENARIO 1: Clear win")
print("="*70)
decision_framework(effect=0.08, ci=(0.05, 0.11), threshold=0.03)

print("\n" + "="*70)
print("SCENARIO 2: Significant but trivial")
print("="*70)
decision_framework(effect=0.02, ci=(0.01, 0.03), threshold=0.05)

print("\n" + "="*70)
print("SCENARIO 3: Promising but uncertain")
print("="*70)
decision_framework(effect=0.04, ci=(-0.01, 0.09), threshold=0.03)

Common Mistakes

def common_mistakes():
    """
    Common mistakes in practical significance assessment.
    """
    print("COMMON MISTAKES TO AVOID")
    print("=" * 50)

    mistakes = {
        "Using Cohen's benchmarks blindly": {
            "problem": "d = 0.2 is 'small' but could be huge in your context",
            "solution": "Derive thresholds from business impact"
        },
        "Setting threshold post-hoc": {
            "problem": "Easy to rationalize any result as 'meaningful'",
            "solution": "Document threshold before seeing results"
        },
        "Ignoring confidence interval width": {
            "problem": "Point estimate might be above threshold but CI isn't",
            "solution": "Make decisions based on CI bounds, not just point estimate"
        },
        "Conflating statistical and practical significance": {
            "problem": "p < 0.05 doesn't mean the effect matters",
            "solution": "Evaluate both separately"
        },
        "Using same threshold for all metrics": {
            "problem": "Different metrics have different scales and business value",
            "solution": "Calculate threshold for each primary metric"
        }
    }

    for mistake, details in mistakes.items():
        print(f"\n{mistake}:")
        print(f"  Problem: {details['problem']}")
        print(f"  Solution: {details['solution']}")


common_mistakes()

R Implementation

# Practical significance framework in R

decision_framework <- function(effect, ci_low, ci_high, threshold) {
  cat("PRACTICAL SIGNIFICANCE ASSESSMENT\n")
  cat(rep("=", 50), "\n\n")

  cat(sprintf("Effect: %.3f\n", effect))
  cat(sprintf("95%% CI: [%.3f, %.3f]\n", ci_low, ci_high))
  cat(sprintf("Threshold: %.3f\n\n", threshold))

  # Assessments
  stat_sig <- (ci_low > 0) | (ci_high < 0)
  definitely_meaningful <- ci_low > threshold
  definitely_trivial <- ci_high < threshold

  cat("Statistical significance: ")
  cat(ifelse(stat_sig, "Yes\n", "No\n"))

  cat("Practical significance: ")
  if (definitely_meaningful) {
    cat("Definitely meaningful\n")
  } else if (definitely_trivial) {
    cat("Definitely trivial\n")
  } else {
    cat("Uncertain\n")
  }

  cat("\nDECISION: ")
  if (stat_sig && definitely_meaningful) {
    cat("SHIP - Clear meaningful effect\n")
  } else if (stat_sig && definitely_trivial) {
    cat("DO NOT SHIP - Effect too small\n")
  } else {
    cat("NEEDS JUDGMENT - Consider context\n")
  }
}

# Usage:
# decision_framework(effect = 0.05, ci_low = 0.02, ci_high = 0.08, threshold = 0.03)


Key Takeaway

Practical significance means the effect is large enough to matter for your decision. Define this threshold BEFORE analyzing based on business context: implementation cost, expected value, and stakeholder expectations. Statistical significance without practical significance is a false positive for decision-making. Use confidence intervals to assess whether effects are definitively meaningful, definitively trivial, or uncertain—and plan your decision rule for each scenario in advance.


References

  1. https://doi.org/10.1037/a0024338
  2. https://doi.org/10.1002/9781118445112.stat06538
  3. Kirk, R. E. (1996). Practical significance: A concept whose time has come. *Educational and Psychological Measurement*, 56(5), 746-759.
  4. Ferguson, C. J. (2009). An effect size primer: A guide for clinicians and researchers. *Professional Psychology: Research and Practice*, 40(5), 532-538.
  5. Lakens, D., Scheel, A. M., & Isager, P. M. (2018). Equivalence testing for psychological research: A tutorial. *Advances in Methods and Practices in Psychological Science*, 1(2), 259-269.

Frequently Asked Questions

How do I determine what effect size is 'meaningful'?
Consider: What's the implementation cost? What's the expected revenue/impact? At what effect size would benefits exceed costs? Also consider domain norms and stakeholder expectations.
Should I use Cohen's benchmarks (0.2, 0.5, 0.8)?
No, they're arbitrary guidelines from psychology. A d = 0.1 might be hugely important in medicine or at scale in tech. Always derive thresholds from your specific business context.
What if my CI spans both meaningful and trivial effects?
This is an inconclusive result. You can't confidently say the effect is meaningful OR trivial. Options: gather more data, make a risk-based decision, or accept uncertainty and document it.

Key Takeaway

Practical significance means the effect is large enough to matter for your decision. Define this threshold BEFORE analyzing, based on business context: What effect would justify implementation costs? What would stakeholders consider meaningful? Statistical significance without practical significance is a false positive for decision-making.

Send to a friend

Share this with someone who loves clean statistical work.