Effect Sizes

P-Values vs. Confidence Intervals: How to Interpret Both for Decisions

Understand the relationship between p-values and confidence intervals, when they agree, when they seem to disagree, and how to use them together for better decisions.

Share

Quick Hits

  • P-values tell you probability of data under null; CIs tell you plausible parameter values
  • 95% CI excludes 0 ⟺ p < 0.05 (for testing against null of 0)
  • CIs give more information: direction, magnitude, and precision
  • For decisions, CI bounds matter more than p-values

TL;DR

P-values tell you the probability of seeing your data (or more extreme) if the null hypothesis is true. Confidence intervals give you a range of plausible values for the true parameter. They're mathematically linked: a 95% CI that excludes zero corresponds to p < 0.05. But CIs are more informative for decisions because they show effect magnitude and precision, not just whether to reject the null.


What Each Tells You

P-Values

import numpy as np
from scipy import stats

def explain_p_value():
    """
    Clarify what a p-value actually means.
    """
    print("WHAT A P-VALUE TELLS YOU")
    print("=" * 50)
    print()
    print("P-value = P(data this extreme or more | H₀ is true)")
    print()
    print("In plain English:")
    print("  'If there were truly no effect, what's the probability")
    print("   of seeing results as extreme as what we observed?'")
    print()
    print("P < 0.05 means:")
    print("  'This would be surprising if H₀ were true'")
    print("  'We reject H₀ at the 0.05 level'")
    print()
    print("P-VALUE DOES NOT MEAN:")
    print("  ✗ P(H₀ is true)")
    print("  ✗ P(H₁ is true)")
    print("  ✗ Probability the effect is real")
    print("  ✗ Size of the effect")
    print()
    print("WHAT P-VALUES DON'T TELL YOU:")
    print("  • How big the effect is")
    print("  • Whether the effect matters practically")
    print("  • The probability of replication")


explain_p_value()

Confidence Intervals

def explain_confidence_interval():
    """
    Clarify what a confidence interval means.
    """
    print("WHAT A CONFIDENCE INTERVAL TELLS YOU")
    print("=" * 50)
    print()
    print("95% CI: A range constructed such that if we repeated")
    print("the study many times, 95% of such intervals would")
    print("contain the true parameter value.")
    print()
    print("In practice:")
    print("  'We're 95% confident the true effect is in this range'")
    print()
    print("WHAT A CI TELLS YOU:")
    print("  ✓ Plausible values for the true effect")
    print("  ✓ Precision of the estimate (narrow = precise)")
    print("  ✓ Whether the effect is significant (if 0 excluded)")
    print("  ✓ Whether the effect might be practically important")
    print()
    print("CI DOES NOT MEAN:")
    print("  ✗ 95% of the data falls in this range")
    print("  ✗ 95% probability the true value is in THIS interval")
    print("     (The true value is fixed; it's either in or out)")


explain_confidence_interval()

The Mathematical Relationship

They're Two Sides of the Same Coin

def demonstrate_relationship():
    """
    Show the mathematical link between p-values and CIs.
    """
    np.random.seed(42)

    # Generate data
    control = np.random.normal(100, 15, 50)
    treatment = np.random.normal(108, 15, 50)

    diff = np.mean(treatment) - np.mean(control)
    se = np.sqrt(np.var(control, ddof=1)/50 + np.var(treatment, ddof=1)/50)

    # P-value (two-sided test against H₀: diff = 0)
    t_stat = diff / se
    df = 98  # Approximately
    p_value = 2 * (1 - stats.t.cdf(abs(t_stat), df))

    # 95% CI
    t_crit = stats.t.ppf(0.975, df)
    ci_low = diff - t_crit * se
    ci_high = diff + t_crit * se

    print("P-VALUE AND CI RELATIONSHIP")
    print("=" * 50)
    print()
    print(f"Observed difference: {diff:.2f}")
    print(f"Standard error: {se:.2f}")
    print()
    print(f"P-value: {p_value:.4f}")
    print(f"  → p {'<' if p_value < 0.05 else '≥'} 0.05")
    print()
    print(f"95% CI: [{ci_low:.2f}, {ci_high:.2f}]")
    print(f"  → 0 is {'NOT ' if ci_low > 0 or ci_high < 0 else ''}in CI")
    print()
    print("THE KEY RELATIONSHIP:")
    print("  95% CI excludes 0 ⟺ p < 0.05")
    print("  99% CI excludes 0 ⟺ p < 0.01")
    print("  (1-α)% CI excludes H₀ ⟺ p < α")


demonstrate_relationship()

Visual Demonstration

def visualize_ci_pvalue_link():
    """
    Show how CI relates to p-value visually.
    """
    import matplotlib.pyplot as plt

    np.random.seed(42)

    # Three scenarios
    scenarios = [
        {'diff': 10, 'se': 3, 'label': 'Significant'},
        {'diff': 3, 'se': 3, 'label': 'Borderline'},
        {'diff': 1, 'se': 3, 'label': 'Not significant'}
    ]

    fig, axes = plt.subplots(1, 3, figsize=(15, 4))

    for ax, s in zip(axes, scenarios):
        ci_low = s['diff'] - 1.96 * s['se']
        ci_high = s['diff'] + 1.96 * s['se']
        p_val = 2 * (1 - stats.norm.cdf(abs(s['diff'] / s['se'])))

        # Plot CI
        ax.errorbar(s['diff'], 0, xerr=[[s['diff'] - ci_low], [ci_high - s['diff']]],
                   fmt='o', capsize=10, markersize=10, capthick=2)
        ax.axvline(0, color='red', linestyle='--', label='Null (H₀: diff=0)')
        ax.set_xlim(-10, 20)
        ax.set_ylim(-0.5, 0.5)
        ax.set_title(f"{s['label']}\nDiff={s['diff']}, p={p_val:.3f}")
        ax.set_xlabel('Effect Size')
        ax.legend()

    plt.tight_layout()
    return fig

Why CIs Are Often More Useful

CI Tells You What P-Value Can't

def ci_advantages():
    """
    Demonstrate advantages of CIs over p-values.
    """
    np.random.seed(42)

    # Two scenarios with same p-value, very different implications
    scenarios = [
        {
            'name': 'Precise estimate',
            'diff': 5,
            'se': 2.5,
            'n': 500
        },
        {
            'name': 'Imprecise estimate',
            'diff': 5,
            'se': 2.5,
            'n': 50  # Same result but from smaller sample
        }
    ]

    print("TWO SCENARIOS WITH SAME P-VALUE")
    print("=" * 60)

    for s in scenarios:
        ci_low = s['diff'] - 1.96 * s['se']
        ci_high = s['diff'] + 1.96 * s['se']
        p_val = 2 * (1 - stats.norm.cdf(abs(s['diff'] / s['se'])))

        print(f"\n{s['name']} (n = {s['n']}):")
        print(f"  Observed difference: {s['diff']}")
        print(f"  P-value: {p_val:.4f}")
        print(f"  95% CI: [{ci_low:.2f}, {ci_high:.2f}]")

    print()
    print("SAME P-VALUE, BUT CI TELLS YOU:")
    print("  • CI width shows precision")
    print("  • CI bounds show plausible effect range")
    print("  • You can assess practical significance from CI bounds")


ci_advantages()


def ci_for_decision_making():
    """
    How to use CIs for decisions.
    """
    print("\nUSING CIs FOR DECISIONS")
    print("=" * 60)
    print()

    scenarios = {
        'CI fully above threshold': {
            'ci': (3, 7),
            'threshold': 2,
            'decision': 'Implement - effect definitely exceeds threshold'
        },
        'CI overlaps threshold': {
            'ci': (1, 5),
            'threshold': 2,
            'decision': 'Uncertain - need more data or consider risk tolerance'
        },
        'CI fully below threshold': {
            'ci': (0.5, 2.5),
            'threshold': 3,
            'decision': 'Don\'t implement - effect likely below threshold'
        },
        'CI contains zero but above threshold possible': {
            'ci': (-1, 4),
            'threshold': 2,
            'decision': 'Not significant, but practical effect possible - more data needed'
        }
    }

    for name, info in scenarios.items():
        print(f"\n{name}:")
        print(f"  CI: [{info['ci'][0]}, {info['ci'][1]}]")
        print(f"  Practical threshold: {info['threshold']}")
        print(f"  Decision: {info['decision']}")


ci_for_decision_making()

Interpreting Together

The Complete Picture

def interpret_together(diff, se, threshold=None, alpha=0.05):
    """
    Interpret p-value and CI together for decisions.
    """
    # Calculate statistics
    z = diff / se
    p_value = 2 * (1 - stats.norm.cdf(abs(z)))
    z_crit = stats.norm.ppf(1 - alpha/2)
    ci_low = diff - z_crit * se
    ci_high = diff + z_crit * se

    print("INTEGRATED INTERPRETATION")
    print("=" * 60)
    print()
    print(f"Point estimate: {diff:.3f}")
    print(f"95% CI: [{ci_low:.3f}, {ci_high:.3f}]")
    print(f"P-value: {p_value:.4f}")

    print()
    print("STATISTICAL SIGNIFICANCE:")
    if p_value < alpha:
        print(f"  ✓ Significant at α = {alpha}")
        print(f"    (CI does not include 0)")
    else:
        print(f"  ✗ Not significant at α = {alpha}")
        print(f"    (CI includes 0)")

    if threshold:
        print()
        print("PRACTICAL SIGNIFICANCE:")
        if ci_low > threshold:
            print(f"  ✓ Definitely exceeds threshold ({threshold})")
            print(f"    (Entire CI above threshold)")
        elif ci_high < threshold:
            print(f"  ✗ Definitely below threshold ({threshold})")
            print(f"    (Entire CI below threshold)")
        else:
            print(f"  ? Uncertain relative to threshold ({threshold})")
            print(f"    (CI overlaps threshold)")

    print()
    print("RECOMMENDATION:")
    if p_value < alpha and threshold and ci_low > threshold:
        print("  Strong evidence for meaningful effect - implement")
    elif p_value < alpha and threshold and ci_high < threshold:
        print("  Significant but below threshold - may not be worth implementing")
    elif p_value < alpha:
        print("  Significant - examine CI to assess practical importance")
    elif threshold and ci_high > threshold:
        print("  Not significant, but practical effect still possible - gather more data")
    else:
        print("  No significant effect and unlikely to be practically important")


# Examples
print("\n" + "="*70)
print("SCENARIO 1: Clearly beneficial")
print("="*70)
interpret_together(diff=10, se=3, threshold=5)

print("\n" + "="*70)
print("SCENARIO 2: Significant but possibly trivial")
print("="*70)
interpret_together(diff=2, se=0.5, threshold=5)

print("\n" + "="*70)
print("SCENARIO 3: Not significant but potentially meaningful")
print("="*70)
interpret_together(diff=5, se=4, threshold=5)

Common Misinterpretations

def common_misinterpretations():
    """
    Address common misunderstandings.
    """
    print("COMMON MISINTERPRETATIONS TO AVOID")
    print("=" * 60)

    misinterpretations = {
        'P-value myths': [
            ('p = 0.03 means 3% chance null is true',
             'P-value is P(data|H₀), not P(H₀|data)'),
            ('p = 0.03 is "more significant" than p = 0.04',
             'Both just suggest H₀ is unlikely; don\'t over-interpret small differences'),
            ('p > 0.05 means no effect exists',
             'It means we can\'t rule out chance, not that effect is zero'),
        ],
        'CI myths': [
            ('95% probability true value is in this interval',
             'True value is fixed; either it\'s in there or not'),
            ('95% of data falls in this interval',
             'CI is about parameter estimate, not data spread'),
            ('Overlapping CIs mean no significant difference',
             'CIs can overlap but groups still differ significantly'),
        ]
    }

    for category, myths in misinterpretations.items():
        print(f"\n{category}:")
        print("-" * 50)
        for myth, reality in myths:
            print(f"\n  ✗ WRONG: {myth}")
            print(f"  ✓ RIGHT: {reality}")


common_misinterpretations()

Practical Decision Framework

def decision_framework():
    """
    Framework for using p-values and CIs in decisions.
    """
    print("""
DECISION FRAMEWORK
==================

STEP 1: Define what matters BEFORE analysis
  • What's the minimum effect that would change your decision?
  • What's the acceptable risk of false positive/negative?

STEP 2: Look at CI first
  • What range of effects is plausible?
  • Does the CI include practically important effects?
  • Does the CI include trivial effects?

STEP 3: Consider p-value
  • Is the result statistically significant?
  • If significant but CI includes trivial effects: beware over-interpretation
  • If not significant but CI includes important effects: consider getting more data

STEP 4: Make decision

  Scenario A: CI entirely in "actionable" range, p < α
  → Strong evidence to act

  Scenario B: CI entirely in "trivial" range, p < α
  → Significant but not worth acting on

  Scenario C: CI in "actionable" range, p > α
  → Promising but uncertain; consider more data

  Scenario D: CI entirely in "trivial" range, p > α
  → No evidence of meaningful effect

  Scenario E: CI spans trivial and actionable
  → Inconclusive; more data needed for confident decision
    """)


decision_framework()

R Implementation

# P-value and CI interpretation in R

interpret_result <- function(diff, se, threshold = NULL, alpha = 0.05) {
  z <- diff / se
  p_value <- 2 * (1 - pnorm(abs(z)))
  z_crit <- qnorm(1 - alpha/2)
  ci_low <- diff - z_crit * se
  ci_high <- diff + z_crit * se

  cat("INTEGRATED INTERPRETATION\n")
  cat(rep("=", 50), "\n\n")

  cat(sprintf("Point estimate: %.3f\n", diff))
  cat(sprintf("95%% CI: [%.3f, %.3f]\n", ci_low, ci_high))
  cat(sprintf("P-value: %.4f\n", p_value))

  cat("\nStatistical significance:\n")
  if (p_value < alpha) {
    cat(sprintf("  Significant at alpha = %.2f\n", alpha))
  } else {
    cat(sprintf("  Not significant at alpha = %.2f\n", alpha))
  }

  if (!is.null(threshold)) {
    cat("\nPractical significance:\n")
    if (ci_low > threshold) {
      cat(sprintf("  Definitely exceeds threshold (%.1f)\n", threshold))
    } else if (ci_high < threshold) {
      cat(sprintf("  Definitely below threshold (%.1f)\n", threshold))
    } else {
      cat(sprintf("  Uncertain relative to threshold (%.1f)\n", threshold))
    }
  }

  invisible(list(p_value = p_value, ci = c(ci_low, ci_high)))
}

# Usage:
# interpret_result(diff = 5, se = 2, threshold = 3)


Key Takeaway

P-values and confidence intervals are mathematically linked but serve different purposes. P-values address whether an effect exists (statistical significance). CIs show how big it might be and with what precision. For decisions, focus on the CI: Does it contain only trivial effects? Only meaningful effects? Both? This determines your action more reliably than whether p crosses 0.05. Report both, but let the CI guide your practical interpretation.


References

  1. https://doi.org/10.1038/d41586-019-00857-9
  2. https://www.jstor.org/stable/2684655
  3. Cumming, G., & Finch, S. (2005). Inference by eye: confidence intervals and how to read pictures of data. *American Psychologist*, 60(2), 170-180.
  4. Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Goodman, S. N., & Altman, D. G. (2016). Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. *European Journal of Epidemiology*, 31(4), 337-350.
  5. Wasserstein, R. L., & Lazar, N. A. (2016). The ASA statement on p-values: context, process, and purpose. *The American Statistician*, 70(2), 129-133.

Frequently Asked Questions

Which is better, p-values or confidence intervals?
Both have value. P-values directly address 'is there an effect?' CIs address 'how big and how certain?' For most decisions, CIs are more informative because they show the range of plausible effect sizes.
Why do some statisticians want to abandon p-values?
P-values are often misinterpreted (they don't give P(H₀|data)), encourage binary thinking, and don't communicate effect size or precision. CIs address these issues while still allowing significance assessment.
Can a 95% CI contain 0 but p > 0.05?
No, they're mathematically linked for two-sided tests. If the 95% CI excludes the null value, p < 0.05, and vice versa. Apparent disagreements usually involve comparing different CIs (e.g., separate group CIs vs. CI for difference).

Key Takeaway

P-values and confidence intervals are mathematically related but answer different questions. P-values ask 'does it exist?' while CIs ask 'how big and how precise?' For decision-making, CIs are usually more informative because they show the range of plausible effect sizes. Use both together: p-values for significance, CIs for practical interpretation.

Send to a friend

Share this with someone who loves clean statistical work.