Bayesian Methods

Credible Intervals vs. Confidence Intervals: What Changes

Understand the real difference between credible and confidence intervals. Learn what each actually means, when it matters, and how to interpret both correctly.

Jan 296 min readstatstest_flow Bayesian Methods Supporting

Credible Intervals vs. Confidence Intervals: What Changes

Quick Hits

•A 95% credible interval means: there is a 95% probability the parameter is in this range
•A 95% confidence interval means: if we repeated the experiment many times, 95% of intervals would contain the true value
•Credible intervals answer the question people actually ask; confidence intervals answer a hypothetical about repetitions
•With large samples and uninformative priors, the two intervals are nearly identical numerically
•Highest Density Intervals (HDI) are the shortest credible intervals and are most useful for skewed posteriors

TL;DR

A 95% credible interval says: "There is a 95% probability the true value is in this range." A 95% confidence interval says: "If we repeated this procedure, 95% of intervals would contain the true value." Most people interpret confidence intervals as if they were credible intervals -- but they are not. This guide explains the real difference, when it matters, and how to use each correctly.

The Core Difference

What a Confidence Interval Actually Means

A 95% confidence interval is a property of the procedure, not the specific interval you calculated.

After computing a confidence interval of [2.1%, 5.3%]:

CORRECT: "If we repeated this experiment and recomputed the interval many times, 95% of those intervals would contain the true parameter."
INCORRECT: "There is a 95% probability that the true parameter is between 2.1% and 5.3%."

The true parameter is fixed. It is either in the interval or it is not. The 95% refers to the long-run frequency of the procedure, not to this particular interval.

What a Credible Interval Actually Means

A 95% credible interval is a property of the posterior distribution for this specific dataset.

After computing a credible interval of [2.0%, 5.4%]:

CORRECT: "Given the data and our prior, there is a 95% probability the parameter is between 2.0% and 5.4%."
This is a direct probability statement about the parameter.

Side-by-Side Example

import numpy as np
from scipy import stats

np.random.seed(42)

# Scenario: Estimating conversion rate lift
# True lift: 3% (unknown to us)
# Observed: 50 extra conversions out of 2000 visitors in treatment
# vs 0 extra in control of 2000

successes_c, n_c = 240, 2000   # Control: 12.0%
successes_t, n_t = 300, 2000   # Treatment: 15.0%

# --- FREQUENTIST: Confidence Interval ---
p_c = successes_c / n_c
p_t = successes_t / n_t
diff = p_t - p_c
se = np.sqrt(p_c*(1-p_c)/n_c + p_t*(1-p_t)/n_t)
ci_freq = (diff - 1.96*se, diff + 1.96*se)

print("FREQUENTIST")
print(f"Point estimate: {diff:.1%}")
print(f"95% Confidence Interval: [{ci_freq[0]:.1%}, {ci_freq[1]:.1%}]")
print("Interpretation: If we repeated this experiment many times,")
print("95% of such intervals would contain the true difference.\n")

# --- BAYESIAN: Credible Interval ---
post_c = stats.beta(1 + successes_c, 1 + n_c - successes_c)
post_t = stats.beta(1 + successes_t, 1 + n_t - successes_t)

samples_c = post_c.rvs(100000)
samples_t = post_t.rvs(100000)
diff_samples = samples_t - samples_c

ci_bayes = np.percentile(diff_samples, [2.5, 97.5])

print("BAYESIAN")
print(f"Posterior mean: {np.mean(diff_samples):.1%}")
print(f"95% Credible Interval: [{ci_bayes[0]:.1%}, {ci_bayes[1]:.1%}]")
print("Interpretation: There is a 95% probability the true")
print("difference is in this range, given our data and prior.")

Notice the numbers are similar. The difference is what you can say about them.

Types of Credible Intervals

Equal-Tailed Interval (ETI)

The most common type. It excludes 2.5% of the posterior on each side.

def equal_tailed_interval(samples, level=0.95):
    """Standard equal-tailed credible interval."""
    alpha = 1 - level
    lower = np.percentile(samples, 100 * alpha / 2)
    upper = np.percentile(samples, 100 * (1 - alpha / 2))
    return lower, upper

Best for: Symmetric or near-symmetric posteriors. Simple to compute and explain.

Highest Density Interval (HDI)

The shortest interval that contains 95% of the posterior mass. It includes the most probable values.

def highest_density_interval(samples, level=0.95):
    """
    Shortest interval containing the specified probability mass.
    More informative than ETI for skewed distributions.
    """
    sorted_samples = np.sort(samples)
    n = len(sorted_samples)
    interval_size = int(np.ceil(level * n))

    # Find the shortest interval
    widths = sorted_samples[interval_size:] - sorted_samples[:n - interval_size]
    best_idx = np.argmin(widths)

    return sorted_samples[best_idx], sorted_samples[best_idx + interval_size]

Best for: Skewed posteriors (e.g., variance parameters, rate parameters). The HDI excludes the least probable values, which makes more intuitive sense.

When They Differ

# Skewed distribution example: estimating a rate parameter
np.random.seed(42)
samples = np.random.gamma(3, 2, 100000)  # Right-skewed

eti = equal_tailed_interval(samples)
hdi = highest_density_interval(samples)

print(f"ETI: [{eti[0]:.2f}, {eti[1]:.2f}] width = {eti[1]-eti[0]:.2f}")
print(f"HDI: [{hdi[0]:.2f}, {hdi[1]:.2f}] width = {hdi[1]-hdi[0]:.2f}")
print(f"\nHDI is shorter because it captures the mode region")
print(f"ETI wastes probability mass in the thin right tail")

For symmetric posteriors (e.g., Normal), ETI and HDI are identical.

When the Difference Matters

Case 1: Small Samples

With small samples, confidence intervals can include impossible values (e.g., negative conversion rates). Credible intervals with proper priors stay within sensible bounds.

# Small sample: 3 out of 10 converted
successes, n = 3, 10

# Frequentist CI (Wald)
p_hat = successes / n
se = np.sqrt(p_hat * (1 - p_hat) / n)
ci_freq = (p_hat - 1.96*se, p_hat + 1.96*se)

# Bayesian credible interval (Beta prior)
post = stats.beta(1 + successes, 1 + n - successes)
ci_bayes = post.ppf([0.025, 0.975])

print(f"Frequentist CI: [{ci_freq[0]:.1%}, {ci_freq[1]:.1%}]")
print(f"Bayesian CI:    [{ci_bayes[0]:.1%}, {ci_bayes[1]:.1%}]")
print(f"\nFrequentist CI lower bound is near zero or negative with tiny samples")
print(f"Bayesian CI stays within [0, 1] naturally due to the Beta distribution")

Case 2: Informative Priors

When you have strong prior information, credible intervals are narrower and more accurate. Confidence intervals ignore prior knowledge entirely.

Case 3: Communication to Stakeholders

"There is a 95% probability the lift is between 1% and 5%" is immediately useful for decisions.

"If we repeated this experiment, 95% of intervals would contain the true lift" is technically correct but practically unhelpful for this specific decision.

Common Pitfalls

Pitfall 1: Interpreting Confidence Intervals as Credible Intervals

This is the most common statistical misinterpretation in applied work. If you catch yourself saying "there is a 95% chance the parameter is in this interval" about a confidence interval, you are making a Bayesian claim without doing Bayesian inference.

Pitfall 2: Assuming Credible Intervals Are Always Narrower

Credible intervals with uninformative priors are the same width as confidence intervals. Informative priors can make them narrower, but skeptical or wide priors can make them wider.

Pitfall 3: Ignoring the Prior's Influence

With small samples, the credible interval is heavily influenced by the prior. Always do a sensitivity analysis: does your conclusion change with a different reasonable prior?

Quick Reference

Feature	Confidence Interval	Credible Interval
Probability statement about parameter	No	Yes
Requires prior	No	Yes
Long-run frequency guarantee	Yes	No (not its purpose)
Works with small samples	Can be unreliable	Stabilized by prior
Handles skewed parameters	Same formula	HDI adapts to shape
Computational cost	Low	Low to moderate

Bayesian Methods Overview (Pillar) - Full Bayesian framework
Bayesian vs. Frequentist - Framework comparison
Prior Selection - How priors affect intervals
Bayesian A/B Testing - Credible intervals in experiments

Key Takeaway

Credible intervals give you what most people want: a direct probability statement that the parameter lies in a given range. Confidence intervals describe the long-run performance of a procedure, not a probability about the parameter for your specific experiment. In practice, the numbers are often similar, but the interpretation is fundamentally different. Use credible intervals when you want to make direct probability statements about your parameter.

References

https://doi.org/10.3758/s13423-013-0572-3
https://doi.org/10.1016/j.jmp.2017.05.006
https://mc-stan.org/users/documentation/

Frequently Asked Questions

Do credible intervals and confidence intervals give different numbers?

With large samples and flat priors, they are nearly identical. The numbers diverge with small samples, informative priors, or non-symmetric distributions. The key difference is interpretation, not necessarily the numbers themselves.

Which interval should I report?

Report the interval that matches your framework. If you ran a frequentist test, report a confidence interval. If you ran a Bayesian analysis, report a credible interval. Be explicit about which one it is and what it means. Never describe a confidence interval as if it were a credible interval.

What is a Highest Density Interval (HDI)?

An HDI is the shortest interval that contains 95% of the posterior probability. For symmetric posteriors, it matches the equal-tailed interval. For skewed posteriors, the HDI is more informative because it includes the most probable values. It is the default recommendation for Bayesian analysis.

Key Takeaway

Send to a friend

Share this with someone who loves clean statistical work.

Facebook X Reddit LinkedIn Email