Distributions

Delta Method vs. Bootstrap: When Each Is Appropriate

A practical guide to choosing between delta method and bootstrap for variance estimation. Learn when each approach excels, their assumptions, and how to implement both.

Share

Quick Hits

  • Delta method: fast, analytical, works well for large n and smooth functions
  • Bootstrap: flexible, no closed-form needed, handles complex statistics
  • Both should give similar results for well-behaved data—compare as sanity check
  • Heavy tails favor bootstrap; ratios with large n favor delta method
  • When in doubt, use both and report if they differ substantially

TL;DR

The delta method and bootstrap are two approaches to variance estimation for complex statistics. Delta method is analytical—fast and efficient for large samples with smooth functions of means. Bootstrap is computational—flexible and robust for complex statistics without closed-form variances. Both should give similar results for well-behaved data; when they disagree, investigate why. This guide helps you choose the right tool for your situation.


The Two Approaches

Delta Method

Core idea: Approximate a function of random variables using Taylor expansion.

For statistic $g(\bar{X})$ where $\bar{X}$ is a vector of sample means:

$$\text{Var}(g(\bar{X})) \approx \nabla g(\mu)^T \cdot \Sigma \cdot \nabla g(\mu) / n$$

Where:

  • $\nabla g(\mu)$ = gradient of g evaluated at true means
  • $\Sigma$ = covariance matrix of underlying variables
  • $n$ = sample size

Example: For ratio $R = \bar{Y}/\bar{X}$: $$\text{Var}(R) \approx \frac{1}{\mu_X^2} \left[ \sigma_Y^2 - 2R\sigma_{XY} + R^2\sigma_X^2 \right] / n$$

Bootstrap

Core idea: Treat sample as population, resample to estimate sampling variability.

  1. Draw B resamples with replacement
  2. Compute statistic on each resample
  3. Use distribution of bootstrap statistics for inference

Example: For any statistic $T$: $$\hat{SE}(T) = \text{SD of } {T_1^, T_2^, ..., T_B^*}$$


Quick Comparison

Aspect Delta Method Bootstrap
Computation Fast (analytical) Slow (many resamples)
Assumptions Large n, smooth function Sample represents population
Flexibility Need variance formula Works for any statistic
Heavy tails Can fail More robust
Small samples Can fail Better (with caveats)
Implementation Needs calculus Straightforward

Implementation Side-by-Side

import numpy as np
from scipy import stats


def delta_method_ratio(numerator, denominator):
    """
    Delta method SE for ratio of means.
    """
    n = len(numerator)

    mean_y = np.mean(numerator)
    mean_x = np.mean(denominator)
    ratio = mean_y / mean_x

    var_y = np.var(numerator, ddof=1)
    var_x = np.var(denominator, ddof=1)
    cov_xy = np.cov(numerator, denominator, ddof=1)[0, 1]

    var_ratio = (1 / mean_x**2) * (
        var_y - 2 * ratio * cov_xy + ratio**2 * var_x
    ) / n

    return {'estimate': ratio, 'se': np.sqrt(var_ratio)}


def bootstrap_ratio(numerator, denominator, n_bootstrap=5000):
    """
    Bootstrap SE for ratio of means.
    """
    n = len(numerator)
    ratio = np.sum(numerator) / np.sum(denominator)

    boot_ratios = []
    for _ in range(n_bootstrap):
        idx = np.random.choice(n, n, replace=True)
        boot_ratio = np.sum(numerator[idx]) / np.sum(denominator[idx])
        boot_ratios.append(boot_ratio)

    return {
        'estimate': ratio,
        'se': np.std(boot_ratios),
        'ci': np.percentile(boot_ratios, [2.5, 97.5])
    }


def compare_methods(numerator, denominator, n_bootstrap=5000):
    """
    Compare delta method and bootstrap for the same statistic.
    """
    delta_result = delta_method_ratio(numerator, denominator)
    boot_result = bootstrap_ratio(numerator, denominator, n_bootstrap)

    return {
        'delta': delta_result,
        'bootstrap': boot_result,
        'se_ratio': delta_result['se'] / boot_result['se'],
        'agreement': abs(delta_result['se'] - boot_result['se']) / boot_result['se'] < 0.1
    }


# Example: Well-behaved data
np.random.seed(42)
n = 1000
clicks = np.random.poisson(5, n)
impressions = np.random.poisson(50, n)

result = compare_methods(clicks, impressions)

print("Method Comparison: CTR with Well-Behaved Data")
print("=" * 50)
print(f"Sample size: {n}")
print(f"\nDelta Method:")
print(f"  CTR estimate: {result['delta']['estimate']:.4f}")
print(f"  SE: {result['delta']['se']:.6f}")
print(f"\nBootstrap:")
print(f"  CTR estimate: {result['bootstrap']['estimate']:.4f}")
print(f"  SE: {result['bootstrap']['se']:.6f}")
print(f"  95% CI: ({result['bootstrap']['ci'][0]:.4f}, {result['bootstrap']['ci'][1]:.4f})")
print(f"\nDelta/Bootstrap SE ratio: {result['se_ratio']:.3f}")
print(f"Methods agree (<10% difference): {result['agreement']}")

When Delta Method Excels

Large Samples

Delta method is asymptotically correct—as n → ∞, it converges to the true variance.

def show_delta_convergence():
    """
    Show delta method improves with sample size.
    """
    np.random.seed(42)

    true_ctr = 0.10
    n_simulations = 500

    sample_sizes = [100, 500, 1000, 5000]
    results = []

    for n in sample_sizes:
        delta_ses = []
        boot_ses = []

        for _ in range(n_simulations):
            impressions = np.random.poisson(50, n)
            clicks = np.random.binomial(impressions, true_ctr)

            delta_ses.append(delta_method_ratio(clicks, impressions)['se'])
            boot_ses.append(bootstrap_ratio(clicks, impressions, n_bootstrap=1000)['se'])

        results.append({
            'n': n,
            'delta_mean_se': np.mean(delta_ses),
            'boot_mean_se': np.mean(boot_ses),
            'delta_se_of_se': np.std(delta_ses),
            'boot_se_of_se': np.std(boot_ses)
        })

    print("Delta Method Performance by Sample Size")
    print("=" * 60)
    print(f"{'n':>8} {'Delta SE':>12} {'Boot SE':>12} {'Ratio':>10}")
    print("-" * 60)
    for r in results:
        ratio = r['delta_mean_se'] / r['boot_mean_se']
        print(f"{r['n']:>8} {r['delta_mean_se']:>12.6f} {r['boot_mean_se']:>12.6f} {ratio:>10.3f}")


show_delta_convergence()

Fast Computation Needs

When you need to compute thousands of SEs (e.g., for many metrics):

import time


def benchmark_speed(n=1000, n_metrics=100):
    """
    Compare computation time.
    """
    np.random.seed(42)

    # Generate data
    numerators = [np.random.poisson(5, n) for _ in range(n_metrics)]
    denominators = [np.random.poisson(50, n) for _ in range(n_metrics)]

    # Delta method
    start = time.time()
    for num, denom in zip(numerators, denominators):
        delta_method_ratio(num, denom)
    delta_time = time.time() - start

    # Bootstrap (just 1000 resamples to keep it reasonable)
    start = time.time()
    for num, denom in zip(numerators, denominators):
        bootstrap_ratio(num, denom, n_bootstrap=1000)
    boot_time = time.time() - start

    print(f"Speed Comparison ({n_metrics} metrics)")
    print("=" * 40)
    print(f"Delta method: {delta_time:.2f}s")
    print(f"Bootstrap (1000 resamples): {boot_time:.2f}s")
    print(f"Speedup: {boot_time/delta_time:.0f}x")


benchmark_speed()

When Bootstrap Excels

Heavy-Tailed Data

Bootstrap handles non-normal distributions better:

def compare_on_heavy_tails():
    """
    Show bootstrap advantage with heavy-tailed data.
    """
    np.random.seed(42)
    n = 200
    n_simulations = 500

    # Heavy-tailed: log-normal with whales
    def generate_heavy_tailed(n):
        revenue = np.random.lognormal(3, 1.5, n)
        # Add some whales
        whale_idx = np.random.choice(n, max(1, n // 100), replace=False)
        revenue[whale_idx] *= np.random.uniform(10, 50, len(whale_idx))
        return revenue, np.ones(n)  # Denominator is 1 (per-user metric)

    # Track coverage
    true_mean = np.exp(3 + 1.5**2/2) * 1.05  # Approximate true mean with whales

    delta_coverage = 0
    boot_coverage = 0

    for _ in range(n_simulations):
        revenue, ones = generate_heavy_tailed(n)

        # Delta method (for mean, this is just standard SE)
        mean_est = np.mean(revenue)
        se_delta = np.std(revenue, ddof=1) / np.sqrt(n)
        ci_delta = (mean_est - 1.96 * se_delta, mean_est + 1.96 * se_delta)

        # Bootstrap
        boot_means = [np.mean(np.random.choice(revenue, n, replace=True))
                      for _ in range(2000)]
        ci_boot = np.percentile(boot_means, [2.5, 97.5])

        # Check coverage (using sample mean as "true" is imperfect, but illustrative)
        if ci_delta[0] <= mean_est <= ci_delta[1]:
            delta_coverage += 1
        if ci_boot[0] <= mean_est <= ci_boot[1]:
            boot_coverage += 1

    print("Heavy-Tailed Data (n=200)")
    print("=" * 50)
    print("Note: Comparing CI width and symmetry")
    print("\nSingle sample comparison:")

    revenue, ones = generate_heavy_tailed(200)
    mean_est = np.mean(revenue)
    se_delta = np.std(revenue, ddof=1) / np.sqrt(200)

    boot_means = [np.mean(np.random.choice(revenue, 200, replace=True))
                  for _ in range(5000)]

    ci_delta = (mean_est - 1.96 * se_delta, mean_est + 1.96 * se_delta)
    ci_boot = np.percentile(boot_means, [2.5, 97.5])

    print(f"Sample mean: ${mean_est:.2f}")
    print(f"\nDelta method 95% CI:")
    print(f"  ({ci_delta[0]:.2f}, {ci_delta[1]:.2f})")
    print(f"  Width: {ci_delta[1] - ci_delta[0]:.2f}")
    print(f"  Symmetric around mean")

    print(f"\nBootstrap 95% CI:")
    print(f"  ({ci_boot[0]:.2f}, {ci_boot[1]:.2f})")
    print(f"  Width: {ci_boot[1] - ci_boot[0]:.2f}")
    print(f"  Lower asymmetry: {mean_est - ci_boot[0]:.2f}")
    print(f"  Upper asymmetry: {ci_boot[1] - mean_est:.2f}")
    print("\nBootstrap CI is asymmetric, reflecting the skewed distribution")


compare_on_heavy_tails()

Complex Statistics

When no analytical formula exists:

def bootstrap_complex_statistic():
    """
    Bootstrap for statistics without closed-form variance.
    """
    np.random.seed(42)
    n = 500

    # Complex statistic: ratio of medians
    group_a = np.random.lognormal(3, 1, n)
    group_b = np.random.lognormal(3.1, 1, n)  # 10% higher

    # No delta method formula for median ratio
    point_est = np.median(group_b) / np.median(group_a)

    # Bootstrap
    boot_ratios = []
    for _ in range(5000):
        boot_a = np.random.choice(group_a, n, replace=True)
        boot_b = np.random.choice(group_b, n, replace=True)
        boot_ratios.append(np.median(boot_b) / np.median(boot_a))

    ci = np.percentile(boot_ratios, [2.5, 97.5])
    se = np.std(boot_ratios)

    print("Complex Statistic: Ratio of Medians")
    print("=" * 50)
    print(f"Point estimate: {point_est:.3f}")
    print(f"Bootstrap SE: {se:.3f}")
    print(f"95% CI: ({ci[0]:.3f}, {ci[1]:.3f})")
    print("\nNo analytical formula exists for this—bootstrap is the only option")


bootstrap_complex_statistic()

Small Samples

Bootstrap can capture non-normality that delta method misses:

def small_sample_comparison():
    """
    Compare methods with small n.
    """
    np.random.seed(42)

    n_simulations = 1000
    true_ratio = 0.10

    for n in [30, 50, 100]:
        delta_ci_covers = 0
        boot_ci_covers = 0

        for _ in range(n_simulations):
            impressions = np.random.poisson(20, n)
            clicks = np.random.binomial(impressions, true_ratio)

            # Delta method
            delta = delta_method_ratio(clicks, impressions)
            ci_delta = (delta['estimate'] - 1.96 * delta['se'],
                        delta['estimate'] + 1.96 * delta['se'])

            # Bootstrap
            boot = bootstrap_ratio(clicks, impressions, n_bootstrap=2000)
            ci_boot = boot['ci']

            # Coverage
            if ci_delta[0] <= true_ratio <= ci_delta[1]:
                delta_ci_covers += 1
            if ci_boot[0] <= true_ratio <= ci_boot[1]:
                boot_ci_covers += 1

        print(f"n={n}: Delta coverage={delta_ci_covers/n_simulations:.1%}, "
              f"Bootstrap coverage={boot_ci_covers/n_simulations:.1%}")


print("Small Sample Coverage (target: 95%)")
print("=" * 50)
small_sample_comparison()

Decision Framework

START: Need variance/CI for a statistic
       ↓
QUESTION: Is there an analytical formula?
├── No → Bootstrap
└── Yes → Continue
       ↓
QUESTION: Sample size?
├── Small (n < 100) → Bootstrap (more robust)
├── Large (n > 500) → Delta method (faster, accurate)
└── Medium → Either, or both
       ↓
QUESTION: Distribution shape?
├── Heavy-tailed or very skewed → Bootstrap
├── Roughly symmetric → Delta method
└── Unknown → Bootstrap or both
       ↓
QUESTION: Computation constraints?
├── Need speed (many metrics) → Delta method
├── One-off analysis → Bootstrap (safer)
└── No constraints → Both as cross-check
       ↓
VALIDATE: If both computed, do they agree?
├── Yes (within 10%) → Use either, report one
├── No → Investigate; bootstrap often more reliable
└── Very different → Report both, explain discrepancy

Practical Recommendations

When to Use Delta Method

  1. Large samples (n > 500) where asymptotic approximations hold
  2. Ratio metrics (CTR, conversion) with moderate variation
  3. Production systems requiring fast computation
  4. Standard statistics with known variance formulas

When to Use Bootstrap

  1. Heavy-tailed data (revenue, engagement)
  2. Complex statistics (quantiles, trimmed means, custom metrics)
  3. Small-to-moderate samples (n < 200)
  4. When in doubt (it's the safer default)

Best Practice: Use Both

def robust_inference(numerator, denominator, n_bootstrap=5000):
    """
    Compute both delta and bootstrap, flag if they disagree.
    """
    delta = delta_method_ratio(numerator, denominator)
    boot = bootstrap_ratio(numerator, denominator, n_bootstrap)

    se_ratio = delta['se'] / boot['se']
    agreement = 0.8 < se_ratio < 1.2

    result = {
        'estimate': delta['estimate'],
        'delta_se': delta['se'],
        'delta_ci': (delta['estimate'] - 1.96 * delta['se'],
                     delta['estimate'] + 1.96 * delta['se']),
        'bootstrap_se': boot['se'],
        'bootstrap_ci': boot['ci'],
        'methods_agree': agreement
    }

    if not agreement:
        result['warning'] = f"Methods disagree: delta/boot SE ratio = {se_ratio:.2f}"

    return result


# Example
np.random.seed(42)
clicks = np.random.poisson(5, 500)
impressions = np.random.poisson(50, 500)

result = robust_inference(clicks, impressions)
print("Robust Inference Report")
print("=" * 50)
print(f"Estimate: {result['estimate']:.4f}")
print(f"Delta method SE: {result['delta_se']:.6f}")
print(f"Bootstrap SE: {result['bootstrap_se']:.6f}")
print(f"Methods agree: {result['methods_agree']}")
if 'warning' in result:
    print(f"⚠️ {result['warning']}")

R Implementation

# Delta method (using msm package)
library(msm)

# For ratio
delta_ratio <- function(y, x) {
    means <- c(mean(y), mean(x))
    vcov_mat <- cov(cbind(y, x)) / length(y)

    # g(mu_y, mu_x) = mu_y / mu_x
    # Gradient: (1/mu_x, -mu_y/mu_x^2)
    g <- function(mu) mu[1] / mu[2]

    deltamethod(~ x1 / x2, means, vcov_mat)
}

# Bootstrap (using boot package)
library(boot)

boot_ratio <- function(data, indices) {
    d <- data[indices, ]
    sum(d$y) / sum(d$x)
}

data <- data.frame(y = clicks, x = impressions)
boot_result <- boot(data, boot_ratio, R = 5000)
boot.ci(boot_result, type = "bca")


Key Takeaway

Delta method and bootstrap are complementary approaches to variance estimation. Delta method is fast and efficient for large samples with smooth statistics—perfect for production systems computing many ratio metrics. Bootstrap is more flexible and robust—handles complex statistics, heavy tails, and small samples where asymptotic approximations fail. For important analyses, compute both: agreement gives confidence; disagreement signals investigation needed. When they differ substantially, bootstrap is usually more reliable for the difficult cases that matter most.


References

  1. https://doi.org/10.1214/ss/1177013815
  2. https://doi.org/10.1111/j.1467-9868.2010.00753.x
  3. https://www.jstor.org/stable/2289144
  4. Efron, B., & Tibshirani, R. J. (1993). *An Introduction to the Bootstrap*. Chapman & Hall.
  5. Casella, G., & Berger, R. L. (2002). *Statistical Inference* (2nd ed.). Duxbury.
  6. Van der Vaart, A. W. (1998). *Asymptotic Statistics*. Cambridge University Press.

Frequently Asked Questions

When should I use delta method vs. bootstrap?
Use delta method when you have large samples, the statistic is a smooth function of means, and you want fast computation. Use bootstrap when samples are small-to-moderate, distributions are heavy-tailed or complex, or no analytical variance formula exists for your statistic.
Can delta method and bootstrap give different answers?
Yes, especially for small samples, heavy tails, or non-smooth statistics. If they differ substantially, investigate why—often bootstrap is more reliable for difficult distributions, but check bootstrap stability (enough resamples) and sample representativeness.
Is one method more correct than the other?
Neither is universally better. Delta method relies on asymptotic approximations (large sample theory); bootstrap relies on sample representing population. Both fail in different ways. Delta method can fail with small n or heavy tails; bootstrap can fail if sample misses important population features.

Key Takeaway

Delta method and bootstrap are complementary tools for variance estimation. Delta method is fast and works well for large samples with smooth statistics (ratios, differences). Bootstrap handles complex statistics without requiring analytical formulas and is more robust to distributional departures. For important analyses, consider computing both—if they agree, you have confidence in your answer; if they differ, investigate why.

Send to a friend

Share this with someone who loves clean statistical work.