Survival Analysis

Time-to-Event Sample Size: Practical Approximations

A practical guide to sample size calculations for survival studies. Learn how to power time-to-event analyses, what drives the sample size, and practical approximations for retention experiments.

Jan 269 min readstatstest_flow Survival Analysis Supporting

Time-to-Event Sample Size: Practical Approximations

Quick Hits

•Power depends on NUMBER OF EVENTS, not just sample size
•Low event rates require larger samples or longer follow-up
•Hazard ratio detection: events needed ≈ 4(zα + zβ)^2/(log(HR))^2
•Longer follow-up = more events = more power (often cheaper than more users)
•Plan for 10-20% extra sample for dropouts and censoring

TL;DR

Survival analysis power depends on the number of observed events, not just sample size. If few users churn, you can't estimate churn risk precisely. This guide provides practical formulas for calculating required events, converting to sample size given expected event rates, and planning retention experiments. Key insight: longer follow-up often substitutes for more users.

The Key Insight: Events Drive Power

Why Events, Not Sample Size?

Statistical information comes from observed events:

1000 users, 500 events → lots of information
1000 users, 10 events → very little information

The log-rank test and Cox regression precision depend on: $\text{Variance} \propto \frac{1}{\text{Number of Events}}$

More events → smaller variance → better power.

Implications

Low event rates require larger samples: If only 5% churn, need 20× more users for same events
Longer follow-up helps: More time → more events accumulate
Plan based on events: Calculate events needed, then derive sample size

Basic Sample Size Formula

For Log-Rank Test (Schoenfeld)

Required number of events:

$d = \frac{4(z_{\alpha/2} + z_\beta)^2}{(\log HR)^2}$

Where:

d = total events needed (both groups combined)
$z_{\alpha/2}$ = critical value for significance (1.96 for $\alpha=0.05$ , two-sided)
$z_\beta$ = critical value for power (0.84 for 80%, 1.28 for 90%)
HR = hazard ratio to detect

Converting Events to Sample Size

$n = \frac{d}{p_e}$

Where $p_e$ = expected event probability over follow-up period.

Quick Reference Table

HR to Detect	Events (80% power)	Events (90% power)
0.50	33	44
0.60	53	71
0.70	95	127
0.75	132	176
0.80	199	266
0.85	324	434
0.90	601	803

Code: Sample Size Calculations

Python

import numpy as np
from scipy import stats


def events_for_logrank(hr, alpha=0.05, power=0.8, two_sided=True):
    """
    Calculate events needed to detect a hazard ratio.

    Parameters:
    -----------
    hr : float
        Hazard ratio to detect
    alpha : float
        Significance level
    power : float
        Desired power
    two_sided : bool
        Two-sided test

    Returns:
    --------
    int : Number of events needed
    """
    if two_sided:
        z_alpha = stats.norm.ppf(1 - alpha/2)
    else:
        z_alpha = stats.norm.ppf(1 - alpha)

    z_beta = stats.norm.ppf(power)

    # Schoenfeld formula
    d = 4 * (z_alpha + z_beta)**2 / (np.log(hr))**2

    return int(np.ceil(d))


def sample_size_survival(hr, event_prob, alpha=0.05, power=0.8,
                          allocation_ratio=1, dropout_rate=0.1):
    """
    Calculate sample size for survival study.

    Parameters:
    -----------
    hr : float
        Hazard ratio to detect
    event_prob : float
        Expected probability of event during follow-up
    alpha : float
        Significance level
    power : float
        Desired power
    allocation_ratio : float
        n_treatment / n_control
    dropout_rate : float
        Expected dropout/loss rate

    Returns:
    --------
    dict with sample size calculations
    """
    # Events needed
    d = events_for_logrank(hr, alpha, power)

    # Adjust for allocation ratio
    # For unequal allocation, need to account for efficiency
    if allocation_ratio != 1:
        k = allocation_ratio
        efficiency = (1 + k)**2 / (4 * k)
        d_adjusted = d * efficiency
    else:
        d_adjusted = d

    # Convert to sample size
    n_total = d_adjusted / event_prob

    # Adjust for dropouts
    n_total_adj = n_total / (1 - dropout_rate)

    # Split by group
    r = allocation_ratio
    n_control = n_total_adj / (1 + r)
    n_treatment = n_total_adj * r / (1 + r)

    return {
        'events_needed': int(np.ceil(d)),
        'events_adjusted': int(np.ceil(d_adjusted)),
        'total_sample': int(np.ceil(n_total_adj)),
        'control_group': int(np.ceil(n_control)),
        'treatment_group': int(np.ceil(n_treatment)),
        'assumptions': {
            'hazard_ratio': hr,
            'event_prob': event_prob,
            'alpha': alpha,
            'power': power,
            'dropout_rate': dropout_rate
        }
    }


def event_prob_from_retention(retention_at_t, t=30):
    """
    Convert retention rate to event probability.

    If D30 retention is 70%, event probability = 30%.
    """
    return 1 - retention_at_t


def study_duration_for_events(baseline_hazard, target_events, sample_size,
                               accrual_period=0):
    """
    Estimate study duration to achieve target events.

    Parameters:
    -----------
    baseline_hazard : float
        Hazard rate (events per unit time)
    target_events : int
        Events needed
    sample_size : int
        Total sample size
    accrual_period : float
        Time over which subjects are enrolled

    Returns:
    --------
    float : Estimated study duration
    """
    # Simplified: assume uniform accrual and constant hazard
    # More complex formulas account for varying accrual

    # Approximate: d ≈ n × (1 - S(T)) for average follow-up T
    # 1 - S(T) = 1 - exp(-h × T) ≈ h × T for small hT

    # Solve for T given d = n × (1 - exp(-h × T))
    from scipy.optimize import brentq

    def events_at_time(T):
        avg_followup = T - accrual_period/2 if T > accrual_period else T/2
        event_prob = 1 - np.exp(-baseline_hazard * avg_followup)
        return sample_size * event_prob - target_events

    # Find T
    try:
        T = brentq(events_at_time, 0.1, 1000)
        return T
    except:
        return None


# Example usage
if __name__ == "__main__":
    print("Sample Size Calculation Examples")
    print("=" * 60)

    # Example 1: Basic calculation
    print("\n1. Basic Log-Rank Power")
    print("-" * 40)
    for hr in [0.5, 0.6, 0.7, 0.8, 0.9]:
        events = events_for_logrank(hr, power=0.8)
        print(f"  HR = {hr}: {events} events needed (80% power)")

    # Example 2: Full sample size
    print("\n2. Full Sample Size Calculation")
    print("-" * 40)

    # Scenario: Retention experiment
    # Expect 70% D30 retention (30% event rate)
    # Want to detect HR = 0.75 (treatment reduces churn by 25%)

    result = sample_size_survival(
        hr=0.75,
        event_prob=0.30,
        power=0.8,
        dropout_rate=0.1
    )

    print(f"  Hazard ratio to detect: {result['assumptions']['hazard_ratio']}")
    print(f"  Expected event rate: {result['assumptions']['event_prob']:.0%}")
    print(f"  Events needed: {result['events_needed']}")
    print(f"  Total sample size: {result['total_sample']}")
    print(f"  Per group: {result['control_group']} control, {result['treatment_group']} treatment")

    # Example 3: Varying follow-up
    print("\n3. Effect of Follow-Up Duration")
    print("-" * 40)

    hr = 0.75
    baseline_hazard = 0.01  # 1% daily hazard ≈ 30% monthly

    for followup_days in [30, 60, 90, 180]:
        event_prob = 1 - np.exp(-baseline_hazard * followup_days)
        result = sample_size_survival(hr=hr, event_prob=event_prob)
        print(f"  {followup_days} days follow-up: event prob = {event_prob:.1%}, "
              f"n = {result['total_sample']}")

R

library(tidyverse)


events_for_logrank <- function(hr, alpha = 0.05, power = 0.8) {
    #' Calculate events needed for log-rank test

    z_alpha <- qnorm(1 - alpha/2)
    z_beta <- qnorm(power)

    d <- 4 * (z_alpha + z_beta)^2 / log(hr)^2
    ceiling(d)
}


sample_size_survival <- function(hr, event_prob, alpha = 0.05, power = 0.8,
                                  dropout_rate = 0.1) {
    #' Calculate sample size for survival study

    events <- events_for_logrank(hr, alpha, power)
    n_total <- events / event_prob
    n_adjusted <- n_total / (1 - dropout_rate)

    list(
        events_needed = events,
        total_sample = ceiling(n_adjusted),
        per_group = ceiling(n_adjusted / 2)
    )
}


# Example
cat("Sample Size Calculations\n")
cat(strrep("=", 50), "\n")

# Events by HR
cat("\nEvents needed by HR (80% power):\n")
for (hr in c(0.5, 0.6, 0.7, 0.8)) {
    events <- events_for_logrank(hr)
    cat(sprintf("  HR = %.1f: %d events\n", hr, events))
}

# Full calculation
cat("\nFull Calculation:\n")
result <- sample_size_survival(hr = 0.75, event_prob = 0.30)
cat(sprintf("  HR = 0.75, 30%% event rate\n"))
cat(sprintf("  Events: %d\n", result$events_needed))
cat(sprintf("  Total sample: %d\n", result$total_sample))

Practical Planning Guide

Step 1: Define the Effect Size

What hazard ratio do you want to detect?

Business Impact	Typical HR
Large effect	0.50 - 0.60
Medium effect	0.70 - 0.80
Small effect	0.85 - 0.95

Be realistic—detecting HR=0.95 requires thousands of events.

Step 2: Estimate Event Rate

From historical data:

What's your baseline retention at your planned follow-up time?
Event rate = 1 - retention

Example: 70% D30 retention → 30% event rate.

Step 3: Calculate Events Needed

Use the Schoenfeld formula or table above.

$\text{HR} = 0.75$ , 80% power → ~130 events needed

Step 4: Convert to Sample Size

$n = \frac{\text{events needed}}{\text{event rate}} \times \frac{1}{1 - \text{dropout rate}}$

$130 / 0.30 \\times 1.1 = 477$ per group (10% dropout)

Step 5: Consider Trade-offs

Can you:

Extend follow-up? (More events per user)
Enrich population? (Higher event rate)
Accept less power? (Fewer events needed)

Trade-off: More Users vs. Longer Follow-Up

The Problem

You need 200 events. Options:

2000 users with 10% event rate
1000 users with 20% event rate (longer follow-up)
500 users with 40% event rate (much longer follow-up)

Considerations

Option	Pros	Cons
More users	Faster answer	Higher acquisition cost
Longer follow-up	Lower cost	Delayed decision
Higher-risk population	More events	May not generalize

Rule of Thumb

If user acquisition is expensive and time is flexible, extend follow-up.

If you need quick answers and can get users easily, enroll more.

Sensitivity Analysis

Vary Your Assumptions

Don't trust a single calculation. Calculate sample size across:

Range of HRs: What if true effect is smaller?
Range of event rates: What if baseline retention is better/worse?
Range of dropout rates: What if more/fewer users drop out?

# Sensitivity analysis
results = []
for hr in [0.70, 0.75, 0.80]:
    for event_prob in [0.20, 0.30, 0.40]:
        n = sample_size_survival(hr, event_prob)['total_sample']
        results.append({
            'HR': hr,
            'Event Rate': f"{event_prob:.0%}",
            'Sample Size': n
        })

print(pd.DataFrame(results).pivot(index='Event Rate', columns='HR', values='Sample Size'))

Plan for the Worst Case

If your sample size calculation says 400 per group but sensitivity shows it could be 600, plan for 600.

Common Mistakes

Mistake 1: Using Generic Power Calculators

Generic power calculators (for t-tests, etc.) don't account for censoring and event-driven power.

Fix: Use survival-specific formulas.

Mistake 2: Ignoring Event Rate

"We have 10,000 users, that should be enough."

Problem: If only 2% churn, you only have 200 events—may not be enough for small effects.

Fix: Calculate events expected, not just sample size.

Mistake 3: Not Planning for Dropouts

Real studies lose users. Plan for 10-20% extra.

Mistake 4: Detecting Too Small an Effect

Trying to detect $\text{HR} = 0.95$ requires enormous samples.

Fix: Focus on practically meaningful effects ( $\text{HR} \le 0.80$ ).

Quick Reference

Events Needed (Two-Sided $\alpha=0.05$ )

HR	80% Power	90% Power
0.50	33	44
0.60	53	71
0.70	95	127
0.75	132	176
0.80	199	266

Sample Size = $\frac{\text{Events}}{\text{Event Rate}} \times 1.1$ (dropout buffer)

Example: Need 132 events, expect 30% event rate → $132 / 0.30 \\times 1.1 = 484$ per group

Time-to-Event and Retention Analysis (Pillar) - Full survival framework
Power Analysis Without Cargo Culting - General power principles
Log-Rank Test - What you're powering for
MDE and Sample Size - A/B test perspective

Key Takeaway

Sample size for survival studies depends on expected events, not just total users. Low event rates (high retention) require larger samples or longer follow-up. Use the Schoenfeld formula: events needed $\approx \\frac{4(z_{\\alpha} + z_{\\beta})^2}{(\\log(\\text{HR}))^2}$ . Then convert: sample size = $\frac{\\text{events}}{\\text{event rate}}$ . Always add 10-20% buffer and run sensitivity analysis. When possible, extend follow-up rather than recruiting more users—it's usually cheaper and achieves the same statistical goal.

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3275994/
https://onlinelibrary.wiley.com/doi/10.1002/sim.5498
https://doi.org/10.1002/pst.313
Schoenfeld, D. A. (1983). Sample-size formula for the proportional-hazards regression model. *Biometrics*, 39(2), 499-503.
Freedman, L. S. (1982). Tables of the number of patients required in clinical trials using the logrank test. *Statistics in Medicine*, 1(2), 121-129.
Machin, D., Cheung, Y. B., & Parmar, M. K. (2006). Survival analysis: A practical approach (2nd ed.). Wiley.

Frequently Asked Questions

Why does survival sample size depend on events, not just sample size?

Statistical power in survival analysis comes from observed events. If 1000 users never churn, you have no information about churn. More events = more precise estimates. This is why low-event outcomes need longer follow-up or larger samples.

How do I plan sample size when I don't know the baseline hazard?

Use historical data or pilot studies to estimate baseline survival. If D30 retention is historically 70%, that implies a certain hazard. Run power calculations across a range of plausible baseline rates and target the most conservative.

Should I run my retention experiment longer or enroll more users?

Often, longer follow-up is more efficient. Adding 30 more days of observation costs nothing but time and may generate the events you need. Adding more users requires acquisition effort. Balance both based on your constraints.

Key Takeaway

Sample size for survival studies depends on expected number of events, not just total sample size. Target enough events to detect your effect size with desired power. In retention experiments, longer follow-up increases events and is often more efficient than adding users. Always add buffer for dropouts and over-estimate required events.

Send to a friend

Share this with someone who loves clean statistical work.

Facebook X Reddit LinkedIn Email