Contents
Time-to-Event Sample Size: Practical Approximations
A practical guide to sample size calculations for survival studies. Learn how to power time-to-event analyses, what drives the sample size, and practical approximations for retention experiments.
Quick Hits
- •Power depends on NUMBER OF EVENTS, not just sample size
- •Low event rates require larger samples or longer follow-up
- •Hazard ratio detection: events needed ≈ (4 × (z_α + z_β)²) / (log(HR))²
- •Longer follow-up = more events = more power (often cheaper than more users)
- •Plan for 10-20% extra sample for dropouts and censoring
TL;DR
Survival analysis power depends on the number of observed events, not just sample size. If few users churn, you can't estimate churn risk precisely. This guide provides practical formulas for calculating required events, converting to sample size given expected event rates, and planning retention experiments. Key insight: longer follow-up often substitutes for more users.
The Key Insight: Events Drive Power
Why Events, Not Sample Size?
Statistical information comes from observed events:
- 1000 users, 500 events → lots of information
- 1000 users, 10 events → very little information
The log-rank test and Cox regression precision depend on: $$\text{Variance} \propto \frac{1}{\text{Number of Events}}$$
More events → smaller variance → better power.
Implications
- Low event rates require larger samples: If only 5% churn, need 20× more users for same events
- Longer follow-up helps: More time → more events accumulate
- Plan based on events: Calculate events needed, then derive sample size
Basic Sample Size Formula
For Log-Rank Test (Schoenfeld)
Required number of events:
$$d = \frac{4(z_{\alpha/2} + z_\beta)^2}{(\log HR)^2}$$
Where:
- d = total events needed (both groups combined)
- $z_{\alpha/2}$ = critical value for significance (1.96 for α=0.05, two-sided)
- $z_\beta$ = critical value for power (0.84 for 80%, 1.28 for 90%)
- HR = hazard ratio to detect
Converting Events to Sample Size
$$n = \frac{d}{p_e}$$
Where $p_e$ = expected event probability over follow-up period.
Quick Reference Table
| HR to Detect | Events (80% power) | Events (90% power) |
|---|---|---|
| 0.50 | 33 | 44 |
| 0.60 | 53 | 71 |
| 0.70 | 95 | 127 |
| 0.75 | 132 | 176 |
| 0.80 | 199 | 266 |
| 0.85 | 324 | 434 |
| 0.90 | 601 | 803 |
Code: Sample Size Calculations
Python
import numpy as np
from scipy import stats
def events_for_logrank(hr, alpha=0.05, power=0.8, two_sided=True):
"""
Calculate events needed to detect a hazard ratio.
Parameters:
-----------
hr : float
Hazard ratio to detect
alpha : float
Significance level
power : float
Desired power
two_sided : bool
Two-sided test
Returns:
--------
int : Number of events needed
"""
if two_sided:
z_alpha = stats.norm.ppf(1 - alpha/2)
else:
z_alpha = stats.norm.ppf(1 - alpha)
z_beta = stats.norm.ppf(power)
# Schoenfeld formula
d = 4 * (z_alpha + z_beta)**2 / (np.log(hr))**2
return int(np.ceil(d))
def sample_size_survival(hr, event_prob, alpha=0.05, power=0.8,
allocation_ratio=1, dropout_rate=0.1):
"""
Calculate sample size for survival study.
Parameters:
-----------
hr : float
Hazard ratio to detect
event_prob : float
Expected probability of event during follow-up
alpha : float
Significance level
power : float
Desired power
allocation_ratio : float
n_treatment / n_control
dropout_rate : float
Expected dropout/loss rate
Returns:
--------
dict with sample size calculations
"""
# Events needed
d = events_for_logrank(hr, alpha, power)
# Adjust for allocation ratio
# For unequal allocation, need to account for efficiency
if allocation_ratio != 1:
k = allocation_ratio
efficiency = (1 + k)**2 / (4 * k)
d_adjusted = d * efficiency
else:
d_adjusted = d
# Convert to sample size
n_total = d_adjusted / event_prob
# Adjust for dropouts
n_total_adj = n_total / (1 - dropout_rate)
# Split by group
r = allocation_ratio
n_control = n_total_adj / (1 + r)
n_treatment = n_total_adj * r / (1 + r)
return {
'events_needed': int(np.ceil(d)),
'events_adjusted': int(np.ceil(d_adjusted)),
'total_sample': int(np.ceil(n_total_adj)),
'control_group': int(np.ceil(n_control)),
'treatment_group': int(np.ceil(n_treatment)),
'assumptions': {
'hazard_ratio': hr,
'event_prob': event_prob,
'alpha': alpha,
'power': power,
'dropout_rate': dropout_rate
}
}
def event_prob_from_retention(retention_at_t, t=30):
"""
Convert retention rate to event probability.
If D30 retention is 70%, event probability = 30%.
"""
return 1 - retention_at_t
def study_duration_for_events(baseline_hazard, target_events, sample_size,
accrual_period=0):
"""
Estimate study duration to achieve target events.
Parameters:
-----------
baseline_hazard : float
Hazard rate (events per unit time)
target_events : int
Events needed
sample_size : int
Total sample size
accrual_period : float
Time over which subjects are enrolled
Returns:
--------
float : Estimated study duration
"""
# Simplified: assume uniform accrual and constant hazard
# More complex formulas account for varying accrual
# Approximate: d ≈ n × (1 - S(T)) for average follow-up T
# 1 - S(T) = 1 - exp(-h × T) ≈ h × T for small hT
# Solve for T given d = n × (1 - exp(-h × T))
from scipy.optimize import brentq
def events_at_time(T):
avg_followup = T - accrual_period/2 if T > accrual_period else T/2
event_prob = 1 - np.exp(-baseline_hazard * avg_followup)
return sample_size * event_prob - target_events
# Find T
try:
T = brentq(events_at_time, 0.1, 1000)
return T
except:
return None
# Example usage
if __name__ == "__main__":
print("Sample Size Calculation Examples")
print("=" * 60)
# Example 1: Basic calculation
print("\n1. Basic Log-Rank Power")
print("-" * 40)
for hr in [0.5, 0.6, 0.7, 0.8, 0.9]:
events = events_for_logrank(hr, power=0.8)
print(f" HR = {hr}: {events} events needed (80% power)")
# Example 2: Full sample size
print("\n2. Full Sample Size Calculation")
print("-" * 40)
# Scenario: Retention experiment
# Expect 70% D30 retention (30% event rate)
# Want to detect HR = 0.75 (treatment reduces churn by 25%)
result = sample_size_survival(
hr=0.75,
event_prob=0.30,
power=0.8,
dropout_rate=0.1
)
print(f" Hazard ratio to detect: {result['assumptions']['hazard_ratio']}")
print(f" Expected event rate: {result['assumptions']['event_prob']:.0%}")
print(f" Events needed: {result['events_needed']}")
print(f" Total sample size: {result['total_sample']}")
print(f" Per group: {result['control_group']} control, {result['treatment_group']} treatment")
# Example 3: Varying follow-up
print("\n3. Effect of Follow-Up Duration")
print("-" * 40)
hr = 0.75
baseline_hazard = 0.01 # 1% daily hazard ≈ 30% monthly
for followup_days in [30, 60, 90, 180]:
event_prob = 1 - np.exp(-baseline_hazard * followup_days)
result = sample_size_survival(hr=hr, event_prob=event_prob)
print(f" {followup_days} days follow-up: event prob = {event_prob:.1%}, "
f"n = {result['total_sample']}")
R
library(tidyverse)
events_for_logrank <- function(hr, alpha = 0.05, power = 0.8) {
#' Calculate events needed for log-rank test
z_alpha <- qnorm(1 - alpha/2)
z_beta <- qnorm(power)
d <- 4 * (z_alpha + z_beta)^2 / log(hr)^2
ceiling(d)
}
sample_size_survival <- function(hr, event_prob, alpha = 0.05, power = 0.8,
dropout_rate = 0.1) {
#' Calculate sample size for survival study
events <- events_for_logrank(hr, alpha, power)
n_total <- events / event_prob
n_adjusted <- n_total / (1 - dropout_rate)
list(
events_needed = events,
total_sample = ceiling(n_adjusted),
per_group = ceiling(n_adjusted / 2)
)
}
# Example
cat("Sample Size Calculations\n")
cat(strrep("=", 50), "\n")
# Events by HR
cat("\nEvents needed by HR (80% power):\n")
for (hr in c(0.5, 0.6, 0.7, 0.8)) {
events <- events_for_logrank(hr)
cat(sprintf(" HR = %.1f: %d events\n", hr, events))
}
# Full calculation
cat("\nFull Calculation:\n")
result <- sample_size_survival(hr = 0.75, event_prob = 0.30)
cat(sprintf(" HR = 0.75, 30%% event rate\n"))
cat(sprintf(" Events: %d\n", result$events_needed))
cat(sprintf(" Total sample: %d\n", result$total_sample))
Practical Planning Guide
Step 1: Define the Effect Size
What hazard ratio do you want to detect?
| Business Impact | Typical HR |
|---|---|
| Large effect | 0.50 - 0.60 |
| Medium effect | 0.70 - 0.80 |
| Small effect | 0.85 - 0.95 |
Be realistic—detecting HR=0.95 requires thousands of events.
Step 2: Estimate Event Rate
From historical data:
- What's your baseline retention at your planned follow-up time?
- Event rate = 1 - retention
Example: 70% D30 retention → 30% event rate.
Step 3: Calculate Events Needed
Use the Schoenfeld formula or table above.
HR = 0.75, 80% power → ~130 events needed
Step 4: Convert to Sample Size
$$n = \frac{\text{events needed}}{\text{event rate}} \times \frac{1}{1 - \text{dropout rate}}$$
130 events / 0.30 event rate × 1.1 (10% dropout) = 477 per group
Step 5: Consider Trade-offs
Can you:
- Extend follow-up? (More events per user)
- Enrich population? (Higher event rate)
- Accept less power? (Fewer events needed)
Trade-off: More Users vs. Longer Follow-Up
The Problem
You need 200 events. Options:
- 2000 users with 10% event rate
- 1000 users with 20% event rate (longer follow-up)
- 500 users with 40% event rate (much longer follow-up)
Considerations
| Option | Pros | Cons |
|---|---|---|
| More users | Faster answer | Higher acquisition cost |
| Longer follow-up | Lower cost | Delayed decision |
| Higher-risk population | More events | May not generalize |
Rule of Thumb
If user acquisition is expensive and time is flexible, extend follow-up.
If you need quick answers and can get users easily, enroll more.
Sensitivity Analysis
Vary Your Assumptions
Don't trust a single calculation. Calculate sample size across:
- Range of HRs: What if true effect is smaller?
- Range of event rates: What if baseline retention is better/worse?
- Range of dropout rates: What if more/fewer users drop out?
# Sensitivity analysis
results = []
for hr in [0.70, 0.75, 0.80]:
for event_prob in [0.20, 0.30, 0.40]:
n = sample_size_survival(hr, event_prob)['total_sample']
results.append({
'HR': hr,
'Event Rate': f"{event_prob:.0%}",
'Sample Size': n
})
print(pd.DataFrame(results).pivot(index='Event Rate', columns='HR', values='Sample Size'))
Plan for the Worst Case
If your sample size calculation says 400 per group but sensitivity shows it could be 600, plan for 600.
Common Mistakes
Mistake 1: Using Generic Power Calculators
Generic power calculators (for t-tests, etc.) don't account for censoring and event-driven power.
Fix: Use survival-specific formulas.
Mistake 2: Ignoring Event Rate
"We have 10,000 users, that should be enough."
Problem: If only 2% churn, you only have 200 events—may not be enough for small effects.
Fix: Calculate events expected, not just sample size.
Mistake 3: Not Planning for Dropouts
Real studies lose users. Plan for 10-20% extra.
Mistake 4: Detecting Too Small an Effect
Trying to detect HR = 0.95 requires enormous samples.
Fix: Focus on practically meaningful effects (HR ≤ 0.80).
Quick Reference
Events Needed (Two-Sided α=0.05)
| HR | 80% Power | 90% Power |
|---|---|---|
| 0.50 | 33 | 44 |
| 0.60 | 53 | 71 |
| 0.70 | 95 | 127 |
| 0.75 | 132 | 176 |
| 0.80 | 199 | 266 |
Sample Size = Events / Event Rate × 1.1 (dropout buffer)
Example: Need 132 events, expect 30% event rate → 132 / 0.30 × 1.1 = 484 per group
Related Methods
- Time-to-Event and Retention Analysis (Pillar) - Full survival framework
- Power Analysis Without Cargo Culting - General power principles
- Log-Rank Test - What you're powering for
- MDE and Sample Size - A/B test perspective
Key Takeaway
Sample size for survival studies depends on expected events, not just total users. Low event rates (high retention) require larger samples or longer follow-up. Use the Schoenfeld formula: events needed ≈ 4(z_α + z_β)² / (log HR)². Then convert: sample size = events / event rate. Always add 10-20% buffer and run sensitivity analysis. When possible, extend follow-up rather than recruiting more users—it's usually cheaper and achieves the same statistical goal.
References
- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3275994/
- https://onlinelibrary.wiley.com/doi/10.1002/sim.5498
- https://doi.org/10.1002/pst.313
- Schoenfeld, D. A. (1983). Sample-size formula for the proportional-hazards regression model. *Biometrics*, 39(2), 499-503.
- Freedman, L. S. (1982). Tables of the number of patients required in clinical trials using the logrank test. *Statistics in Medicine*, 1(2), 121-129.
- Machin, D., Cheung, Y. B., & Parmar, M. K. (2006). Survival analysis: A practical approach (2nd ed.). Wiley.
Frequently Asked Questions
Why does survival sample size depend on events, not just sample size?
How do I plan sample size when I don't know the baseline hazard?
Should I run my retention experiment longer or enroll more users?
Key Takeaway
Sample size for survival studies depends on expected number of events, not just total sample size. Target enough events to detect your effect size with desired power. In retention experiments, longer follow-up increases events and is often more efficient than adding users. Always add buffer for dropouts and over-estimate required events.