Survival Analysis

Log-Rank Test

The Log-Rank Test compares survival curves between two or more groups. Use it when you want to know whether groups differ in their time to an event such as churn, conversion, or failure.

Share
Log-Rank Test

Quick Hits

  • Compares survival (time-to-event) curves between two or more groups
  • Tests whether groups have different hazard rates over the full follow-up period
  • Non-parametric: no assumption about the shape of the survival curve
  • Most powerful when the hazard ratio between groups is roughly constant over time
  • Null hypothesis: the survival curves are identical across groups

The StatsTest Flow: Time-to-Event / Survival >> Compare survival between groups

Not sure this is the right statistical method? Use the Choose Your StatsTest workflow to select the right method.


What is a Log-Rank Test?

The Log-Rank Test is a non-parametric statistical test used to compare the survival distributions of two or more groups. It evaluates whether the groups differ in their time-to-event outcomes, where the event can be anything that happens at a measurable point in time: customer churn, conversion, device failure, disease progression, or any other transition.

The test works by comparing the observed number of events in each group at every time point where an event occurs to the number that would be expected if the groups had identical survival curves. The resulting test statistic follows a chi-square distribution under the null hypothesis.

The Log-Rank Test is also called the Mantel-Cox Test, the Mantel-Haenszel Test (for survival data), or the Cox-Mantel Log-Rank Test.


Assumptions for a Log-Rank Test

Every statistical method has assumptions. Assumptions mean that your data must satisfy certain properties in order for statistical method results to be accurate.

The assumptions for the Log-Rank Test include:

  1. Time-to-Event Outcome
  2. Independent Censoring
  3. Non-Informative Censoring
  4. Proportional Hazards (for maximum power)
  5. Independent Observations

Let's dive in to each one of these separately.

Time-to-Event Outcome

Your outcome must be the time from a defined starting point until an event occurs. Examples include days from signup until churn, hours from deployment until crash, or months from diagnosis until recovery.

If your outcome is a continuous measurement at a single time point (like a test score), use an Independent Samples T-Test or Mann-Whitney U Test instead.

Independent Censoring

Some subjects will not experience the event during the observation period. These are censored observations. The reason a subject is censored must be unrelated to their likelihood of experiencing the event. For example, a user leaving the study because the observation window ended is acceptable. A user leaving because they were about to churn is not.

Non-Informative Censoring

Censored subjects must have the same future prospects as those still being observed at the same time point. If sicker patients drop out more often, censoring is informative and the log-rank test results will be biased.

Proportional Hazards (for maximum power)

The log-rank test is most powerful when the ratio of hazards between groups is approximately constant over time. This means one group is consistently at higher risk than the other throughout the follow-up period.

If the survival curves cross (one group does better early but worse later), the standard log-rank test loses power. Consider using a weighted variant (Fleming-Harrington) or the restricted mean survival time (RMST) approach.

Independent Observations

Each subject should be an independent observation. If the same user can appear multiple times (e.g., multiple subscriptions), you need to account for this clustering.


When to use a Log-Rank Test?

You should use a Log-Rank Test in the following scenario:

  1. You want to compare time-to-event outcomes between groups
  2. Your outcome is time until an event (churn, failure, conversion)
  3. You have two or more groups to compare
  4. You have censored observations (some subjects have not yet experienced the event)
  5. The hazard ratio between groups is roughly constant over time

Time-to-Event Comparison

You are looking for a statistical test to determine whether two or more groups differ in how quickly they experience an event. This is a survival analysis question. If you just want to compare a continuous outcome at a single time point, use a t-test or ANOVA instead.

Censored Data

Your data includes subjects who have not yet experienced the event. The log-rank test handles this correctly by using each subject's data up until the point they were censored. If you have no censoring, you could use a standard comparison test, but the log-rank test still works.

Two or More Groups

The log-rank test works with any number of groups. For two groups it produces a single chi-square statistic. For three or more groups it produces an omnibus test, and you follow up with pairwise comparisons if significant.

If you want to model the effect of multiple variables on survival simultaneously, use Cox Proportional Hazards regression instead. If you only want to estimate a survival curve without comparing groups, use the Kaplan-Meier Estimator.


Log-Rank Test Example

Group 1: Free-tier users who received a promotional email campaign. Group 2: Free-tier users who received no email campaign (control). Event of interest: Upgrading to a paid subscription.

In this example, we want to know whether the email campaign accelerates conversions from free to paid. We track the number of days from the start of the campaign until each user upgrades. Users who have not upgraded by the end of the 90-day observation window are censored.

We construct Kaplan-Meier survival curves for each group and then run a log-rank test. The null hypothesis is that both groups convert at the same rate over time. If the p-value is below our significance threshold (typically 0.05), we conclude that the email campaign significantly changed the conversion rate over time.

The log-rank test gives us a chi-square statistic and a p-value. A pp-value 0.05\le 0.05 means the survival curves are significantly different and we can conclude the campaign had a real effect on time-to-conversion.


References

  1. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3059453/
  2. https://www.bmj.com/content/328/7447/1073

Frequently Asked Questions

What is the difference between a log-rank test and a t-test?
A t-test compares group means on a continuous variable measured at a single point in time. A log-rank test compares entire survival curves over time and correctly handles censoring, which is when you know a subject has not yet experienced the event but you do not know their final outcome.
Can I use a log-rank test with more than two groups?
Yes. The log-rank test generalizes to any number of groups. With three or more groups it functions like an omnibus test (similar to ANOVA). If significant, follow up with pairwise log-rank tests with a multiple-comparisons correction such as Bonferroni.
What if the survival curves cross?
When survival curves cross, the proportional hazards assumption is violated and the log-rank test loses power. Consider alternatives such as a restricted mean survival time (RMST) comparison, a weighted log-rank test (e.g., Fleming-Harrington), or a two-stage test.

Key Takeaway

The log-rank test is the standard method for comparing survival curves between groups. It answers the question: do these groups experience the event at different rates over time? It handles censored observations correctly and requires no distributional assumptions, but it works best when the hazard ratio between groups is roughly constant.

Send to a friend

Share this with someone who loves clean statistical work.