Assumptions

Kolmogorov-Smirnov Test: Comparing Distributions and Detecting Drift

A practical guide to the KS test for comparing distributions and detecting drift. Learn the one-sample and two-sample variants, common pitfalls, and when to use alternatives.

Share
Kolmogorov-Smirnov Test: Comparing Distributions and Detecting Drift

Quick Hits

  • Two variants: one-sample (vs. theoretical) and two-sample (vs. another dataset)
  • D statistic = maximum vertical distance between CDFs
  • More sensitive to differences near the center of the distribution than the tails
  • Two-sample KS is the workhorse for drift detection in ML pipelines
  • For normality testing specifically, Shapiro-Wilk is more powerful

The Kolmogorov-Smirnov Test is one of the most versatile distribution tests available. This post covers practical applications, especially for drift detection.

One-Sample vs. Two-Sample

One-sample KS test: Tests whether your data follows a specific theoretical distribution (normal, exponential, uniform, etc.). The reference distribution must be fully specified in advance — you cannot estimate parameters from the test data.

Two-sample KS test: Tests whether two empirical samples come from the same distribution. No theoretical distribution is needed. This is the more useful variant in practice.

Drift Detection in Production

The two-sample KS test is widely used to detect when the distribution of model inputs or outputs has shifted:

  1. Save a reference distribution from training or a baseline period
  2. Periodically compare new production data against the reference
  3. Alert when D exceeds a threshold or p-value drops below significance level

Practical tips:

  • Use a sliding window (e.g., last 7 days vs. reference period)
  • Set alerts on the D statistic rather than p-value (p-value is too sensitive with large production datasets)
  • Combine with PSI (Population Stability Index) for a more complete picture

Limitations and Alternatives

The KS test is most sensitive near the center of the distribution and less sensitive in the tails. If tail behavior matters (e.g., for revenue metrics), consider:

  • Anderson-Darling test: Gives more weight to the tails
  • Cramér-von Mises test: Integrates squared differences across the entire CDF
  • Chi-square test: For categorical or binned data

See also: Drift Detection: KS Test, PSI, and Interpreting Signals for a deeper treatment of drift monitoring.


References

  1. https://www.itl.nist.gov/div898/handbook/eda/section3/eda35g.htm
  2. https://arxiv.org/abs/2004.03045

Frequently Asked Questions

Can I use the KS test for discrete data?
The standard KS test assumes continuous data. For discrete distributions, the test is conservative (p-values are too large). Use the chi-square goodness-of-fit test or an exact discrete KS variant instead.
Why shouldn't I estimate parameters from the same data I test?
If you estimate the mean and standard deviation from your sample and then test normality against that fitted distribution, you have used the data twice. The p-values will be too large (too liberal). Use the Lilliefors correction for this scenario, or use Shapiro-Wilk.
How does the KS test compare to PSI for drift detection?
The KS test gives a formal p-value but is sensitive to sample size. Population Stability Index (PSI) gives a unitless score that is easier to set thresholds on. Many teams use both: KS for formal testing and PSI for monitoring dashboards.

Key Takeaway

The KS test is the most versatile distribution comparison tool. The two-sample variant is especially valuable for detecting distribution drift in production ML systems and monitoring metric health. Pair it with visual CDF comparisons for the most insight.

Send to a friend

Share this with someone who loves clean statistical work.