Library

StatsTest Blog

Experimental design, data analysis, and statistical tooling for modern teams. No hype, just the math.

StatsTest
Model EvaluationJan 26New

Inter-Rater Reliability: Cohen's Kappa and Krippendorff's Alpha

How to measure agreement between human raters for AI evaluation. Learn when to use Cohen's Kappa vs. Krippendorff's Alpha, how to interpret values, and what to do when agreement is low.