Prediction

Negative Binomial Regression

Negative Binomial Regression models overdispersed count data where the variance exceeds the mean. Use it when Poisson regression is too restrictive for your event counts.

Jan 293 min readPrediction Count Data Regression

Quick Hits

•Models count outcomes when the variance is larger than the mean (overdispersion)
•Adds a dispersion parameter to the Poisson model to handle extra variability
•Coefficients exponentiate to incidence rate ratios, just like Poisson regression
•Produces more conservative (wider) confidence intervals than Poisson when data is overdispersed
•Default choice when Poisson residual deviance is much larger than degrees of freedom

The StatsTest Flow: Relationship or Prediction >> Prediction >> Count data outcome >> Overdispersion present

Not sure this is the right statistical method? Use the Choose Your StatsTest workflow to select the right method.

What is Negative Binomial Regression?

Negative Binomial Regression is a generalized linear model for count data that relaxes the Poisson assumption that the mean equals the variance. It adds a dispersion parameter that allows the variance to exceed the mean, making it appropriate for overdispersed count data.

Like Poisson regression, it uses a log link function and produces coefficients that can be exponentiated to incidence rate ratios (IRR). The key difference is that it properly accounts for extra-Poisson variability, producing more accurate standard errors, p-values, and confidence intervals.

Negative Binomial Regression is also called the NB2 Model, Negative Binomial GLM, or Overdispersed Count Model.

Assumptions for Negative Binomial Regression

The assumptions for Negative Binomial Regression include:

Count Outcome
Overdispersion (or at minimum, no underdispersion)
Independence
Log-Linear Relationship
Negative Binomial Distribution

Count Outcome

The dependent variable must be a non-negative integer count. This is the same requirement as Poisson regression.

Overdispersion

The model is designed for data where the variance exceeds the mean. If the variance approximately equals the mean, Poisson regression is more efficient. If the variance is less than the mean (underdispersion), neither model is ideal and you may need a generalized Poisson model.

Independence

Observations must be independent. Clustered or repeated-measures count data needs mixed-effects or GEE extensions.

Log-Linear Relationship

The log of the expected count should be approximately linear in the predictors.

Negative Binomial Distribution

The model assumes the counts follow a Negative Binomial distribution, which is a Poisson-Gamma mixture. This is reasonable when overdispersion arises from unobserved heterogeneity across subjects.

When to use Negative Binomial Regression?

You should use Negative Binomial Regression in the following scenario:

Your outcome is a count of events
The variance is larger than the mean (overdispersion)
You want to model which factors affect the event rate
Observations are independent

If the variance approximately equals the mean, use Poisson Regression for more efficient estimates. If your outcome is continuous, use Linear Regression. If binary, use Logistic Regression.

Negative Binomial Regression Example

Outcome: Number of app crashes per user per week. Predictors: Device type, OS version, number of installed plugins.

Crash counts are highly overdispersed: most users experience zero or one crash, but some experience many. The variance (25.3) far exceeds the mean (2.1).

A Poisson model would underestimate the standard errors, making effects appear significant when they are not. The Negative Binomial model correctly accounts for the extra variability. After fitting, we find that users with more than 5 plugins have an IRR of 2.8 (p < 0.001), meaning they experience crashes at 2.8 times the rate of users with 0-5 plugins, controlling for device type and OS version.

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2726498/
https://online.stat.psu.edu/stat504/lesson/9

Frequently Asked Questions

How do I know if I should use Negative Binomial instead of Poisson?

Compare the residual deviance to the residual degrees of freedom from a Poisson model. If the ratio is substantially greater than 1 (a common rule of thumb is > 1.5), your data is overdispersed and you should use Negative Binomial. You can also run a likelihood ratio test comparing the two models.

What causes overdispersion?

Common causes include unobserved heterogeneity (important predictors are missing), clustering (observations are not truly independent), or excess zeros (more zero counts than expected). The first two are well handled by Negative Binomial regression; excess zeros may need a zero-inflated model.

Is Negative Binomial regression always better than Poisson?

Not necessarily. If the data is truly equidispersed (variance equals mean), Poisson regression is more efficient (tighter confidence intervals). The Negative Binomial model reduces to Poisson when the dispersion parameter approaches zero, so in practice it is a safe default, but check.

Key Takeaway

Negative Binomial regression extends Poisson regression by adding a dispersion parameter that accommodates variance larger than the mean. Use it as your default for count data when there is any suspicion of overdispersion. It produces the same interpretable incidence rate ratios as Poisson but with properly calibrated uncertainty estimates.

Send to a friend

Share this with someone who loves clean statistical work.

Facebook X Reddit LinkedIn Email