Prediction

Poisson Regression

Poisson Regression models count data and event rates. Use it when your outcome is a count of events (e.g., clicks, errors, purchases) and you want to understand which factors affect the rate.

Jan 293 min readPrediction Count Data Regression

Quick Hits

•Models count outcomes: number of events per unit of time, exposure, or observation
•Assumes the mean equals the variance (equidispersion)
•Uses a log link: coefficients represent the log of the rate ratio
•Exponentiated coefficients give incidence rate ratios (IRR)
•If variance >> mean (overdispersion), use Negative Binomial Regression instead

The StatsTest Flow: Relationship or Prediction >> Prediction >> Count data outcome >> No overdispersion

Not sure this is the right statistical method? Use the Choose Your StatsTest workflow to select the right method.

What is Poisson Regression?

Poisson Regression is a generalized linear model (GLM) used to model count data and event rates. The outcome variable is a non-negative integer representing the number of times an event occurred (e.g., number of support tickets, number of purchases, number of errors).

The model uses a log link function, meaning it models the natural logarithm of the expected count as a linear function of the predictors. This ensures that predicted counts are always non-negative.

Poisson Regression is also called the Poisson Log-Linear Model, Log-Linear Regression (for count data), or the Poisson GLM.

Assumptions for Poisson Regression

The assumptions for Poisson Regression include:

Count Outcome
Equidispersion
Independence
Log-Linear Relationship
No Excess Zeros (or use zero-inflated variant)

Count Outcome

The dependent variable must be a non-negative integer: 0, 1, 2, 3, and so on. Continuous outcomes, proportions, and binary outcomes require different models.

If your outcome is continuous, use Simple Linear Regression or Multiple Linear Regression. If your outcome is binary, use Logistic Regression.

Equidispersion

The Poisson distribution assumes the mean equals the variance. In practice, many count datasets are overdispersed (variance much greater than mean). Check this by comparing the residual deviance to the residual degrees of freedom. If the ratio is substantially greater than 1, overdispersion is present.

If overdispersed, use Negative Binomial Regression instead.

Independence

Observations must be independent. If counts come from the same subjects measured repeatedly, you need a mixed-effects Poisson model or GEE approach.

Log-Linear Relationship

The relationship between predictors and the log of the expected count should be approximately linear. Check residual plots for systematic patterns.

No Excess Zeros

If your data has far more zeros than a Poisson distribution would predict (many subjects with zero events), consider a zero-inflated Poisson model, which separately models the probability of being a "structural zero" versus having a count from the Poisson process.

When to use Poisson Regression?

You should use Poisson Regression in the following scenario:

Your outcome is a count of events (0, 1, 2, 3...)
You want to know which factors affect the event rate
The variance is approximately equal to the mean
Observations are independent

Count Data

Typical examples include: number of support tickets per user per week, number of crashes per deployment, number of purchases per customer per month, number of defects per manufacturing batch.

Rate Modeling

If subjects have different exposure times (e.g., users active for different durations), use an offset to model rates rather than raw counts.

If variance is much greater than the mean, use Negative Binomial Regression. If your outcome is binary, use Logistic Regression.

Poisson Regression Example

Outcome: Number of customer support tickets filed per user per month. Predictors: Account age, plan tier, number of active integrations.

We model the number of support tickets as a function of user characteristics. The Poisson model estimates coefficients on the log scale. Exponentiating a coefficient gives the incidence rate ratio (IRR).

For example, if the IRR for "premium plan" is 0.65, premium users file 35% fewer support tickets per month compared to free-tier users, controlling for account age and integrations. A $p$ -value $\le 0.05$ means this effect is statistically significant.

References

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2726498/
https://online.stat.psu.edu/stat504/lesson/9

Frequently Asked Questions

When should I use Poisson regression instead of linear regression?

Use Poisson regression when your outcome is a non-negative integer count (0, 1, 2, 3...). Linear regression can predict negative counts and fractional values, which are meaningless for count data. Poisson regression ensures predictions are non-negative and properly models the discrete, skewed nature of count data.

What is overdispersion and why does it matter?

Overdispersion occurs when the variance of your count data is much larger than the mean. Poisson regression assumes they are equal. If overdispersed, standard errors will be too small, p-values too optimistic, and confidence intervals too narrow. Use a Negative Binomial Regression or quasi-Poisson model instead.

Can I include an offset or exposure variable?

Yes. If observation periods differ across subjects (e.g., one user was active for 7 days, another for 30 days), include the log of the exposure time as an offset. This converts the model from predicting counts to predicting rates.

Key Takeaway

Poisson regression is the standard model for count data where you want to understand which factors drive event frequency. It works best when counts are not overdispersed (variance roughly equals mean). For overdispersed data, switch to Negative Binomial Regression.

Send to a friend

Share this with someone who loves clean statistical work.

Facebook X Reddit LinkedIn Email