Regression

Regression vs. t-Test vs. ANOVA: The Unifying View (and When the Simpler Tool Suffices)

Understand how t-tests, ANOVA, and regression are all the same underlying model. Learn when to use the simpler approach and when regression's flexibility is worth it.

Share

Quick Hits

  • T-tests, ANOVA, and regression are all special cases of the general linear model
  • A two-sample t-test is regression with one binary predictor
  • One-way ANOVA is regression with one categorical predictor (dummy coded)
  • Use the simpler tool when it suffices - it's more interpretable
  • Use regression when you need: continuous predictors, multiple covariates, or interactions

TL;DR

T-tests, ANOVA, and linear regression are all special cases of the general linear model. A two-sample t-test is regression with one binary predictor. One-way ANOVA is regression with one categorical predictor. Understanding this unification helps you see that the "choice" between them is about presentation, not statistics. Use the simpler tool when it fits your problem; switch to regression when you need continuous predictors, covariates, or complex comparisons.


The Big Picture

All these tests fit the same underlying model:

$$Y = X\beta + \epsilon$$

Where:

  • Y is your outcome
  • X is your design matrix (encodes groups/predictors)
  • β is your coefficients (means or effects)
  • ε is error (assumed normal, constant variance)

The "choice" between tests is really about:

  1. How you construct X (dummy coding, effect coding, etc.)
  2. How you report results (means vs. coefficients, F vs. t)
  3. How interpretable the output is for your audience

One-Sample t-Test = Regression with Intercept Only

The t-Test

Test whether mean differs from a value (usually 0): $$H_0: \mu = 0$$

The Regression

$$Y = \beta_0 + \epsilon$$

β₀ is the mean of Y. Testing β₀ = 0 is identical to the one-sample t-test.

Demonstration

import numpy as np
from scipy import stats
import statsmodels.formula.api as smf
import pandas as pd

np.random.seed(42)
y = np.random.normal(5, 2, 100)

# One-sample t-test
t_stat, p_value = stats.ttest_1samp(y, 0)
print(f"T-test: t = {t_stat:.4f}, p = {p_value:.4f}")

# Regression (intercept only)
data = pd.DataFrame({'y': y})
model = smf.ols('y ~ 1', data=data).fit()
print(f"Regression: t = {model.tvalues['Intercept']:.4f}, p = {model.pvalues['Intercept']:.4f}")
print(f"Coefficient (mean) = {model.params['Intercept']:.4f}, Sample mean = {y.mean():.4f}")

Output:

T-test: t = 24.8503, p = 0.0000
Regression: t = 24.8503, p = 0.0000
Coefficient (mean) = 4.9397, Sample mean = 4.9397

Two-Sample t-Test = Regression with Binary Predictor

The t-Test

Compare means of two groups: $$H_0: \mu_1 = \mu_2$$

The Regression

$$Y = \beta_0 + \beta_1 \cdot \text{Group} + \epsilon$$

Where Group = 0 for control, 1 for treatment.

Interpretation:

  • β₀ = mean of control group (when Group = 0)
  • β₁ = difference in means (treatment - control)
  • Testing β₁ = 0 is identical to the two-sample t-test

Demonstration

# Generate data
np.random.seed(42)
control = np.random.normal(10, 3, 50)
treatment = np.random.normal(12, 3, 50)

# Two-sample t-test (equal variance)
t_stat, p_value = stats.ttest_ind(control, treatment)
print(f"T-test: t = {t_stat:.4f}, p = {p_value:.4f}")

# Regression
data = pd.DataFrame({
    'y': np.concatenate([control, treatment]),
    'group': [0]*50 + [1]*50
})
model = smf.ols('y ~ group', data=data).fit()
print(f"Regression: t = {model.tvalues['group']:.4f}, p = {model.pvalues['group']:.4f}")
print(f"\nControl mean: {control.mean():.4f}")
print(f"Intercept (β₀): {model.params['Intercept']:.4f}")
print(f"Difference: {treatment.mean() - control.mean():.4f}")
print(f"Group coefficient (β₁): {model.params['group']:.4f}")

Note: The equal-variance t-test matches regression. Welch's t-test (unequal variance) requires heteroscedasticity-robust standard errors in regression.


One-Way ANOVA = Regression with Categorical Predictor

ANOVA

Compare means across k groups: $$H_0: \mu_1 = \mu_2 = ... = \mu_k$$

Regression with Dummy Variables

For k groups, create k-1 dummy variables:

$$Y = \beta_0 + \beta_1 D_1 + \beta_2 D_2 + ... + \beta_{k-1} D_{k-1} + \epsilon$$

Interpretation:

  • β₀ = mean of reference group
  • βⱼ = difference between group j and reference group

The F-Test Connection

ANOVA reports an F-statistic testing all groups equal. In regression:

  • The overall F-test for the model tests the same hypothesis
  • Individual t-tests for dummy coefficients test pairwise differences from reference

Demonstration

# Three groups
np.random.seed(42)
group_a = np.random.normal(10, 2, 40)
group_b = np.random.normal(12, 2, 40)
group_c = np.random.normal(11, 2, 40)

# One-way ANOVA
f_stat, p_value = stats.f_oneway(group_a, group_b, group_c)
print(f"ANOVA: F = {f_stat:.4f}, p = {p_value:.4f}")

# Regression with dummies
data = pd.DataFrame({
    'y': np.concatenate([group_a, group_b, group_c]),
    'group': ['A']*40 + ['B']*40 + ['C']*40
})
model = smf.ols('y ~ C(group)', data=data).fit()
print(f"Regression F-test: F = {model.fvalue:.4f}, p = {model.f_pvalue:.4f}")

print("\nGroup means:")
print(f"  A: {group_a.mean():.4f}")
print(f"  B: {group_b.mean():.4f}")
print(f"  C: {group_c.mean():.4f}")

print("\nRegression coefficients:")
print(f"  Intercept (Group A mean): {model.params['Intercept']:.4f}")
print(f"  B vs A: {model.params['C(group)[T.B]']:.4f}")
print(f"  C vs A: {model.params['C(group)[T.C]']:.4f}")

Two-Way ANOVA = Regression with Two Categorical Predictors

Two-Way ANOVA

Tests:

  • Main effect of Factor A
  • Main effect of Factor B
  • A × B Interaction

Regression Equivalent

$$Y = \beta_0 + \beta_1 A + \beta_2 B + \beta_3 (A \times B) + \epsilon$$

With appropriate dummy coding for categorical variables.

# Regression for two-way ANOVA
model = smf.ols('y ~ C(factor_a) * C(factor_b)', data=data).fit()

# ANOVA table from regression
import statsmodels.api as sm
anova_table = sm.stats.anova_lm(model, typ=2)  # Type II SS
print(anova_table)

Paired t-Test = Regression on Differences

Paired t-Test

Compare paired observations (before/after, matched pairs): $$H_0: \mu_{diff} = 0$$

Regression Equivalent

Create difference variable, then one-sample test:

$$D = Y_{after} - Y_{before}$$ $$D = \beta_0 + \epsilon$$

Testing β₀ = 0 is the paired t-test.

Alternative: Repeated Measures Regression

# Mixed effects model approach
import statsmodels.formula.api as smf

# Long format data with subject ID
model = smf.mixedlm('y ~ time', data=data, groups=data['subject_id']).fit()

The Equivalence Table

Simple Test Regression Equivalent
One-sample t-test Y ~ 1 (intercept only)
Two-sample t-test Y ~ group (binary)
Paired t-test Y_diff ~ 1 or mixed model
One-way ANOVA Y ~ factor (dummy coded)
Two-way ANOVA Y ~ A * B
ANCOVA Y ~ factor + covariate
Correlation test Y ~ X (standardized)
Simple regression Y ~ X
Multiple regression Y ~ X1 + X2 + ...

Code: Demonstrating Equivalence

Python

import numpy as np
import pandas as pd
from scipy import stats
import statsmodels.formula.api as smf
import statsmodels.api as sm


def demonstrate_equivalence():
    """Show that t-tests, ANOVA, and regression give same results."""

    np.random.seed(42)

    # Two-sample case
    print("=" * 60)
    print("TWO-SAMPLE T-TEST vs REGRESSION")
    print("=" * 60)

    control = np.random.normal(100, 15, 50)
    treatment = np.random.normal(110, 15, 50)

    # T-test
    t_result = stats.ttest_ind(control, treatment)
    print(f"\nTwo-sample t-test:")
    print(f"  t = {t_result.statistic:.4f}")
    print(f"  p = {t_result.pvalue:.4f}")

    # Regression
    data = pd.DataFrame({
        'y': np.concatenate([control, treatment]),
        'group': [0]*50 + [1]*50
    })
    model = smf.ols('y ~ group', data=data).fit()
    print(f"\nRegression:")
    print(f"  t = {-model.tvalues['group']:.4f}")  # Sign depends on coding
    print(f"  p = {model.pvalues['group']:.4f}")
    print(f"  Effect (coefficient) = {model.params['group']:.4f}")
    print(f"  Effect (mean diff) = {treatment.mean() - control.mean():.4f}")

    # Three-group case
    print("\n" + "=" * 60)
    print("ONE-WAY ANOVA vs REGRESSION")
    print("=" * 60)

    group_a = np.random.normal(100, 15, 40)
    group_b = np.random.normal(110, 15, 40)
    group_c = np.random.normal(105, 15, 40)

    # ANOVA
    f_result = stats.f_oneway(group_a, group_b, group_c)
    print(f"\nOne-way ANOVA:")
    print(f"  F = {f_result.statistic:.4f}")
    print(f"  p = {f_result.pvalue:.4f}")

    # Regression
    data = pd.DataFrame({
        'y': np.concatenate([group_a, group_b, group_c]),
        'group': ['A']*40 + ['B']*40 + ['C']*40
    })
    model = smf.ols('y ~ C(group)', data=data).fit()
    print(f"\nRegression:")
    print(f"  F = {model.fvalue:.4f}")
    print(f"  p = {model.f_pvalue:.4f}")

    # Paired case
    print("\n" + "=" * 60)
    print("PAIRED T-TEST vs REGRESSION ON DIFFERENCES")
    print("=" * 60)

    before = np.random.normal(100, 15, 30)
    after = before + np.random.normal(5, 10, 30)  # Correlated increase

    # Paired t-test
    t_result = stats.ttest_rel(before, after)
    print(f"\nPaired t-test:")
    print(f"  t = {t_result.statistic:.4f}")
    print(f"  p = {t_result.pvalue:.4f}")

    # Regression on differences
    diff = after - before
    data = pd.DataFrame({'diff': diff})
    model = smf.ols('diff ~ 1', data=data).fit()
    print(f"\nRegression (on differences):")
    print(f"  t = {model.tvalues['Intercept']:.4f}")
    print(f"  p = {model.pvalues['Intercept']:.4f}")


if __name__ == "__main__":
    demonstrate_equivalence()

R

library(tidyverse)


demonstrate_equivalence <- function() {
    set.seed(42)

    # Two-sample case
    cat(strrep("=", 60), "\n")
    cat("TWO-SAMPLE T-TEST vs REGRESSION\n")
    cat(strrep("=", 60), "\n")

    control <- rnorm(50, 100, 15)
    treatment <- rnorm(50, 110, 15)

    # T-test
    t_result <- t.test(control, treatment, var.equal = TRUE)
    cat(sprintf("\nTwo-sample t-test:\n"))
    cat(sprintf("  t = %.4f\n", t_result$statistic))
    cat(sprintf("  p = %.4f\n", t_result$p.value))

    # Regression
    data <- tibble(
        y = c(control, treatment),
        group = factor(c(rep(0, 50), rep(1, 50)))
    )
    model <- lm(y ~ group, data = data)
    cat(sprintf("\nRegression:\n"))
    cat(sprintf("  t = %.4f\n", summary(model)$coefficients["group1", "t value"]))
    cat(sprintf("  p = %.4f\n", summary(model)$coefficients["group1", "Pr(>|t|)"]))

    # One-way ANOVA case
    cat("\n", strrep("=", 60), "\n")
    cat("ONE-WAY ANOVA vs REGRESSION\n")
    cat(strrep("=", 60), "\n")

    group_a <- rnorm(40, 100, 15)
    group_b <- rnorm(40, 110, 15)
    group_c <- rnorm(40, 105, 15)

    data <- tibble(
        y = c(group_a, group_b, group_c),
        group = factor(c(rep("A", 40), rep("B", 40), rep("C", 40)))
    )

    # ANOVA
    aov_result <- aov(y ~ group, data = data)
    cat(sprintf("\nOne-way ANOVA:\n"))
    cat(sprintf("  F = %.4f\n", summary(aov_result)[[1]]["group", "F value"]))
    cat(sprintf("  p = %.4f\n", summary(aov_result)[[1]]["group", "Pr(>F)"]))

    # Regression
    model <- lm(y ~ group, data = data)
    cat(sprintf("\nRegression:\n"))
    cat(sprintf("  F = %.4f\n", summary(model)$fstatistic[1]))
    cat(sprintf("  p = %.4f\n", pf(summary(model)$fstatistic[1],
                                   summary(model)$fstatistic[2],
                                   summary(model)$fstatistic[3],
                                   lower.tail = FALSE)))
}


demonstrate_equivalence()

When to Use Which

Use t-Test When

  • You have two groups
  • No covariates to control for
  • You want simple, recognizable output
  • Your audience thinks in terms of "comparing two means"

Use ANOVA When

  • You have multiple (3+) categorical groups
  • No continuous predictors
  • You want to decompose variance (between vs. within)
  • Your audience is familiar with ANOVA tables

Use Regression When

  • You have continuous predictors
  • You need to control for covariates
  • You want custom contrasts (not just vs. reference)
  • You have complex interactions
  • You need to include both categorical and continuous predictors
  • You want coefficient-based interpretation

Advantages of the Regression Framing

1. Flexibility

Regression handles any combination of:

  • Continuous predictors
  • Categorical predictors (with dummy coding)
  • Interactions
  • Covariates

2. Custom Contrasts

With regression, you can easily test specific comparisons:

# Test: Is the average of B and C different from A?
model = smf.ols('y ~ C(group, Treatment("A"))', data=data).fit()

# Custom contrast matrix
from patsy.contrasts import ContrastMatrix
# ... define specific contrasts

3. Extends to GLMs

The same framework extends to:

  • Logistic regression (binary outcomes)
  • Poisson regression (count outcomes)
  • Any generalized linear model

4. Clearer About What You're Testing

Regression output shows exactly what comparison each coefficient represents, rather than hiding it in sum-of-squares decomposition.


Common Misconceptions

"Regression requires normality of X"

False. Regression assumes normality of residuals (errors), not predictors. You can use regression with any distribution of X.

"ANOVA is for experiments, regression is for observational data"

False. Both can be used for either. The model is the same; the interpretation differs based on study design.

"I need to check normality of my groups for ANOVA"

Partially false. You need approximate normality of residuals (within each group). With large samples, this matters less due to CLT.

"T-tests are less powerful than regression"

False. They're identical (same test). Regression might have more power if you include relevant covariates that reduce error variance.



Key Takeaway

T-tests, ANOVA, and regression are all the general linear model with different interfaces. Understanding this unification demystifies statistical testing: there's one underlying framework, and the "different tests" are just different ways of specifying and presenting it. Use the simplest tool that fits your problem (t-test for two groups, ANOVA for multiple groups with categorical predictors), but know that regression is there when you need its flexibility (continuous predictors, covariates, custom contrasts). The p-values and statistical conclusions will be identical.


References

  1. https://lindeloev.github.io/tests-as-linear/
  2. https://www.amazon.com/Statistical-Rethinking-Bayesian-Examples-Chapman/dp/036713991X
  3. https://www.sciencedirect.com/science/article/pii/S0022103117307746
  4. Lindeløv, J. K. (2019). Common statistical tests are linear models. Online tutorial.
  5. McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (2nd ed.). CRC Press.
  6. Judd, C. M., McClelland, G. H., & Ryan, C. S. (2017). Data analysis: A model comparison approach (3rd ed.). Routledge.

Frequently Asked Questions

Will I get different results from a t-test vs. regression?
No. A two-sample t-test (equal variance assumed) gives identical p-values and equivalent test statistics to regression with a binary predictor. The regression coefficient equals the difference in means. They're mathematically the same.
Why would I use ANOVA if it's just regression?
ANOVA notation and output are more intuitive when your predictors are purely categorical. You think in terms of group means and between/within variance rather than dummy variable coefficients. Use whichever framing is clearer for your audience.
When should I switch from ANOVA to regression?
When you have continuous predictors, want to control for covariates (ANCOVA → regression), have multiple factors with complex interactions, or want to extract specific contrasts easily. Regression is more flexible but requires more interpretation.

Key Takeaway

T-tests, ANOVA, and regression are all the general linear model with different packaging. Understanding this unification helps you see when the simpler tool suffices (categorical predictors, no covariates) and when regression's flexibility is needed (continuous predictors, multiple controls, custom contrasts).

Send to a friend

Share this with someone who loves clean statistical work.