Log-Linear Analysis

The StatsTest Flow: Difference >> Proportional or Categorical >> Three or More Group Variables

Not sure this is the right statistical method? Use the Choose Your StatsTest workflow to select the right method.


What is Log-Linear Analysis?

Log-Linear Analysis is a statistical test used to determine if the proportions of categories in two or more group variables significantly differ from each other. To use this test, you should have two or more group variables with two or more options in each group variable. See more below.

Log-Linear Analysis is also called Multi-Way Frequency Tables, Log-Linear Analysis of Frequency Tables, or Log Linear Models.


Assumptions for Log-Linear Analysis

Every statistical method has assumptions. Assumptions mean that your data must satisfy certain properties in order for statistical method results to be accurate.

The assumptions for Log-Linear Analysis include:

  1. Random Sample
  2. Independence
  3. Mutually exclusive groups

Let’s dive into what that means.

Random Sample

The data points for each group in your analysis must have come from a simple random sample. This is important because if your groups were not randomly determined then your analysis will be incorrect. In statistical terms this is called bias, or a tendency to have incorrect results because of bad data.

Independence

Each of your observations (data points) should be independent. This means that each value of your variables doesn’t “depend” on any of the others. For example, this assumption is usually violated when there are multiple data points over time from the same unit of observation (e.g. subject/customer/store), because the data points from the same unit of observation are likely to be related or affect one another.

Mutually Exclusive Groups

The two groups of your categorical variable should be mutually exclusive. For example, if your categorical variable is hungry (yes/no), then your groups are mutually exclusive, because one person cannot belong to both groups at once.


When to use Log-Linear Analysis?

You should use Log-Linear Analysis in the following scenario:

  1. You want to test the difference between two or more variables
  2. Your variable of interest is proportional or categorical
  3. You have two or more options

Let’s clarify these to help you know when to use Log-Linear Analysis.

Difference

You are looking for a statistical test to look at how a variable differs between two groups. Other types of analyses include testing for a relationship between two variables or predicting one variable using another variable (prediction).

Proportional or Categorical

For this test, your variable of interest must be proportional or categorical. A categorical variable is a variable that contains categories without a natural order. Examples of categorical variables are eye color, city of residence, type of dog, etc. Proportional variables are derived from categorical variables, for instance: the number of people that converted on two different versions of your website (10% vs 15%), percentages, the number of people who voted vs people who did not vote, the proportion of plants that died vs survived an experimental treatment, etc.

If you want to compare two or more continuous variables, you may want to use a One-Way ANOVA.

Two or more Options

Your categorical variables should have two or more possible options. Some examples of variables like this are made a purchase (yes/no), color (black/white/red/etc), recovered from disease (yes/no).


Log-Linear Analysis Example

Group Variable 1: Bird Size (large/small)
Group Variable 2: Bird Color (black/white/gray)
Group Variable 3: Bird Habitat (island/mainland)

In this example, we are interested in investigating whether there are significant relationships among our variables of bird size, color, and habitat. The null hypothesis is that there is no relationship among the variables.

Because our variable has two or more possible values (yes/no), and our data meet all other assumptions, we know that the Chi-Square Test of Independence is appropriate to use.

The analysis will result in a probability or p-value for each interaction between variables. The p-value represents the chance of seeing our results if there was actually no relationship among the variables in question. A p-value less than or equal to 0.05 means that our result is statistically significant and we can trust that the difference is not due to chance alone.

Frequently Asked Questions

Q: How do I run a Log-Linear Analysis in R or SPSS?
A: StatsTest is focused on helping you pick the right statistical method every time. There are many resources available to help you figure out how to run this method with your data:
R article: https://data.library.virginia.edu/an-introduction-to-loglinear-models/
R video: https://www.youtube.com/watch?v=fwMOpntkCDQ
SPSS video: https://www.youtube.com/watch?v=-jOlF8lIUGg

Help!

If you still can’t figure something out, feel free to reach out.

css.php