The StatsTest Flow: Difference >> Proportional or Categorical >> One Group Variable >> More Than Two Options >> More Than 10 In a Cell
Not sure this is the right statistical method? Use the Choose Your StatsTest workflow to select the right method.
What is the Chi-Square Goodness Of Fit Test?
The Chi-Square Goodness Of Fit Test is a statistical test used to determine if the proportions of categories in a single qualitative variable significantly differ from an expected or known population proportion. To use it, you should have one group variable with more than two or more options and you should have more than 10 values per cell. See more below.
The Chi-Square Goodness Of Fit Test is also called the The Goodness Of Fit Test, The Chi-Squared Test (not to be confused with Chi-Square Test of Independence), Chi-Square Test of Goodness of Fit).
Assumptions for the Chi-Square Goodness Of Fit Test
Every statistical method has assumptions. Assumptions mean that your data must satisfy certain properties in order for statistical method results to be accurate.
The assumptions for the Chi-Square Goodness Of Fit Test include:
- Categorical variable
- Mutually exclusive groups
Let’s dive into what that means.
For this test, your variable must be categorical with more than two categories. A categorical variable is a variable that is a category without a natural order. Examples of categorical variables are eye color, city of residence, type of dog, etc.
Each of your observations (data points) should be independent. This means that each value of your variables doesn’t “depend” on any of the others. For example, this assumption is usually violated when there are multiple data points over time from the same unit of observation (e.g. subject/customer/store), because the data points from the same unit of observation are likely to be related or affect one another.
Mutually Exclusive Groups
The groups of your categorical variable should be mutually exclusive. For example, if your categorical variable is city of residence, then your groups are mutually exclusive, because one person cannot live in multiple cities at once.
When to use the Chi-Square Goodness Of Fit Test?
You should use the Chi-Square Goodness Of Fit Test in the following scenario:
- You want to know the difference between two variables
- Your variable of interest is proportional or categorical
- You have two or more than two options
- You have more than 10 in a cell
Let’s clarify these to help you know when to use the Chi-Square Goodness Of Fit Test.
You are looking for a statistical test to look at how a variable differs between two groups. Other types of analyses include testing for a relationship between two variables or predicting one variable using another variable (prediction).
Proportional or Categorical
For this test, your variable of interest must be proportional or categorical. A categorical variable is a variable that contains categories without a natural order. Examples of categorical variables are eye color, city of residence, type of dog, etc. Proportional variables are derived from categorical variables, for instance: the number of people that converted on two different versions of your website (10% vs 15%), percentages, the number of people who voted vs people who did not vote, the proportion of plants that died vs survived an experimental treatment, etc.
If you have a continuous variable that you want to compare to an expected population, you may want to use a Single Sample Z-Test.
More than Two Options
Your categorical variable should have more two or more options. Some examples of variables like this are eye color, city of residence, and type of dog.
If you have only two options and less than 10 in a cell, you should consider using the Binomial Exact Test of Goodness of Fit.
More than 10 in a Cell
The rule-of-thumb we recommend is to use this test when you have around 10 or fewer observations in each cell. “Cell” in this case refers simply to the count of values in each group. For example, if I have a list of survey responses with 5 “yes” and 1 “no”, there are 5 and 1 value(s) per cell, respectively.
If you have fewer than 10 in a cell, we recommend using the Multinomial Exact Goodness of Fit Test. And if you have more than 10 in every cell and more than 1000 total observations, we recommend using the G-Test of Goodness of Fit.
Chi-Square Goodness Of Fit Test Example
Variable: Political party
In this example, we have a group of subjects and are interested in investigating whether their political party alignment differs from the typical proportions of the population from which the sample was drawn. The null hypothesis is that there is no difference between the proportions in each political party between the sample and population.
Because our variable is categorical with two or more values (one value for each political party), and our data meet the other assumptions, we know that the Chi-Square Goodness Of Fit Test is a suitable test.
The analysis will result in a chi-square statistic and a p-value. The p-value represents the chance of seeing our results if the sample was randomly selected from the population. The lower the p-value, the more different our sample proportions are from the population. A p-value less than or equal to 0.05 means that our result is statistically significant and we can conclude that our sample is different from the population on our variable of interest.
Frequently Asked Questions
Q: How do I run the Chi-Square Goodness Of Fit Test in R?
A: StatsTest is focused on helping you pick the right statistical method every time. There are many resources available to help you figure out how to run this method with your data:
R article: http://www.sthda.com/english/wiki/chi-square-goodness-of-fit-test-in-r
R video: https://www.youtube.com/watch?v=VNG7MtXidrg
If you still can’t figure something out, feel free to reach out.