The StatsTest Flow: Relationship >> One Continuous One Binary
Not sure this is the right statistical method? Use the Choose Your StatsTest workflow to select the right method.
What is Point-Biserial Correlation?
Point-biserial correlation is used to understand the strength of the relationship between two variables. Your variables of interest should include one continuous and one binary variable. See more below.
Point-Biserial correlation is also called the point-biserial correlation coefficient.
Assumptions for Point-Biserial correlation
Every statistical method has assumptions. Assumptions mean that your data must satisfy certain properties in order for statistical method results to be accurate.
The assumptions for Point-Biserial correlation include:
- Continuous and Binary
- Normally Distributed
- No Outliers
- Equal Variances
Let’s dive in to each one of these separately.
Continuous and Binary
For this test, you should have one continuous and one binary variable. Continuous means that the variable can take on any reasonable value. Some good examples of continuous variables include age, weight, height, test scores, survey scores, yearly salary, etc.
Binary means that your variable is a category with only two possible values. Some good examples of binary variables include smoker(yes/no), sex(male/female) or any True/False or 0/1 variable.
Normally Distributed
The variable that you care about must be spread out in a normal way. In statistics, this is called being normally distributed (aka it must look like a bell curve when you graph the data). Only use Point-Biserial Correlation on your data if the variable you care about is normally distributed.
No Outliers
The variables that you care about must not contain outliers. Point-Biserial correlation is sensitive to outliers, or data points that have unusually large or small values. You can tell if your variables have outliers by plotting them and observing if any points are far from all other points.
Equal Variances
One of the assumptions of Point-Biserial correlation is that there is similar spread between the two groups of the binary variable. You can check for this assumption by plotting your continuous variable in each of your two groups and visually identifying if the spread of the data is similar.
When to use Point-Biserial Correlation?
You should use Point-Biserial Correlation in the following scenario:
- You want to know the relationship between two variables
- Your variables of interest include one continuous and one binary variable
- You have only two variables
Let’s clarify these to help you know when to use Point-Biserial Correlation
Relationship
You are looking for a statistical test to look at how two variables are related. Other types of analyses include testing for a difference between two variables or predicting one variable using another variable (prediction).
One Continuous and One Binary
For this test, you should have one continuous and one binary variable. Continuous means that the variable can take on any reasonable value. Some good examples of continuous variables include age, weight, height, test scores, survey scores, yearly salary, etc.
Binary means that your variable is a category with only two possible values. Some good examples of binary variables include smoker(yes/no), sex(male/female) or any True/False or 0/1 variable.
If you have two continuous variables, you should use Pearson Correlation. And if you have at least one ordinal variable, you should use Spearman’s Rho or Kendall’s Tau instead.
Two Variables
Point-Biserial Correlation can only be used to compare two variables.
Point-Biserial Correlation Example
Variable 1: Height.
Variable 2: Gender.
In this example, we are interested in the relationship between height and gender. To begin, we collect these data from a group of people.
Before running Point-Biserial Correlation, we check that our variables meet the assumptions of the method. After confirming that our continuous variable is normally distributed, has no outliers, and has equal variances in each gender, we move forward with the analysis.
The analysis will result in a correlation coefficient (called “r”) and a p-value. R values range from -1 to 1. A negative value of r indicates that the variables are inversely related, or when one variable increases, the other decreases. On the other hand, positive values indicate that when one variable increases, so does the other. In this example, whether r is positive or negative depends on which gender you represent with a value of 0 and which you represent with a value of 1.
Frequently Asked Questions
How do I run Point-Biserial Correlation in SPSS or R?
A: StatsTest is focused on helping you pick the right statistical method every time. There are many resources available to help you figure out how to run this method with your data:
SPSS article: https://statistics.laerd.com/spss-tutorials/point-biserial-correlation-using-spss-statistics.php
SPSS video: https://www.youtube.com/watch?v=76ipx-ta8FY
R article: https://www.rdocumentation.org/packages/ltm/versions/1.1-1/topics/biserial.cor
Help!
If you still can’t figure something out, feel free to reach out.