Hello! In this post we will be going through different ways to calculate the regression line. Here are the topcis to cover:
The least squares regression line is just a straight line plotted on an xy graph. This is what it can look like:
Here are some Alternative names:
Here is the technical definition:
The regression line is the line which minimizes the sum of the residuals squared.
But, what does that mean? Lets dive in.
Residuals are the distance between the data point and the line.
The line of best fit should attempt to minimise these differences. However, we cannot just add them all up, as this would result in all the differences cancelling each other out because some are negative and some are positive. And thus we square the differences.
This is why it is called the least squares regression line, because it is the line which uses up the “least squares” in terms of area.
So we can technically do a bunch of complex maths to derive what this line is, and you could definitely look into that, however it is not necessary. If you are HL and are trying to extend the maths, this might be a good extension. However there are also more simple equations that can be used to calculate it so lets look at some examples:
The equation for the slope is:
Lets unpack this equation by first asking this question: What does the standard gradient equation look like? And how is it similar to the linear regression slope equation.
The gradient essentially decribes how much you rise up for how much you run across by. You can calculate this on a graph using rise over run (middle school maths). As you can see, the regression slope equation resembles this. The standard deviation of y, is how stretched out y is. the standard deviation of x is how stretched out x is, basically like rise over run. The r value adjusts the value for how connected the variables.
Remember, the regression line runs through the middle of the data points. What value represents the middle value of data? The mean (average). One feature of the regression equation is that (mean x, mean y) will always be a point on the regression equation:
Thus, using the mean point, and the slope, you can solve for the y intercept.
You can either calculate using technology by plotting a scatter plot. Or you can do it by hand using the equations and then write about it in your IA. If you decide to do it by hand, I would make sure to check your answers using the scatter plot.
3. Double click data points, check the box Trend Line, click the show equation option.
Final product:
Calculating the regression line using the Equations on Google Sheets:
1) Calculate standard deviation
2) Calculate Pearson's correlation coefficient
3) Calculate the mean/average of x and y
4) Calculate the slope
5) Calculate the y-intercept
6) Put it all together