paint-brush
How to Implement Gradient Descent with a Linear Regression Modelby@walterwhites
475 reads
475 reads

How to Implement Gradient Descent with a Linear Regression Model

by FloSeptember 15th, 2023
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

This tutorial demonstrates how to create a simple linear regression model with gradient descent in Python. It explains key concepts like linear relationships, gradient descent, learning rate, and coefficients. The step-by-step guide includes code for essential functions and visualizing cost evolution with matplotlib, allowing you to predict salaries based on years of experience.
featured image - How to Implement Gradient Descent with a Linear Regression Model
Flo HackerNoon profile picture


This article illustrates how to build, in less than 5 minutes, a simple linear regression model with gradient descent. The goal is to predict a dependent variable (y) from an independent variable (X).


We want to predict salaries given years of experience. For that, we will explain a few concepts (Gradient descent, Linear model) and code 4 functions:


  • Predict function: To predict a salary from years of experience. (We found the best coefficients, B0 and B1, thanks to the gradient descent).
  • Cost function: It allows the visualization of the cost errors to each iteration. It uses mean squared error (the difference between the prediction and the real values).
  • Gradient descent: Find the best coefficients B0 and B1.
  • Print graph: Used to display scatter plots of values predicted from the model and real values with matplotlib.


After that, we will train our model using the learning rate. Finally, we find the best coefficient and predict new values never seen by the model.


Linear model

In machine learning, the linear model is a regression model searching for the relationship between the independent variable (X) and the dependent variable.


In this article, we dive into simple linear regression (with only one independent variable).


The formula for simple linear regression is:


y = B0 + B1x


y is the variable we want to predict

x is the independent variable (input variable)

B0 is the term representing y when x = 0

B1 is the coefficient (weight) linked to x.


When you build a simple linear regression model, the goal is to find the parameters B0 and B1. To find the best parameters, we use gradient descent.


Imagine your model finds that the best parameters are B0 = 10 and B1 = 12.

If you want to predict y (salary) based on new data (10 years of experience), you just need to calculate it.


y (salary) = 10 + (12 * 10) = 130k


With a simple calculation, your model succeeds in predicting a salary (130k) based on unknown data (10 years of experience).

Gradient descent

Gradient descent is one of the methods to train the model and find the best parameters/coefficient (B0 and B1).


For that, it calculates the errors and adjusts the gradients according to the partial derivative.


Below, I detail and explain the B0 and B1 calculations.

exp = np.array([1, 2, 3, 4, 5])
salaries = np.array([30, 40, 50, 60, 70])

learning_rate = 0.1
B0 = 2
B1 = 2

pred = B0 + B1 * exp
     = 2 + (2 * [1, 2, 3, 4, 5])
     = 2 + (2 + 4 + 6 + 8 + 10)
     = [4 + 6 + 8 + 10 + 12]

errors = pred - salaries
       = [4 + 6 + 8 + 10 + 12] - [30, 40, 50, 60, 70]
       = [-26, -34, -42, -50, -58]

gradient_B0 = sum(errors) / len(exp)
            = sum([-26, -34, -42, -50, -58]) / 5
            = -42

gradient_B1 = sum(errors * exp) / len(exp)
            = sum([-26 * 1, -34 * 2, -42 * 3, -50 * 4, -58 * 5]) / 5
            = sum([-26, -68, -126, -200, -290]) / 5
            = -142

B0 = B0 - (gradient_B0 * learning_rate)
   = 2 - (-42 * 0.1)
   = 2 - (-4.2)
   = 6.2

B1 = B1 - (gradient_B1 * learning_rate)
   = 2 - (-142 * 0.1)
   = 2 - (-14.2)
   = 16.2


First, we define an arbitrary or random value for B0 and B1. Based on the formula B0 + B1 * exp, we calculate prediction. Afterward, we calculate errors. Errors are the prediction minus real values (salaries). We use those errors to find gradient_B0 and gradient_B1.



Why do we need gradient_B0 and gradient_B1?


In simple words, gradient descent tries to find the line-minimizing errors.

For that, it updates B0 (Intercept) and B1 (Slope).

B0 represents the value of y when x is 0.

B1 represents the change in y for a unit change in x. For example, if y increases by 10 when x increases by 1, B1 would be 10.


In each iteration, gradient descent will reduce the cost by adjusting the intercept and slope with new values.


Linear function by Flo




Creating functions

Predict function

First of all, we start with the predict function.


def predict(exp, B0, B1):
  return B0 + B1 * exp


To understand it, I will share the formula of simple linear regression and briefly explain the role of coefficients B0 and B1.


linear regression formula: y = β0​ + β1 ​⋅ X+ϵ

y is the variable we want to predict (salary)

X is the independent variable (years of experience)

B0 and B1 are the coefficients we adjust to find the lower cost. The most the cost is the better the predictions will be.


Cost function

The role of the cost function is to calculate the difference between prediction and real values. For this article’s purpose, I only print the cost for each iteration. According to your need, you can code your own cost function and use it to adjust the parameters of your model.


I use mean squared error: MSE = (1/n) * Σ(yi — ŷi)²

n is the number of samples

yi is the real value

ŷi is the predicted value


I wrote an article explaining the MSE in detail.


def mse_cost_function(error, predictions):
  return np.sum(error ** 2) / (2 * len(predictions))


Gradient descent

We apply gradient descent using the learning rate. Its purpose is to adjust the model parameters during each iteration. It controls how quickly or slowly the algorithm converges to a minimum of the cost function.


I fixed its value to 0.01. Be careful, if you have a learning rate too high, the gradient descent could never converge towards the minimum.


def gradient_descent(exp, salaries, B0, B1, learning_rate, num_iterations):
    num_samples = len(exp)
    cost_history = []
    for _ in range(num_iterations):
        predictions = predict(exp, B0, B1)
        error = predictions - salaries
        gradient_B0 = np.sum(error) / num_samples
        gradient_B1 = np.sum(error * exp) / num_samples
        B0 -= learning_rate * gradient_B0
        B1 -= learning_rate * gradient_B1
        cost = mse_cost_function(error, predictions)
        cost_history.append(cost)
    return B0, B1, cost_history



Scatter plot function

We display the cost history we saved in the gradient descent.

You can try several learning rates and a number of iteration variables to see the impact on the cost curve.


Below I tried with num_iterations = 200

def print_graph(exp, salary):
    plt.scatter(exp, salary, label="Real values")
    plt.plot(exp, predict(exp, B0, B1), color='red', label="Linear Regression")
    plt.xlabel("Years of experience")
    plt.ylabel("Salary")
    plt.legend()
    plt.show()


Cost Evolution by Flo


Training the model

Importing the libraries and initializing variables.


  • exp is an independent variable representing years of experience

  • salaries is a dependent variable representing salary


I set up arbitrary values for B0, B1, learning_rate, and num_iterations.

num_iterations represents the number of iterations/steps the algorithm performs.


import numpy as np
import matplotlib.pyplot as plt
exp = np.array([1, 2, 3, 4, 5])
salaries = np.array([30, 40, 50, 60, 70])
B0 = 2
B1 = 2
learning_rate = 0.01
num_iterations = 1000
B0, B1, cost_history = gradient_descent(exp, salaries, B0, B1, learning_rate, num_iterations)
print_graph(exp, salaries)


Visualize Cost Evolution

We simply use matplotlib to display a scatter plot of cost_history.

It allows us to visualize cost by iteration, in the first iterations, the cost is high, while at the end, the cost tends to be 0.


plt.plot(range(num_iterations), cost_history, marker='o')
plt.xlabel('Iteration')
plt.ylabel('Cost')
plt.title('Cost Evolution during Gradient Descent')
plt.axis([0, num_iterations, 0, max(cost_history)])
plt.show()


Visualize cost evolution during gradient descent by Flo




Predict New value

After finding the best coefficients, B0 and B1, we are now able to predict new values. For 30 years of experience, our model predicts a salary of 339.73k.


new_prediction = predict(30, B0, B1)
print(new_prediction)


Conclusion

We saw the different steps to code a simple linear regression model.

Explaining concepts such as Linear relationship, gradient descent, learning rate, and coefficient representing the intercept and slope.


We implemented gradient descent with Python by calculating B0 et B1,

and finally, printing the cost evolution with matplotlib.

You can find the full source code on Kaggle and Github.


Also published here.