Linear Regression with one Variable

This is the python version of Linear Regression, most of this is converted from octave/matlab code of the Stanford Machine Learning Course from Coursera

This will be purely Mathematical and Algorithmic

Importing required libraries...

import numpy as np
import matplotlib.pyplot as plt

Functions to be used...

Let's first define the functions that will be called in actual code...

Hypothesis function

Outputs Y, Given X, by parameter θ

  1. In Simple Definition:

  1. In Vector Form:

  • m -> number of elements in the dataset
  • X -> mx2 matrix
  • θ -> 2x1 matrix
  • m -> dataset size
  • Outputs mx2 matrix

Vector Form is implemented here for faster Operation.

def hypothesis(x, theta):
    '''calculates the hypothesis function'''
    return np.matmul(x, theta)

Cost Function

This essentially calculates the distance between what we need the line to be, and what it actually is:

  1. Definition:
    • m -> number of elements in the dataset
    • x(i) -> value of x in ith data point
    • y(i) -> value of y in ith data point

  1. Vector Form for faster implementation
    • m -> number of elements in the dataset
    • h(θ, X) and Y -> mx1 matrix
    • Outputs 1x1 matrix

def compute_cost(x, y, theta):
    '''outputs the cost function'''
    m = len(y)
    error = hypothesis(x, theta) - y
    error_square_summed = np.matmul(np.transpose(error), error)
    return 1/(2*m)*error_square_summed

Gradient Descent

Gradient Descent is an iterative way through which we can minimize the cost function J(θ,x), which essentially depends on the values of θ0 and θ1

Gradient Descent is implemented as follows:


  1. α -> a constant, also called learning rate of algorithm
  2. θj -> jth value of θ
  3. J( θ0 , θ1 ) -> The Cost Function

This algorithm iteratively minimizes J(θ ,x) to reach it's minimum possible value

  • Vector Implementation to speed up Algorithm:
    • m -> dataset size
    • X -> mx2 matrix
    • h(θ, X) and Y -> mx1 matrix

def gradient_descent(x, y, theta, alpha, num_iter):
    '''Performs Gradient Descent and outputs minimized theta and history of cost_functions'''
    m = len(y)
    J_history = np.zeros((num_iter, 1))

    for iter in range(num_iter):
        h = hypothesis(x, theta)
        error = h-y
        partial_derivative = 1/m * np.matmul(np.transpose(x), error)
        theta = theta - alpha*partial_derivative

        J_history[iter] = compute_cost(x, y, theta)

    return theta, J_history


uses hypothesis() to predict value of new input

def predict(value, theta):
    x_array = [1, value]
    return hypothesis(x_array,theta)

Now Let's start the actual processing


Loading Data from ex1data1.txt

In each line, first value is 'Population of City in 10,000s' and second value is 'Profit in $10,000s'

data_path = "./ex1data1.txt"
data = np.loadtxt(data_path, delimiter=',')

Extracting Population and Profits from data

# first value is independent variable x, second is dependant y
independent_x = data[:, 0]
dependant_y = data[:, 1]

Plotting the Scatter-Plot of data

# showing data
print("Plotting Data ...\n")
plt.figure("Scatter Plot Visualization of Data")
plt.title("Scatter Plot Visualization of Data")
plt.scatter(independent_x, dependant_y, marker="x", c="r")
plt.ylabel('Profit in $10,000s')
plt.xlabel('Population of City in 10,000s')

These lines of codes output this graph

Converting x and y in matrix form

as x and y are 1D vector, they have to converted to their respective matrix form

Also as we are about to do a Matrix Multiplication, to simulate hθ(x) = θ0 + θ1 * x , we are appending 1 to every rows of x to simulate addition of θ0 by matrix multiplication.

# as we are going to use matrix multiplication, we need x as first column 1, second column values
dataset_size = independent_x.shape[0]
ones = np.ones(dataset_size)
x = np.stack((ones, independent_x), axis=1)

# also converting y in vector form to matrix form
y = dependant_y.reshape(len(dependant_y), 1)

Testing hypothesis and cost function with sample inputs

# initializing theta
theta = np.zeros((2, 1))
alpha = 0.01
num_iter = 1500

print("Testing the cost function ...")
print(f"with theta = [[0],[0]] \nCost computed = {compute_cost(x,y,theta)}")
print("Expected cost value (approx) 32.07\n")

print(f"with theta = [[-1],[2]] \nCost computed = {compute_cost(x,y,[[-1],[2]])}")
print("Expected cost value (approx) 54.24\n")

Running Gradient Descent

print("Running Gradient Descent ...\n")
minimized_theta, J_history = gradient_descent(x, y, theta, alpha, num_iter)

Plotting Value of J during Gradient Descent

If there is value of α is not high, this should decrease with epochs

plt.figure("Value of J during Gradient Descent")
plt.title('Value of J during Gradient Descent')
x_axis = range(len(J_history))
plt.xlabel('No. of iterations or Epochs')
plt.ylabel("Cost function J")

A sample graph generated from this...

Testing Minimized Theta

print("Theta found by gradient descent:")
print("Expected theta values (approx)")
print("  3.6303\n  1.1664\n")

Predicting Value for new Inputs

print(f"For population = 35,000, we predict a profit of {predict(3.5, minimized_theta)*10000}")
print(f"For population = 70,000, we predict a profit of {predict(7, minimized_theta)*10000}")

Plotting Scatterplot and the Line of Hypothesis

plt.figure("Linear Regression Result")
plt.title("Linear Regression Result")
plt.scatter(independent_x, dependant_y, marker="x", c="r")
plt.plot(x[:, 1], hypothesis(x, minimized_theta))
plt.ylabel('Profit in $10,000s')
plt.xlabel('Population of City in 10,000s')

A Sample graph from the above code...


I have linked a .ipynb file Here, if you want to see the code in action, do note that the dataset will not be accessible if you will open the file in colab, you have to download that from Here (This dataset is taken from the Course Assignment)

Here is a Link to the Machine Learning Course I was talking about...

