401 Project 10 - Regression: Perceptrons
Project Objectives
In this project, we will be learning about perceptrons and how they can be used for regression. We will be using the Boston Housing dataset as it has many different potential features and target variables.
Questions
Question 1 (2 points)
A perceptron is a simple model that can be used for regression. These perceptrons can be combined together to create neural networks. In this project, we will be creating a perceptron from scratch.
To start, let’s load in the Boston Housing dataset with the below code:
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
df = pd.read_csv('/anvil/projects/tdm/data/boston_housing/boston.csv')
X = df.drop(columns=['MEDV'])
y = df[['MEDV']]
scaler = StandardScaler()
X = scaler.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.15, random_state=7)
y_train = y_train.to_numpy()
y_test = y_test.to_numpy()
Now, we can begin discussing what a perceptron is. A perceptron is a simple model that takes in a set of inputs and produces an output. The perceptron is defined by a set of weights and a bias term (similar to our linear regression model having coefficients and an y intercept term). The perceptron then takes the dot product of the input features and the weights and adds the bias term.
Then, the perceptron will apply some activation function before outputting the final value. This activation function is some non-linear function that allows the perceptron to learn complex data, instead of behaving as a linear model.
There are many different activation functions, some of the most common are listed below:
Activation Function | Formula | Derivative | Usage |
---|---|---|---|
Linear |
x |
1 |
Final layer of regression to output continuous values |
ReLU |
max(0, x) |
1 if x > 0, 0 otherwise |
Hidden layers of neural networks |
Sigmoid |
1 / (1 + exp(-x)) |
sigmoid(x) * (1 - sigmoid(x)) |
Final layer of binary classification, or hidden layers of neural networks |
Tanh |
(exp(x) - exp(-x)) / (exp(x) + exp(-x)) |
1 - tanh(x)^2 |
Hidden layers of neural networks |
For this project, we will be creating a perceptron class that can be used for regression. There are many different parameters that can be set when creating a perceptron, such as the learning rate, number of epochs, and activation function.
For this question, please implement functions for the Linear, ReLU, Sigmoid, and Tanh activation functions. Additionally, implement the derivative of each of these functions. These functions should be able to take in a numpy array and return the transformed array.
import numpy as np
def linear(x):
pass
def linear_d(x):
pass
def relu(x):
pass
def relu_d(x):
pass
def sigmoid(x):
pass
def sigmoid_d(x):
pass
def tanh(x):
pass
def tanh_d(x):
pass
To test your functions, you can use the below code:
x = np.array([-1, 0, 1])
print(linear(x)) # should return [-1, 0, 1]
print(linear_d(x)) # should return [1, 1, 1]
print(relu(x)) # should return [0, 0, 1]
print(relu_d(x)) # should return [0, 0, 1]
print(sigmoid(x)) # should return [0.26894142, 0.5, 0.73105858]
print(sigmoid_d(x)) # should return [0.19661193, 0.25, 0.19661193]
print(tanh(x)) # should return [-0.76159416, 0, 0.76159416]
print(tanh_d(x)) # should return [0.41997434, 1, 0.41997434]
-
Completed activation and derivative functions
-
Test the functions with the provided code
Question 2 (2 points)
Now that we have our activation functions, let’s start working on our Perceptron class. This class will create a perceptron that can be used for regression problems. Below is a skeleton of our Perceptron class:
class Perceptron:
def __init__(self, learning_rate=0.01, n_epochs=1000, activation='relu'):
# this will initialize the perceptron with the given parameters
pass
def activate(self, x):
# this will apply the activation function to the input
pass
def activate_derivative(self, x):
# this will apply the derivative of the activation function to the input
pass
def compute_linear(self, X):
# this will calculate the linear combination of the input and weights
pass
def error(self, y_pred, y_true):
# this will calculate the error between the predicted and true values
pass
def backward_gradient(self, error, linear):
# this will update the weights and bias of the perceptron
pass
def predict(self, X):
# this will predict the output of the perceptron given the input
pass
def train(self, X, y, reset_weights = True):
# this will train the perceptron on the given input and target values
pass
def test(self, X, y):
# this will test the perceptron on the given input and target values
pass
Now, it may seem daunting to implement all of these functions. However, most of these functions are as simple as one mathematical operation.
For this question, please implement the init
, activate
, and activate_derivative
functions.
The init
function should initialize the perceptron with the given parameters, as well as setting weights and bias terms to None.
The activate
function should apply the activation function to the input, and the activate_derivative
function should apply the derivative of the activation function to the input. It is important that these functions use the proper function based on the value of self.activation
. Additionally, if the activation function is not set to one of the three functions we implemented earlier, the default should be the ReLU function.
To test your functions, you can use the below code:
test_x = np.array([-2, 0, 1.5])
p = Perceptron(learning_rate=0.01, n_epochs=1000, activation='linear')
print(p.activate(test_x)) # should return [-2, 0, 1.5]
print(p.activate_derivative(test_x)) # should return [1, 1, 1]
p.activation = 'relu'
print(p.activate(test_x)) # should return [0, 0, 1.5]
print(p.activate_derivative(test_x)) # should return [0, 0, 1]
p.activation = 'sigmoid'
print(p.activate(test_x)) # should return [0.11920292, 0.5, 0.81757448]
print(p.activate_derivative(test_x)) # should return [0.10499359, 0.25, 0.14914645]
p.activation = 'tanh'
print(p.activate(test_x)) # should return [-0.96402758, 0, 0.90514825]
print(p.activate_derivative(test_x)) # should return [0.07065082, 1, 0.18070664]
p.activation = 'invalid'
print(p.activate(test_x)) # should return [0, 0, 1.5]
print(p.activate_derivative(test_x)) # should return [0, 0, 1]
-
Implement the
init
,activate
, andactivate_derivative
functions -
Test the functions with the provided code
Question 3 (2 points)
Now, let’s move onto the harder topics. The basic concept behind how this perceptron works is that it will take in an input, calculate the predicted value, find the error between the predicted and true value, and then update the weights and bias based on this error and it’s learning rate. This process is then repeated for the set number of epochs.
In this sense, there are to main portions of the perceptron that need to be implemented: the forward and backward passes. The forward pass is the process of calculating the predicted value, and the backward pass is the process of updating the weights and bais based on the calculated error.
For this question, we will implement the compute_linear
, predict
, error
, and backward_gradient
functions.
The compute_linear
function should calculate the linear combination of the input, weights, and bias, by computing the dot product of the input and weights and adding the bias term.
The predict
function should compute the linear combination of the input and then apply the activation function to the result.
The error
function should calculate the error between the predicted (y_pred) and true (y_true) values, ie true - predicted.
The backward_gradient
should calculate the gradient of the error, which is simply the negative of the error multiplied by the activation derivative of the linear combination.
To test your functions, you can use the below code:
p = Perceptron(learning_rate=0.01, n_epochs=1000, activation='sigmoid')
p.weights = np.array([1, 2, 3])
p.bias = 4
test_X = np.array([1,2,3])
test_y = np.array([20])
l = p.compute_linear(test_X)
print(l) # should return 18
error = p.error(l, test_y)
print(error) # should return 2
gradient = p.backward_gradient(error, l)
print(gradient) # should return -3.04599585e-08
pred = p.predict(test_X) # should return 0.9999999847700205
print(pred)
-
Implement the
compute_linear
,predict
,error
, andbackward_gradient
functions -
Test the functions with the provided code
Question 4 (2 points)
Now that we have implemented all of our helper functions, we can implement our train
function.
Firstly, if the argument 'reset_weights' is true, or if reset_weights
is false but the weights and bias are not set, we will initialize our weights to a np array of zeros with the same length as the number of features in our input data. We will also initialize our bias to 0. In any other case, we will not modify the weights and bias.
Then, this function will train the perceptron on the given training data. For each datapoint in the training data, we will get the linear combination of the input and the predicted value through our activation function. Then, we will compute the error and get the backward gradient. Then, we will calculate the gradient for our weights (simply the input times the backward gradient) and the gradient for our bias (simply the backward gradient). Finally, we will update the weights and bias by multiplying the gradients by the learning rate, and subtracting them from the current weights and bias. This process will be repeated for the set number of epochs.
In this case, we are updating the weights and bias after every datapoint. This is commonly known as Stochastic Gradient Descent (SGD). Another common method is to calculate our error for every datapoint in the epoch, and then update the weights and bias based on the average error at the end of each epoch. This method is known as Batch Gradient Descent (BGD). A more sophisticated called Mini-Batch Gradient Descent (MBGD) is a combination of the two philosophies, where we group our data into small batches and update our weights and bias after each batch. This results in more weight/bias updates than BGD, but less than SGD. |
In order to test your function, we will create a perceptron and train it on the Boston Housing dataset. We will then print the weights and bias of the perceptron.
np.random.seed(3)
p = Perceptron(learning_rate=0.01, n_epochs=1000, activation='linear')
p.train(X_train, y_train)
print(p.weights)
print(p.bias)
If you implemented the functions correctly, you should see the following output:
[-1.08035188 0.47131981 0.09222406 0.46998928 -1.90914324 3.14497775 -0.01770744 -3.04430895 2.62947786 -1.84244828 -2.03654589 0.79672007 -2.79553875] [22.44124231]
-
Implement the
train
function -
Test the function with the provided code
Question 5 (2 points)
Finally, let’s implement the test
function. This function will test the perceptron on the given test data. This function should return our summary statistics from the previous project (mean squared error, root mean squared error, mean absolute error, and r squared) in a dictionary.
To test your function, you can use the below code:
p.test(X_test, y_test)
If you implemented the function correctly, you should see the following output:
{'mse': 19.28836923644003, 'rmse': 4.391852597303333, 'mae': 3.2841026903192283, 'r_squared': 0.6875969898568428}
-
Implement the
test
function -
Test the function with the provided code
Question 6 (2 points)
As mentioned in question 4, there are multiple different methods for updating the weights and bias of our class. In this question, please add the following outline to your function:
-
Rename the
train
function totrain_sgd
-
Add the following function signatures:
def train_bgd(self, X, y): pass def train_mbgd(self, X, y, n_batches=16): pass def train(self, X, y, method='sgd', n_batches=16): pass
After you have added these signatures to your class, please implement the train_bgd
function, which will train the perceptron using Batch Gradient Descent as described in question 4. This function should calculate the weight/bias gradients for every point in the dataset, and then update the weights and bias based on the average gradients at the end of each epoch.
Additionally, please implement the train
function to function as a selector for the different training methods. If method
is set to 'sgd', the function should call the train_sgd
function. If method
is set to 'bgd', the function should call the train_bgd
function. If method
is set to 'mbgd', the function should call the train_mbgd
function. If method
is set to anything else, the function should raise a ValueError.
To test your functions, you can use the below code:
np.random.seed(3)
p = Perceptron(learning_rate=0.1, n_epochs=1000, activation='linear')
p.train(X_train, y_train, method='bgd')
print(p.weights)
print(p.bias)
p.test(X_test, y_test)
If you implemented the function correctly, you should see the following output:
[-1.01203489 0.86314322 0.12818681 0.80290412 -2.02780693 3.08686583 0.04321048 -3.00595432 2.64831884 -1.92232099 -2.03927489 0.8549853 -3.67072291] [22.68586412] {'mse': 19.14677015365128, 'rmse': 4.375702246914349, 'mae': 3.336506171166659, 'r_squared': 0.6898903916034843}
Question 7 (2 points)
Finally, please implement the train_mbgd
function. This function will train the perceptron using Mini-Batch Gradient Descent as described in question 4. This function should split our data into n_batches
number of batches, and then update the weights and bias based on the average gradients at the end of each batch.
You should use the |
To test your functions, you can use the below code:
np.random.seed(3)
p = Perceptron(learning_rate=0.1, n_epochs=1000, activation='linear')
p.train(X_train, y_train, method='mbgd')
print(p.weights)
print(p.bias)
p.test(X_test, y_test)
If you implemented the function correctly, you should see the following output:
[-0.97274486 0.67793429 0.08464404 0.72503617 -1.91926787 3.18789867 0.01581749 -2.97858639 2.61498091 -1.97518827 -2.00677852 0.89807989 -3.26179108] [22.52676272] {'mse': 19.022979683470613, 'rmse': 4.361534097478846, 'mae': 3.2954565543935757, 'r_squared': 0.6918953571367247}
Now that we have implemented SGD, BGD, and MBGD, let’s compare the mean squared error of each method at each epoch. To do this, we create a new function called train_mse
, that will test the perceptron on the test data at the end of each epoch and store the mean squared error in a list. We will then plot this list to compare the performance of each method.
A common mistake is to create the perceptron object with |
Here is the outline of the function:
def train_mse(X_train, y_train, X_test, y_test, learning_rate=0.01, n_epochs=1000, method='sgd'):
pass
To test your functions, you can use the below code:
np.random.seed(3)
sgd_data = train_mse(X_train, y_train, X_test, y_test, learning_rate=0.06, n_epochs=50, method='sgd')
bgd_data = train_mse(X_train, y_train, X_test, y_test, learning_rate=0.06, n_epochs=50, method='bgd')
mbgd_data = train_mse(X_train, y_train, X_test, y_test, learning_rate=0.06, n_epochs=50, method='mbgd')
import matplotlib.pyplot as plt
plt.plot(sgd_data, label='SGD')
plt.plot(bgd_data, label='BGD')
plt.plot(mbgd_data, label='MBGD')
plt.xlabel('Epoch')
plt.ylabel('Mean Squared Error')
plt.legend()
plt.show()
-
Implement the
train_mbgd
function -
Implement the
train_mse
function -
Test the functions with the provided code
-
How do the graphs of the mean squared error compare between the three methods? Which method do you think is the best?
Submitting your Work
-
firstname_lastname_project10.ipynb
You must double check your You will not receive full credit if your |