Basic Neural Net from "Scratch"

Apr 30, 2024 · 18 min read

This notebook can be found in this kaggle notebook.

You can use this article as a reference to the kaggle notebook but the same notes will appear in kaggle.

Overview

This notebook is an example of building a Neural Network from “scratch”.

Specifically, it goes through Chapter 4 of the excellent book Deep Learning for Coders with fastai & PyTorch.

Notes or descriptions above or below cells may differ from the book to fit my own narrative of understanding.

!pip install fastbook
!pip install fastai

fastai and fastbook are libraries created by the authors of Deep Learning for Coders with fastai & PyTorch.

These libraries are not only appropriate for the book, but are designed to be used in personal and production level projects.

As another aside, yes import * is usually bad practice, but for notebooks, this tends to be the norm.

from fastai.vision.all import *
from fastbook import *

Learning Objectives

The objective in this notebook is demontrate the process of building a Neural Network from “scratch”.

In practice, this means identifying the process of training a Neural Network as well as implementing the basic coding requirements.

The diagram below shows the training process of a Neural Network. This will act as our guide for building our Neural Network.

#id gradient_descent
#caption The gradient descent process
#alt Graph showing the steps for Gradient Descent
gv('''
init->predict->loss->gradient->step->stop
step->predict[label=repeat]
''')

svg

The next sections will implement the above diagram. This diagram is the basis of all neural networks. There will be a summary at the end with this diagram, detailing the actions taken with the steps in the diagram.

Download MNIST

For our Neural Network, we are going to train an Image Recognition Model.

We are going to use the MNIST data set, which is a dataset that contains images of handwritten numbers.

Our model is going to identify whether an image is a 3 or a 7.

path = untar_data(URLs.MNIST_SAMPLE)
100.14% [3219456/3214948 00:00<00:00]
(path/'train').ls()
(#2) [Path('/root/.fastai/data/mnist_sample/train/7'),Path('/root/.fastai/data/mnist_sample/train/3')]
(path).ls()
(#3) [Path('/root/.fastai/data/mnist_sample/valid'),Path('/root/.fastai/data/mnist_sample/labels.csv'),Path('/root/.fastai/data/mnist_sample/train')]

Setting up the training data

In this section, we are going to focus on setting up our training and validation data.

In the training of our Neural Network, we are obviously going to use our training data to train the model but we are going to use a validation set during the training of the model to test the accuracy/improvements of the model.

The validation set, is a set of inputs and labels that the model hasn’t “seen”, meaning it’s not part of the training data and will be used to calculate our LOSS.

What’s happening?

We are going to need to make sure all images have the same size. This is essential for training the model correctly.

We are going to set all images to a pixel size of 28 * 28 which is 255.

PIXEL_SIZE = 28*28

What’s happening?

We are going to open each image and save each image into a rank2 tensor.

Each list in the rank2 tensor will represent a x-axis and y-axis. These axes represent pixels in the image.

We are creating x2 rank2 tensors - training and validation.

training_data = (path/'train').ls().sorted()
validation_data = (path/'valid').ls().sorted()

# Open each image and insert into a rank2 tensor
def extract_images(image_folder_path):
    return {path.name: [tensor(Image.open(x)) for x in path.ls()] for path in image_folder_path}

training_images = extract_images(training_data)
validation_images = extract_images(validation_data)
type(validation_images)
dict

Explanations

What is a tensor?

A tensor is a multi-dimensional array.

Under the hood, in our context, it’s usually a Numpy ndarray. Basically, a multi-dimensional array that makes it easy to perform matrix multiplication and addition with access to a GPU.

What is a rank2 tensor?

A rank2 tensor, is a tensor that represents a matrix.

The matrix can be represented as a list of lists. Usually just a x-axis list and a y-axis list.

[[x axis], [y axis]]

What’s happening?

Using access to the .shape variable, we can view the “shape” of the tensor.

This indicates a tensor of a x and y axis of 28 respectively, matching our Pixel Size of 28*28

torch.Size([28, 28])

We also have a helper function show_image, will print out one of our images.

validation_images['7'][0].shape
torch.Size([28, 28])
show_image(validation_images['7'][-1])
<Axes: >

png

What’s happening?

How we use our training data is a key question.

The goal is to train a Neural Network to classify an unseen image, a 3 or a 7. In this case, how will the model know what a 3 or 7 looks like?

The following approach “stacks” all the images of a 3 and “stacks” all the images of a 7.

This creates an “ideal” image. This can also be thought of as an average of a 3 or a 7. By creating this “ideal”, the Neural Network can learn to compare the pixel values of an input against the “ideal” image of a 3 or 7, thus determining a probability of either number.

We stack the images to create the ideal/optimal version of the 3 and 7. We’ll divide the tensor by 255 to represent RGB. By stacking the tensors, we are creating a rank3 tensor. A tensor of matrices.

def stack_tensors(image_tensors):
    return {t: torch.stack(image_tensors[t]).float()/255 for t in image_tensors.keys()}
stacked_tensors = stack_tensors(training_images)
show_image(stacked_tensors['7'].mean(0))
<Axes: >

png

stacked_tensors['7'].ndim
3

Explanations

What is a rank3 tensor?

A rank3 tensor is a list of matrices,aka a tensor of rank2 tensors e.g.

[
[[a, b], [c,d]],
[[e, f], [g, h]],
]

What’s happening?

We are combining and concatenating x2 rank3 tensors into one rank2 tensor.

Cat doesn’t create a new dimension but we are telling Pytorch to infer the size of the dimensions using -1.

The rank3 tensors are essentially a list of rank2 tensors. We are combining all the training data into a new tensor for both 3 and 7. All the training data are rank2 tensors, so combined, they just make a new big rank2 tensor.

When we print the shape we see, [12396, 784]. This means we are seeing 12396 items (images) of a size 784 each, which is the pixel size 28 * 28.

This be will be the x-axis for our training data.

train_x = torch.cat([stacked_tensors['3'], stacked_tensors['7']]).view(-1, PIXEL_SIZE)
train_x.shape
torch.Size([12396, 784])
train_x.ndim
2

What’s happening?

We are creating a rank2 tensor which will be our y axis.

This represents whether a cell is a 3 or 7.

1 will be the label for the image 3 and a 0 will be the label for the image 7.

When we print the shape we see [12396, 1]. This means we have 12396 labels (representing each image) that have the label value 1 or a 0, in other words a 3 or a 7.

We then combine our training sets into an x and y axis.

train_y = tensor([1]*len(stacked_tensors['3']) + [0]*len(stacked_tensors['7'])).unsqueeze(1)
train_y.shape
torch.Size([12396, 1])
dset = list(zip(train_x,train_y))
x,y = dset[0]

We repeat the same process for our validation set.

stacked_valid_tensors = stack_tensors(validation_images)
valid_x = torch.cat([stacked_valid_tensors['3'], stacked_valid_tensors['7']]).view(-1, PIXEL_SIZE)
valid_x.shape
torch.Size([2038, 784])

We create a validation DataLoader object with a batch size of 256.

The DataLoader class is a fastAI class that is a helpful wrapper around loading and batching training/validation data.

valid_y = tensor([1]*len(stacked_valid_tensors['3']) + [0]*len(stacked_valid_tensors['7'])).unsqueeze(1)
valid_dset = list(zip(valid_x,valid_y))
valid_dl = DataLoader(valid_dset, batch_size=256)

Understanding the loss function

This simple example shows how we can create predictions using a linear function: y = mx + b and a loss function to measure the accuracy of our predictions.

The PARAMETERS of the linear function are m and b - weights and bias. We can first generate random values for variables since we will adjust them according to the calculated gradients.

def init_params(size, std=1.0): 
    return (torch.randn(size)*std).requires_grad_()

weights = init_params((PIXEL_SIZE,1))
bias = init_params(1)
weights.mean(), bias.mean()
(tensor(0.0235, grad_fn=<MeanBackward0>),
 tensor(0.3472, grad_fn=<MeanBackward0>))

Basic Loss Function Explained

This is an example of the loss function calculation. The generated predictions that are closer to 0, mean the input is a 7 and the predictions closer to 1 mean its a 3.

In the example, the targets are [1, 0 ,1] == [3, 7, 3] and the predictions are [0.9, 0.4, 0.2] == [3, 7, 7].

The last prediction is incorrect.

trgts = tensor([1,0,1])
prds = tensor([0.9, 0.4, 0.2])

torch.where(trgts==1, 1-prds, prds) will use C/CUDA to perform essentially a list comprehension on the matrix. But the logic goes as follows:

  1. If the value at an index in trgts[i] results in the value being equal to 1 e.g. trgts==1 is true, then the value is a 3, then return the first argument path (1-prds)

  2. 1-prds returns the difference between the target and the prediction by subtracting 1 (meaning the iamge of a 3) from the prediction. This is the accuracy for the prediction of a 3. So in our first case, the target is 1 (an image of a 3) and the prediction is a 0.9 (so very certain its a 3). By subtracting 1 - 0.9, we get 0.1. Remember this is a Loss function, so the lower the value the more accurate and in this case, its very accurate

  3. If the value at an index in trgts[i]doesn’t equal 1, meaning its 0 (meaning the image is a 7), then return the second argument path without modifying prds, this will essentially be the distance from 0 (the distance from an image of a 7). So in this example, the prediction for the 7 in the middle index was 0.4, so the distance from 0 is 0.4, therefore the prediction is fairly accurate

  4. The last item in trgts is a 1 (meaning its an image of a 3), the prediction is a 0.2 (meaning its closer to a 7). Since the target is 1, it returns 1-prds, the difference is 0.8. Remember this is a Loss function, so this is a very inaccurate prediction

torch.where(trgts==1, 1-prds, prds)
tensor([0.1000, 0.4000, 0.8000])

We can see the mean accuracy for our loss function is 0.4333

torch.where(trgts==1, 1-prds, prds).mean()
tensor(0.4333)

If we change the last prediction to be more accurate (from 0.2 to 0.8), we can see the loss decrease since this is now more accurate.

trgts = tensor([1,0,1])
prds = tensor([0.9, 0.4, 0.8])
torch.where(trgts==1, 1-prds, prds)
tensor([0.1000, 0.4000, 0.2000])
torch.where(trgts==1, 1-prds, prds).mean()
tensor(0.2333)

The problem here, is that our loss function assumes all values of the predictions will always be between 0 and 1.

Let’s change the value of the last index of predictions to 3.

trgts = tensor([1,0,1])
prds = tensor([0.9, 0.4, 3])

We can see here the prediction for the last index is -2.000 which becomes meaningless and a mean loss accuracy of -0.5000.

torch.where(trgts==1, 1-prds, prds)
tensor([ 0.1000,  0.4000, -2.0000])
torch.where(trgts==1, 1-prds, prds).mean()
tensor(-0.5000)

Sigmoid Function

To ensure that the values of the loss function are usable and fit between 0 and 1.

We can use a sigmoid function to map our prediction values to fit 0 to 1.

The sigmoid formula:

  • e is eulers number which is the natural logarithm of a value 2.71828
f(x)=11+ex f(x) = \frac{1}{1 + e^{-x}}
plot_function(torch.sigmoid, title='Sigmoid', min=-5, max=10)

png

In the prds tensor, we add values below 0 and above 1.

trgts = tensor([1,0,1])
prds = tensor([-0.9, 0.4, 3])
prds
tensor([-0.9000,  0.4000,  3.0000])

We can observe, that by applying the sigmoid function to the prds tensor, it will convert all the values that are outside of the range 0-1 to 0-1.

Importantly, the higher the predictions, the higher the value is closer to 1 and inversly when negative values.

prds.sigmoid()
tensor([0.2891, 0.5987, 0.9526])

Creating the loss function

Now we can create our loss function. We know we need to apply the sigmoid function to the predictions to map them between 0-1.

We can calculate the loss as a distance from the our target predictions (0 for 7 and 1 for 3).

Remember, the lower the value of the loss function, the more accurate. The sigmoid function is essentially our activation function.

def mnist_loss(predictions, targets):
    predictions = predictions.sigmoid()
    return torch.where(targets==1, 1-predictions, predictions).mean()

Explanations

What is an Activation Function?

An activation function is a function that introduces non-linearity between the different levels in a Neural Network.

Think of a Neural Network as a continuous linear line on a graph. The Activation Functions, will introduce bends in the line, in order to be able to fit or model a complex data set.

Batching, Training Epochs and Optimization (Stochastic Gradient Descent)

Now we need to be able to iteratively train the model and lower the loss function by training in epochs.

First we’ll need to split our data into batches.

The reason behind using batching is, that if we trainined our whole dataset in one go, it will be slow and time consuming and the GPU may even run out of memory. If we only train on one item, its going to be a very inaccurate model.

On each epoch, we essentialy create a new “sample” of random training data in a new batch.

batch = train_x[:4]
batch.shape
torch.Size([4, 784])

Prediction Function

We are going to create our prediction function y = mx + b.

An important note below. @ allows Pytorch to perform matrix multiplication, and uses C/CUDA

def linear_func(input_batch):
    """
    Compute the linear transformation of the input batch.

    Args:
        input_batch (tensor): Input batch of data, where each row represents a single data point.

    Returns:
        tensor: The result of the linear transformation applied to the input batch,
                with the bias vector added.
    """
    return input_batch@weights + bias

We will initialize our randomized weights and bias.

def init_params(size, std=1.0): 
    return (torch.randn(size)*std).requires_grad_()

weights = init_params((PIXEL_SIZE,1))
bias = init_params(1)

weights.mean(), bias.mean()
(tensor(-0.0128, grad_fn=<MeanBackward0>),
 tensor(-0.2762, grad_fn=<MeanBackward0>))
preds = linear_func(batch)
preds
tensor([[23.6921],
        [ 2.4338],
        [19.6201],
        [ 3.7629]], grad_fn=<AddBackward0>)

Lets calculate the loss on the predictions:

labels_batch = train_y[:4]
loss = mnist_loss(preds, labels_batch)
loss, labels_batch
(tensor(0.0258, grad_fn=<MeanBackward0>),
 tensor([[1],
         [1],
         [1],
         [1]]))

We can see our loss, lets call backward() (backpropogation) to calculate the derivatives/gradients so we can determine how much we need to change our parameters.

loss.backward()
weights.grad.shape, weights.grad.mean(), bias.grad
(torch.Size([784, 1]), tensor(-0.0024), tensor([-0.0241]))

Lets encapsulate calculate gradients into a function:

def calc_grads(xb, yb, model):
    preds = model(xb)
    loss = mnist_loss(preds, yb)
    loss.backward()
calc_grads(batch, labels_batch, linear_func)
weights.grad.mean(), bias.grad
(tensor(-0.0049), tensor([-0.0482]))

We need to be able to reset our calculated gradients on each iteration otherwise, they gradients will just be added to each other. We’ll use the zero function on the tensors:

weights.grad.zero_()
bias.grad.zero_()
tensor([0.])

Lets load our training data into a DataLoader for easier access.

training_dl = DataLoader(dset, batch_size=256)

Lets create a training epoch function.

For the training data, its going to go through each x and y in the batch (x is the data and y is the correct label) and calculate the gradients/derivatives based on the loss function of the predictions. We then step the weights so that our weights and bias become more accurate.

def train_epoch(dl, lr, model):
    for xb, yb in dl:
        calc_grads(xb, yb, model)
        weights.data -= weights.grad * lr
        bias.data -= bias.grad *lr
        weights.grad.zero_()
        bias.grad.zero_()

We also want some metric to view the accuracy of the predictions. We can just check if a value is greater than 0 (meaning its a 3) and if its a 0 its a 7. This will compare it the label in train_y (either a 0 or 1)

(preds>0.0).float() == train_y[:4]
tensor([[True],
        [True],
        [True],
        [True]])

We can create a batch accuracy function to tell us the accuracy of the predictions:

def batch_accuracy(xb, yb):
    """
    Calculate the accuracy of predictions for a batch of data.

    Args:
        xb (Tensor): Predicted probabilities for each sample in the batch.
        yb (Tensor): Ground truth labels for each sample in the batch.

    Returns:
        Tensor: Mean accuracy of predictions for the batch.

    Explanation:
        This batch accuracy function measures the accuracy of the predictoins.
        If the prediction is less than 0.5 its closer to a 0 (meaning a 7).
        (preds>0.5) resolves to a bool and yb is the y axis for the truth label.
        e.g.
        If prediction is < 0.5 its False (meaning a 7) and if yb is a 0 (meaning a 7), then this prediction is correct.
        If prediction is > 0.5 its True (meaning a 3) and if yb is a 1 (meaing a 3), then this prediction is correct
    """
    preds = xb.sigmoid()
    correct = (preds>0.5) == yb
    return correct.float().mean()
batch_accuracy(linear_func(batch), train_y[:4])
tensor(1.)

We can create a validate epoch function that will make predictions on the validation set and measure the accuracy of those predictions. It’s important to use the validation set and NOT the training set. This is essentially “unseen” data by the model, so it’s a very accurate way to determine if the model is accurate.

def validate_epoch(model):
    accs = [batch_accuracy(model(x), y) for x,y in valid_dl]
    return round(torch.stack(accs).mean().item(), 4)
validate_epoch(linear_func)
0.5748

Now we can train the model based on a number of epochs and after each epoch we can measure the accuracy of the predictions using the validation set using validate_epoch

lr = 0.1
for i in range(20):
    train_epoch(training_dl, lr, linear_func)
    print(validate_epoch(linear_func), end=' ')
0.7253 0.8163 0.8473 0.8683 0.88 0.8907 0.902 0.9103 0.9162 0.9206 0.9226 0.925 0.929 0.9319 0.9334 0.9344 0.9378 0.9397 0.9417 0.9431 

This part covers calculating the loss function on the predictions, calculating the gradients of the loss function towards the targets and stepping the weights and bias according to the gradients and learning rate to fine tune the input to make more accurate predictions.

Creating an optimizer

We are going to use some Pytorch and FastAI functions/objects to make things easier.

The below optimizer simply contains our model weights and learning rate. It will encapsulate the step to update params according to the gradients and also to zero out the gradients.

class BasicOptim:
    def __init__(self, params, lr):
        self.params, self.lr = list(params), lr
        
    def step(self, *args, **kwargs):
        for p in self.params:
            p.data -= p.grad.data * self.lr
            
    def zero_grad(self, *args, **kwargs):
        for p in self.params:
            p.grad = None

nn.Linear (neural net.Linear) contains the linear function we have been using as well as the parameters. It does the same thing as our init_params function. It will randomly initilize weights and bias

linear_model = nn.Linear(PIXEL_SIZE, 1)
w,b = linear_model.parameters()
w.shape,b.shape
(torch.Size([1, 784]), torch.Size([1]))

We can make a basic Optimizer. The optimizer contains our steps to update our model params and the learning rate.

opt = BasicOptim(linear_model.parameters(), lr)

We can redefine train_epoch to use the the optimizer.

def train_epoch(model, opt):
    for xb, yb in training_dl:
        calc_grads(xb, yb, model)
        opt.step()
        opt.zero_grad()
validate_epoch(linear_model)
0.215
def train_model(model, opt, epochs):
    for i in range(epochs):
        train_epoch(model, opt)
        print(validate_epoch(model), end=' ')
train_model(linear_model, opt, 20)
0.5352 0.8623 0.936 0.9536 0.9633 0.9633 0.9652 0.9667 0.9682 0.9687 0.9687 0.9696 0.9701 0.9711 0.9716 0.9716 0.9716 0.9726 0.9726 0.9726 

Now we can use the fastai Learner object which encapsulates everything we’ve built. The DataLoaders class will contain both our training and validation sets. We’ll also redefine the linear model so we can start with fresh randomized params. We’ll also use SGD (Stochastic Gradient Descent) optimizer from fastai, since we’ve been basically implementing this from scratch this whole time.

linear_model = nn.Linear(PIXEL_SIZE, 1)
dls = DataLoaders(training_dl, valid_dl)
learn = Learner(dls, linear_model, opt_func=SGD,
                loss_func=mnist_loss, metrics=batch_accuracy)

Explanations

Learner is a fastAI class that encapsulates everything required to train a Neural Network:

  • dls: Dataloaders for the training and validation sets
  • model: The model used for predictions
  • opt_func: The optimization function/method, in our case, Stochastic Gradient Descent
  • loss_func: Our function to calculate the loss
  • metrics: Our human readable function to present the accuracy of our model to the reader

To run the equivalent of train_model, we call learn.fit with the epoch and learning rate.

lr = 0.1
learn.fit(10, lr=lr)
epochtrain_lossvalid_lossbatch_accuracytime
00.2163530.3731360.53532900:00
10.1209970.1826990.87340500:00
20.0836010.1063910.93768400:00
30.0656800.0793940.95583900:00
40.0559480.0663640.96369000:00
50.0500420.0587290.96516200:00
60.0460860.0536860.96663400:00
70.0432110.0500820.96663400:00
80.0409880.0473580.96908700:00
90.0391900.0452150.96908700:00
#id gradient_descent
#caption The gradient descent process
#alt Graph showing the steps for Gradient Descent
gv('''
init->predict->loss->gradient->step->stop
step->predict[label=repeat]
''')

svg

To summarize the whole fundementals of Deep Learning:

  1. Initialize random parameters (weights and bias) for the Neural Network model
  2. Create predictions using a model (linear function) and weights and bias
  3. A Neural Network is made up of linear functions and ReLU (rectified linear units) or activation functions
  4. Calculate the loss on the predicions, this gives us a measurement of the performance of the predictions
  5. Calculate the gradients/derivatives of the loss, this gives us the required magnitude and direction to move the weight and biases towards the correct prediction
  6. Step the weights, reduce the weights and bias by the gradients * the learning rate, this should make the weights and bias more accurate in predictions
  7. Repeat for x number of epochs

Building our own Learner

This section is an optional section where I’m just playing around with creating a custom Learner class.

Things we need:

  • dls
  • neural net
  • optimizer
  • loss function
  • metrics

functions we need:

  • fit
class CustomLearner:
    def __init__(self, training, validation, neural_net, optimizer, loss, metrics):
        self.training = training
        self.validation = validation
        
        self.neural_net = neural_net
        self.optimizer = optimizer
        
        # TODO: 4. TRy a different loss function
        self.loss = loss
        
        # TODO: 5. Try a different metric function?
        self.metrics = metrics
        
    def fit(self, epochs):
        for i in range(epochs):
            self._train_epoch()
            # TODO 6. Make this pretty print
            print(self._validate_epoch(), end=' ')
        
    def _train_epoch(self):
        for xb, yb in self.training:
            self._calc_grads(xb, yb)
            self.optimizer.step()
            self.optimizer.zero_grad()
            
    def _calc_grads(self, xb, yb):
        preds = self.neural_net(xb)
        loss = self.loss(preds, yb)
        loss.backward()

    def _validate_epoch(self):
        accs = [self.metrics(self.neural_net(x), y) for x,y in self.validation]
        return round(torch.stack(accs).mean().item(), 4)
linear_model = nn.Linear(PIXEL_SIZE, 1)
opt = BasicOptim(linear_model.parameters(), lr)
dls = DataLoaders(training_dl, valid_dl)
learn = CustomLearner(training=dls[0], validation=dls[1], neural_net=linear_model, optimizer=opt, loss=mnist_loss, metrics=batch_accuracy)
learn.fit(10)
0.5479 0.8769 0.9365 0.9565 0.9638 0.9638 0.9662 0.9682 0.9696 0.9691 

We can also use the FastAI SGD in our custom learner

linear_model = nn.Linear(PIXEL_SIZE, 1)
sgd = SGD(linear_model.parameters(), lr)
learn = CustomLearner(training=dls[0], validation=dls[1], neural_net=linear_model, optimizer=sgd, loss=mnist_loss, metrics=batch_accuracy)
learn.fit(10)
0.5635 0.8784 0.9374 0.9555 0.9638 0.9638 0.9657 0.9667 0.9682 0.9691 

Lets create a custom linear_model, this example creates a neural net of 19 layers? Just seeing what would happen.

simple_net = nn.Sequential(
    nn.Linear(PIXEL_SIZE, 30),
    nn.ReLU(),
    nn.Linear(30, 30),
    nn.ReLU(),
    nn.Linear(30, 30),
    nn.ReLU(),
    nn.Linear(30, 30),
    nn.ReLU(),
    nn.Linear(30, 30),
    nn.ReLU(),
    nn.Linear(30, 30),
    nn.ReLU(),
    nn.Linear(30, 30),
    nn.ReLU(),
    nn.Linear(30, 30),
    nn.ReLU(),
    nn.Linear(30, 30),
    nn.ReLU(),
    nn.Linear(30, 1),
)
sgd = SGD(simple_net.parameters(), lr)
learn = CustomLearner(training=dls[0], validation=dls[1], neural_net=simple_net, optimizer=sgd, loss=mnist_loss, metrics=batch_accuracy)
learn.fit(10)
0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 0.5068 

From our results, after 10 epochs, its not very accurate.

If we redefine our neural net just using 3 layers, its much more accurate in 10 epochs.

simple_net = nn.Sequential(
    nn.Linear(PIXEL_SIZE, 30),
    nn.ReLU(),
    nn.Linear(30, 1)
)
sgd = SGD(simple_net.parameters(), lr)
learn = CustomLearner(training=dls[0], validation=dls[1], neural_net=simple_net, optimizer=sgd, loss=mnist_loss, metrics=batch_accuracy)
learn.fit(10)
0.5068 0.8066 0.9189 0.9438 0.9574 0.9638 0.9657 0.9672 0.9687 0.9701 

Using a 3 layer net, is much more accurate in 10 epochs

Christopher Coverdale
Authors
Software Engineer and Consultant
Interested in all things Sci-Fi and Comp-Sci