My Notes on Learning PyTorch

I am trying to learn PyTorch. Below are my personal notes designed to help me(and hopefully will help others at my skill level). I don’t promise these notes are accurate; they only represent my understanding at the time of making this post.

This is a living document, and is likely to change over time.

Learning in General

Learning a new subject mostly comprises of two things: learning jargon, and chunking processes.


Like a lot of highly technical topics, PyTorch is filled with jargon. This makes communicating with other domain experts more efficient, but it also makes the topic more difficult to learn. Complicating this further, this topic is layered on top of jargon from computer science, linear algebra, and deep learning.

I won’t give a list of all the words needed to learn. I’m still learning myself. But efficient memorization and retention of the jargon will make learning the overall concepts much easier.

To memorize terms quickly, I highly recommend leveraging spaced repetition software like Anki. If you’ve never heard of spaced repetition, check out this blog post: Spaced Repetition for Efficient Learning. I’ve used it to memorize all sorts of information, from symbols for math, functions required for certain PyTorch modules, and personal information like birthdays, wifi passwords, and phone numbers.

When I find a term I don’t know, I Google its definition and put it onto a flashcard in Anki. Every morning, I review flash cards. If I remember the information in the card, then I see the card less and less over time, while if I don’t remember it, I see it more frequently.


I learned of chunking in Barbara Oakley’s course titled Learning How to Learn. Chunking is the encapsulation of a complex set of processes into a broader and “chunked together” process.

Think of the processes required to put on a shirt. You must have a sense of cloth dynamics, understand how to orient the shirt without bunching it, hold your arms in a certain way to keep the cloth tight so that you can find the holes easily, hold the shirt in a way so that it opens up and you can put your arms in, and so on and so forth.

However, we have put a shirt on every day for years. We have done it so often that it just comes naturally. The process of putting on a shirt becomes so second-nature that we can do it without any real conscious thought beyond “put on shirt.” This process has become chunked.

Chunking is important for math. Much of math is simply chunking fundamental processes so that you can learn higher-level processes. If we had to build a mental model of all the objects involved when we add numbers, we would never be able to do more complex tasks like find the solutions to a quadratic equation.

Chunking requires three things: focus, understanding, and practice. Most people who complete online courses do okay at focusing and understanding, but they often completely ignore practice. Practice over time is required to chunk a process.

Chunking is not binary. Processes are not either chunked or not-chunked. Learning is a process, and it takes time. Neurons and axons used to execute the process need time to grow, and the forgetting curve mentioned in Gwern’s Spaced Repetition post shows that it must be practiced repeatedly over time, especially in the first days and weeks of learning.

Deep Learning

Fundamentally, deep learning is all about the computer making functions on its own(a function being defined as something that takes input(s) of some kind, modifying the input(s) somehow, and providing an output). You provide the input and desired output, and the computer makes a function that modifies the input in a way that makes it closer to the desired output. Do this hundreds or thousands of times with a lot of related data, and you end up with a function that gets close to what you want to do.

To make the function, deep learning libraries use neural networks. A neural network comprises of neurons connected by parameters. A neuron is just a container for data, and its structure doesn’t have to match the structure of the input data. Input data flows between neurons to the output, being modified by the parameters along the way.

In deep learning frameworks, the function that is modified is called the model. When we train the model, we feed it input and the desired output, and the deep learning framework modifies the parameters so that the actual output of the function gets closer to the desired output. The difference between the actual output and the desired output is measured by a loss function. The calculated loss is then used with back-propagation to modify the parameters, modifying the function and making it (hopefully) more accurate.

From wikipedia’s page on Artificial Neural Networks. The white dots are neurons, the colored lines are parameters.

To start practicing deep learning, I recommend playing with and reading through their docs a lot. is a framework that sits on top of PyTorch and abstracts a lot of the complicated stuff, making common deep learning tasks like image categorization, image segmentation, and some other tasks much easier. Once you kind of understand it and you start hitting its limitations, it makes sense to start learning PyTorch. This is what I did, at least.


I have only started learning PyTorch, so this section is the most likely to be inaccurate. I will write with confidence, but know that this confidence isn’t warranted. Most of what I write about is a rewrite of information in their tutorials.

On PyTorch’s website at the bottom of the page Learn The Basics, there are links numbered 0-7. Going through them sequentially will help grasp overall how it works.

Copying and pasting all my code will not work. I’ve tried to focus on explaining concepts, not making something usable. If you copy and paste all the code from this page in the PyTorch tutorials, it should work.

PyTorch has three overarching tasks: load data, create a model, and train the model. The model can then be saved and loaded as needed.

Data Loading Tutorials > Datasets & Dataloaders

A Dataset is a PyTorch object that does the following:

  • Points to an entire dataset
  • tells how to transform the data upon loading if needed
  • tells how to load a single datum (includes the label)

A Dataset object requres three functions: __init__, __len__, and __getitem__. The data is pointed to in __init__, and it is transformed if needed within __getitem__.

Notice below that __getitem__ simply returns a dict containing a tensor and a label. (a tensor is just a matrix. The image file was converted to a matrix using the torchvision function read_image.)

# Copied from

import os
import pandas as pd
from import read_image

class CustomImageDataset(Dataset):
    def __init__(self, annotations_file, img_dir, transform=None, target_transform=None):
        self.img_labels = pd.read_csv(annotations_file)
        self.img_dir = img_dir
        self.transform = transform
        self.target_transform = target_transform

    def __len__(self):
        return len(self.img_labels)

    def __getitem__(self, idx):
        img_path = os.path.join(self.img_dir, self.img_labels.iloc[idx, 0])
        image = read_image(img_path)
        label = self.img_labels.iloc[idx, 1]
        if self.transform:
            image = self.transform(image)
        if self.target_transform:
            label = self.target_transform(label)
        sample = {"image": image, "label": label}
        return sample

A Dataloader is an object that retrieves data from a Dataset object into batches. It can shuffle the data after each cycle(called an epoch) so that overfitting can be prevented. It’s relatively simple, and is just an iterable.

# Copied from
from import DataLoader

train_dataloader = DataLoader(training_data, batch_size=64, shuffle=True)
test_dataloader = DataLoader(test_data, batch_size=64, shuffle=True)

Model Creation Tutorials > Build Model

Just a reminder: Deep learning is simply providing software with input and the desired output. The software then makes a function that gets as close to the desired output as possible. A model is the function that the deep learning library modifies based on the input and desired output.

A model in PyTorch is a subclass of nn.Module, and must contain an __init__ function and a forward function. __init__ defines the structure of the neural network. The forward function defines how data propagates through the neural network from input to output.

# Copied from
# refer to this for the statements below
class NeuralNetwork(nn.Module):
    def __init__(self):
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.Linear(512, 512),
            nn.Linear(512, 10),

    def forward(self, x):
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

The specifics of everything they do in the object above is best found on this website. The overview is that upon initialization, the object creates a nn.Sequential object, defined as “A sequential container. Modules will be added to it in the order they are passed in the constructor.” Contained within the Sequential object is a series of neurons, structures that contain data. The structure of these neurons determines how the parameters interconnect, they determine how data flows and can be modified from the input to the output, and they determine the structure of the output.

Model Training Tutorials > Optimization

You now have a dataloader that provides the input(image) and desired output(label). You have a model containing interconnected neurons. What’s left now is to train the model.

Training means that input data will be passed into the neural network. The output data from that network will be compared to the desired output using a loss function. Based off of the difference between the desired output and the actual output, the model’s parameters will be modified using back propagation. How that works is discussed here, but you don’t have to fully understand it to be able to start playing with it.

Training requires defining the loss function and optimization algorithms. The loss function a measure of the difference between the desired output and the actual output. The optimization algorithm used by the optimizer is the method used to modify the parameters of the model.

Below is an example of loops for training and testing. A training loop is used to modify the parameters. A test loop is used to determine if the model is improving or not.

The training loop iterates through the entire dataset. For each item, it gets a prediction from the model, then calculates the loss from that prediction. The parameters of the optimizer are zeroed out from the last item. The loss is backpropagated using loss.backward(), then the parameters are modified using optimizer.step().

# Copied from

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
loss_fn = nn.CrossEntropyLoss()

def train_loop(dataloader, model, loss_fn, optimizer):
    size = len(dataloader.dataset)
    for batch, (X, y) in enumerate(dataloader):
        # Compute prediction and loss
        pred = model(X)
        loss = loss_fn(pred, y)

        # Backpropagation

        if batch % 100 == 0:
            loss, current = loss.item(), batch * len(X)
            print(f"loss: {loss:>7f}  [{current:>5d}/{size:>5d}]")

The test loop just checks how inaccurate the model is.

def test_loop(dataloader, model, loss_fn):
    size = len(dataloader.dataset)
    test_loss, correct = 0, 0

    with torch.no_grad():
        for X, y in dataloader:
            pred = model(X)
            test_loss += loss_fn(pred, y).item()
            correct += (pred.argmax(1) == y).type(torch.float).sum().item()

    test_loss /= size
    correct /= size
    print(f"Test Error: \n Accuracy: {(100*correct):>0.1f}%, Avg loss: {test_loss:>8f} \n")
loss_fn = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

epochs = 10
for t in range(epochs):
    print(f"Epoch {t+1}\n-------------------------------")
    train_loop(train_dataloader, model, loss_fn, optimizer)
    test_loop(test_dataloader, model, loss_fn)

Finally, the training and test loops are cycled through several times. Each cycle is called an epoch.

Categorized as Tech

Leave a comment

Your email address will not be published.