PyTorch official tutorial: a neural network

This article is in the official PyTorch tutorial: How to build a neural network. Build a simple neural network based on the sub-module torch.nn, a neural network built specifically for PyTorch.

complete tutorial run codelab

torch.nn document

Neural networks are composed of layers/modules that perform operations on data. torch.nn provides all the modules needed to build a neural network.

Every module in PyTorch is a subclass of nn.module.
In the following part, we will build a neural network to classify 10 categories.

Build a neural network

Neural networks are composed of layers/modules that perform operations on data. torch.nn provides all the modules needed to build a neural network. Every module in PyTorch is a subclass of nn.module.
In the following part, we will build a neural network to classify 10 categories.

import os
import torch
from torch import nn
from torch.utils.data import DataLoader
from torchvision import datasets, transforms

Load training equipment

We hope to be able to train our models on hardware accelerators, such as GPUs. You can check whether the GPU is available through torch.cuda.

device = 'cuda' if torch.cuda.is_available() else 'cpu' #检测gpu是否可用，不可用使用cpu
print('Using {} device'.format(device)) #输出使用设备类型

Definition class

We define the neural network through nn.Module, and initialize the neural network in __init__. Each nn.Module subclass implements operations on input data in the forward method.

class NeuralNetwork(nn.Module):
    def __init__(self): #定义网络结构
        super(NeuralNetwork, self).__init__()
        self.flatten = nn.Flatten()
        self.linear_relu_stack = nn.Sequential(
            nn.Linear(28*28, 512),
            nn.ReLU(),
            nn.Linear(512, 512),
            nn.ReLU(),
            nn.Linear(512, 10),
            nn.ReLU()
        )

    def forward(self, x): #前向传播
        x = self.flatten(x)
        logits = self.linear_relu_stack(x)
        return logits

Before using the model, you need to instantiate the model and move it to the GPU

model = NeuralNetwork().to(device) #实例化模型
print(model)

In order to create a complex non-linear mapping between the input and output of the model, a non-linear activation function needs to be used.

They introduce non-linearity after linear transformation to help neural networks learn a variety of complex mappings. In this model, we use nn.ReLU between the linear layers, and other activation functions can also be used to introduce nonlinearity.

X = torch.rand(1, 28, 28, device=device)  #生成（1，28，28）的数据
logits = model(X) #向模型输入数据
pred_probab = nn.Softmax(dim=1)(logits) #调用softmax 将预测值映射为（0，1）间的概率
y_pred = pred_probab.argmax(1) #最大概率对应分类
print(f"Predicted class: {y_pred}")

Description of the layers of the neural network

Next, we decompose the network to describe the functions of each layer in detail.

To illustrate this point, we will take a small batch of 3 image samples of size 28x28 into the network

input_image = torch.rand(3,28,28) #生成（3，28，28）的数据
print(input_image.size())

nn.Flatten layer

The Flatten layer is used to make the multi-dimensional input one-dimensional, and is commonly used in the transition from the convolutional layer to the fully connected layer.

The nn.Flatten layer can convert each 28x28 image into a continuous array of 784 ($28\times 28=784$) pixel values (the batch dimension remains at 3).

flatten = nn.Flatten() 
flat_image = flatten(input_image) #（3，28，28）转换为（3，784）
print(flat_image.size())

nn.Linear layer

The nn.Linear layer, the linear layer, is a module that uses weights and deviations to linearly transform the input data.

layer1 = nn.Linear(in_features=28*28, out_features=20) #输入（3，28*28） 输出（3，20）
hidden1 = layer1(flat_image)
print(hidden1.size())

nn.ReLU layer

In order to create a complex non-linear mapping between the input and output of the model, a non-linear activation function needs to be used. They introduce non-linearity after linear transformation to help neural networks learn a variety of complex mappings.

In this model, we use nn.ReLU between the linear layers, and other activation functions can also be used to introduce nonlinearity.

print(f"Before ReLU: {hidden1}\n\n")
hidden1 = nn.ReLU()(hidden1)
print(f"After ReLU: {hidden1}")

nn.Sequential layer

The last linear layer of the neural network returns logits, which are the original values in the range of $[-\infty,\infty]$. After these values are passed to the nn.Softmax module, logit is scaled to the $[0,1]$ interval, which represents the predicted probability of each class by the model.

The dim parameter indicates the position of the operation in each dimension, and the result of the operation adds to 1.

softmax = nn.Softmax(dim=1)
pred_probab = softmax(logits)

Output model structure

Many layers in the neural network are parameterized, that is, have associated weights and biases, and these parameters are iteratively optimized during training.

The subclass nn.Module automatically tracks all the fields defined within the model object, and uses the parameters() or named_parameters() methods of the model to access all parameters.

We can iterate through the model for each parameter and output its size and value.

print("Model structure: ", model, "\n\n")

for name, param in model.named_parameters():
    print(f"Layer: {name} | Size: {param.size()} | Values : {param[:2]} \n")

The final output result can access the complete tutorial

PyTorch official tutorial: a neural network

Build a neural network

Load training equipment

Definition class

Description of the layers of the neural network

nn.Flatten layer

nn.Linear layer

nn.ReLU layer

nn.Sequential layer

Output model structure

超神经HyperAI

引用和评论

David Baker 团队最新研究，利用蛋白质序列生成模型实现重叠基因设计，成功率极高

LRU算法，你别跑，我就要吃透你

人工智能与机器学习入门：基尼系数（Gini Index）和基于熵（Entropy）

大模型中的Token究竟是什么？从原理到作用深度解析

Open WebUI：开源AI交互平台的全面解析

一文掌握 MCP 上下文协议：从理论到实践

人工智能与机器学习入门：决策树应用