DL4J combat three: classic convolution example (LeNet-5)

Welcome to my GitHub

Content: Classification and summary of all original articles and supporting source code, involving Java, Docker, Kubernetes, DevOPS, etc.;

Overview of this article

As the third chapter of "DL4J", the goal is to create a classic LeNet-5 convolutional neural network model under the DL4J framework to train and test the MNIST data set. This chapter consists of the following:
Introduction to LeNet-5
Introduction to MNIST
Introduction to the data set
About version and environment
coding
verify
Introduction to LeNet-5
It is a convolutional neural network designed by Yann LeCun in 1998 for handwritten digit recognition. For example, many banks in the United States used it to recognize handwritten digits on checks. LeNet-5 is one of the most representative experimental systems of early convolutional neural networks. one
The LeNet-5 network structure is shown in the figure below, a total of seven layers: C1 -> S2 -> C3 -> S4 -> C5 -> F6 -> OUTPUT

在这里插入图片描述

This picture is more clear and clear (the original picture address: https://cuijiahua.com/blog/2018/01/dl_3.html), which can guide our coding on DL4J:

在这里插入图片描述

A simple analysis according to the above figure is used to guide the next development:
Each picture is a 28*28 single channel, the matrix should be [1, 28,28]
C1 is the convolutional layer, the size of the convolution kernel used is 5*5, the sliding step size is 1, and the number of convolution kernels is 20, so the size change is: 28-5+1=24 (imagine that a window with a width of 5 has a width of 28 Slide in the window, how many times can you slide), the output matrix is [20,24,24]
S2 is the pooling layer, the core size is 2*2, the step size is 2, and the type is MAX. After the pooling operation, the size is halved and becomes [20,12,12]
C3 is the convolutional layer, the size of the convolution kernel used is 5*5, the sliding step size is 1, the number of convolution kernels is 50, so the size change is: 12-5+1=8, the output matrix [50,8,8]
S4 is the pooling layer, the core size is 2*2, the step size is 2, and the type is MAX. After the pooling operation, the size is halved and becomes [50,4,4]
C5 is the fully connected layer (FC), the number of neurons is 500, and the relu activation function is connected
The last is the fully connected layer Output, there are 10 nodes in total, representing the numbers 0 to 9, and the activation function is softmax

Introduction to MNIST

MNIST is a classic computer vision data set. The source is the National Institute of Standards and Technology (NIST, National Institute of Standards and Technology). It contains a variety of handwritten digital pictures, including 60,000 in the training set and 10,000 in the test set.
MNIST comes from the handwriting of 250 different people, of which 50% are high school students and 50% are from the Census Bureau staff. The test set is also the same proportion of handwritten digital data
MNIST official website: http://yann.lecun.com/exdb/mnist/

Introduction to the data set

The original data downloaded from the MNIST official website is not a picture file. It needs to be parsed and processed according to the official format instructions to be converted into pictures. These things are obviously not the subject of this article, so we can directly use the DL4J prepared for us. The data set (download address will be given later), the data set is a piece of independent pictures, the name of the directory where these pictures are located is the specific number of the picture, as shown in the figure below, the directory 0 All pictures with numbers 0:

在这里插入图片描述

There are two download addresses for the above data set:
It can be downloaded in CSDN (0 points): https://download.csdn.net/download/boling_cavalry/19846603
github：https://raw.githubusercontent.com/zq2599/blog_download_files/master/files/mnist_png.tar.gz
After downloading, unzip it, it is a folder named mnist_png, we will use it later in actual combat

About the DL4J version

The source code of "DL4J Actual Combat" series adopts maven's parent-child project structure, and the version of DL4J is defined as 1.0.0 in the parent project dlfj-tutorials -beta7
Although the code in this article is still a subproject of dlfj-tutorials, the DL4J version uses the lower 1.0.0-beta6 The reason for this is that in the next article, we will hand over the training and testing of this article to the GPU to complete, and the corresponding CUDA library is only 1.0.0-beta6
After so much, you can start coding

Source download

The complete source code of this actual combat can be downloaded on GitHub. The address and link information are shown in the following table ( https://github.com/zq2599/blog_demos):

name	Link	Remark
Project homepage	https://github.com/zq2599/blog_demos	The project's homepage on GitHub
git warehouse address (https)	https://github.com/zq2599/blog_demos.git	The warehouse address of the source code of the project, https protocol
git warehouse address (ssh)	git@github.com:zq2599/blog_demos.git	The warehouse address of the source code of the project, ssh protocol

There are multiple folders in this git project. The source code of the "DL4J Actual Combat" series is under the dl4j-tutorials folder, as shown in the red box below:

在这里插入图片描述

There are multiple sub-projects under the dl4j-tutorials folder. The actual combat code is in the simple-convolution directory, as shown in the red box below :

在这里插入图片描述

coding

Create a new subproject named simple-convolution under the parent project dl4j-tutorials. Its pom.xml is as follows, see here The dl4j version is designated as 1.0.0-beta6:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <parent>
        <artifactId>dlfj-tutorials</artifactId>
        <groupId>com.bolingcavalry</groupId>
        <version>1.0-SNAPSHOT</version>
    </parent>
    <modelVersion>4.0.0</modelVersion>

    <artifactId>simple-convolution</artifactId>

    <properties>
        <dl4j-master.version>1.0.0-beta6</dl4j-master.version>
    </properties>

    <dependencies>
        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>

        <dependency>
            <groupId>ch.qos.logback</groupId>
            <artifactId>logback-classic</artifactId>
        </dependency>

        <dependency>
            <groupId>org.deeplearning4j</groupId>
            <artifactId>deeplearning4j-core</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency>

        <dependency>
            <groupId>org.nd4j</groupId>
            <artifactId>${nd4j.backend}</artifactId>
            <version>${dl4j-master.version}</version>
        </dependency>
    </dependencies>
</project>

Next, follow the previous analysis to implement the code, and detailed comments have been added, so I won't repeat it:

package com.bolingcavalry.convolution;

import lombok.extern.slf4j.Slf4j;
import org.datavec.api.io.labels.ParentPathLabelGenerator;
import org.datavec.api.split.FileSplit;
import org.datavec.image.loader.NativeImageLoader;
import org.datavec.image.recordreader.ImageRecordReader;
import org.deeplearning4j.datasets.datavec.RecordReaderDataSetIterator;
import org.deeplearning4j.nn.conf.MultiLayerConfiguration;
import org.deeplearning4j.nn.conf.NeuralNetConfiguration;
import org.deeplearning4j.nn.conf.inputs.InputType;
import org.deeplearning4j.nn.conf.layers.ConvolutionLayer;
import org.deeplearning4j.nn.conf.layers.DenseLayer;
import org.deeplearning4j.nn.conf.layers.OutputLayer;
import org.deeplearning4j.nn.conf.layers.SubsamplingLayer;
import org.deeplearning4j.nn.multilayer.MultiLayerNetwork;
import org.deeplearning4j.nn.weights.WeightInit;
import org.deeplearning4j.optimize.listeners.ScoreIterationListener;
import org.deeplearning4j.util.ModelSerializer;
import org.nd4j.evaluation.classification.Evaluation;
import org.nd4j.linalg.activations.Activation;
import org.nd4j.linalg.dataset.api.iterator.DataSetIterator;
import org.nd4j.linalg.dataset.api.preprocessor.DataNormalization;
import org.nd4j.linalg.dataset.api.preprocessor.ImagePreProcessingScaler;
import org.nd4j.linalg.learning.config.Nesterovs;
import org.nd4j.linalg.lossfunctions.LossFunctions;
import org.nd4j.linalg.schedule.MapSchedule;
import org.nd4j.linalg.schedule.ScheduleType;
import java.io.File;
import java.util.HashMap;
import java.util.Map;
import java.util.Random;

@Slf4j
public class LeNetMNISTReLu {

    // 存放文件的地址，请酌情修改
//    private static final String BASE_PATH = System.getProperty("java.io.tmpdir") + "/mnist";

    private static final String BASE_PATH = "E:\\temp\\202106\\26";

    public static void main(String[] args) throws Exception {
        // 图片像素高
        int height = 28;
        // 图片像素宽
        int width = 28;
        // 因为是黑白图像，所以颜色通道只有一个
        int channels = 1;
        // 分类结果，0-9，共十种数字
        int outputNum = 10;
        // 批大小
        int batchSize = 54;
        // 循环次数
        int nEpochs = 1;
        // 初始化伪随机数的种子
        int seed = 1234;

        // 随机数工具
        Random randNumGen = new Random(seed);
        
        log.info("检查数据集文件夹是否存在：{}", BASE_PATH + "/mnist_png");

        if (!new File(BASE_PATH + "/mnist_png").exists()) {
            log.info("数据集文件不存在，请下载压缩包并解压到：{}", BASE_PATH);
            return;
        }

        // 标签生成器，将指定文件的父目录作为标签
        ParentPathLabelGenerator labelMaker = new ParentPathLabelGenerator();
        // 归一化配置(像素值从0-255变为0-1)
        DataNormalization imageScaler = new ImagePreProcessingScaler();

        // 不论训练集还是测试集，初始化操作都是相同套路：
        // 1. 读取图片，数据格式为NCHW
        // 2. 根据批大小创建的迭代器
        // 3. 将归一化器作为预处理器

        log.info("训练集的矢量化操作...");
        // 初始化训练集
        File trainData = new File(BASE_PATH + "/mnist_png/training");
        FileSplit trainSplit = new FileSplit(trainData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);
        ImageRecordReader trainRR = new ImageRecordReader(height, width, channels, labelMaker);
        trainRR.initialize(trainSplit);
        DataSetIterator trainIter = new RecordReaderDataSetIterator(trainRR, batchSize, 1, outputNum);
        // 拟合数据(实现类中实际上什么也没做)
        imageScaler.fit(trainIter);
        trainIter.setPreProcessor(imageScaler);

        log.info("测试集的矢量化操作...");
        // 初始化测试集，与前面的训练集操作类似
        File testData = new File(BASE_PATH + "/mnist_png/testing");
        FileSplit testSplit = new FileSplit(testData, NativeImageLoader.ALLOWED_FORMATS, randNumGen);
        ImageRecordReader testRR = new ImageRecordReader(height, width, channels, labelMaker);
        testRR.initialize(testSplit);
        DataSetIterator testIter = new RecordReaderDataSetIterator(testRR, batchSize, 1, outputNum);
        testIter.setPreProcessor(imageScaler); // same normalization for better results

        log.info("配置神经网络");

        // 在训练中，将学习率配置为随着迭代阶梯性下降
        Map<Integer, Double> learningRateSchedule = new HashMap<>();
        learningRateSchedule.put(0, 0.06);
        learningRateSchedule.put(200, 0.05);
        learningRateSchedule.put(600, 0.028);
        learningRateSchedule.put(800, 0.0060);
        learningRateSchedule.put(1000, 0.001);

        // 超参数
        MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
            .seed(seed)
            // L2正则化系数
            .l2(0.0005)
            // 梯度下降的学习率设置
            .updater(new Nesterovs(new MapSchedule(ScheduleType.ITERATION, learningRateSchedule)))
            // 权重初始化
            .weightInit(WeightInit.XAVIER)
            // 准备分层
            .list()
            // 卷积层
            .layer(new ConvolutionLayer.Builder(5, 5)
                .nIn(channels)
                .stride(1, 1)
                .nOut(20)
                .activation(Activation.IDENTITY)
                .build())
            // 下采样，即池化
            .layer(new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                .kernelSize(2, 2)
                .stride(2, 2)
                .build())
            // 卷积层
            .layer(new ConvolutionLayer.Builder(5, 5)
                .stride(1, 1) // nIn need not specified in later layers
                .nOut(50)
                .activation(Activation.IDENTITY)
                .build())
            // 下采样，即池化
            .layer(new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX)
                .kernelSize(2, 2)
                .stride(2, 2)
                .build())
            // 稠密层，即全连接
            .layer(new DenseLayer.Builder().activation(Activation.RELU)
                .nOut(500)
                .build())
            // 输出
            .layer(new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD)
                .nOut(outputNum)
                .activation(Activation.SOFTMAX)
                .build())
            .setInputType(InputType.convolutionalFlat(height, width, channels)) // InputType.convolutional for normal image
            .build();

        MultiLayerNetwork net = new MultiLayerNetwork(conf);
        net.init();

        // 每十个迭代打印一次损失函数值
        net.setListeners(new ScoreIterationListener(10));

        log.info("神经网络共[{}]个参数", net.numParams());

        long startTime = System.currentTimeMillis();
        // 循环操作
        for (int i = 0; i < nEpochs; i++) {
            log.info("第[{}]个循环", i);
            net.fit(trainIter);
            Evaluation eval = net.evaluate(testIter);
            log.info(eval.stats());
            trainIter.reset();
            testIter.reset();
        }
        log.info("完成训练和测试，耗时[{}]毫秒", System.currentTimeMillis()-startTime);

        // 保存模型
        File ministModelPath = new File(BASE_PATH + "/minist-model.zip");
        ModelSerializer.writeModel(net, ministModelPath, true);
        log.info("最新的MINIST模型保存在[{}]", ministModelPath.getPath());
    }
}

Executing the above code, the log output is as follows, the training and testing are successfully completed, and the accuracy rate reaches 0.9886:

21:19:15.355 [main] INFO org.deeplearning4j.optimize.listeners.ScoreIterationListener - Score at iteration 1110 is 0.18300625613640034
21:19:15.365 [main] DEBUG org.nd4j.linalg.dataset.AsyncDataSetIterator - Manually destroying ADSI workspace
21:19:16.632 [main] DEBUG org.nd4j.linalg.dataset.AsyncDataSetIterator - Manually destroying ADSI workspace
21:19:16.642 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 

========================Evaluation Metrics========================
 # of classes:    10
 Accuracy:        0.9886
 Precision:       0.9885
 Recall:          0.9886
 F1 Score:        0.9885
Precision, recall & F1: macro-averaged (equally weighted avg. of 10 classes)


=========================Confusion Matrix=========================
    0    1    2    3    4    5    6    7    8    9
---------------------------------------------------
  972    0    0    0    0    0    2    2    2    2 | 0 = 0
    0 1126    0    3    0    2    1    1    2    0 | 1 = 1
    1    1 1019    2    0    0    0    6    3    0 | 2 = 2
    0    0    1 1002    0    5    0    1    1    0 | 3 = 3
    0    0    2    0  971    0    3    2    1    3 | 4 = 4
    0    0    0    3    0  886    2    1    0    0 | 5 = 5
    6    2    0    1    1    5  942    0    1    0 | 6 = 6
    0    1    6    0    0    0    0 1015    1    5 | 7 = 7
    1    0    1    1    0    2    0    2  962    5 | 8 = 8
    1    2    1    3    5    3    0    2    1  991 | 9 = 9

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times
==================================================================
21:19:16.643 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 完成训练和测试，耗时[27467]毫秒
21:19:17.019 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 最新的MINIST模型保存在[E:\temp\202106\26\minist-model.zip]

Process finished with exit code 0

About accuracy

The previous test results show that the accuracy rate is 0.9886, which is the training result of the 1.0.0-beta6 version of DL4J. If you change it to 1.0.0-beta7, the accuracy rate can reach 0.99 or higher, you can try it;
At this point, the actual combat of the classic convolution under the DL4J framework has been completed. Up to now, our training and testing work is done by CPU. The increase in CPU usage during work is very obvious. In the next article, we will hand over today’s work. Give the GPU a try to see if you can use CUDA to accelerate the training and testing work;

You are not lonely, Xinchen is with you all the way

Welcome to pay attention to the public account: programmer Xin Chen

Search "Programmer Xin Chen" on WeChat, I am Xin Chen, and I look forward to traveling the Java world with you...
https://github.com/zq2599/blog_demos

DL4J combat three: classic convolution example (LeNet-5)

Welcome to my GitHub

Overview of this article

Introduction to LeNet-5

Introduction to MNIST

Introduction to the data set

About the DL4J version

Source download

coding

About accuracy

You are not lonely, Xinchen is with you all the way

Welcome to pay attention to the public account: programmer Xin Chen

程序员欣宸

引用和评论

quarkus依赖注入之十二：禁用类级别拦截器

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了

在 ApeCloud （云猿生数据）实习是怎样的体验？跟行业大佬练技术修为的一年小记

阿里云 ESA 游戏行业解决方案｜安全防护、加速、低延时的技术融合

基于 KubeBlocks 的 PikiwiDB(原Pika) 云化下一站

Linux系统安装更新Python3.x版本详细步骤

K3s + KubeSphere + DeepSeek 全流程部署指南：轻量 K8s 与 AI 大模型私有化实践