DL4J combat four: classic convolution example (GPU version)

Welcome to my GitHub

Content: Classification and summary of all original articles and supporting source code, involving Java, Docker, Kubernetes, DevOPS, etc.;

Overview of this article

As the fourth chapter of "DL4J Actual Combat", today we will not write code, but make some preparations for future actual combat: use GPU to accelerate the deep learning training process under the DL4J framework;
If you have an NVIDIA graphics card on your computer, and successfully installed CUDA, then let's take the actual operation with this article. The full text consists of the following:
Software and hardware environment reference information
DL4J's dependent libraries and versions
Specific steps to use GPU
GPU training and CPU training comparison

Software and hardware environment reference information

As we all know, Xin Chen is a poor man, so a computer with NVIDIA graphics card is a worn-out Lenovo notebook. The relevant information is as follows:
Operating system: Ubuntu16 desktop version
Graphics card model: GTX950M
CUDA：9.2
CPU：i5-6300HQ
Memory: 32G DDR4
Hard Disk: NvMe 1T
It turns out that the above configuration can run smoothly. "DL4J Actual Combat 3: Classic Convolution Example (LeNet-5)" The example in the article 16170b492629f2, and can be accelerated by GPU training (the comparison data of GPU and CPU will be given later)
For the process of installing NVIDIA driver and CUDA9.2 in the Ubuntu16 environment, please refer to the article "Installing CUDA (9.1) and cuDNN on Pure Ubuntu16" , the CUDA version installed here is 9.1, please change to version 9.2 by yourself

DL4J's dependent libraries and versions

The first thing to emphasize is: do not use CUDA 11.2 version (this is the version output when nvidia-smi is executed). As of this writing, using CUDA 11.2 and its dependent libraries will cause ClassNotFound exceptions at startup
I have not tried the CUDA 10.X version here, so I will not comment
Both CUDA 9.1 and 9.2 versions have been tried and can be used normally
Why not use 9.1? Let's first go to the central warehouse to see the version of the DL4J core library, as shown in the figure below, the latest version has arrived 1.0.0-M1:

在这里插入图片描述

Let's take a look at the version of the nd4j library corresponding to CUDA 9.1, as shown in the red box below. The latest one is 1.0.0-beta in 2018, which is too far behind the core library:

在这里插入图片描述

Okay, let’s take a look at the version of the nd4j library corresponding to CUDA 9.2, as shown in the red box below. The latest one is 1.0.0-beta6, which is two versions different from the core library. Therefore, it is recommended to use CUDA 9.2:

在这里插入图片描述

Specific steps to use GPU

Whether to use CPU or GPU, the specific steps are very simple: just switch between different dependent libraries, which are introduced below
If you use CPU for training, the dependent libraries and versions are as follows:

<!--核心库，不论是CPU还是GPU都要用到-->
<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-core</artifactId>
    <version>1.0.0-beta6</version>
</dependency>
<!--CPU要用到-->
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>1.0.0-beta6</version>
</dependency>

If you use GPU for training and the CUDA version is 9.2, the dependent libraries and versions are as follows:

<!--核心库，不论是CPU还是GPU都要用到-->
<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-core</artifactId>
    <version>1.0.0-beta6</version>
</dependency>
<!--GPU要用到-->
<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-9.2</artifactId>
    <version>1.0.0-beta6</version>
</dependency>
<!--GPU要用到-->
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-9.2-platform</artifactId>
    <version>1.0.0-beta6</version>
</dependency>

The java code is not posted here. The code in "DL4J Actual Combat 3: Classic Convolution Example (LeNet-5)" is used without any changes.

Memory settings

When using IDEA to run the code, you can increase the memory appropriately according to the current hardware situation. The steps are as follows:

在这里插入图片描述

Please adjust as appropriate, I set it to 8G here

在这里插入图片描述

After setting up, then perform training and testing with CPU and GPU on the same computer, and check the GPU acceleration effect by comparison

CPU version

On this shabby laptop, it is very difficult to use the CPU for training, as shown in the figure below, it is almost drained:

在这里插入图片描述

The console output is as follows, which takes 158 seconds, which is a really long process:

=========================Confusion Matrix=========================
    0    1    2    3    4    5    6    7    8    9
---------------------------------------------------
  973    1    0    0    0    0    2    2    1    1 | 0 = 0
    0 1132    0    2    0    0    1    0    0    0 | 1 = 1
    1    5 1018    1    1    0    0    4    2    0 | 2 = 2
    0    0    2 1003    0    3    0    1    1    0 | 3 = 3
    0    0    1    0  975    0    2    0    0    4 | 4 = 4
    2    0    0    6    0  880    2    1    1    0 | 5 = 5
    6    1    0    0    3    4  944    0    0    0 | 6 = 6
    0    3    6    1    0    0    0 1012    2    4 | 7 = 7
    3    0    1    1    0    1    1    2  964    1 | 8 = 8
    0    0    0    2    6    2    0    2    0  997 | 9 = 9

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times
==================================================================
13:24:31.616 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 完成训练和测试，耗时[158739]毫秒
13:24:32.116 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 最新的MINIST模型保存在[/home/will/temp/202106/26/minist-model.zip]

GPU version

Next, modify the pom.xml file according to the dependencies given above to enable the GPU. During operation, the console outputs the following to indicate that the GPU is enabled:

13:27:08.277 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Linux]
13:27:08.277 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [4]; Memory: [7.7GB];
13:27:08.277 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
13:27:08.300 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 9.2.148
13:27:08.301 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [GeForce GTX 950M]; cc: [5.0]; Total memory: [4242604032]

This time, the running process is obviously much smoother, and the CPU usage rate has dropped a lot:

在这里插入图片描述

The console output is as follows, which takes 21 seconds. It can be seen that the GPU acceleration effect is still very obvious:

=========================Confusion Matrix=========================
    0    1    2    3    4    5    6    7    8    9
---------------------------------------------------
  973    1    0    0    0    0    2    2    1    1 | 0 = 0
    0 1129    0    2    0    0    2    2    0    0 | 1 = 1
    1    3 1021    0    1    0    0    4    2    0 | 2 = 2
    0    0    1 1003    0    3    0    1    2    0 | 3 = 3
    0    0    1    0  973    0    3    0    0    5 | 4 = 4
    1    0    0    6    0  882    2    1    0    0 | 5 = 5
    6    1    0    0    2    5  944    0    0    0 | 6 = 6
    0    2    4    1    0    0    0 1016    2    3 | 7 = 7
    1    0    2    1    0    1    0    2  964    3 | 8 = 8
    0    0    0    2    6    3    0    2    1  995 | 9 = 9

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times
==================================================================
13:27:30.722 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 完成训练和测试，耗时[21441]毫秒
13:27:31.323 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 最新的MINIST模型保存在[/home/will/temp/202106/26/minist-model.zip]

Process finished with exit code 0

At this point, the actual combat of GPU acceleration under the DL4J framework is complete. If you have an NVIDIA graphics card in your hand, you can try it. I hope this article can give you some reference.

You are not alone, Xinchen and original are with you all the way

Welcome to pay attention to the public account: programmer Xin Chen

Search "Programmer Xin Chen" on WeChat, I am Xin Chen, and I look forward to traveling the Java world with you...
https://github.com/zq2599/blog_demos

DL4J combat four: classic convolution example (GPU version)

Welcome to my GitHub

Overview of this article

Software and hardware environment reference information

DL4J's dependent libraries and versions

Specific steps to use GPU

Memory settings

CPU version

GPU version

You are not alone, Xinchen and original are with you all the way

Welcome to pay attention to the public account: programmer Xin Chen

程序员欣宸

引用和评论

quarkus依赖注入之十二：禁用类级别拦截器

数据库的下一场革命：S3 延迟已降至原先的 10%，云数据库架构该进化了

在 ApeCloud （云猿生数据）实习是怎样的体验？跟行业大佬练技术修为的一年小记

阿里云 ESA 游戏行业解决方案｜安全防护、加速、低延时的技术融合

基于 KubeBlocks 的 PikiwiDB(原Pika) 云化下一站

Linux系统安装更新Python3.x版本详细步骤

K3s + KubeSphere + DeepSeek 全流程部署指南：轻量 K8s 与 AI 大模型私有化实践