Welcome to my GitHub

https://github.com/zq2599/blog_demos

Content: Classification and summary of all original articles and supporting source code, involving Java, Docker, Kubernetes, DevOPS, etc.;

Overview of this article

  • As the fourth chapter of "DL4J Actual Combat", today we will not write code, but make some preparations for future actual combat: use GPU to accelerate the deep learning training process under the DL4J framework;
  • If you have an NVIDIA graphics card on your computer, and successfully installed CUDA, then let's take the actual operation with this article. The full text consists of the following:
  • Software and hardware environment reference information
  • DL4J's dependent libraries and versions
  • Specific steps to use GPU
  • GPU training and CPU training comparison

Software and hardware environment reference information

  • As we all know, Xin Chen is a poor man, so a computer with NVIDIA graphics card is a worn-out Lenovo notebook. The relevant information is as follows:
  • Operating system: Ubuntu16 desktop version
  • Graphics card model: GTX950M
  • CUDA:9.2
  • CPU:i5-6300HQ
  • Memory: 32G DDR4
  • Hard Disk: NvMe 1T
  • It turns out that the above configuration can run smoothly. "DL4J Actual Combat 3: Classic Convolution Example (LeNet-5)" The example in the article 16170b492629f2, and can be accelerated by GPU training (the comparison data of GPU and CPU will be given later)
  • For the process of installing NVIDIA driver and CUDA9.2 in the Ubuntu16 environment, please refer to the article "Installing CUDA (9.1) and cuDNN on Pure Ubuntu16" , the CUDA version installed here is 9.1, please change to version 9.2 by yourself

DL4J's dependent libraries and versions

  • The first thing to emphasize is: do not use CUDA 11.2 version (this is the version output when nvidia-smi is executed). As of this writing, using CUDA 11.2 and its dependent libraries will cause ClassNotFound exceptions at startup
  • I have not tried the CUDA 10.X version here, so I will not comment
  • Both CUDA 9.1 and 9.2 versions have been tried and can be used normally
  • Why not use 9.1? Let's first go to the central warehouse to see the version of the DL4J core library, as shown in the figure below, the latest version has arrived <font color="blue">1.0.0-M1</font>:

在这里插入图片描述

  • Let's take a look at the version of the nd4j library corresponding to CUDA 9.1, as shown in the red box below. The latest one is <font color="blue">1.0.0-beta</font> in 2018, which is too far behind the core library:

在这里插入图片描述

  • Okay, let’s take a look at the version of the nd4j library corresponding to CUDA 9.2, as shown in the red box below. The latest one is <font color="blue">1.0.0-beta6</font>, which is two versions different from the core library. Therefore, it is recommended to use CUDA 9.2:

在这里插入图片描述

Specific steps to use GPU

  • Whether to use CPU or GPU, the specific steps are very simple: just switch between different dependent libraries, which are introduced below
  • If you use CPU for training, the dependent libraries and versions are as follows:
<!--核心库,不论是CPU还是GPU都要用到-->
<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-core</artifactId>
    <version>1.0.0-beta6</version>
</dependency>
<!--CPU要用到-->
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-native</artifactId>
    <version>1.0.0-beta6</version>
</dependency>

If you use GPU for training and the CUDA version is 9.2, the dependent libraries and versions are as follows:

<!--核心库,不论是CPU还是GPU都要用到-->
<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-core</artifactId>
    <version>1.0.0-beta6</version>
</dependency>
<!--GPU要用到-->
<dependency>
    <groupId>org.deeplearning4j</groupId>
    <artifactId>deeplearning4j-cuda-9.2</artifactId>
    <version>1.0.0-beta6</version>
</dependency>
<!--GPU要用到-->
<dependency>
    <groupId>org.nd4j</groupId>
    <artifactId>nd4j-cuda-9.2-platform</artifactId>
    <version>1.0.0-beta6</version>
</dependency>

Memory settings

  • When using IDEA to run the code, you can increase the memory appropriately according to the current hardware situation. The steps are as follows:

在这里插入图片描述

  • Please adjust as appropriate, I set it to 8G here

在这里插入图片描述

  • After setting up, then perform training and testing with CPU and GPU on the same computer, and check the GPU acceleration effect by comparison

CPU version

  • On this shabby laptop, it is very difficult to use the CPU for training, as shown in the figure below, it is almost drained:

在这里插入图片描述

  • The console output is as follows, which takes <font color="blue">158</font> seconds, which is a really long process:
=========================Confusion Matrix=========================
    0    1    2    3    4    5    6    7    8    9
---------------------------------------------------
  973    1    0    0    0    0    2    2    1    1 | 0 = 0
    0 1132    0    2    0    0    1    0    0    0 | 1 = 1
    1    5 1018    1    1    0    0    4    2    0 | 2 = 2
    0    0    2 1003    0    3    0    1    1    0 | 3 = 3
    0    0    1    0  975    0    2    0    0    4 | 4 = 4
    2    0    0    6    0  880    2    1    1    0 | 5 = 5
    6    1    0    0    3    4  944    0    0    0 | 6 = 6
    0    3    6    1    0    0    0 1012    2    4 | 7 = 7
    3    0    1    1    0    1    1    2  964    1 | 8 = 8
    0    0    0    2    6    2    0    2    0  997 | 9 = 9

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times
==================================================================
13:24:31.616 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 完成训练和测试,耗时[158739]毫秒
13:24:32.116 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 最新的MINIST模型保存在[/home/will/temp/202106/26/minist-model.zip]

GPU version

  • Next, modify the pom.xml file according to the dependencies given above to enable the GPU. During operation, the console outputs the following to indicate that the GPU is enabled:
13:27:08.277 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Backend used: [CUDA]; OS: [Linux]
13:27:08.277 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Cores: [4]; Memory: [7.7GB];
13:27:08.277 [main] INFO org.nd4j.linalg.api.ops.executioner.DefaultOpExecutioner - Blas vendor: [CUBLAS]
13:27:08.300 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - ND4J CUDA build version: 9.2.148
13:27:08.301 [main] INFO org.nd4j.linalg.jcublas.JCublasBackend - CUDA device 0: [GeForce GTX 950M]; cc: [5.0]; Total memory: [4242604032]
  • This time, the running process is obviously much smoother, and the CPU usage rate has dropped a lot:

在这里插入图片描述

  • The console output is as follows, which takes <font color="blue">21</font> seconds. It can be seen that the GPU acceleration effect is still very obvious:
=========================Confusion Matrix=========================
    0    1    2    3    4    5    6    7    8    9
---------------------------------------------------
  973    1    0    0    0    0    2    2    1    1 | 0 = 0
    0 1129    0    2    0    0    2    2    0    0 | 1 = 1
    1    3 1021    0    1    0    0    4    2    0 | 2 = 2
    0    0    1 1003    0    3    0    1    2    0 | 3 = 3
    0    0    1    0  973    0    3    0    0    5 | 4 = 4
    1    0    0    6    0  882    2    1    0    0 | 5 = 5
    6    1    0    0    2    5  944    0    0    0 | 6 = 6
    0    2    4    1    0    0    0 1016    2    3 | 7 = 7
    1    0    2    1    0    1    0    2  964    3 | 8 = 8
    0    0    0    2    6    3    0    2    1  995 | 9 = 9

Confusion matrix format: Actual (rowClass) predicted as (columnClass) N times
==================================================================
13:27:30.722 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 完成训练和测试,耗时[21441]毫秒
13:27:31.323 [main] INFO com.bolingcavalry.convolution.LeNetMNISTReLu - 最新的MINIST模型保存在[/home/will/temp/202106/26/minist-model.zip]

Process finished with exit code 0
  • At this point, the actual combat of GPU acceleration under the DL4J framework is complete. If you have an NVIDIA graphics card in your hand, you can try it. I hope this article can give you some reference.

You are not alone, Xinchen and original are with you all the way

  1. Java series
  2. Spring series
  3. Docker series
  4. kubernetes series
  5. database + middleware series
  6. DevOps series

Welcome to pay attention to the public account: programmer Xin Chen

Search "Programmer Xin Chen" on WeChat, I am Xin Chen, and I look forward to traveling the Java world with you...
https://github.com/zq2599/blog_demos

程序员欣宸
147 声望24 粉丝

热爱Java和Docker