Ultra-detailed coding practice, let your springboot application identify pedestrians, cars, dogs, and cats in the picture (JavaCV+YOLO4)

Welcome to my GitHub

Content: All original articles are categorized and summarized and supporting source code, involving Java, Docker, Kubernetes, DevOPS, etc.;

Overview of this article

In the "Three Minutes: Extremely Fast Experience JAVA Version Target Detection (YOLO4)" , we experienced the powerful object recognition ability of YOLO4, as shown below, the dogs, people and horses in the original picture have been identified and marked :

在这里插入图片描述

If you have previous knowledge of deep learning, YOLO, darknet, etc., I believe you will have questions: Can Java achieve this?
Yes, today we will start from scratch and develop a SpringBoot application to achieve the above functions. The application is called yolo-demo
The key to letting the SpringBoot application recognize objects in pictures is how to use the trained neural network model. Fortunately, the DNN module integrated by OpenCV can load and use the YOLO4 model. We just need to find a way to use OpenCV.
My method here is to use the JavaCV library, because JavaCV itself encapsulates OpenCV, and can finally use the YOLO4 model for inference. The dependencies are shown in the following figure:

在这里插入图片描述

key technology

This article involves JavaCV, OpenCV, YOLO4, etc. It can be seen from the above figure that JavaCV has encapsulated these, including the model used in the final inference, which is also officially trained by YOLO4 in advance. We only need to know how to use the API of JavaCV.
YOVO4's paper is here: https://arxiv.org/pdf/2004.10934v1.pdf

Version Information

Here is my development environment for your reference:
OS: Ubuntu 16 (also MacBook Pro, version 11.2.3, macOS Big Sur)
docker：20.10.2 Community
java：1.8.0_211
springboot：2.4.8
javacv：1.5.6
opencv：4.5.3

Practical steps

Before the official start, first sort out the steps of this actual combat clearly, and then execute it step by step;
In order to reduce the impact of environment and software differences and make the running and debugging of the program easier, the SpringBoot application will be made into a docker image, and then run in the docker environment. Therefore, the whole actual combat is simply divided into three steps: making a basic image, Develop a SpringBoot application and make the application into a mirror, as shown below:

在这里插入图片描述

The first step in the above process, make a basic image, has been introduced in "Make a Basic Docker Image (CentOS7+JDK8+OpenCV4) that JavaCV Applications Depend on" , We can directly use the mirror bolingcavalry/opencv4.5.3:0.0.1, and the following content will focus on the development of SpringBoot applications;
The function of this SpringBoot application is very simple, as shown in the following figure:

在这里插入图片描述

The whole development process involves these steps: web page submission of photos, neural network initialization, file processing, image detection, processing detection results, standard recognition results on pictures, front-end display of pictures, etc. The complete steps have been organized as follows:

在这里插入图片描述

The content is very rich, and the harvest will not be less, not to mention previous has been guaranteed to run successfully, so don't hesitate, let's start!

Source code download

The complete source code in this actual combat can be downloaded from GitHub. The address and link information are shown in the following table ( https://github.com/zq2599/blog_demos ):

name	Link	Remark
Project homepage	https://github.com/zq2599/blog_demos	The project's homepage on GitHub
git repository address (https)	https://github.com/zq2599/blog_demos.git	The warehouse address of the source code of the project, https protocol
git repository address (ssh)	git@github.com:zq2599/blog_demos.git	The warehouse address of the project source code, ssh protocol

There are multiple folders in this git project. The source code of this article is in the javacv-tutorials folder, as shown in the red box below:

在这里插入图片描述

There are multiple sub-projects in javacv-tutorials. Today's code is under the yolo-demo project:

在这里插入图片描述

Create a new SpringBoot application

Create a new maven project named yolo-demo. First, this is a standard SpringBoot project, and secondly, the dependency library of javacv is added. The content of pom.xml is as follows, focusing on javacv, opencv, etc. Library dependencies and exact version matching:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.bolingcavalry</groupId>
    <version>1.0-SNAPSHOT</version>
    <artifactId>yolo-demo</artifactId>
    <packaging>jar</packaging>

    <properties>
        <java.version>1.8</java.version>
        <maven.compiler.source>8</maven.compiler.source>
        <maven.compiler.target>8</maven.compiler.target>
        <maven-compiler-plugin.version>3.6.1</maven-compiler-plugin.version>
        <springboot.version>2.4.8</springboot.version>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
        <maven.compiler.encoding>UTF-8</maven.compiler.encoding>
    </properties>

    <dependencyManagement>
        <dependencies>
            <dependency>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-dependencies</artifactId>
                <version>${springboot.version}</version>
                <type>pom</type>
                <scope>import</scope>
            </dependency>
        </dependencies>
    </dependencyManagement>

    <dependencies>
        <!--FreeMarker模板视图依赖-->
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-freemarker</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
        </dependency>

        <dependency>
            <groupId>org.projectlombok</groupId>
            <artifactId>lombok</artifactId>
        </dependency>

        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-test</artifactId>
            <scope>test</scope>
        </dependency>

        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>javacv-platform</artifactId>
            <version>1.5.6</version>
        </dependency>

        <dependency>
            <groupId>org.bytedeco</groupId>
            <artifactId>opencv-platform-gpu</artifactId>
            <version>4.5.3-1.5.6</version>
        </dependency>

    </dependencies>

    <build>
        <plugins>
            <!-- 如果父工程不是springboot，就要用以下方式使用插件，才能生成正常的jar -->
            <plugin>
                <groupId>org.springframework.boot</groupId>
                <artifactId>spring-boot-maven-plugin</artifactId>
                <configuration>
                    <mainClass>com.bolingcavalry.yolodemo.YoloDemoApplication</mainClass>
                </configuration>
                <executions>
                    <execution>
                        <goals>
                            <goal>repackage</goal>
                        </goals>
                    </execution>
                </executions>
            </plugin>
        </plugins>
    </build>

</project>

The next focus is the configuration file application.properties, as can be seen below, in addition to the common spring configuration, there are several file path configurations, and these paths must be stored corresponding to the actual runtime. The files are used by the program. How to obtain these files will be discussed later:

### FreeMarker 配置
spring.freemarker.allow-request-override=false
#Enable template caching.启用模板缓存。
spring.freemarker.cache=false
spring.freemarker.check-template-location=true
spring.freemarker.charset=UTF-8
spring.freemarker.content-type=text/html
spring.freemarker.expose-request-attributes=false
spring.freemarker.expose-session-attributes=false
spring.freemarker.expose-spring-macro-helpers=false
#设置面板后缀
spring.freemarker.suffix=.ftl

# 设置单个文件最大内存
spring.servlet.multipart.max-file-size=100MB
# 设置所有文件最大内存
spring.servlet.multipart.max-request-size=1000MB
# 自定义文件上传路径
web.upload-path=/app/images
# 模型路径
# yolo的配置文件所在位置
opencv.yolo-cfg-path=/app/model/yolov4.cfg
# yolo的模型文件所在位置
opencv.yolo-weights-path=/app/model/yolov4.weights
# yolo的分类文件所在位置
opencv.yolo-coconames-path=/app/model/coco.names
# yolo模型推理时的图片宽度
opencv.yolo-width=608
# yolo模型推理时的图片高度
opencv.yolo-height=608

Startup class YoloDemoApplication.java:

package com.bolingcavalry.yolodemo;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class YoloDemoApplication {

    public static void main(String[] args) {
        SpringApplication.run(YoloDemoApplication.class, args);
    }
}

The project has been built, and then start coding, starting from the front-end page

front page

As long as the front-end is involved, Xinchen will usually issue a self-protection statement: please forgive Xinchen's uninformed front-end level, I can't bear to look directly at the page, but for the integrity of the function, please bear with it, it is not impossible Well, we always have to have a place to submit photos and show the recognition results, right?
Add a new front-end template file named index.ftl, located in the red box as shown below:

在这里插入图片描述

The content of index.ftl is as follows. It can be seen that it is very simple. There are forms for selecting and submitting files, scripts for displaying results, and prompt information returned in the background. is enough:

<!DOCTYPE html>
<head>
    <meta charset="UTF-8" />
    <title>图片上传Demo</title>
</head>
<body>
<h1 >图片上传Demo</h1>
<form action="fileUpload" method="post" enctype="multipart/form-data">
    <p>选择检测文件: <input type="file" name="fileName"/></p>
    <p><input type="submit" value="提交"/></p>
</form>
<#--判断是否上传文件-->
<#if msg??>
    <span>${msg}</span><br><br>
<#else >
    <span>${msg!("文件未上传")}</span><br>
</#if>
<#--显示图片，一定要在img中的src发请求给controller，否则直接跳转是乱码-->
<#if fileName??>
<#--<img src="/show?fileName=${fileName}" style="width: 100px"/>-->
<img src="/show?fileName=${fileName}"/>
<#else>
<#--<img src="/show" style="width: 200px"/>-->
</#if>
</body>
</html>

The effect of the page is as follows:

在这里插入图片描述

Backend logic: initialization

In order to keep it simple, all the back-end logic is placed in a java file: YoloServiceController.java. According to the previous process, let's look at the initialization part first.
The first is member variables and dependencies

private final ResourceLoader resourceLoader;

    @Autowired
    public YoloServiceController(ResourceLoader resourceLoader) {
        this.resourceLoader = resourceLoader;
    }

    @Value("${web.upload-path}")
    private String uploadPath;

    @Value("${opencv.yolo-cfg-path}")
    private String cfgPath;

    @Value("${opencv.yolo-weights-path}")
    private String weightsPath;

    @Value("${opencv.yolo-coconames-path}")
    private String namesPath;

    @Value("${opencv.yolo-width}")
    private int width;

    @Value("${opencv.yolo-height}")
    private int height;

    /**
     * 置信度门限（超过这个值才认为是可信的推理结果）
     */
    private float confidenceThreshold = 0.5f;

    private float nmsThreshold = 0.4f;

    // 神经网络
    private Net net;

    // 输出层
    private StringVector outNames;

    // 分类名称
    private List<String> names;

Next is the initialization method init. It can be seen that the configuration, training model and other files required by the neural network will be loaded from the previously configured file paths. The key method is the call of readNetFromDarknet, and the check whether there is a device that supports CUDA. If Just set it up in the neural network:

    @PostConstruct
    private void init() throws Exception {
        // 初始化打印一下，确保编码正常，否则日志输出会是乱码
        log.error("file.encoding is " + System.getProperty("file.encoding"));

        // 神经网络初始化
        net = readNetFromDarknet(cfgPath, weightsPath);

        // 检查网络是否为空
        if (net.empty()) {
            log.error("神经网络初始化失败");
            throw new Exception("神经网络初始化失败");
        }

        // 输出层
        outNames = net.getUnconnectedOutLayersNames();

        // 检查GPU
        if (getCudaEnabledDeviceCount() > 0) {
            net.setPreferableBackend(opencv_dnn.DNN_BACKEND_CUDA);
            net.setPreferableTarget(opencv_dnn.DNN_TARGET_CUDA);
        }

        // 分类名称
        try {
            names = Files.readAllLines(Paths.get(namesPath));
        } catch (IOException e) {
            log.error("获取分类名称失败，文件路径[{}]", namesPath, e);
        }
    }

Process upload files

How to process the image file in binary format after the front end submits it? Here is a simple file processing method upload, which will save the file in the specified location of the server, which will be called later:

/**
     * 上传文件到指定目录
     * @param file 文件
     * @param path 文件存放路径
     * @param fileName 源文件名
     * @return
     */
    private static boolean upload(MultipartFile file, String path, String fileName){
        //使用原文件名
        String realPath = path + "/" + fileName;

        File dest = new File(realPath);

        //判断文件父目录是否存在
        if(!dest.getParentFile().exists()){
            dest.getParentFile().mkdir();
        }

        try {
            //保存文件
            file.transferTo(dest);
            return true;
        } catch (IllegalStateException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
            return false;
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
            return false;
        }
    }

object detection

The preparations have been completed, let's write the core object detection code, which is placed in the method of the yolo-demo application to process web requests, as shown below, it can be seen that this is just an outline, and functions such as reasoning, result processing, and image annotation are included. String together to form a complete process, but does not involve the details of each specific function:

@RequestMapping("fileUpload")
    public String upload(@RequestParam("fileName") MultipartFile file, Map<String, Object> map){
        log.info("文件 [{}], 大小 [{}]", file.getOriginalFilename(), file.getSize());

        // 文件名称
        String originalFileName = file.getOriginalFilename();

        if (!upload(file, uploadPath, originalFileName)){
            map.put("msg", "上传失败！");
            return "forward:/index";
        }

        // 读取文件到Mat
        Mat src = imread(uploadPath + "/" + originalFileName);

        // 执行推理
        MatVector outs = doPredict(src);

        // 处理原始的推理结果，
        // 对检测到的每个目标，找出置信度最高的类别作为改目标的类别，
        // 还要找出每个目标的位置，这些信息都保存在ObjectDetectionResult对象中
        List<ObjectDetectionResult> results = postprocess(src, outs);

        // 释放资源
        outs.releaseReference();

        // 检测到的目标总数
        int detectNum = results.size();

        log.info("一共检测到{}个目标", detectNum);

        // 没检测到
        if (detectNum<1) {
            // 显示图片
            map.put("msg", "未检测到目标");
            // 文件名
            map.put("fileName", originalFileName);

            return "forward:/index";
        } else {
            // 检测结果页面的提示信息
            map.put("msg", "检测到" + results.size() + "个目标");
        }

        // 计算出总耗时，并输出在图片的左上角
        printTimeUsed(src);

        // 将每一个被识别的对象在图片框出来，并在框的左上角标注该对象的类别
        markEveryDetectObject(src, results);

        // 将添加了标注的图片保持在磁盘上，并将图片信息写入map（给跳转页面使用）
        saveMarkedImage(map, src);

        return "forward:/index";
    }

The whole process can already be understood here, and then each detail will be expanded.

Detecting Objects with Neural Networks

It can be seen from the above code that after the image is converted into a Mat object (an important data structure in OpenCV, which can be understood as a matrix, which stores the information of each pixel of the image), it is sent to doPredict method, after the method is executed, the result of object recognition is obtained
A closer look at the doPredict method shows that the core is to use the blobFromImage method to obtain a four-dimensional blob object, and then send this object to the neural network for detection (net.setInput, net.forward)

/**
     * 用神经网络执行推理
     * @param src
     * @return
     */
    private MatVector doPredict(Mat src) {
        // 将图片转为四维blog，并且对尺寸做调整
        Mat inputBlob = blobFromImage(src,
                1 / 255.0,
                new Size(width, height),
                new Scalar(0.0),
                true,
                false,
                CV_32F);

        // 神经网络输入
        net.setInput(inputBlob);

        // 设置输出结果保存的容器
        MatVector outs = new MatVector(outNames.size());

        // 推理，结果保存在outs中
        net.forward(outs, outNames);

        // 释放资源
        inputBlob.release();

        return outs;
    }

It should be noted that blobFromImage, net.setInput, net.forward are all native methods, provided by the dnn module of OpenCV
The doPredict method returns a MatVector object, which is the detection result

Process raw detection results

The detection result MatVector object is a collection, which contains multiple Mat objects, each Mat object is a table, which contains rich data, the specific content is as follows:

在这里插入图片描述

After looking at the above picture, I believe you have a good idea of how to deal with the original detection results. Just take out the Mats from the MatVector one by one, treat each Mat as a table, and find the column with the highest probability in each row of the table. This column is the object (As for what each column is, why is the fifth column in the above table a person, the sixth column is a bicycle, and the last column is a toothbrush? This will be discussed later):

    /**
     * 推理完成后的操作
     * @param frame
     * @param outs
     * @return
     */
    private List<ObjectDetectionResult> postprocess(Mat frame, MatVector outs) {
        final IntVector classIds = new IntVector();
        final FloatVector confidences = new FloatVector();
        final RectVector boxes = new RectVector();

        // 处理神经网络的输出结果
        for (int i = 0; i < outs.size(); ++i) {
            // extract the bounding boxes that have a high enough score
            // and assign their highest confidence class prediction.

            // 每个检测到的物体，都有对应的每种类型的置信度，取最高的那种
            // 例如检车到猫的置信度百分之九十，狗的置信度百分之八十，那就认为是猫
            Mat result = outs.get(i);
            FloatIndexer data = result.createIndexer();

            // 将检测结果看做一个表格，
            // 每一行表示一个物体，
            // 前面四列表示这个物体的坐标，后面的每一列，表示这个物体在某个类别上的置信度，
            // 每行都是从第五列开始遍历，找到最大值以及对应的列号，
            for (int j = 0; j < result.rows(); j++) {
                // minMaxLoc implemented in java because it is 1D
                int maxIndex = -1;
                float maxScore = Float.MIN_VALUE;
                for (int k = 5; k < result.cols(); k++) {
                    float score = data.get(j, k);
                    if (score > maxScore) {
                        maxScore = score;
                        maxIndex = k - 5;
                    }
                }

                // 如果最大值大于之前设定的置信度门限，就表示可以确定是这类物体了，
                // 然后就把这个物体相关的识别信息保存下来，要保存的信息有：类别、置信度、坐标
                if (maxScore > confidenceThreshold) {
                    int centerX = (int) (data.get(j, 0) * frame.cols());
                    int centerY = (int) (data.get(j, 1) * frame.rows());
                    int width = (int) (data.get(j, 2) * frame.cols());
                    int height = (int) (data.get(j, 3) * frame.rows());
                    int left = centerX - width / 2;
                    int top = centerY - height / 2;

                    // 保存类别
                    classIds.push_back(maxIndex);
                    // 保存置信度
                    confidences.push_back(maxScore);
                    // 保存坐标
                    boxes.push_back(new Rect(left, top, width, height));
                }
            }

            // 资源释放
            data.release();
            result.release();
        }

        // remove overlapping bounding boxes with NMS
        IntPointer indices = new IntPointer(confidences.size());
        FloatPointer confidencesPointer = new FloatPointer(confidences.size());
        confidencesPointer.put(confidences.get());

        // 非极大值抑制
        NMSBoxes(boxes, confidencesPointer, confidenceThreshold, nmsThreshold, indices, 1.f, 0);

        // 将检测结果放入BO对象中，便于业务处理
        List<ObjectDetectionResult> detections = new ArrayList<>();
        for (int i = 0; i < indices.limit(); ++i) {
            final int idx = indices.get(i);
            final Rect box = boxes.get(idx);

            final int clsId = classIds.get(idx);

            detections.add(new ObjectDetectionResult(
               clsId,
               names.get(clsId),
               confidences.get(idx),
               box.x(),
               box.y(),
               box.width(),
               box.height()
            ));

            // 释放资源
            box.releaseReference();
        }

        // 释放资源
        indices.releaseReference();
        confidencesPointer.releaseReference();
        classIds.releaseReference();
        confidences.releaseReference();
        boxes.releaseReference();

        return detections;
    }

It can be seen that the code is very simple, that is to treat each Mat as a table, there are two special places to deal with:

The confidenceThreshold variable, the confidence threshold, here is 0.5. If the maximum probability of a row cannot even reach 0.5, it is equivalent to knowing that the possibility of all categories is not high, and it is not recognized, so it will not be recognized. Stored in the detections collection (will not be marked in the result image)
NMSBoxes: When the classifier evolves into a detector, windows are generated from multiple scales on the original image, which leads to the effect on the left side of the figure below. The same person detects multiple faces. At this time, NMSBoxes is used to keep the best one. result

在这里插入图片描述

Now explain what category each column is in the table corresponding to the Mat object: this table is the detection result of YOLO4, so what category each column is should be explained by YOLO4, the official name is coco.names file, the content of the file is as shown below, a total of 80 lines, each line represents a category:

在这里插入图片描述

At this moment, you must have understood what category each column in the Mat table represents: each column in the Mat table corresponds to each row of coco.names, as shown below:

在这里插入图片描述

After the postprocess method is executed, the recognition result of a photo is put into a collection named detections. Each element in the collection represents a recognized object. Let's take a look at the data structure of this element, as shown below, This data is enough for us to label the recognition results on the photos:

@Data
@AllArgsConstructor
public class ObjectDetectionResult {
    // 类别索引
    int classId;
    // 类别名称
    String className;
    // 置信度
    float confidence;
    // 物体在照片中的横坐标
    int x;
    // 物体在照片中的纵坐标
    int y;
    // 物体宽度
    int width;
    // 物体高度
    int height;
}

Draw the test results on the picture

With the detection results in hand, the next thing to do is to draw these results on the original image, so as to have the effect of object recognition. The drawing is divided into two parts, the first is the total time spent in the upper left corner, and the second is the recognition of each object result
First, draw the total time-consuming of this detection in the upper corner of the picture, and the effect is shown in the following figure:

在这里插入图片描述

It is the printTimeUsed method that is responsible for drawing the total time-consuming. As follows, it can be seen that the total time-consuming is obtained by dividing the total number of multi-layer networks by the frequency. Note that this is not the total time spent on the interface on the web page, but the neural network to identify objects. The total time is spent, the putText of the exception drawing is a local method, which is also one of the common methods of OpenCV:

    /**
     * 计算出总耗时，并输出在图片的左上角
     * @param src
     */
    private void printTimeUsed(Mat src) {
        // 总次数
        long totalNums = net.getPerfProfile(new DoublePointer());
        // 频率
        double freq = getTickFrequency()/1000;
        // 总次数除以频率就是总耗时
        double t =  totalNums / freq;

        // 将本次检测的总耗时打印在展示图像的左上角
        putText(src,
                String.format("Inference time : %.2f ms", t),
                new Point(10, 20),
                FONT_HERSHEY_SIMPLEX,
                0.6,
                new Scalar(255, 0, 0, 0),
                1,
                LINE_AA,
                false);
    }

The next step is to draw the result of each object recognition. With the ObjectDetectionResult object collection, drawing is very simple: just call the native method for drawing rectangles and text:

   /**
     * 将每一个被识别的对象在图片框出来，并在框的左上角标注该对象的类别
     * @param src
     * @param results
     */
    private void markEveryDetectObject(Mat src, List<ObjectDetectionResult> results) {
        // 在图片上标出每个目标以及类别和置信度
        for(ObjectDetectionResult result : results) {
            log.info("类别[{}]，置信度[{}%]", result.getClassName(), result.getConfidence() * 100f);

            // annotate on image
            rectangle(src,
                    new Point(result.getX(), result.getY()),
                    new Point(result.getX() + result.getWidth(), result.getY() + result.getHeight()),
                    Scalar.MAGENTA,
                    1,
                    LINE_8,
                    0);

            // 写在目标左上角的内容：类别+置信度
            String label = result.getClassName() + ":" + String.format("%.2f%%", result.getConfidence() * 100f);

            // 计算显示这些内容所需的高度
            IntPointer baseLine = new IntPointer();

            Size labelSize = getTextSize(label, FONT_HERSHEY_SIMPLEX, 0.5, 1, baseLine);
            int top = Math.max(result.getY(), labelSize.height());

            // 添加内容到图片上
            putText(src, label, new Point(result.getX(), top-4), FONT_HERSHEY_SIMPLEX, 0.5, new Scalar(0, 255, 0, 0), 1, LINE_4, false);
        }
    }

Show results

The core work has been completed, the next step is to save the picture and then jump to the display page:

在这里插入图片描述

So far the SpringBoot project coding is completed, the next thing to do is to make the entire project into a docker image

Make the SpringBoot project into a docker image

The previous "Make the Basic Docker Image That JavaCV Application Depends on (CentOS7+JDK8+OpenCV4)" prepared the basic image and helped us prepare the JDK and OpenCV libraries, which makes the next operation extremely simple, let's go step by step
Write the Dockerfile file first. Please put the Dockerfile file in the same directory as and pom.xml. The contents are as follows:

# 基础镜像集成了openjdk8和opencv4.5.3
FROM bolingcavalry/opencv4.5.3:0.0.1

# 创建目录
RUN mkdir -p /app/images && mkdir -p /app/model

# 指定镜像的内容的来源位置
ARG DEPENDENCY=target/dependency

# 复制内容到镜像
COPY ${DEPENDENCY}/BOOT-INF/lib /app/lib
COPY ${DEPENDENCY}/META-INF /app/META-INF
COPY ${DEPENDENCY}/BOOT-INF/classes /app

ENV LANG C.UTF-8
ENV LANGUAGE zh_CN.UTF-8
ENV LC_ALL C.UTF-8
ENV TZ Asia/Shanghai

# 指定启动命令(注意要执行编码，否则日志是乱码)
ENTRYPOINT ["java","-Dfile.encoding=utf-8","-cp","app:app/lib/*","com.bolingcavalry.yolodemo.YoloDemoApplication"]

The console enters the directory where pom.xml is located, and executes the command mvn clean package -U, which is a common maven command, which will compile the source code and generate the file yolo-demo-1.0-SNAPSHOT.jar
Execute the following command to extract the content needed to make a docker image from the jar file:

mkdir -p target/dependency && (cd target/dependency; jar -xf ../*.jar)

Execute the following command to build the image:

docker build -t bolingcavalry/yolodemo:0.0.1 .

Build succeeded:

will@willMini yolo-demo % docker images        
REPOSITORY                  TAG       IMAGE ID       CREATED              SIZE
bolingcavalry/yolodemo      0.0.1     d0ef6e734b53   About a minute ago   2.99GB
bolingcavalry/opencv4.5.3   0.0.1     d1518ffa4699   6 days ago           2.01GB

At this moment, the SpringBoot application with complete object recognition capability has been developed. Remember the file path configurations in application.properties? We are going to download these files, there are two download methods, you can choose one
The first is to download from the official website, and download from the following three addresses:

YOLOv4 configuration file: https://raw.githubusercontent.com/AlexeyAB/darknet/master/cfg/yolov4.cfg
YOLOv4 weights: https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.weights
Category name: https://raw.githubusercontent.com/AlexeyAB/darknet/master/data/coco.names

The second is to download from csdn (no points required), I have packaged the above three files here: https://download.csdn.net/download/boling_cavalry/33229838
Either of the above two methods will eventually get three files: yolov4.cfg, yolov4.weights, coco.names, please put them in the same directory, I put them here: /home/will/temp/202110 /19/model
Create a new directory to store photos. The new directory I created here is: /home/will/temp/202110/19/images, be careful to ensure that this directory can be read and written
The final directory structure looks like this:

/home/will/temp/202110/19/
├── images
└── model
    ├── coco.names
    ├── yolov4.cfg
    └── yolov4.weights

With everything in place, execute the following command to run the service:

sudo docker run \
--rm \
--name yolodemo \
-p 8080:8080 \
-v /home/will/temp/202110/19/images:/app/images \
-v /home/will/temp/202110/19/model:/app/model \
bolingcavalry/yolodemo:0.0.1

After the service is running, the operation process and effects "Three Minutes: Extremely Fast Experience JAVA Version Target Detection (YOLO4)" , so I won't go into details.
At this point, the development and actual combat of the entire object recognition has been completed. The convenience of Java in engineering, combined with the excellent models in the field of deep learning, has added an alternative for us to solve the visual image problem. If you are a visionary Java programmers who are interested in images and images, I hope this article can give you some reference

You are not alone, Xinchen Original is with you all the way

Welcome to the public number: Programmer Xin Chen

Search "Programmer Xinchen" on WeChat, I am Xinchen, I look forward to traveling the Java world with you...
https://github.com/zq2599/blog_demos

Ultra-detailed coding practice, let your springboot application identify pedestrians, cars, dogs, and cats in the picture (JavaCV+YOLO4)

Welcome to my GitHub

Overview of this article

key technology

Version Information

Practical steps

Source code download

Create a new SpringBoot application

front page

Backend logic: initialization

Process upload files

object detection

Detecting Objects with Neural Networks

Process raw detection results

Draw the test results on the picture

Show results

Make the SpringBoot project into a docker image

You are not alone, Xinchen Original is with you all the way

Welcome to the public number: Programmer Xin Chen

程序员欣宸

引用和评论

quarkus依赖注入之十二：禁用类级别拦截器

云电竞巅峰对决：ToDesk/网易云/START实战测评，谁是真王者？

国产化环境下的 K8s 全离线部署：鲲鹏 + 麒麟 V10 + KubeSphere + Harbor

Higress 开源 Remote MCP Server 托管方案，并将上线 MCP 市场

助力资本与创新协同——2025年民营科技企业投贷融合赋能行动在蓉举行

实时云渲染：颠覆传统工作流的五大核心优势

ITSM流程落地经验之变更管理