From 0 to 1, use OpenPPL to implement an AI reasoning application

The deep learning reasoning framework OpenPPL has been open source. This article uses an image classification example to explain how to deploy a deep learning model from 0 to 1 to complete an AI reasoning application.

The final result: by uploading a photo of a cat (dogs can also be), identify the animals in the picture

background knowledge

OpenPPL is an inference engine based on a self-developed high-performance operator library. It provides multiple back-end deployment capabilities of AI models in a cloud native environment, and supports the efficient deployment of deep learning models such as OpenMMLab.

OpenPPL source link: https://github.com/openppl-public/ppl.nn

Install

1. Download PPLNN source code

git clone https://github.com/openppl-public/ppl.nn.git

2. Installation dependencies

The compilation dependencies of PPLNN are as follows:

GCC >= 4.9 or LLVM/Clang >= 6.0
CMake >= 3.14
Git >= 2.7.0

The image classification routine classification explained in this article also requires additional installation of OpenCV:

For apt package management system (such as: Ubuntu/Debian):

    sudo apt install libopencv-dev

For yum package management system (such as CentOS):

    sudo yum install opencv opencv-devel

Or install OpenCV from source

Note: When compiling, it will automatically detect whether OpenCV is installed. If it is not installed, the routines in this article will not be generated.

3. Compile

    cd ppl.nn
    ./build.sh -DHPCC_USE_OPENMP=ON   # 不开启多线程的话，可以不加后面的-DHPCC_USE_OPENMP选项

CUDA

    cd ppl.nn
    ./build.sh -DHPCC_USE_CUDA=ON

After the compilation is complete, the image classification routine classification will be generated in the pplnn-build/samples/cpp/run_model/ directory, you can read the image and model files, and output the classification results.

For more compilation related descriptions, please refer to: building-from-source.md

Image classification routine explanation

The source code of the image classification routine is in samples/cpp/run_model/classification.cpp. This section will explain the main parts of it.

1. Image preprocessing

The data format read by OpenCV is BGR HWC uint8 format, and the input format required by ONNX model is RGB NCHW fp32, and the image data needs to be converted:

int32_t ImagePreprocess(const Mat& src_img, float* in_data) {
    const int32_t height = src_img.rows;
    const int32_t width = src_img.cols;
    const int32_t channels = src_img.channels();

    // 将颜色空间从 BGR/GRAY 转换到 RGB
    Mat rgb_img;
    if (channels == 3) {
        cvtColor(src_img, rgb_img, COLOR_BGR2RGB);
    } else if (channels == 1) {
        cvtColor(src_img, rgb_img, COLOR_GRAY2RGB);
    } else {
        fprintf(stderr, "unsupported channel num: %d\n", channels);
        return -1;
    }

    // 将 HWC 格式的三通道分开
    vector<Mat> rgb_channels(3);
    split(rgb_img, rgb_channels);

    // 这里构造 cv::Mat 时，直接用 in_data 为 cv::Mat 提供数据空间。这样当 cv::Mat 变化时，数据会直接写到 in_data 内
    Mat r_channel_fp32(height, width, CV_32FC1, in_data + 0 * height * width);
    Mat g_channel_fp32(height, width, CV_32FC1, in_data + 1 * height * width);
    Mat b_channel_fp32(height, width, CV_32FC1, in_data + 2 * height * width);
    vector<Mat> rgb_channels_fp32{r_channel_fp32, g_channel_fp32, b_channel_fp32};

    // 将 uint8 数据转换为 fp32，并减均值除标准差，y = (x - mean) / std
    const float mean[3] = {0, 0, 0}; // 根据数据集和训练参数调整均值和方差
    const float std[3] = {255.0f, 255.0f, 255.0f};
    for (uint32_t i = 0; i < rgb_channels.size(); ++i) {
        rgb_channels[i].convertTo(rgb_channels_fp32[i], CV_32FC1, 1.0f / std[i], -mean[i] / std[i]);
    }

    return 0;
}

2. Generate runtime builder from ONNX model

First, you need to create and register the engine you want to use. Each engine corresponds to an inference backend, which currently supports x86 and CUDA.

Create x86 engine:

    auto x86_engine = X86EngineFactory::Create();

Or cuda engine:

    auto cuda_engine = CudaEngineFactory::Create(CudaEngineOptions());

The following example uses only the x86 engine:

    // 注册所有想使用的 engine
    vector<unique_ptr<Engine>> engines;
    engines.emplace_back(unique_ptr<Engine>(x86_engine));

Then use the ONNXRuntimeBuilderFactory::Create() function to read the ONNX model and create a runtime builder based on the registered engine:

    vector<Engine*> engine_ptrs;
    engine_ptrs.emplace_back(engines[0].get());
    auto builder = unique_ptr<ONNXRuntimeBuilder>(
        ONNXRuntimeBuilderFactory::Create(ONNX_model_path, engine_ptrs.data(), engine_ptrs.size()));

Supplementary note: PPLNN framework level supports mixed reasoning of multiple heterogeneous devices. You can register a variety of different engines, the framework will automatically split the calculation graph into multiple subgraphs, and schedule different engines to perform calculations.

3. Create runtime

Use runtime_options to configure runtime options, for example, configure the mm_policy field to MM_LESS_MEMORY (memory saving mode):

    RuntimeOptions runtime_options;
    runtime_options.mm_policy = MM_LESS_MEMORY; // 使用省内存模式

Use the runtime builder generated in the previous step to create a runtime instance:

    unique_ptr<Runtime> runtime;
    runtime.reset(builder->CreateRuntime(runtime_options));

A runtime builder can create multiple runtime instances. These runtime instances will share constant data (weights, etc.) and network topology, thereby saving memory overhead.

4. Set the network input data

First get the runtime input tensor through the GetInputTensor() interface:

    auto input_tensor = runtime->GetInputTensor(0); // 分类网络仅有一个输入

Reshape input tensor, and reallocate the memory of tensor:

    const std::vector<int64_t> input_shape{1, channels, height, width};
    input_tensor->GetShape().Reshape(input_shape); // 即使 ONNX 模型里已经将输入尺寸固定，PPLNN 仍会动态调整输入尺寸
    auto status = input_tensor->ReallocBuffer();   // 当调用了 Reshape 后，必须调用此接口重新分配内存

Unlike ONNX Runtime, even if the input size is fixed in the ONNX model, PPLNN can still dynamically adjust the input size of the network (but you need to ensure that the input size is reasonable).

The data type in_data obtained by the above preprocessing is fp32, and the format is NDARRAY (4-dimensional data NDARRAY is equivalent to NCHW), which defines the format description of the user input data:

    TensorShape src_desc = input_tensor->GetShape();
    src_desc.SetDataType(DATATYPE_FLOAT32);
    src_desc.SetDataFormat(DATAFORMAT_NDARRAY); // 对于4维数据来说，NDARRAY 等同于 NCHW

Finally, the ConvertFromHost() interface is called to convert the data in_data into the format required by input_tensor to complete the data filling:

    status = input_tensor->ConvertFromHost(in_data, src_desc);

5. Model Reasoning

    status = runtime->Run(); // 执行网络推理

6. Get network output data

Get the runtime output tensor through the GetOutputTensor() interface:

    auto output_tensor = runtime->GetOutputTensor(0); // 分类网络仅有一个输出

Allocate data space to store network output:

    uint64_t output_size = output_tensor->GetShape().GetElementsExcludingPadding();
    std::vector<float> output_data_(output_size);
    float* output_data = output_data_.data();

Like input data, you need to define the desired output format description:

    TensorShape dst_desc = output_tensor->GetShape();
    dst_desc.SetDataType(DATATYPE_FLOAT32);
    dst_desc.SetDataFormat(DATAFORMAT_NDARRAY); // 对于1维数据而言，NDARRAY 等同于 vector

Call the ConvertToHost() interface to convert the output_tensor data into the format described by dst_desc to get the output data:

    status = output_tensor->ConvertToHost(output_data, dst_desc);

7. Parse the output

Analyze the score output by the network to obtain the classification result:

int32_t GetClassificationResult(const float* scores, const int32_t size) {
    vector<pair<float, int>> pairs(size);
    for (int32_t i = 0; i < size; i++) {
        pairs[i] = make_pair(scores[i], i);
    }

    auto cmp_func = [](const pair<float, int>& p0, const pair<float, int>& p1) -> bool {
        return p0.first > p1.first;
    };

    const int32_t top_k = 5;
    nth_element(pairs.begin(), pairs.begin() + top_k, pairs.end(), cmp_func); // get top K results & sort
    sort(pairs.begin(), pairs.begin() + top_k, cmp_func);

    printf("top %d results:\n", top_k);
    for (int32_t i = 0; i < top_k; ++i) {
        printf("%dth: %-10f %-10d %s\n", i + 1, pairs[i].first, pairs[i].second, imagenet_labels_tab[pairs[i].second]);
    }

    return 0;
}

run

1. Prepare ONNX model

We have prepared a classification model mnasnet0_5.onnx under tests/testdata, which can be used for testing.

More ONNX models can be obtained through the following methods:

You can export ONNX models from OpenMMLab/PyTorch: model-convert-guide.md
Get the model from ONNX Model Zoo: https://github.com/onnx/models

The model opset version of ONNX Model Zoo is lower, you can use convert_onnx_opset_version.py under tools to convert the opset to 11:

    python convert_onnx_opset_version.py --input_model input_model.onnx --output_model output_model.onnx --output_opset 11

For details on converting opset, please refer to: onnx-model-opset-convert-guide.md

2. Prepare test pictures

Any format can be used for the test picture. We have prepared cat0.png (the headshot of our master cat) and cat1.jpg (ImageNet's verification set picture) under tests/testdata:

Pictures of any size can run normally. If you want to resize to 224 x 224, you can modify the following variables in the program:

    const bool resize_input = false; // 想要resize的话，修改为true即可

3. Run

    pplnn-build/samples/cpp/run_model/classification <image_file> <onnx_model_file>

After the inference is complete, you will get the following output:

image preprocess succeed!
[INFO][2021-07-23 17:29:31.341][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export]: 1.
successfully create runtime builder!
successfully build runtime!
successfully set input data to tensor [input]!
successfully run network!
successfully get outputs!
top 5 results:
1th: 3.416199   284        n02123597 Siamese cat, Siamese
2th: 3.049764   285        n02124075 Egyptian cat
3th: 2.989676   606        n03584829 iron, smoothing iron
4th: 2.812310   283        n02123394 Persian cat
5th: 2.796991   749        n04033901 quill, quill pen

is not difficult to see, this program correctly judged that my cat

So far, the installation of OpenPPL and the reasoning of the image classification model have been completed

In addition, there is an executable file pplnn in the pplnn-build/tools directory, which can perform arbitrary model inference, dump output data, benchmark and other operations.

The specific usage can be viewed with the --help option. You can make changes based on this example to become more familiar with the usage of OpenPPL.

Exchange QQ group: 627853444, enter the group secret order OpenPPL

From 0 to 1, use OpenPPL to implement an AI reasoning application

background knowledge

Install

1. Download PPLNN source code

2. Installation dependencies

3. Compile

Image classification routine explanation

1. Image preprocessing

2. Generate runtime builder from ONNX model

3. Create runtime

4. Set the network input data

5. Model Reasoning

6. Get network output data

7. Parse the output

run

1. Prepare ONNX model

2. Prepare test pictures

3. Run

Aceyclee

引用和评论

模型部署优化的学习路线是什么？

LRU算法，你别跑，我就要吃透你

大模型中的Token究竟是什么？从原理到作用深度解析

Open WebUI：开源AI交互平台的全面解析

DeepSeek(私有化)+IDEA+Dify+微信搭建AI助手保姆级教程

一文掌握 MCP 上下文协议：从理论到实践

MySQL × 向量数据库：大模型时代的黄金组合实战指南

From 0 to 1, use OpenPPL to implement an AI reasoning application

background knowledge

Install

1. Download PPLNN source code

2. Installation dependencies

3. Compile

Image classification routine explanation

1. Image preprocessing

2. Generate runtime builder from ONNX model

3. Create runtime

4. Set the network input data

5. Model Reasoning

6. Get network output data

7. Parse the output

run

1. Prepare ONNX model

2. Prepare test pictures

3. Run

Aceyclee

引用和评论

模型部署优化的学习路线是什么？

LRU算法，你别跑，我就要吃透你

大模型中的Token究竟是什么？从原理到作用深度解析

Open WebUI：开源AI交互平台的全面解析

DeepSeek(私有化)+IDEA+Dify+微信 搭建AI助手保姆级教程

一文掌握 MCP 上下文协议：从理论到实践

MySQL × 向量数据库：大模型时代的黄金组合实战指南

DeepSeek(私有化)+IDEA+Dify+微信搭建AI助手保姆级教程