头图

The deep learning reasoning framework OpenPPL has been open source. This article uses an image classification example to explain how to deploy a deep learning model from 0 to 1 to complete an AI reasoning application.

The final result: by uploading a photo of a cat (dogs can also be), identify the animals in the picture

background knowledge

OpenPPL is an inference engine based on a self-developed high-performance operator library. It provides multiple back-end deployment capabilities of AI models in a cloud native environment, and supports the efficient deployment of deep learning models such as OpenMMLab.

OpenPPL source link: https://github.com/openppl-public/ppl.nn

Install

1. Download PPLNN source code

git clone https://github.com/openppl-public/ppl.nn.git

2. Installation dependencies

The compilation dependencies of PPLNN are as follows:

  • GCC >= 4.9 or LLVM/Clang >= 6.0
  • CMake >= 3.14
  • Git >= 2.7.0

The image classification routine classification explained in this article also requires additional installation of OpenCV:

  • For apt package management system (such as: Ubuntu/Debian):
    sudo apt install libopencv-dev
  • For yum package management system (such as CentOS):
    sudo yum install opencv opencv-devel
  • Or install OpenCV from source
Note: When compiling, it will automatically detect whether OpenCV is installed. If it is not installed, the routines in this article will not be generated.

3. Compile

  • X86
    cd ppl.nn
    ./build.sh -DHPCC_USE_OPENMP=ON   # 不开启多线程的话,可以不加后面的-DHPCC_USE_OPENMP选项
  • CUDA
    cd ppl.nn
    ./build.sh -DHPCC_USE_CUDA=ON

After the compilation is complete, the image classification routine classification will be generated in the pplnn-build/samples/cpp/run_model/ directory, you can read the image and model files, and output the classification results.

For more compilation related descriptions, please refer to: building-from-source.md


Image classification routine explanation

The source code of the image classification routine is in samples/cpp/run_model/classification.cpp. This section will explain the main parts of it.

1. Image preprocessing

The data format read by OpenCV is BGR HWC uint8 format, and the input format required by ONNX model is RGB NCHW fp32, and the image data needs to be converted:

int32_t ImagePreprocess(const Mat& src_img, float* in_data) {
    const int32_t height = src_img.rows;
    const int32_t width = src_img.cols;
    const int32_t channels = src_img.channels();

    // 将颜色空间从 BGR/GRAY 转换到 RGB
    Mat rgb_img;
    if (channels == 3) {
        cvtColor(src_img, rgb_img, COLOR_BGR2RGB);
    } else if (channels == 1) {
        cvtColor(src_img, rgb_img, COLOR_GRAY2RGB);
    } else {
        fprintf(stderr, "unsupported channel num: %d\n", channels);
        return -1;
    }

    // 将 HWC 格式的三通道分开
    vector<Mat> rgb_channels(3);
    split(rgb_img, rgb_channels);

    // 这里构造 cv::Mat 时,直接用 in_data 为 cv::Mat 提供数据空间。这样当 cv::Mat 变化时,数据会直接写到 in_data 内
    Mat r_channel_fp32(height, width, CV_32FC1, in_data + 0 * height * width);
    Mat g_channel_fp32(height, width, CV_32FC1, in_data + 1 * height * width);
    Mat b_channel_fp32(height, width, CV_32FC1, in_data + 2 * height * width);
    vector<Mat> rgb_channels_fp32{r_channel_fp32, g_channel_fp32, b_channel_fp32};

    // 将 uint8 数据转换为 fp32,并减均值除标准差,y = (x - mean) / std
    const float mean[3] = {0, 0, 0}; // 根据数据集和训练参数调整均值和方差
    const float std[3] = {255.0f, 255.0f, 255.0f};
    for (uint32_t i = 0; i < rgb_channels.size(); ++i) {
        rgb_channels[i].convertTo(rgb_channels_fp32[i], CV_32FC1, 1.0f / std[i], -mean[i] / std[i]);
    }

    return 0;
}

2. Generate runtime builder from ONNX model

First, you need to create and register the engine you want to use. Each engine corresponds to an inference backend, which currently supports x86 and CUDA.

Create x86 engine:

    auto x86_engine = X86EngineFactory::Create();

Or cuda engine:

    auto cuda_engine = CudaEngineFactory::Create(CudaEngineOptions());

The following example uses only the x86 engine:

    // 注册所有想使用的 engine
    vector<unique_ptr<Engine>> engines;
    engines.emplace_back(unique_ptr<Engine>(x86_engine));

Then use the ONNXRuntimeBuilderFactory::Create() function to read the ONNX model and create a runtime builder based on the registered engine:

    vector<Engine*> engine_ptrs;
    engine_ptrs.emplace_back(engines[0].get());
    auto builder = unique_ptr<ONNXRuntimeBuilder>(
        ONNXRuntimeBuilderFactory::Create(ONNX_model_path, engine_ptrs.data(), engine_ptrs.size()));

Supplementary note: PPLNN framework level supports mixed reasoning of multiple heterogeneous devices. You can register a variety of different engines, the framework will automatically split the calculation graph into multiple subgraphs, and schedule different engines to perform calculations.

3. Create runtime

Use runtime_options to configure runtime options, for example, configure the mm_policy field to MM_LESS_MEMORY (memory saving mode):

    RuntimeOptions runtime_options;
    runtime_options.mm_policy = MM_LESS_MEMORY; // 使用省内存模式

Use the runtime builder generated in the previous step to create a runtime instance:

    unique_ptr<Runtime> runtime;
    runtime.reset(builder->CreateRuntime(runtime_options));

A runtime builder can create multiple runtime instances. These runtime instances will share constant data (weights, etc.) and network topology, thereby saving memory overhead.

4. Set the network input data

First get the runtime input tensor through the GetInputTensor() interface:

    auto input_tensor = runtime->GetInputTensor(0); // 分类网络仅有一个输入

Reshape input tensor, and reallocate the memory of tensor:

    const std::vector<int64_t> input_shape{1, channels, height, width};
    input_tensor->GetShape().Reshape(input_shape); // 即使 ONNX 模型里已经将输入尺寸固定,PPLNN 仍会动态调整输入尺寸
    auto status = input_tensor->ReallocBuffer();   // 当调用了 Reshape 后,必须调用此接口重新分配内存

Unlike ONNX Runtime, even if the input size is fixed in the ONNX model, PPLNN can still dynamically adjust the input size of the network (but you need to ensure that the input size is reasonable).

The data type in_data obtained by the above preprocessing is fp32, and the format is NDARRAY (4-dimensional data NDARRAY is equivalent to NCHW), which defines the format description of the user input data:

    TensorShape src_desc = input_tensor->GetShape();
    src_desc.SetDataType(DATATYPE_FLOAT32);
    src_desc.SetDataFormat(DATAFORMAT_NDARRAY); // 对于4维数据来说,NDARRAY 等同于 NCHW

Finally, the ConvertFromHost() interface is called to convert the data in_data into the format required by input_tensor to complete the data filling:

    status = input_tensor->ConvertFromHost(in_data, src_desc);

5. Model Reasoning

    status = runtime->Run(); // 执行网络推理

6. Get network output data

Get the runtime output tensor through the GetOutputTensor() interface:

    auto output_tensor = runtime->GetOutputTensor(0); // 分类网络仅有一个输出

Allocate data space to store network output:

    uint64_t output_size = output_tensor->GetShape().GetElementsExcludingPadding();
    std::vector<float> output_data_(output_size);
    float* output_data = output_data_.data();

Like input data, you need to define the desired output format description:

    TensorShape dst_desc = output_tensor->GetShape();
    dst_desc.SetDataType(DATATYPE_FLOAT32);
    dst_desc.SetDataFormat(DATAFORMAT_NDARRAY); // 对于1维数据而言,NDARRAY 等同于 vector

Call the ConvertToHost() interface to convert the output_tensor data into the format described by dst_desc to get the output data:

    status = output_tensor->ConvertToHost(output_data, dst_desc);

7. Parse the output

Analyze the score output by the network to obtain the classification result:

int32_t GetClassificationResult(const float* scores, const int32_t size) {
    vector<pair<float, int>> pairs(size);
    for (int32_t i = 0; i < size; i++) {
        pairs[i] = make_pair(scores[i], i);
    }

    auto cmp_func = [](const pair<float, int>& p0, const pair<float, int>& p1) -> bool {
        return p0.first > p1.first;
    };

    const int32_t top_k = 5;
    nth_element(pairs.begin(), pairs.begin() + top_k, pairs.end(), cmp_func); // get top K results & sort
    sort(pairs.begin(), pairs.begin() + top_k, cmp_func);

    printf("top %d results:\n", top_k);
    for (int32_t i = 0; i < top_k; ++i) {
        printf("%dth: %-10f %-10d %s\n", i + 1, pairs[i].first, pairs[i].second, imagenet_labels_tab[pairs[i].second]);
    }

    return 0;
}

run

1. Prepare ONNX model

We have prepared a classification model mnasnet0_5.onnx under tests/testdata, which can be used for testing.

More ONNX models can be obtained through the following methods:

The model opset version of ONNX Model Zoo is lower, you can use convert_onnx_opset_version.py under tools to convert the opset to 11:

    python convert_onnx_opset_version.py --input_model input_model.onnx --output_model output_model.onnx --output_opset 11

For details on converting opset, please refer to: onnx-model-opset-convert-guide.md

2. Prepare test pictures

Any format can be used for the test picture. We have prepared cat0.png (the headshot of our master cat) and cat1.jpg (ImageNet's verification set picture) under tests/testdata:

Pictures of any size can run normally. If you want to resize to 224 x 224, you can modify the following variables in the program:

    const bool resize_input = false; // 想要resize的话,修改为true即可

3. Run

    pplnn-build/samples/cpp/run_model/classification <image_file> <onnx_model_file>

After the inference is complete, you will get the following output:

image preprocess succeed!
[INFO][2021-07-23 17:29:31.341][simple_graph_partitioner.cc:107] total partition(s) of graph[torch-jit-export]: 1.
successfully create runtime builder!
successfully build runtime!
successfully set input data to tensor [input]!
successfully run network!
successfully get outputs!
top 5 results:
1th: 3.416199   284        n02123597 Siamese cat, Siamese
2th: 3.049764   285        n02124075 Egyptian cat
3th: 2.989676   606        n03584829 iron, smoothing iron
4th: 2.812310   283        n02123394 Persian cat
5th: 2.796991   749        n04033901 quill, quill pen

is not difficult to see, this program correctly judged that my cat

So far, the installation of OpenPPL and the reasoning of the image classification model have been completed

In addition, there is an executable file pplnn in the pplnn-build/tools directory, which can perform arbitrary model inference, dump output data, benchmark and other operations.

The specific usage can be viewed with the --help option. You can make changes based on this example to become more familiar with the usage of OpenPPL.

Exchange QQ group: 627853444, enter the group secret order OpenPPL

Aceyclee
16 声望13 粉丝

世间那些不可思议的事情都是默默地进行的,喧哗者不真诚。