Abstract: This article interprets "Gaussian Bounding Boxes and Probabilistic Intersection-over-Union for Object Detection". This paper proposes a new Gaussian detection box (GBB) for target detection tasks and a new method of calculating target similarity (ProbIoU).

This article is shared from Huawei Cloud Community " Paper Interpretation Series 19: Gaussian Detection Frame and ProbIoU for Target Detection", author: BigDragon.
image.png

Paper address: https://arxiv.org/abs/2106.06072

Github address: https://github.com/ProbIOU

The improvement directions of existing target detection are mainly focused on: training larger data sets (LVIS dataset), dealing with imbalanced categories, proposing better backbones, establishing long-distance interaction models (Transformers, LambdaNetworks), and balancing classification and detection frames Analysis, there is less research on the presentation form of the detection frame. Existing target detection tasks are mainly horizontal frames ( HBB ) and rotating frames ( OBB ), and the presentation form is still rectangular or quasi-rectangular. The existing target distance and similarity calculation methods include: IoU (Intersection over Union), GIoU (Generalized IoU), DIoU (Distance IoU), PIoU (Pixel IoU), Gaussian Wasserstein Distance (GWD).

The existing OBB algorithm has improved the detection problems of slender and rotating objects compared with the HBB algorithm, but it does not fit well with the target semantic segmentation. Therefore, this paper proposes a segmentation presentation form that is more suitable for the semantic segmentation form and the corresponding target Similarity calculation method.

The contribution of the paper is as follows:

  • Propose a new oval target detection frame (Gaussian Bounding Boxes, GBB )

The shape of the semantic segmentation mask of GBB and the target is closer, and it is more suitable for non-rectangular targets, and the detection effect of non-rectangular targets is better than HBB and OBB.

  • A new calculation method of target similarity is proposed (Probabilistic IoU, ProbIoU )

ProbIoU based on Hellinger Distance takes into account the characteristics of 2D Gaussian distribution, satisfies all distance metrics, can represent the true distance between different distributions, and is differentiable everywhere, which can improve the detection effect of OBB and HBB targets.
image.png

1.Gaussian Bounding Boxes (GBB)

In order to determine a two-dimensional Gaussian distribution in a two-dimensional area, it is necessary to calculate its mean μ and covariance matrix Σ, where μ is (x0, y0) T, and the covariance matrix Σ can be calculated by the following formula. In the target detection task, you can directly set (x0, y0, a, b, c) as the parameters in the regression task in the target detection, or express the parameters in the regression task as (x0, y0, a', b ' , Θ ), and the latter form is more in line with the output form of the existing rotation detection frame.
image.png

Hypothesis

The following assumptions are followed in the conversion of horizontal box and rotating box to Gaussian box: the target area is a 2-dimensional binary area Ω, and Ω conforms to a uniform probability distribution, then the mean μ and covariance matrix ∑ of the distribution can be calculated by the following formula.
image.png

Among them, N represents the area of the region Ω.

1.1 Convert HBB to GBB

For HBB, its binary region Ω is a rectangular region centered on (x0, y0), height is H, and width is W, so μ is (x0, y0), and its covariance matrix Σ can be calculated by the following formula
image.png

Therefore, we can get a=w²/12, b=H²/12, c=0. As shown in the above formula, the converted Gaussian box can also be converted into a horizontal box, and the process is reversible.

1.2 Convert OBB to GBB

The conversion of OBB to GBB needs to calculate (a', b', θ), as shown in the figure below, the variances a 'and b' can be calculated by converting the rotating box into a horizontal box, and the covariance matrix can be calculated by the following formula.
image.png
image.png

1.3 Convert Polygon Box (PBB) to GBB

The polygon box is converted into a Gauss box, which can be calculated according to the following formula:
image.png

2. ProbIoU and positioning loss function

2.1 ProbIoU

Bhattacharyya Distance (BD)

In order to calculate the similarity between different GBBs, this paper first uses Bhattacharyya Coefficient (BC); the BC between the two probability density functions p(x) and q(x) is calculated according to the following formula:
image.png

Where BC (p,q) ∈ [0,1], if and only if the two distributions are the same, BC (p,q)=1.

Based on the above BC (p, q ), the Bhattacharyya Distance (BD) between different distributions can be obtained. The BD between the two probability density functions p(x) and q(x) is calculated according to the following formula:
image.png

When p ~N (μ1, Σ1), q~N (μ2, Σ2) and the actual problem in target detection is a 2-dimensional vector and matrix, the Bhattacharyya distance BD can be calculated by the following formula:
image.png
image.png
image.png
image.png

Hellinger Distance (HD)

Since Bhattacharyya Distance does not satisfy the triangle inequality, it is not the true distance. Therefore, to express the true distance, Hellinger Distance (HD) . The formula is as follows:
image.png

Where HD (p,q) ∈ [0,1], if and only if the two distributions are the same, HD(p,q)=0.

Probabilistic IoU (ProbIoU)

Based on the above Hellinger Distance, this paper proposes the Gaussian distribution similarity calculation method ProbIoU, the specific calculation formula is as follows:
image.png

2.2 Positioning loss function

Assuming that the predicted GBB is p= (x1, y1, a1, b1, c1) and the real GBB is p= (x2, y2, a2, b2, c2 ), the loss function is as follows:
image.png
image.png

However, when the predicted GBB is far away from the real GBB, the value of the L1 loss function is close to 1, the training process produces a small gradient and slow convergence. The L2 loss function avoids the above problems, but the geometric relationship with IoU is weak. Therefore, it is recommended to use the L2 loss function for training first, and then switch to the L1 loss function.

2.3 Features of ProbIoU

ProbIoU based on Hellinger Distance has the following features:

  • All parameters in the three functions are differentiable;
  • Helinger Distance meets all distance metrics;
  • The loss function is invariant to the scaling of the object.

3. Experimental results

3.1 Experimental results of different detection frames

After training on COCO2017, by comparing the IoU detected by GBB, OBB, and HBB, the following conclusions can be drawn:

  • The average IoU of GBB in the 77 categories in COCO 2017 is higher than that of HBB and OBB
  • GBB is worse than HBB and OBB in the three categories of traffic light, microwave and tv

3.2 The improvement of ProbIoU loss for HBB and OBB detection

The loss function based on ProbIoU was used in the HBB detection task, and EfficientDet D0 and SSD 300 were used for training on the PASCAL-VOC 2007 data set, respectively. As shown in the following table, compared to IoU, the ProbIoU method has improved AP and AP75, and the model using the loss function based on ProbIoU can reach a higher AP.
image.png

The loss function based on ProbIoU is used in OBB detection tasks, using R-50 Retinanet and R-50 R3Det respectively, and training on the DOTA v1 and HRSC2016 data sets. As shown in the following table, in the DOTA V1 data set, when the Retinanet model is used, the loss function AP based on ProbIoU is 2% higher than that of GWP-ret; when the R3Det model is used, the results are close to GWD-rep and GWD-ret. On the HRSC2016 data set, the loss function result based on ProbIoU is equivalent to GWD-rep and better than GWD-ret.
image.png
image.png

4. Summary

The method presented in this article contains the following three important parts:

  • Using Gaussian distribution form detection frame (GBB)
  • ProbIoU based on Hellinger Distance is proposed, and the corresponding loss function L1, L2 is proposed
  • In the training process, the combination of L1 and L2 loss functions is better

The limitations of the method presented in this article include the following two parts:

  • For the isometric Gaussian distribution, the rotation angle cannot be determined
  • For slender targets, the gradient is likely to be too large during the training process, resulting in unstable training.

If you want to know more about the dry goods of AI technology, welcome to the AI area of HUAWEI CLOUD. There are currently for everyone to learn for free

Click to follow, and learn about Huawei Cloud's fresh technology for the first time~


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量