Talk about how to use opencv for image recognition

The original text was published by hakaboom in the TesterHome community. Click the link to the original text to communicate directly with the author.

1 Introduction

Since 2018, I have been in contact with Chacha Assistant (the platform has been invited to drink tea), and through color recognition, I used to write hang scripts for frequently played games, and I wrote them for two or three years. justification for the test.
In November last year, I also used this technology to get into outsourcing, the salary is not bad, I am a mess, the position is publishing, I have been exposed to many games, because I can't get poco, I only have apk in hand,
Over time, there are more and more games, but the project team is still the only one. In order to reduce their own pressure, they started the no-return road of UI automation.

2) Game UI Automation

Because the game engine cannot be obtained through frameworks such as appium, if some SDKs are not connected, the only method of identification is image recognition. Now the common open source framework

NetEase's Airtest is automated through traditional recognition, and airtestIDE can easily and quickly write airtest code
Tencent GameAISDK, automation through deep learning (not used, not maintained for a long time)
Ali's SoloPi, the main function is recording, group control, with image matching assistance

Common methods for image correlation:

Traditional recognition methods: feature points, templates, contours
- Feature points: SIFT, ORB
  - will be detailed below
- Template matching: opencv's matchTemplate
  - The simplest solution is to find the most suitable target by panning the template in the target image
- Contour: HALCON Shape-based Matching, Canny
  - I haven't used it, I can't write it, the halcon will cost money
Methods based on deep learning:
- Text recognition: PaddleOCR, tesseract
  - paddleOCR basically works out of the box, but for in-game WordArt, additional training is required
- Image Category: paddleClas
  - I haven't actually used it, I feel that it can be used to distinguish scenes, and then do more detailed identification. For example, identify pop-up windows
- Object detection: yolo
  - The Fps plug-in that was very popular before, basically relies on this to identify the human body

The core of UI automation is to find elements, and where. Then the focus will be on image recognition.
The scheme based on deep learning requires a lot of positive and negative samples and labeling work, so it can only be abandoned. Instead, the traditional recognition scheme is used.
In the community and QQ test groups, it can be found that most people's impression of traditional image recognition is: slow and inaccurate.
Before the Chinese New Year this year, I went to Zhangjiang to interview with a game company, which is also a publishing company. After chatting for more than an hour, their plan was to take a screenshot of a model of airtest for adaptation. I was greatly shocked.
Summarize the UI automation difficulties of image recognition:

Slow to recognize
Inaccurate recognition results
Multi-resolution incompatibility
Game UI update, cost of managing image library

3) How to solve

So what I did, the project is here: https://github.com/hakaboom/py_image_registration
It is also refactoring at the moment, and it may have a good name after the refactoring is complete: https://github.com/hakaboom/image_registration

At the beginning, I referred to the aircv part of airtest. At that time, I didn't want to have so many dependencies, so I took it apart.
After the refactoring, through the encapsulation of some opencv APIs, the framework and algorithms have been reorganized. At present, the effect feels good, and pr has been given to airtest, and the merger will be promoted in the future.

Install opencv-python

The suggested version can be 4.5.5

There are compiled ones on pypi, but only the cpu method can be used:
- pip install opencv-python
- pip install opencv-contrib-python
Compile from source code, you can customize more things, such as adding cuda support
- First clone the code from the opencv repository
- See the rest here https://github.com/hakaboom/py_image_registration/blob/master/doc/cuda_opencv.md

What is a feature point

Simple understanding: key points used to describe image features

Common feature point extraction algorithms:

SIFT: Scale-invariant feature transformation. opencv only has cpu implementation
SURF: Acceleration algorithm for surf. Opencv has cpu and cuda implementation
ORB: use FAST feature detection and BRIEF feature descriptor. opencv has cpu and cuda implementation

What are their benefits: scale and rotation invariance, to put it bluntly, is compatible with the transformation speed sorting of different resolutions, rotations, and scales: ORB(cuda)>SURF(cuda)>ORB>SURF>SIFT
Effect sorting (the effect is not only the number of feature points, but also the quality of the feature points): SIFT>ORB>SURF

example

6.png (2532x1170) screenshot on iphone12pro
4.png (1922x1118, the actual game rendering is 1920x1080, and the extra is a windows border) screenshot of the desktop side of the crash three, and the blue plus sign area in the upper right corner is cropped as a template

 import cv2
import time
from baseImage import Image, Rect
from image_registration.matching import SIFT

match = SIFT()
im_source = Image('tests/image/6.png')
im_search = Image('tests/image/4.png').crop(Rect(1498,68,50,56))

start = time.time()
result = match.find_all_results(im_source, im_search)
print(time.time() - start)
print(result)
img = im_source.clone()
for _ in result:
    img.rectangle(rect=_['rect'], color=(0, 0, 255), thickness=3)
img.imshow('ret')
cv2.waitKey(0)

As a result you can get the position of the three plus signs

 [
    {'rect': <Rect [Point(1972.0, 33.0), Size[56.0, 58.0]], 'confidence': 0.9045119285583496}, 
    {'rect': <Rect [Point(2331.0, 29.0), Size[52.0, 66.0]], 'confidence': 0.9046278297901154}, 
    {'rect': <Rect [Point(1617.0, 30.0), Size[51.0, 64.0]], 'confidence': 0.9304171204566956}
]

how to match

What does aircv from Airtest do

https://github.com/AirtestProject/Airtest/blob/d41737944738e651dd29564c29b88cc4c2e71e2e/airtest/aircv/keypoint_base.py#L133
1. Get feature points
2. Match feature points

 def match_keypoints(self, des_sch, des_src):
    """Match descriptors (特征值匹配)."""
    # 匹配两个图片中的特征点集，k=2表示每个特征点取出2个最匹配的对应点:
    return self.matcher.knnMatch(des_sch, des_src, k=2)

We can see that here k=2 represents a feature point on a template to match the feature points of the two target images
3. Screen feature points

 good = []
for m, n in matches:
   if m.distance < self.FILTER_RATIO * n.distance:
       good.append(m)

Filter results by calculating the distance difference between two descriptors

4. According to perspective transformation or coordinate calculation, get the rectangle, and then calculate the confidence

So what is the problem with the above steps?

In the second step, assuming that there are n target images in the picture, then k=2 will lead to fewer feature points.
In the third step, the screening method is not very reasonable. In the actual debug, it will be found that even if some feature points distance high values, they still meet the target from the results, which means that the The method of screening feature points by distance is unreliable
In the fourth step, after obtaining the feature points, the airtest method is to obtain the coordinates of the four vertices of the target according to the perspective transformation, and calculate the minimum circumscribed rectangle.
Then, if the target image is rotated/deformed, the final obtained image will be cropped to the redundant target, resulting in reduced confidence.

Since there are these problems in airtest, what changes have I made? I split the steps one by one

my feature points match

1. Read the picture

 from baseImage import Image
im_source = Image('tests/image/6.png')

Another library of mine is used here https://github.com/hakaboom/base_image
The main use is to convert the format and type of opencv image data, as well as some interface packaging

Use the place parameter to modify the data format
- Ndarray: The format is numpy.ndarray format
- Mat: basically the same as numpy
- Umat: There are not many bindings for python, it is not as flexible as ndarray, and it can be used for opencl acceleration
- GpuMat: cuda format of opencv, need to pay attention to the memory consumption

 from baseImage import Image
from baseImage.constant import Place
    
Image(data='tests/image/0.png', place=Place.Ndarray)  # 使用numpy
Image(data='tests/image/0.png', place=Place.Mat)  # 使用Mat
Image(data='tests/image/0.png', place=Place.UMat)  # 使用Umat
Image(data='tests/image/0.png', place=Place.GpuMat)  # 使用cuda

2. There will be some parameters in the creation of the feature point detection class. In addition to the threshold (filtering threshold) and rgb (whether it is detected through the rgb channel), there are also some configurations that can be added to the feature point extractor. Generally, the default is fine. Specifically, you can Check opencv documentation

 from image_registration.matching import SIFT

match = SIFT(threshold=0.8, rgb=True, nfeatures=50000)

3. Identify

 from image_registration.matching import SIFT
from baseImage import Image, Rect


im_source = Image('tests/image/6.png')
im_search = Image('tests/image/4.png').crop(Rect(1498,68,50,56))

match = SIFT(threshold=0.8, rgb=True, nfeatures=50000)
result = match.find_all_results(im_source, im_search)

4. Analyze what is done in find_all_results image_registration.matching.keypoint.base , you can find the base class in ---a3897ea905394f460e77a7c45c27a426---

Step 1: Create a feature point extractor BaseKeypoint.create_matcher
Example: image_registration.matching.keypoint.sift

 def create_detector(self, **kwargs) -> cv2.SIFT:
    nfeatures = kwargs.get('nfeatures', 0)
    nOctaveLayers = kwargs.get('nOctaveLayers', 3)
    contrastThreshold = kwargs.get('contrastThreshold', 0.04)
    edgeThreshold = kwargs.get('edgeThreshold', 10)
    sigma = kwargs.get('sigma', 1.6)
    
    detector = cv2.SIFT_create(nfeatures=nfeatures, nOctaveLayers=nOctaveLayers, contrastThreshold=contrastThreshold,
                                edgeThreshold=edgeThreshold, sigma=sigma)
    return detector

Step 2: Create a feature point matcher BaseKeypoint.create_detector There are two kinds of matchers for matching the feature points of the template and the target image,
- BFMatcher : brute force match, always try all possible matches
- FlannBasedMatcher : The algorithm is faster, but can also find the nearest neighbor match
Step 3: Extract feature points BaseKeypoint.get_keypoint_and_descriptor
Use the extractor created in the first step to obtain feature points. For ORB, additional descriptors are needed. It depends on the code implementation.
Step 4: Match the feature points Use the matcher created in the second step to obtain the feature point set
Step 5: Screen feature points BaseKeypoint.filter_good_point
- cv2.DMatch opencv's matching keypoint descriptor class
  - distance : The distance between two descriptors (Euclidean distance, etc.), the smaller the distance, the higher the matching degree
  - imgIdx : training image index
  - queryIdx : query descriptor index (corresponding to template image)
  - trainIdx : index of training descriptor (corresponding to target image)
- cv2.Keypoint feature point class of opencv
  - angle : The rotation direction of the feature point (0~360)
  - class_id : Cluster ID of feature point
  - octave : Feature points at the level of the image pyramid
  - pt : the coordinates of the feature point (x,y)
  - response : Response intensity of feature points
  - size : After knowing the diameter of the feature points, we can filter the feature point set obtained in the fourth step
- Step 1: Reorganize the list according to the index of queryIdx, the main purpose is to make the feature points of a template only correspond to the feature points of one target
- Step 2: Sort the feature point set according to the ascending order of distance, and extract the first point, which is the current point set, distance the point with the smallest value, which is 待匹配点A
- 步骤3. 获取点待匹配点A queryIdx trainIdx的keypoint( query_keypoint , train_keypoint ,通过两A feature point angle can be calculated, the rotation direction of the feature point
- Step 4. Calculate the angle between train_keypoint and other feature points. According to the rotation invariance, we can calculate the angle between query_keypoint on the template,
  To filter the angle of train_keypoint
- Step 5. Calculation takes query_keypoint as the origin, the rotation angle of other feature points, or according to the rotation invariance, we can filter to train_keypoint the origin, the rotation angle of other features
- Finally, we can get all matching points, image rotation angles, and reference points ( 待匹配点A )

5. After filtering the point set, it can be matched. There will be several situations here BaseKeypoint.extract_good_points

There is no feature point, in fact, there must be a feature point

There is 1 set of feature points BaseKeypoint._handle_one_good_points

 - 根据两个特征点的```size```大小,获取尺度的变换
- 根据步骤4中返回的旋转角度,获取变换后的矩形顶点
- 通过透视变换,获取目标图像区域,与目标图像进行模板匹配,计算置信度

There are 2 sets of feature points BaseKeypoint._handle_two_good_points

 - 计算两组特征点的两点之间距离,获取尺度的变换
- 根据步骤4中返回的旋转角度,获取变换后的矩形顶点
- 通过透视变换,获取目标图像区域,与目标图像进行模板匹配,计算置信度

There are 3 sets of feature points BaseKeypoint._handle_three_good_points

 - 根据三个特征点组成的三角形面积,获取尺度的变换
- 根据步骤4中返回的旋转角度,获取变换后的矩形顶点
- 通过透视变换,获取目标图像区域,与目标图像进行模板匹配,计算置信度

There are more than or equal to 4 sets of feature points BaseKeypoint._handle_many_good_points

 - 使用单矩阵映射```BaseKeypoint._find_homography```,获取变换后的矩形顶点
- 通过透视变换,获取目标图像区域,与目标图像进行模板匹配,计算置信度

6. Delete feature points After the matching is completed, if the recognition is successful, delete the feature points of the target area, and then enter the next cycle

4) Benchmarking

Equipment environment:

i7-9700k 3.6GHz
Nvidia RTX 3080Ti
cuda version 11.3
opencv version: 4.5.5-dev (compiled from source)

Test content : Loop 50 times to obtain the feature points of the target image and the template image.

Note: There is no feature point screening, and the feature point method does not perform template matching to calculate confidence, so the actual speed will be slower than the test speed

It can be seen from the figure that the cuda method is the fastest, and the CPU usage is also small, because this part of the computing power is given to cuda

Because there is no code to get the cuda usage rate, I can only say an approximate number when I look at the task manager here.

cuda_orb: cuda occupies around 35%~40%
cuda_tpl: cuda occupies around 15%~20%
opencl_surf: cuda occupies about 13%
opencl_akaze: cuda occupies around 10%~15%

There are other algorithms, opencv does not provide cuda or opencl implementation, and can only be accelerated by cpu

5) How to optimize the speed

One of the reasons why airtest is slow is that it only uses cpu for computing. If the computing power can be released to gpu, the speed will be doubled.

Opencv has already prepared a lot of interfaces for us. We can call the algorithms of cuda and opencl through cv2.cuda.GpuMat , cv2.UMat .

Through baseImage you can quickly create images in the corresponding format

 from baseImage import Image
from baseImage.constant import Place
      
Image('tests/images/1.png', place=Place.GpuMat) 
Image('tests/images/1.png', place=Place.UMat)

The recognition method that can be accelerated by cuda needs to call other class functions, and the image format needs to be cv2.cuda.GpuMat

surf: I didn't write it, I'll add it next time
orb: corresponding function image_registration.matching.keypoint.orb.CUDA_ORB
matchTemplate image_registration.matching.template.matchTemplate.CudaMatchTemplate

The recognition method that can be accelerated by opencl, when you only need to pass the image parameters, the format is UMat , opencv will automatically call the opencl method

surf
orb
matchTemplate

Only the method of feature point acquisition/template matching is mentioned here. In other image processing functions cuda and opencl can also have a certain speedup, but it is not as obvious as the above methods.

Accelerate from the frame design. (May be limited to game applications, traditional apps cannot be used)
In terms of games, we know in advance the coordinates of some controls on the screen. When the resolution is converted, we can calculate the location of the controls, crop the image at the corresponding location, and quickly identify them through template matching.
- For example, the following two pictures, one is a screenshot at 1280x720, and the other is a screenshot at 2532x1170
- The coordinate range of the mail control under 1280x720 is Rect(372,69,537,583)
- Through the following calculation method, we can get that under 2532x1170, the range is Rect(828,110,874,949) , and the range obtained by cutting software is Rect(830,112,874,948)
- The specific principle is to use the engine's scaling and anchor point principles to find the coordinate range in reverse. To adapt to some black borders and bangs.
- After finding the range, crop the image of the range and match it with the template, you can quickly identify some fixed-position controls

 from baseImage import Rect
from baseImage.coordinate import Anchor, screen_display_type, scale_mode_type


anchor = Anchor(
    dev=screen_display_type(width=1280, height=720),
    cur=screen_display_type(width=2532, height=1170, top=0, bottom=0, left=84, right=84),
    orientation=1, mainPoint_scale_mode=scale_mode_type(), appurtenant_scale_mode=scale_mode_type()
)

rect = Rect(371, 68, 538, 584)
point = anchor.point(rect.x, rect.y, anchor_mode='Middle')
size = anchor.size(rect.width, rect.height)
print(Rect.create_by_point_size(point, size))
# <Rect [Point(828.9, 110.5), Size[874.2, 949.0]]

Create a template library, pre-load the template, get the screen image, identify and classify the scene through some similarity calculation baseImage.utils.ssim , and then identify the feature points of the corresponding scene. Use this method to reduce the amount of calculation
- There are actually ideas to extend to deep learning, such as the image classification mentioned earlier. First, we have established a large template library, which can be split out 界面1 , 界面2 , 界面3 and some 通用控件
- Then through the classification to get what interface is currently in, and then only identify the controls of this interface to reduce the amount of calculation

6) Remarks

If you have any other questions, you can find me 581529846 in the game test QQ group of testerhome

The original text was published by hakaboom in the TesterHome community. Click the link to the original text to communicate directly with the author.

Today's knowledge has been ingested~
Want to know more cutting-edge test development technologies: Welcome to "The 10th MTSC Conference Shanghai" >>>
1 main venue + 12 special sessions, big coffee gathers elites to gather
The 12 special sessions include:
Zhihu, OpenHarmony, open source, games, Kujiale, audio and video, client, server, digital economy, performance improvement, quality assurance, intelligent testing

Talk about how to use opencv for image recognition

1 Introduction

2) Game UI Automation

3) How to solve

Install opencv-python

What is a feature point

example

how to match

What does aircv from Airtest do

my feature points match

4) Benchmarking

5) How to optimize the speed

6) Remarks

TesterHome

引用和评论

游戏测试 | 游戏工具：做一个可以即时修改卡牌属性的工具方便测试

推荐一款AI生图生视频的工具，搞副业必备

【直播预告】“大模型加速器2.0”版本即将开箱！破解AI“幻觉”难题

OpenBayes 教程上新丨YOLO系列重要创新！清华团队发布YOLOE，直击开放场景物体实时检测与分割

智能扫描助力节碳超13万吨，合合信息旗下扫描全能王“指尖减碳”写就绿色诗篇

支持视频检测， YOLOv12 目标检测刷新速度、精度双记录

OBS绿幕直播，怎么设置快速切换场景