overview
CV (Computer Vision) is relatively successful in real-world applications, such as face recognition in daily life, license plate recognition, fingerprint comparison, electronic image stabilization, pedestrian, vehicle tracking, and so on. So in other fields, such as mobile games that everyone often plays, what applications can CV have? The image of the game scene is still different from the image of the real scene. Some game scenes are relatively complex, such as special effects interference, game characters are not like real people, and there are rules, artistic fonts are not fixed like license plates, and they have uniform background colors. Wait; some elements are relatively simple, such as fixed icons in fixed positions in the game and so on. Simple game elements can use traditional image detection methods, and better results can also be achieved. This article will lead everyone to analyze the recognition of common game scenes.
1. Processing flow
The recognition of game scenes can be divided into two modules, GameClient and CVServer. The GameClient module is responsible for obtaining real-time images from mobile phones or PCs and forwarding them to CVServer. CVServer processes the received game images and returns the result to Game Client. After the Game Client further processes it according to the requirements, it will feed back to the game terminal. The process is shown in Figure 1.
Figure 1 The main process of game scene recognition
2. Application example
The last section mainly shared with you the main process of game scene recognition. In this section, we will mainly analyze the application of image recognition in games.
2.1 Determination of the game state
Each game UI is called a game state. The game can be thought of as having many different UI components. First, establish a sample library of these UIs. When a game screen is acquired in real time, compare the current image with the sample image to determine the current game state. There are many ways to compare whether two images are similar. Here we take feature point matching as an example. The main steps are as follows:
Step1: Feature point extraction of sample image, feature point matching of test image
Figure 2 Feature point extraction
Step2: Feature point matching
Figure 3 Feature point matching
Step3: match screening
Figure 4 Matching screening based on ratio-test
ORB feature point matching is a relatively mature technology. In the collected test data set, due to the difference of mobile phone resolution, bangs or rendering, the size of the image or the UI rendering position will be quite different. Commonly used template matching is difficult to adapt to this situation. This situation does not exist in the matching scheme based on feature points. Feature points generally refer to corner points or salient points in the image. It has no obvious relationship with the position and size of the object element in the image, so it is more applicable. The ORB feature point combines the FAST feature point detection method with the BRIEF feature descriptor, and improves and optimizes them on the original basis. ORB feature points have rotation invariance and scale invariance. The following respectively introduces feature point extraction, feature point description, feature point matching and feature point screening.
2.1.1 Feature point extraction: FAST
The basic idea of FAST, if a certain pixel p and its surrounding neighborhood (1 to 16), there are enough pixels that have a large difference, then the pixel may be a corner point. The original FAST feature points have no scale invariance. The implementation of ORB in OPENCV builds a Gaussian pyramid and then detects the corner points on each layer of the pyramid image to achieve scale invariance. The original FAST also does not have direction invariance. ORB's paper proposes a gray-scale centroid method to solve this problem. For any feature point p, the moment of the neighboring pixels of p is defined as
, Where I(x,y) is the gray value at the point (x,y), and the centroid of the image is:
, The angle between the feature point and the center of mass is the direction of the FAST feature point:
Figure 5 FAST feature point diagram (image from the paper: Faster and better: A machine learning approach to corner detection)
2.1.2 Feature description: BRIEF
The core idea of the BRIEF algorithm is to select N point pairs in a certain way around the key point P, and then combine the comparison results of these N point pairs into a binary code string of length n as the description of the key point son. When ORB calculates the BRIEF descriptor, the coordinate system established is a two-dimensional coordinate system established with the key point as the center, and the line connecting the characteristic point P and the center of mass (Q) of the point taking area as the X axis. The center of the circle is fixed, taking PQ as the x-axis coordinate and the vertical direction as the y-axis coordinate. Under different rotation angles, the point pairs extracted from the same feature point are consistent, which solves the problem of rotation consistency.
2.1.3 Feature point matching: Hamming Distance
The Hamming distance of two equal-length binary strings is the number of different characters in the corresponding positions of the two binary strings. Hamming Distance is used in ORB to measure the distance between two descriptors.
2.1.4 Feature point screening: Ratio-Test
Ratio-Test is used to eliminate fuzzy matching point pairs with approximate distance ratio (nearest neighbor distance/second nearest neighbor distance). A parameter ratio is used here to control the elimination of feature points whose distance ratio is outside a certain range. As shown in the figure below, it can be seen that when the ratio is about 0.75, the correct matching and incorrect matching can be separated best.
Figure 7 The ratio diagram of the nearest neighbor distance and the second nearest neighbor distance. The solid line is the pdf of the ratio when the matching is correct, and the dashed line is the pdf of the ratio when the matching is incorrect. Image from the paper: DG Lowe. Distinctive imagefeatures from scale-invariant keypoints, 2004
2.2 Scene coverage
The method based on feature point matching can also be used in the application of scene coverage. The first is to load the template image of the core scene. The AI will collect a large number of screenshots of the game during the running process. Based on the test data set formed by these game screenshots, each test data set is traversed, and the core scene image and the test image are matched based on the feature point algorithm of part of the image. The feature point matching method of the full image matches the core scene image and the test image, and finally The matching results are filtered out, and the images of the matched core scenes are filtered. Based on the images and numbers of the matched core scenes, the scene coverage in the AI operation process is inferred.
2.3 Recognition of numbers in the game
There are many digital images in the game, such as the number of levels, the number of scores, and the countdown. We can recognize the numbers based on the CNN method. The classification method based on CNN was proposed very early. The early classic CNN network was the Lenet network proposed in 1998. The Lenet network uses 2-layer convolution, 2-layer pooling layer, 2-layer fully connected layer, and the last layer of softmax layer. composition. The input is a digital image, and the output is the category of the digital image.
Figure 6 Lenet network
We can first segment the all-digital image and divide it into an independent number, and then predict each digital image through the Lenet network. The output digital image category is the identified number, and finally the all-digital image is assembled. Get the final all-digit recognition result.
Figure 7 Number recognition process
With the deepening of the network structure, the strengthening of the convolution function, and the historical opportunities brought by GPU and big data, CNN has exploded in recent years. And CNN is not only used for classification, but also for object detection, that is, the last layer is from the original output object category to the output object's position in the image and the object's category at this position. We can use the algorithm YOLOV3 that compromises speed and accuracy, and based on engineering image features, we can optimize the network based on the two directions of reducing the number of network layers and reducing the number of feature maps, and further optimize the network speed.
Figure 9 Number recognition and reorganization process
2.4 Recognition of fixed position fixed icons
There are many applications of template matching. We will illustrate the three aspects of identification of fixed buttons, identification of prompt information, and detection of stuck state. In the main interface of the game, the hero's skills, equipment, operation keys and other buttons are generally in fixed positions. The button icon when the button is available is extracted as a template, and the game interface obtained in real time detects the template, and the detection indicates that the button is currently available. After the game AI obtains the information of these buttons, it can adopt corresponding strategies, such as releasing skills and purchasing equipment. The game prompt information is similar, some prompt information appears in a fixed position in the game interface, such as the route indication information shown in Figure 7, the game end state (success/failure), etc., the game running state (start), etc. We first collect the location where these prompt messages appear, as well as these prompt icon templates. During the real-time running of the game, at the position where they appear, whether the real-time matching matches the collected icon template. If it matches, it means that the prompt information currently appears. . If the game success icon is matched, the AI strategy in this round should be rewarded, on the contrary, it should be punished.
Figure 10 Identification of fixed Button
Figure 11 Recognition of game prompt icon
The idea of template matching is to find the most matching part of an image with another template image. The process is shown in Figure 12.
Figure 12 The process of template matching
The processing steps for template matching are as follows:
Step1: Starting from the upper left corner of the original image, from left to right, from top to bottom, the step size is 1, using the sliding window method to calculate the similarity between the template image and the window sub-image in turn.
Step2: Store the similar results in the result matrix.
Step3: Finally find the best matching value in the result matrix, if the more similar, the larger the value, then the brightest part in the result matrix is the best match.
OpenCV provides the interface function cv2.matchTemplate(src, tmpl, method) for template matching, where method represents the choice of matching method.
2.5 Object filtering based on pixel features
According to the range of the color value of each channel, the pixels in the detection area are filtered, and the position of the target object that meets this color feature can be obtained.
The color characteristics of the health bar in the game are also more obvious. For example, the R channel value of the red blood bar is relatively large; the G channel value of the green blood bar is relatively large; the B channel value of the blue blood bar is relatively large. We extract the color characteristics of the blood bar, and filter the pixels of the blood bar according to the color feature. Many pixels form the blood bar. By calculating the connected area of the blood bar, we can know the length of the blood bar, and then we can know the blood volume percentage. By filtering the pixel points of the health bar, we can know the location attributes of friendly units (green health bars or blue health bars), enemy units (red health bars), and health percentage properties on the main interface of the game. According to these attributes, the game AI can adopt different strategies such as escape, forward attack, and team formation.
Figure 13 The process of calculating the percentage of health bars
In MOBA games, our towers and enemy towers often appear on the small map. The color range of the extraction tower is R(0, 90), G(90, 190), B(110,200). In the range of the minimap, filter the pixels whose gray value is within this range, you can know where the our (enemy) tower is, and the tower's blood volume (how many pixels). If your hero appears on the minimap and there is a green circumscribed circle around your hero's profile picture, we can also extract the pixel value range of the green circumscribed circle R(80, 140), G(170, 210), B(70,110) . Filter out the position of your hero through the gray value of each channel, and then perform pathfinding or strategy selection.
Figure 14 Application of small map pixel screening in MOBA games
2.6 Other
There are many applications of image recognition in games, such as pedestrian detection in game scenes, hero detection, flower screen detection, air wall, mold penetration, de-duplication and so on.
3 Summary
This article mainly introduces the application of image recognition in the game, such as the determination of the game state, the calculation of scene coverage, the recognition of numbers in the game, the recognition of fixed icons in fixed positions, etc. I hope that readers will have a better understanding of the application of image recognition in games after reading this article.
"UQM User Quality Management" professional game client performance (stuck, fever, memory/CPU, network, etc.) and abnormal (crash, ANR, Error) monitoring and analysis platform. With the help of in-depth quality big data analysis, it provides a full range of quality monitoring, data analysis, and business insight services for the game business.
click the link:
UQM User Quality Management丨 WeTest's industry-leading quality cloud service provider Learn more
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。