[Dry goods sharing] Research efficiency optimization practice: AI algorithm helps deep BUG mining

Introduction

As products continue to operate online, the scale of products online is getting bigger and bigger, and their functions are getting more and more complex. The increase in product volume puts higher and higher quality requirements. In order to achieve higher quality requirements, it is necessary to find ways to increase the intensity of testing, but the cost of using the traditional manual method of writing use cases to automate regression is too high. In recent years, AI technology has played an increasingly important role in more and more fields. Within Tencent, we have always maintained curiosity about new technologies, actively learning and applying them in our daily work. The author of this article is Lin Junke, a senior engineer of system testing in the Security Department of Tencent. He has 16 years of software testing experience and has considerable research on the application of AI technology in the testing field.

This article uses security protection products as examples, but this methodology is suitable for in-depth exploration of BUG caused by a combination of multiple factors. The figure shown below is a typical flow attack protection process: hackers launch attacks on business servers on the Internet. We have devices that detect traffic to detect attacks. After detecting the attack, the protection is automatically activated, and the traffic of the attacked IP is diverted to The defense device, after cleaning the traffic on the defense device, re-forwards the normal traffic to the service server.

01Analysis of the pain points of safety product testing

Features of safety protection products:
1. Black product attack methods are diverse and fast refurbished. Products need to be able to quickly respond to new attack methods on the live network, but they must never kill normal users by mistake. Therefore, there are many product protection strategies. The following table shows the number of configuration items in the main strategy files, which add up to two to three hundred, and the number is still growing rapidly. Each iteration of a version will add a large number of new configuration items, and the processing logic is very complicated.
The number of main policy configuration file policy items
anti_*.conf 50
anti_*.conf 147
.conf 11
.conf 11
.conf 10
Even if the development is very careful, in fact, it is still impossible to ensure that each function is high cohesive and low coupling. Sometimes it is still inevitable that the originally unrelated configuration items will affect each other. If there are undue influences between switches that do not influence each other, it may lead to uncontrollable protection after switching strategies. We once experienced an example of a failure caused by an unexpected impact. The failure at that time was: a configuration item for protecting UDP traffic affected the protection function of HTTPS traffic, but the two configurations had nothing to do with each other. Therefore, we need to test that product functions can be stable and reliable under various combined strategies.

2. For specific traffic, eventually most of the traffic will be protected by a specific protection module. Using this feature can simplify the model, we can grasp the main features for modeling, and other protection details can be temporarily ignored.

The industry to solve the problems caused by this parameter combination mainly uses a full dual algorithm to combine the parameters in pairs. The generated test set can cover all value combinations of any two variables with the least number of combinations. In theory, this set of use cases can expose all the defects caused by the interaction of two variables. Although the number of combinations generated by this algorithm is the smallest, if new parameters are added to regenerate the combination, the new combination is completely unrelated to the previous combination. So when there are fewer parameters, we often use it to reduce the number of use cases while maintaining better test coverage. But once there are many parameters, a brand new combination is generated every time, and the expected result must be recalculated according to the combination each time, and the whole process will become very complicated. It is difficult to solve the problem of "protection methods for specific flows of hundreds of switches in different configurations".

At present, our project team is manually adding use cases and automating the execution of use cases. In this way, it is actually difficult to maintain full duality every time a new configuration item is added. For example, suppose that the existing use cases are all dual, and now a new configuration item is added. This configuration item can only take two values, 0 and 1. In order to ensure that all the parameters are combined, the new configuration item must be tested once when it is 0 and again when it is 1 on the basis of all the original use cases. Every time a configuration item is added, the number of use cases doubles, and the number of use cases is very large. If a new combination is generated every time, there are only about 130 combinations when the 150 configuration switches are newly generated in a fully dual combination. The incremental method can reach 2^150 combinations.

02How does the industry automatically generate use cases?

Is there a solution in the industry that can generate a small number of combinations without recalculating the expected results? The answer is yes. UML modeling technology is to update the maintenance model along with the tested version, and re-organize and generate new use cases for testing each time. The core value of this technology lies in: automatically generating use cases, maximizing functional coverage with the least number of use cases, and ultimately testing the version faster and more comprehensively. The disadvantages of this technology are: model maintenance is complicated, design flaws are difficult to find (use cases are only mechanically traversed), and use cases are not designed from the user's perspective.

03AI's application in the field of front-end page testing

In recent years, the development of AI technology has been very fast, and AI technology also has the same characteristics as UML: like building models. So can AI technology bypass complex modeling? Coordinate the use cases as a whole, and achieve maximum coverage with the least number of use cases. At the same time, avoid manual calculation of expected results.

In order to explore the application of new technology to the testing field, I quickly scanned the blindness of AI, and then when I conducted more in-depth learning, I found that the future of AI application in the testing field has come. Many tools in the industry are already using AI for automated testing, and even the use cases are automatically designed. For the front-end pages, there are even tools that claim that as long as the URL link is given, the tester only needs to wait for the test result. Similar software includes: eggplant, appvance IQ, Sauce Labs, etc.

Through analysis, it is found that these technologies mainly use AI computer vision technology to identify all buttons on the page, generate a traversal tree based on the buttons on each page, and then automatically traverse the possible user journey based on the traversal tree. So as to achieve the purpose of automated design use cases and automated testing.

A colleague of Tencent published a book "AI Automated Testing" before, which introduced in detail the testing of AI on image games and data games.

The existing technologies in the industry are excellent, but they are mainly used in the testing of front-end pages, and there is no corresponding technology for back-end testing. So we began to study how to apply AI technology to back-end testing. After many attempts and combining the characteristics of AI, we came up with a bold idea: Without human involvement, machines cannot understand artificially designed business logic, but like UML It is too heavy to build a model like that, but AI is very good at processing data classification, since it can’t figure out the expected result, can it not be calculated? The test suite only records how the traffic is processed. After recording, the AI will classify it according to the traffic and protection results. After the classification is finished, analyze the typical configuration of this type according to each type? Then manually review whether the traffic handling method under the typical configuration is reasonable.

04Explore the application of AI in background testing

Based on these ideas, we quickly formulated an implementation plan. Our goal: to improve the coverage of multiple factor combinations with minimal cost, and to dig deep into deep-seated BUG. The theoretical basis for the successful implementation of the program is: Based on the testing theory, the most scenarios are covered with the least number of use cases. Use AI to classify and insight into responses in various scenarios. It is feasible to string these two pieces together.
The implementation steps of the plan are as follows:
Step1: Every time a new configuration item is added, the configuration is regenerated based on the full duality algorithm.
Step2: Use typical attack methods for each configuration and record the protection methods of the tested end.
Setp3: Analyze the association between various protections and configurations through AI. Find out the most important configuration items for various protection methods.
Step4: Check whether the most relevant N configurations of various protection methods meet the expected design?

The first part is very simple to generate a full dual combination based on the test theory. It took me half a day to implement it. In order to combine the configuration items in multiple configuration files, I designed the configuration item name @file name to name the configuration items. Use the pairwise tool to generate. After the combination, use the script to convert it into a configuration file.

A total of 250 combinations are generated based on the full duality algorithm. Select 27 types of traffic with typical characteristics to initiate'GET','POST','PUT','DELETE','HEAD','OPTIONS','TRACE', and'CONNECT' requests respectively. There are 27 8=216 types of traffic. The 216 types of traffic are passed under 250 configurations and the protection methods are recorded, and the protection records of 250 216=54000 scenarios are obtained. The recorded result is as follows: It is divided into 3 parts, the first part is the configuration item combination data, the second part is the name of the traffic sent, and the last column is the protection method used by the tested terminal.

The data is available and can be handed over to the AI. But the team only has test experts, not AI experts. We asked AI experts within Tencent for advice. After the AI experts understood our needs, they thought it was feasible, but the specific implementation still troubled us very much. Because the knowledge in the AI field is very different from the knowledge in the testing field, learning this knowledge from scratch is like reading a bible book.

But as long as you are willing to use your brain to learn more, there are more methods than difficulties. I found a data mining tool that is easy for AI novices to use. After repeated learning and practice, I think these components can be applied in our solution. The model I built is as follows:

The full name of PCA is called the main cause analysis component, which can help us find out the N configuration items that have a great influence on the result. The configuration items sort the effect of the result, and the output is a one-dimensional list. The processing sequence of the configuration switch in the development and design is definitely a net-like pattern. Just refer to this result. Quantitative analysis of the impact of classification tree on configuration items. I personally think that the information output by this component is more valuable.

The result of PCA's analysis is like this. In our case, this curve is quite smooth, indicating that there are no configuration items that have a particularly large impact.

The configuration items analyzed by AI using RANK components have an impact on the results, and the following sequence is correct with the developed design flow chart, which is roughly the same. Preliminary confirmation that the plan is still a bit reliable.

The following figure shows the display of the running results classified by the classification tree:

Let's take a typical example to illustrate how to find the problem based on the AI's citation: AI gets a large classification tree map after processing the data, and each result in the data will be marked with a color, as shown in the figure The yellow, purple, white and green shown in the figure are respectively the 4 kinds of results related data display. Represents the region where the yellow root protection technique is dropos_ * data total 74.
The most relevant configuration items for this result:
drop_@anti_.conf。
The leaf node on the left represents:
When drop_ @anti_ when .conf configured android, ios, linux.
The protective method is: dropos_ *
The leaf node on the right represents:
When drop_ @anti_ when configured .con + f 0.
Protection practices as: ** _ Trans.

According to the protection logic of the system under test, I see that there is indeed a problem in this place. This function is a function of discarding specific OS fingerprints, because when I ran the use case, I only used the Linux system to send traffic. If the function is normal, only Linux will discard it. AI to analyze but when drop_ @anti_ .conf configured to android, ios, win, linux discards, that is configured as android, ios, win when OS Identify inaccurate issues. Let's make a note of this point first.

The configuration item at the bottom of the box is the secondary related configuration item related to the result. Continue to observe its leaf nodes. We pay special attention to the ratio of each leaf node. In this example, when the configuration item is configured with different values, the ratio is close. The result is also very obvious, this is a signal with low coupling.

Open the original table according to the information displayed in the classification tree, hide the irrelevant columns and put the related configuration items together. At this time, you can see the problem.

According to the line number corresponding to the problem scene, find out the corresponding configuration to reproduce the problem on the environment, and the reproduced problem is shown in the figure. After reproducing the problem, the configuration is as follows:

Expected: The traffic sent by Linux should not match the policy and should be forwarded. The actual measurement found that the flow rate was dropped

This example shows that under the guidance of AI, it is successfully discovered that there is indeed the possibility of misrecognition of the OS fingerprint function in a specific scenario. It also proves that the method of using AI to analyze data is reliable. I think the core value of AI for testing is to visualize complex data, making analysis easier.

To sum up, this method can solve the pain point of "currently there are but not many deep-level bugs caused by the mutual coupling of multiple parameters, but to solve these problems, parameter combination tests are required, and the solution is very costly." Verify the coupling between multiple factors at a small cost. Automatically generated 54,000 scenarios of test cases, which took 3.5 days to run. After AI analysis of the results, two of the BUGs have been confirmed with the development. If use cases are written manually for these 54,000 scenarios, based on the current 30 use cases per person per day, it will take 4.9 years to complete without holidays. After using this method, it only takes a few minutes to generate the combination, and it runs in 3.5 days. At present, it is estimated that the analysis can be completed in 10 days during the exploration stage, which greatly improves the test efficiency.

About Tencent WeTest

Tencent WeTest is a one-stop quality open platform officially launched by Tencent. More than ten years of experience in quality management, dedicated to quality standard construction and product quality improvement. Tencent WeTest provides mobile developers with excellent R&D tools such as compatibility testing, cloud real machine, performance testing, security protection, and more . The gold medal expert team guarantees the quality of your products 360 degrees through 5 dimensions and 41 indicators.

Follow Tencent WeTest to learn more about test dry goods
WeTest Tencent Quality Open Platform-Focus on games to improve quality

[Dry goods sharing] Research efficiency optimization practice: AI algorithm helps deep BUG mining

Introduction

01Analysis of the pain points of safety product testing

02How does the industry automatically generate use cases?

03AI's application in the field of front-end page testing

04Explore the application of AI in background testing

About Tencent WeTest

腾讯WeTest

引用和评论

跨越界限！PerfDog Evo(v10.0)版，打破游戏与APP性能测试壁垒！

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性