vivo Internet Server Team - Li Qingxin
C/C++ development efficiency has been criticized by developers in the industry, and so is unit test development efficiency, so that developers are reluctant to spend time writing unit tests. So can we improve the test case coverage of the project by improving the efficiency of writing unit tests?
This article mainly introduces how to use the GCC plug-in to realize the tool solution for improving the unit efficiency of C/C++ developers, hoping to inspire you to improve the efficiency of unit testing.
1. Motivation
The above figure shows the basic process of C/C++ unit testing. Writing unit tests in the daily development process is a relatively large amount of work. At present, C/C++ unit test code needs to be written manually, and some private methods are piled up. more troublesome.
At present, there are no open source automated testing frameworks or tools in the industry, but there are some commercial automated testing tools. The following figure shows our automated testing tools and unit testing libraries:
Even with the support of test libraries such as gtest in the open source world, we still need to write a lot of unit test case code. For some private and protected class methods, the efficiency of writing unit test cases is lower, and manual stubbing (mock) is required. At the same time, we analyzed the test cases and found that there are many boundary use cases, which are basically fixed or have certain patterns, such as int maximum and minimum.
How to improve the efficiency of writing unit tests, improve the development efficiency and program quality of C/C++ students? We can extract the function, class and other information in the source file, and then generate the corresponding unit test case. Automatic generation of use cases requires information such as function declarations, class declarations, etc., so how should we obtain this information?
For example: the following function definition:
void test(int arg) {}
We hope to get the function's return value type, function name, function parameter type, and function scope from the function definition above. Usually we can get it in the following ways:
1.1 Method 1: Using Regular Expressions
Unfortunately, the C/C++ format is more complicated, although multiple combinations can be used to obtain the corresponding function declaration and other information:
void test(int arg){}
void test1(template<template<string>> arg,...){}
void test2(int(*func)(int ,float,...),template<template<string>> arg2){}
Then you need to write a series of regular expressions:
- Extract function name, parameter name: [z-aA-Z_][0-9]+
- Extract function return value: ^[a-zA-Z_]
The keywords are extracted, but he has a big problem: how to judge whether the code written in the file conforms to the C/C++ syntax description?
1.2 Method 2: Use flex/bison to analyze c/c++ source files
This is of course a good way, but the workload is huge, which is equivalent to implementing a compiler with a simplified version of the lexical and parser, and it needs to adapt to different grammar formats, although bison can solve the above-mentioned how to judge whether the grammar is not Correct question, but still complex.
1.3 Method 3: Generate code using compiled AST
Usually the process of GCC compilation we know is the following four stages:
Source File -> Preprocessing -> Compile -> Assemble → Link
But in fact, GCC has made a lot of optimizations in order to support more programming languages and different CPU architectures, as shown in the following figure:
The figure above shows GCC processing source code and other optimization processes. The Generic language generated in the front-end part is an abstract syntax representation (AST) that is generated for the source code during the compilation process of gcc, which is independent of the source language. Since the AST tree is generated during the GCC compilation process, we can use the GCC plugin to extract the key information of the abstract syntax tree generated by the GCC front-end, such as function return values, function names, parameter types, etc. The overall difficulty is also very high. On the one hand, there are few reference materials in the industry. The description of each node on the AST syntax tree can only be analyzed by analyzing the source code of GCC.
The solution for automatically generating unit test cases described in this article (we call it TU: Translate Unit, hereinafter collectively referred to as TU) is based on method 3. Let's take a look at our automated test case solution. Show results.
2. Effect display
2.1 Zero modification of business code, direct use of TU to generate boundary use cases
In this use case, we can generate boundary test cases for the business code without modifying any business code, and the function parameters can be fully arranged at the boundary value, which greatly reduces the risk of missing use cases. You may find that this use case generated without any modification has no assertions. Although there are no assertions, it can still help to find out whether there are boundary values in the cell that cause coredump.
So if you want to add assertions and mock functions to him, is there no way? With the new attribute syntax of C++11 [[]], it is only necessary to add an assertion according to the TU format when the method is declared or defined, which is not intrusive to the business logic.
2.2 Use the annotation tu::case to generate user-defined use cases
In many cases, the boundary test cases generated by default cannot cover the core logic, so we also provide tu::case to customize test cases and assertions for users. For example, there is an int foo (int x, long y) method, and now you want to add a test case with a return value of 123 and a function argument of 1,1000, then just add the following code before the function declaration:
[[tu::case("NE","123","1","1000")]]
2.3 Use the annotation tu::mock to automatically generate a mock method
During the development process, we often need to mock a method (that is, set a temporary replacement method for the original method and keep the same calling method). For example, when a function accesses Redis and DB, unit testing often requires these methods. The method is mocked to facilitate unit testing of other function calls. In order to facilitate unit testing, we often mock it, so in order to facilitate developers to quickly mock, we provide tu::mock annotations to help developers quickly Define the annotation, and then TU will automatically generate the corresponding mock function. For example: Now give the foo_read method a function to mock, and let the mocked function return 10:
3. TU implementation plan
3.1 What is AST?
GENERIC, GIMPLE and RTL constitute the whole of the gcc intermediate language. They take GIMPLE as the core, are inherited from GENERIC and linked by RTL, and build a three-layer transition on the gap between source files and target instructions. .
During the parsing process of GCC, all recognized language components are stored in a variable called TREE. This TREE is the GCC syntax tree (AST), and this process is called GENERIC. In fact, it is also GCC's symbol table, because variable names, types, etc. are all associated with TREE.
Let's take a look at the ast representation of gcc through the gcc compilation options:
3.2 AST (Abstract syntax tree)
GCC can generate an ast tree by adding the compile option -fdump-tree-all. The content of the ast tree file is as follows:
For the description of each type of AST, please refer to: https://gcc.gnu.org/onlinedocs/gccint/Types.html
Although a simple look at the above figure shows that there is still a dependency between nodes in the form of gcc, it is difficult to understand, and it is easier to read without the intuition generated by clang. Although it is not conducive to reading, it does not affect the extraction of AST information through encoding.
3.3 Scheme
As shown in the figure above, we use different plugins to collect the AST information, header file information, and function annotations (attributes) of the source file under test to save these important information. GCC saves user registration plugin events into an array:
Then during the compilation and construction process, it will check whether the corresponding event has a callback method set. If set, it will be called. TU mainly uses the following plug-ins:
- PLUGIN\_INCLUDE\_FILE is used to get the included header files of the current file
- PLUGIN\_OVERRIDE\_GATE User obtains common functions and classes
- PLUGIN\_PRE\_GENERICIZE is used to obtain the instantiation of the template function
- PLUGIN_ATTRIBUTES is used to implement custom attributes or annotations (tu::case\tu::mock ....)
All plug-in types supported by GCC are shown in the following figure: (from the gcc 6.3.0 source code)
4. Comparison of the ease of use of TU plug-ins
If you just do boundary testing, you only need to modify the build script, such as cmake, to add the corresponding plugin parameters.
Five, the advantages of using TU
- Simple access, boundary unit testing can achieve 0 modification of business code
- The function parameters can be fully arranged at the boundary value, which greatly reduces the risk of missing use cases and reduces a lot of repetitive work.
- Quickly generate user-defined use cases, mock methods, etc.
6. Functions supported by TU
7. Summary and Outlook
1. The article compares three methods of automatically generating test cases. The following methods are compared:
2. The article also mainly introduces the functional characteristics of TU and the solution to automatically generate test cases based on GCC-AST.
At present, the TU solution can automatically generate test cases during construction, which has greatly reduced the unit test threshold and improved unit test coverage. In the future, we also hope to combine TU with IDE to explore more efficient and convenient ways of use. way to generate test cases for the specified method. For example, the test case of the current method is generated by the shortcut key on the function and method.
references:
[1] gcc plugins
【2】 Functions for C++ (GNU Compiler Collection (GCC) Internals)
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。