Traffic recording and playback technology practice

Article guide

This article mainly introduces the application of traffic recording and playback technology in stress testing scenarios. By reading this article, you will learn how the open source recording tool integrates with internal systems, how to carry out secondary development to support Dubbo traffic recording, how to solve jar package version conflicts through the Java class loading mechanism, and how traffic recording is used in automated testing Application and value under the scene. The article is about 14,000 words, with 17 pictures. This article is a summary of my personal responsibility for the past year. It involves a lot of technical points. I have learned a lot from this article. I hope that this article can be useful for everyone. Of course, personal abilities are limited, and you are welcome to advise on the improprieties in the article. The specific chapters are arranged as follows:

1 Introduction

This article records and summarizes the project that I have led in the past year-traffic recording and playback. This project is mainly used to provide stress testing services for business teams. As the project leader, I take on about 70% of the work, so this project carries a lot of my own memories. From demand proposal, technical research, selection verification, problem handling, solution design, the smallest available system online within two weeks, promotion and use, support for mid-year/end full-link pressure testing, iterative optimization, support for dubbo traffic recording, to new scenarios Landing produces value. I have been deeply involved in each item listed here, so I have learned a lot from it. Including but not limited to go language, network knowledge, Dubbo protocol details, and Java class loading mechanism, etc. In addition, the value generated by the project also made me very happy. The project went live for one year, and helped the business line find more than a dozen performance problems, and helped the middleware team discover many serious problems with basic components. In general, this project is of extraordinary significance to me personally and has benefited a lot. Here is a summary of the project experience of the past year. This article focuses on implementation ideas, and will not post too much code. Friends who are interested can customize a set according to their ideas. Okay, let's start the text below.

2. Project background

The emergence of the project originated from a demand of the business team-to use real online traffic for stress testing, to make stress testing more "real". The reason why the business team feels that using the old stress test platform (based on Jmeter implementation) is unreal is because the diversity of stress test data is insufficient and the code coverage is insufficient. The normal pressure test task is usually to perform pressure test on the TOP 30 interfaces of the application. If the pressure test data of these interfaces is improved manually, the cost will be very high. Based on this requirement, we investigated some tools and finally chose GoReplay written in Go language as the traffic recording and playback tool. As for why this tool was chosen, let’s talk about it next.

3. Technical selection and verification

3.1 Technical selection

At the beginning of the selection, I didn't have enough experience and didn't consider too many factors. I only conducted research from the two dimensions of functionality and popularity. First of all, the function must be able to meet our needs, such as having a traffic filtering function, so that the specified interface can be recorded on demand. Secondly, candidates should be endorsed by major manufacturers, and there are many stars on github. Based on these two requirements, the following tools were selected:

Figure 1: Technical selection

open source tool, the full name is 1612c6729835bd jvm-sandbox-repeater , this tool is actually implemented JVM-Sandbox In principle, the tool intercepts the target interface in the form of bytecode enhancement to obtain interface parameters and return values. The effect is equivalent to the Around advice in AOP.

The second option is GoReplay, which is based on the Go language. The bottom layer relies on the pcap library to provide traffic recording capabilities. The famous tcpdump also relies on the pcap library, so GoReplay can be regarded as a minimalist version of tcpdump, because it supports a single protocol and only supports recording http traffic.

The third option is Nginx's traffic mirroring module ngx_http_mirror_module , based on this module, the traffic can be mirrored to a machine to achieve traffic recording.

The fourth option is a sub-product of Cloud Cloud Effect-1612c672983600 dual-engine regression test platform , as can be seen from the name, this system is developed for regression testing. And our need is to do stress testing, so we don't use many functions in this service.

After comparison and screening, we chose GoReplay as the traffic recording tool. Before analyzing the advantages and disadvantages of GoReplay, let's analyze the problems of several other tools.

The bottom layer of jvm-sandbox-repeater is implemented based on JVM-Sandbox. When using it, you need to load the codes of both projects into the target application, which is intrusive to the application runtime environment. If there is a problem with the code of the two projects, causing a problem similar to OOM, it will have a great impact on the target application. In addition, because of the niche direction, the application of JVM-Sandbox is not very extensive, and the community activity is low. Therefore, we are worried that the official will not be able to repair the problem in time, so this selection is to be determined.
ngx_http_mirror_module seems to be a good choice, and was born into a "famous door". But there are also some problems. First of all, it can only support http traffic, and we will definitely support dubbo traffic recording in the future. Secondly, this plug-in needs to mirror the request, which will inevitably consume resources such as the number of TCP connections and network bandwidth of the machine. Considering that our traffic recording will continue to run on the gateway, these resource consumption must be considered. Finally, this module cannot mirror the specified interface, and the mirroring function switch needs to modify the nginx configuration implementation. Online configuration is impossible, especially the configuration of core applications such as gateways cannot be changed arbitrarily. Combining these factors, this selection has also been abandoned.
Alibaba Cloud's engine regression test platform was also polishing its functions during our research, which was troublesome to use. Secondly, this product is a sub-product of Yunxiao and is not sold separately. In addition, this product is mainly used for regression testing, which has a large deviation from our scenario, so it is also abandoned.

Next, let’s talk about the advantages and disadvantages of GoReplay. Let’s talk about the advantages first:

A single program has no other dependencies except the pcap library, and no configuration is required, so the environment preparation is very simple
It is an executable program that can be run directly and is very lightweight. Just pass in the appropriate parameters to record, easy to use
There are many stars on github, which is well-known, and the community is active
It supports functions such as flow filtering function, double-speed playback function, rewriting interface parameters during playback, etc., which meets our needs in function
Low resource consumption, no intrusion into the JVM runtime environment of business applications, and less impact on target applications

For companies based on the Java technology stack, because GoReplay is developed in the Go language, the technology stacks vary greatly, and future maintenance and expansion are a big problem. So based on this alone, it is normal to eliminate this selection. However, due to its relatively outstanding advantages, after considering the advantages and disadvantages of other selections, we finally chose GoReplay as the final selection. In the end, everyone may wonder, why not choose tcpdump. There are two reasons. Our demand is relatively small. Using tcpdump has the feeling of a cannon hitting mosquitoes. On the other hand, tcpdump gives us the feeling that it is too complicated to control (she shed tears of no technology), so we didn't think about this selection much at the beginning.

Selection	Language	Open source	advantage	shortcoming
GoReplay	Go	✅	1. Open source project, simple code, easy to customize 2. Continuous monomer, less dependency, no configuration, simple environment preparation 3. The tool is very lightweight and easy to use 3. The function is relatively rich, which can meet all our needs 4. Built-in playback function, can directly use the recorded data, no need to develop separately 5. Less resource consumption, and does not invade the JVM runtime environment of the target application, the impact is small 6. Provides a plug-in mechanism, and the plug-in implementation does not limit the language , Easy to expand	1. The application is not widely used, there is no endorsement by large companies, and the maturity is not enough. 2. There are many problems. The official version 1.2.0 directly does not recommend . 3. Continued with the previous one, the requirements for users are higher. I can read the source code by myself, and the official response speed is generally 4. The community version only supports the HTTP protocol, does not support the binary protocol, and the core logic is coupled with the HTTP protocol, which is troublesome to expand. 5. Only supports command line startup, no built-in services, Not easy to integrate
JVM-Sandbox jvm-sandbox-repeater	Java	✅	1. Through enhanced methods, you can directly record Java class methods, which is very powerful 2. The functions are richer and more in line with the requirements 3. The business code is transparent and non-intrusive	1. There will be a certain intrusion to the application runtime environment. If a problem occurs, the application may be affected. 2. The tool itself is still biased towards test regression, so some functions cannot be used in our scenario, such as those that cannot be used. The playback function performs high-speed pressure test 3. Community activity is low, and there is a risk of stopping maintenance 4. The underlying implementation is indeed more complicated, and the maintenance cost is also relatively high. I left tears without technology again 😢 5. It needs to be matched with other auxiliary systems, and the integration cost is not low
ngx_http_mirror_module	C	✅	1. Produced by nginx, the maturity can be guaranteed 2. The configuration is relatively simple	1. It is inconvenient to start and stop, and does not support filtering 2. It must be used with nginx only, so the scope of use is also limited
Alibaba Cloud Engine Regression Test Platform	-	❌	-	-

3.2 Selection verification

After the selection is completed, verification of functions, performance, and resource consumption will be carried out immediately to test whether the selection meets the requirements. According to our needs, the following verifications have been made:

Recording function verification, to verify whether the traffic recording is complete, including the completeness of the number of requests and the accuracy of the requested data. And in the case of large traffic, resource consumption verification
Flow filtering function verification, to verify whether the flow of the specified interface can be filtered, and the integrity of the flow
Replay function verification, to verify whether the traffic replay works as expected and whether the amount of replay requests meets expectations
Double-speed playback verification, to verify whether the double-speed function meets expectations, and the resource consumption under high-speed playback

The above verifications were all passed offline at that time, the effect was very good, and everyone was quite satisfied. However, the double-speed playback function, when verified in a production environment, the playback pressure is not enough, and can only be pressed to about 600 QPS. No matter how pressurized after that, QPS will always be at this level. Our colleagues in the business line used different recording data to test multiple rounds of online testing, but it didn't work. At first, we thought it was a bottleneck in machine resources. However, we have seen that the CPU and memory consumption are very low, and the number of TCP connections and bandwidth are also very surplus, so there is no bottleneck in resources. This also highlights a problem. In the early days, we only performed functional tests on the tools, and did not perform performance tests, which led to this problem not being exposed as soon as possible. So I built a test service with nginx and tomcat offline, and conducted some performance tests, and found that I could easily reach thousands of QPS. Seeing this result, my brain was split 😭. It was later discovered that the RT of the offline service was too short, which was very different from that of the online service. So let the thread sleep randomly for tens to hundreds of milliseconds, at this time the effect is very close to the line. At this point, we can basically determine the scope of the problem. It should be that GoReplay has a problem. But GoReplay is written in Go language, and everyone has no experience with Go language. Seeing that the problem is solved at your fingertips, but there is nowhere to start, it is very suffocating. Later, the big guys decided to invest time in the GoReplay source code and find problems by analyzing the source code. Since then, I started the learning path of the Go language. Originally planned to give a preliminary conclusion in two weeks, I did not expect to find the problem in one week. It turns out that there is a big deviation between the usage document and the code implementation of GoReplay v1.1.0, which leads to the failure to achieve the expected effect when operating according to the document. The details are as follows:

Figure 2: GoReplay instructions

Let's take a look at what the --output-http-workers document says. The parameter 0612c672983a2c indicates how many coroutines are used to generate http requests at the same time. The default value is 0, which means unlimited. Let's take a look at how the code (output_http.go) is implemented:

Figure 3: GoRepaly coroutine concurrent number decision logic

The document says that there is no limit to the number of HTTP sending coroutines by default, and the result code is set to 10, which is too big a difference. Why is 10 coroutines not enough? Because the coroutine needs to wait for the response result in place, that is, it will be blocked, so the QPS that 10 coroutines can play is limited. After finding the reason, we explicitly set the --output-http-workers parameter value, and the QPS of double-speed playback was finally verified to meet the requirements.

After this problem occurred, we had great doubts about GoReplay, and felt that this problem was relatively low-level. Such problems will appear, so whether there will be other problems later, so I use it in my heart. Of course, since there are few people maintaining this project, it can basically be regarded as a personal project. Moreover, the project has not been applied on a large scale, especially without the endorsement of large companies, and such problems can be understood, and there is no need to be too harsh. Therefore, when you encounter problems later, you can only see tricks. Anyway, the code is available, just white-box audit.

3.3 Summary and reflection

Let me talk about the problems in the selection process. From the above description, I made some serious mistakes in the selection and verification process, and I was taught a lesson vividly by myself. In the selection stage, regarding the popularity, I thought that if there were more stars, even if it was more famous, it is still too naive to think about it now. Rather than popularity, maturity is actually more important. Stable pits less and get off work early 🤣. In addition, observability must also be considered, otherwise you will experience a sense of helplessness when you check the question.

In the verification phase, functional verification is not too problematic. But the performance verification was just a symbolic effort, and it ended up overturning during verification with colleagues in the business line. Therefore, during the verification period, the performance test cannot be sloppy. Once the related problems are discovered after they go online, it is very passive.

Make a summary based on the technology selection experience this time, and then look it up when you do technical selection in the future. The selection dimensions are summarized as follows:

Dimension	illustrate
Feature	1. Whether the selected function can meet the demand, if not, what is the cost of secondary development?
Maturity	1. In related fields, whether the selection has been used in a wide range. For example, in the Java Web field, the Spring technology stack is well-known to . 1612c672983b78 2. The selection of some niche fields may not be widely used, so you can only check the issue yourself, search for some pit records, and evaluate by yourself
Observability	1. Whether there is an observation method for the internal status data, for example, GoReplay will print out the internal status data regularly 2. If it is inconvenient to access the company’s monitoring system, it should also be considered. After all, human observation is too laborious

The verification is summarized as follows:

According to the requirements, one by one to verify whether the selected functions are in line with expectations, you can make a verification checklist and confirm item by item
Perform performance tests on the selection from multiple possible aspects, and pay attention to the consumption of various resources during this process. For example, GoReplay traffic recording, filtering and playback functions must be tested for performance
The long-term stability of the selected model should be verified, and the abnormal conditions existing during the verification period should be observed and analyzed.
To be more rigorous, you can do some failure tests. Such as killing the process, disconnecting the network, etc.

For more detailed practical experience in the selection, you can refer to Li Yunhua’s article: How to use the open source project .

4. Specific practice

When the technology selection and verification are completed, the next step is to turn the idea into reality. According to the current mode of fast running in small steps and rapid iteration, we usually only plan the most core functions at the start-up stage to ensure that the process goes through. Next, iterate according to the priority of the demand and gradually improve it. Next, I will follow the iterative process of the project to introduce.

4.1 Minimum available system

4.1.1 Introduction to requirements

Serial number	Classification	Demand point	illustrate
1	Record	Traffic filtering, recording on demand	Support filtering traffic by HTTP request path, so that you can record the traffic of the specified interface
2		The recording time can be specified	The recording time can be set, usually it is recorded for 10 minutes, and the traffic peak is recorded
3		Recording task details	Contains information such as recording status and recording result statistics
4	Replay	The playback time can be specified	Support setting the playback duration from 1 to 10 minutes
5		Playback speed can be specified	According to the QPS during recording, the flow rate is amplified by multiples, and the minimum granularity is 1x speed
6		The playback process allows for artificial termination	When a problem is found in the application under test, the playback process can be terminated artificially
7		Playback task details	Contains playback status, playback result statistics

The above is the list of requirements during the project startup phase. These are the most basic requirements. As long as these requirements are completed, a minimum usable system is achieved.

4.1.2 Introduction to the technical solution

4.1.2.1 Architecture diagram

Figure 4: The first phase architecture diagram of the stress testing system

The above architecture diagram has been edited, and there are certain differences with the actual, but it does not affect the explanation. It should be noted that our gateway service, stress testing machine, and stress testing service are composed of multiple units, and all gateways and stress testing instances are deployed with GoRepaly and its controller. In order to simplify the architecture diagram, only one machine is drawn here. Some core processes are introduced below.

4.1.2.2 Gor Controller

Before introducing other content, let me talk about the purpose of the Gor controller. In one sentence: The purpose of introducing this middle layer is to integrate GoReplay, a command-line tool, with our stress testing system. This module was developed by ourselves. It was first written in shell (unsatisfactory 😭), and later rewritten in Go language. The Gor controller is mainly responsible for the following things:

Master the power of GoRepaly's life and death, you can call up and terminate the GoReplay program
Shield out GoReplay usage details, reduce complexity, and improve ease of use
Return status, the status will be returned to the stress test system before GoReplay starts, after the end, and after other iconic events.
Process and return data generated by recording and playback
Log and record the status data output by GoRepaly, which is convenient for follow-up investigation

GoReplay itself only provides the most basic functions. You can imagine it as a car with only basic accessories such as chassis, wheels, steering wheel and engine. Although it can be driven, it is more laborious. And our Gor controller is equivalent to providing one-key start-stop, power steering, car networking and other enhanced functions on its basis, making it more usable. Of course, this is just an approximate metaphor, don't entangle rationality. After knowing the purpose of the controller, the following describes the execution process of startup and playback.

4.1.2.3 Introduction to the recording process

The user's recording command will first be sent to the stress test service. The stress test service could originally send the recording command directly to the Gor controller via SSH, but for security reasons, the operation and maintenance system must be bypassed. After the Gor controller receives the recording command and the parameters are verified correctly, GoReplay will be called up. After the recording is over, the Gor controller will send the status back to the pressure measurement system, and the pressure measurement will determine whether the recording task is over. The detailed process is as follows:

The user sets the recording parameters and submits the recording request to the stress testing service
The stress test service generates stress test tasks and generates recording commands according to the parameters specified by the user
The recording command is sent to the specific machine through the operation and maintenance system
The Gor controller receives the recording command, and returns the status of "recording is about to start" to the stress testing service, and then calls GoReplay
When the recording is over, GoReplay exits, and the Gor controller returns the "recording end" status to the stress test service
The Gor controller returns other information to the pressure measurement system
After the stress testing service determines that the recording task is over, it notifies the stress testing machine to read the recorded data into a local file
End of recording task

Here to explain, in order to use the GoReplay double-speed playback function, the recorded data must be stored in a file. Then set the double speed through the following parameters:

# 三倍速回放
gor --input-file "requests.gor|300%" --output-http "test.com"

4.1.2.4 Introduction to the playback process

The playback process is basically similar to the recording process, except that the playback command is fixedly sent to the pressure tester, and the specific process will not be repeated. Here are a few differences:

Mark the replay flow with pressure measurement: To distinguish the replay flow from the real flow, a mark is needed, that is, the pressure measurement.
Rewrite parameters as needed: For example, change user-agent to goreplay, or add token information for test account
GoReplay runtime status collection: including QPS, task queue backlog, etc. This information can help understand the running status of GoReplay

4.1.3 Shortcomings

This minimum usable system has been online for almost 4 months without any major problems, but there are still some shortcomings. There are two main points:

The command transmission link is slightly longer, which increases the probability of errors and the difficulty of troubleshooting. For example, the interface of the operation and maintenance system occasionally fails, the key has no log, and it is impossible to check the problem at first
The Gor controller is written in a shell, about 300 lines. The shell syntax is quite different from Java, and the code is not easy to debug. At the same time, for complex logic, such as generating JSON strings, it is troublesome to write, and subsequent maintenance costs are high.

These two shortcomings have been with our development and operation and maintenance work, until some optimizations later, it is considered that these problems have been completely solved.

4.2 Continuous optimization

Figure 5: Optimized architecture diagram of Gor controller

In response to the previous pain points, we have made targeted improvements. Focus on using the Go language to rewrite the gor controller, the new controller name is gor-server. As you can see from the name, we have a built-in HTTP service. Based on this service, the pressure test service issued orders finally no longer need to detour the operation and maintenance system. At the same time, all modules are under our control, and the efficiency of development and maintenance is obviously higher.

4.3 Support Dubbo traffic recording

We use Dubbo as the RPC framework internally, and calls between applications are done through Dubbo, so we also have a greater demand for Dubbo traffic recording. After achieving certain results for gateway traffic recording, some colleagues in charge of internal systems also hope to conduct stress testing through GoReplay. In order to meet internal usage requirements, we have carried out a secondary development of GoReplay to support the recording and playback of Dubbo traffic.

4.3.1 Introduction to Dubbo Protocol

To support Dubbo recording, you must first understand the content of the Dubbo protocol. Dubbo is a binary protocol, and its encoding rules are shown in the figure below:

Figure 6: Dubbo protocol diagram; source: Dubbo official website

The following is a brief introduction to the protocol, and the meaning of each field is introduced in the order of the figure.

Field	Number of bits (bit)	meaning	illustrate
Magic High	8	Magic number high	Fixed as 0xda
Magic Low	8	Magic number low	Fixed at 0xbb
Req/Res	1	Packet type	0 - Response 1 - Request
2way	1	Call method	0-One-way call 1-Two-way call
Event	1	Event ID	Such as a heartbeat event
Serialization ID	5	Serializer number	2 - Hessian2Serialization<br/>3 - JavaSerialization<br/>4 - CompactedJavaSerialization<br/>6 - FastJsonSerialization ......
Status	8	Response status	The status list is as follows: 20-OK 30-CLIENT_TIMEOUT 31-SERVER_TIMEOUT 40-BAD_REQUEST 50-BAD_RESPONSE ......
Request ID	64	Request id	The same ID will also be carried in the response header, which is used to associate the request with the response
Data Length	32	Data length	Used to identify the length of the Variable Part part
Variable Part（payload）	-	Data payload

After knowing the content of the agreement, we ran the official demo and grabbed a package to study it.

Figure 7: Dubbo requests packet capture

First of all, we can see the magic number 0xdabb that occupies two bytes. The next 14 bytes are other content in the protocol header. Let's analyze it briefly:

Figure 8: Data analysis of dubbo request header

The markings above are clearer, so I will explain a little bit here. It can be seen from the third byte that this packet is a Dubbo request. Because it is the first request, the request ID is 0. The length of the data is 0xdc, which is 220 bytes converted to decimal. With the 16-byte message header, the total length is exactly 236, which is consistent with the length displayed by the packet capture result.

4.3.2 Dubbo protocol analysis

We support Dubbo traffic recording. First, we need to decode the data packet according to the Dubbo protocol to determine whether the recorded data is a Dubbo request. So the question is, how to determine that the data in the recorded TCP segment is a Dubbo request? The answer is as follows:

First determine whether the data length is greater than or equal to the length of the protocol header, that is, 16 bytes
Determine whether the first two bytes of the data are the magic number 0xdabb
Judge whether the 17th bit is 1, or it can be discarded if it is not 1

Through the above detection, it can be quickly determined whether the data conforms to the Dubbo request format. If the test passes, then how to judge whether the recorded request data is complete? The answer is to compare the recorded data length L1 with the length L2 given in the Data Length field, and perform follow-up operations based on the comparison result. There are several situations:

L1 == L2, indicating complete data reception without additional processing logic
L1 <L2, indicating that there is still some data not received, continue to wait for the remaining data
L1> L2, indicating that some more data has been received, and these data are not part of the current request. At this time, the received data should be divided according to L2

The three scenarios are as follows:

Figure 9: Several situations at the receiving end of the application layer

Seeing this, some students must want to say that this is not a typical TCP "sticky packet" and "unpacking" problem. But I don't want to use these two words to explain some of the above situations. TCP is a byte stream-oriented protocol, and the protocol itself does not have the so-called "sticky packet" and "unpacking" problems. In the process of data transmission, TCP does not care about how the upper layer data is defined. In its view, they are all bytes. It is only responsible for transporting these bytes to the target process in a reliable and orderly manner. As for case 2 and case 3, that is what the application layer should handle. Therefore, we can find the relevant processing logic in Dubbo's code, and interested students can read the NettyCodecAdapter.InternalDecoder#decode method code.

That's it for this section, and finally leave a question for everyone. In the code of GoReplay, case 3 is not handled. Why is there no error in recording HTTP protocol traffic?

4.3.3 GoReplay transformation

4.3.3.1 Introduction to transformation

GoReplay Community Edition currently only supports HTTP traffic recording, and its commercial version supports some binary protocols, but does not support Dubbo. Therefore, in order to meet the needs of internal use, only secondary development can be carried out. However, due to the relatively large coupling between the community version code and the HTTP protocol processing logic, it is still more troublesome to support a new protocol recording. In our implementation, the transformation of GoReplay mainly includes Dubbo protocol identification, Dubbo traffic filtering, and data packet integrity judgment. The decoding and deserialization of the data packet is implemented by the Java program, and the serialization result is converted into JSON for storage. The effect is as follows:

Figure 10: Dubbo traffic recording effect

GoReplay uses three monkey heads 🐵🙈🙉 as request separators, which feels very funny at first glance.

4.3.3.2 Introduction to GoReplay plugin mechanism

You may be curious about how GoReplay works with Java programs, but the principle is very simple. Let's take a look at how to turn on the plug-in mode of GoReplay:

gor --input-raw :80 --middleware "java -jar xxx.jar" --output-file request.gor

A command can be passed to GoRepaly through the middleware parameter, and GoReplay will start a process to execute the command. During the recording process, GoReplay communicates with the plug-in process by obtaining the standard input and output of the process. The data flow is roughly as follows:

+-------------+     Original request     +--------------+     Modified request      +-------------+
|  Gor input  |----------STDIN---------->|  Middleware  |----------STDOUT---------->| Gor output  |
+-------------+                          +--------------+                           +-------------+
  input-raw                              java -jar xxx.jar                            output-file

4.3.3.3 Implementation Ideas of Dubbo Decoding Plug-in

Decoding of the Dubbo protocol is relatively easy to implement. After all, a lot of code has already been written in the Dubbo framework. We only need to modify and customize the code as needed. The parsing logic of the protocol header is in the DubboCodec#decodeBody method, and the parsing logic of the message body is in the DecodeableRpcInvocation#decode(Channel, InputStream) method. Since GoReplay has already parsed and processed the numerical data, there is no need to parse many fields in the plug-in, just parse out the Serialization ID. This field will guide us in the subsequent deserialization operation.

The decoding of the message body is a bit more troublesome. We put a copy of the DecodeableRpcInvocation code in the plug-in project and modified it. The unnecessary logic is deleted, and only the decode method is retained, turning it into a tool class. Considering that our plug-in is not convenient to import the jar package of the application to be recorded, when modifying the decode method, we must also pay attention to removing the logic related to the type. The revised code is roughly as follows:

public class RpcInvocationCodec {
    
    public static MyRpcInvocation decode(byte[] bytes, int serializationId) {
        ObjectInput in = CodecSupport.getSerializationById(serializationId).deserialize(null, input);
        
        MyRpcInvocation rpcInvocation = new MyRpcInvocation();
        String dubboVersion = in.readUTF();
        // ......
        rpcInvocation.setMethodName(in.readUTF());    
        
        // 原代码：Class<?>[] pts = DubboCodec.EMPTY_CLASS_ARRAY;
        // 修改后把 pts 类型改成 String[]，泛化调用时需要用到类型列表
        String[] pts = desc2className(int.readUTF());
        Object[] args = new Object[pts.length];
        for (int i = 0; i < args.length; i++) {
            // 原代码：args[i] = in.readObject(pts[i]);
            // 修改后不在依赖具体类型，直接反序列化成 Map
            args[i] = in.readObject();
        }
        rpcInvocation.setArguments(args);
        rpcInvocation.setParameterTypeNames(pts);
        
        return rpcInvocation;
    }
}

From the perspective of code development alone, it is not very difficult. Of course, the premise is to have a certain understanding of Dubbo's source code. For me, time is mainly spent on the transformation of GoRepaly. The main reason is that I am not familiar with the Go language, and writing and checking results in low efficiency. When the function is written, the debugging is finished, and the result is output correctly, I am really happy. However, this happiness only lasted a short time. Soon during online verification with business colleagues, the plug-in crashed and the scene was very embarrassing. I looked at the error message with a bewildered face, and I will not be able to solve it for a while, in order to keep a little face, I quickly terminated the verification 🤪. After investigation, it was found that when some special deserialized data was converted into JSON format, there was an infinite loop, which caused StackOverflowError to occur. Because the main process of the plug-in is single-threaded, and only Exception is caught, the plug-in error exit is caused.

Figure 11: Circular dependency causes Gson framework to report errors

This error tells us that there is a circular reference between classes, and our plug-in code does not handle circular references. It is reasonable for this error to occur. But when I found the business code that caused this error, I didn't find the circular reference. I didn't find the tricky until I debugged it locally. The code similar to the business code is as follows:

public class Outer {   
    private Inner inner;

    public class Inner {
        private Long xyz;
        
        public class Inner() {
        }
    }
}

The problem lies in the inner class, Inner will implicitly hold the Outer reference. Not surprisingly, this should be done by the compiler. There is no secret in front of the source code, we decompile the class file of the internal class, everything will be clear.

Figure 12: Decompilation results of internal classes

This should be regarded as basic knowledge of Java, but I usually use it less. When I first saw the code, I didn't see the circular reference hidden in it. The explanation here is reasonable, is this the end? Actually not yet. In fact, Gson will not report an error when serializing Outer. Debugging found that it will exclude this$0 . The exclusion logic is as follows:

public final class Excluder
    public boolean excludeField(Field field, boolean serialize) {
        // ......

        // 判断字段是否是合成的
        if (field.isSynthetic()) {
          return true;
        }
    }
}

So why do we get an error when we convert the recorded traffic into JSON? The reason is that our plug-in cannot get the type information of the interface parameters when deserializing, so we deserialize the parameters into Map objects, so this$0 will also be stored as key-value pairs in the Map. At this time, Gson's filtering rules are not effective, and this$0 cannot be filtered out, which causes an endless loop and eventually leads to a stack overflow. After knowing the cause, how can such a problem be solved? The next section expands.

4.3.3.4 Go straight to the problem

I began to think about whether it is possible to clean the data in the Map artificially, but I found that it seemed difficult to do it. If the data structure of the Map is complex, such as many layers of nesting, the cleaning logic may be difficult to implement. And I don’t know if there will be any other twists and turns, so I gave up this idea, and let the deserialization tool do this kind of dirty work. We need to find a way to get the parameter type of the interface, how can the plug-in get the parameter type of the business application api? One way is to download the jar package of the target application to the local when the plug-in starts, and then load it by a separate class loader. But there is a problem here. There are also some dependencies in the api jar package of business applications. Should these dependencies be downloaded recursively? The second method is simple and rude. Directly introduce the business application api dependency in the plug-in project, and then mark it as a fat jar. This does not require a separate class loader, nor does it need to recursively download other dependencies. The only obvious shortcoming is that it will introduce some irrelevant dependencies in the plug-in project pom, but compared with the benefits, this shortcoming is nothing at all. For convenience, we rely on the APIs of many business applications. After some operations, we got the following pom configuration:

<project>
    <groupId>com.xxx.middleware</groupId>
    <artifactId>DubboParser</artifactId>
    <version>1.0</version>
    
    <dependencies>
        <dependency>
            <groupId>com.xxx</groupId>
            <artifactId>app-api-1</artifactId>
            <version>1.0</version>
        </dependency>
        <dependency>
            <groupId>com.xxx</groupId>
            <artifactId>app-api-2</artifactId>
            <version>1.0</version>
        </dependency>
        ......
    <dependencies>
</project>

Next, we need to change the RpcInvocationCodec#decode method, which is actually to restore the code back 😓:

public class RpcInvocationCodec {
    
    public static MyRpcInvocation decode(byte[] bytes, int serializationId) {
        ObjectInput in = CodecSupport.getSerializationById(serializationId).deserialize(null, input);
        
        MyRpcInvocation rpcInvocation = new MyRpcInvocation();
        String dubboVersion = in.readUTF();
        // ......
        rpcInvocation.setMethodName(in.readUTF());    
        
        // 解析接口参数类型
        Class<?>[] pts = ReflectUtils.desc2classArray(desc);
        Object args = new Object[pts.length];
        for (int i = 0; i < args.length; i++) {
            // 根据具体类型进行反序列化
            args[i] = in.readObject(pts[i]);
        }
        rpcInvocation.setArguments(args);
        rpcInvocation.setParameterTypeNames(pts);
        
        return rpcInvocation;
    }
}

The code adjustment is completed, and the verification will be launched on the next day. Everything is normal, which is very gratifying. But soon, I discovered that there were some hidden dangers in it. If it happens online one day, it will bring greater difficulties to the investigation.

4.3.3.5 Potential problems

Considering this situation, the api jar packages of business application A and application B depend on some internal public packages at the same time, and the versions of the public packages may be inconsistent. At this time, how do we deal with dependency conflicts? What to do if the internal public package is not well done and there are compatibility issues.

Figure 13: Schematic diagram of dependency conflict

For example, the version of the common package here conflicts, and 3.0 is not compatible with 1.0. How to deal with it?

Simply deal with it. Instead of relying on all business application api packages in the plug-in pom, we only rely on one. But the downside is that each time we have to build the plug-in code separately for different applications. Obviously we don't like this approach.

Furthermore, we do not rely on the api package of the business application in the plug-in, and keep the plug-in code clean, so there is no need to package it every time. How to get the api jar package of the business application? The answer is to build a project for each api jar, and then mark the project as a fat jar, and the plug-in code uses a custom class loader to load business classes. When the plug-in starts, download the jar package to the machine according to the configuration. Only one jar package needs to be loaded each time, so there is no dependency conflict problem. By doing this, the problem can be solved.

Furthermore, when I read the source code of Ali open source jvm-sandbox project earlier, I found that this project implements a class loader with routing function. Can our plug-in build a similar loader? Out of curiosity, I tried it and found that it was ok. The final realization is as follows:

Figure 14: Schematic diagram of custom class loading mechanism

The first-level class loader has the function of routing according to the "fragment" of the package name, and the second-level class loader is responsible for the specific loading work. Application api jar packages are placed in one folder, and only the secondary class loader can load them. Some classes in the JDK, such as List, still have to be loaded by the built-in class loader of the JVM. Finally, let me explain that the main purpose of this class loader with routing function is to play. Although it can achieve the goal, in actual projects, it is safe to use a method.

4.4 Blossoming and fruiting, landing a new scene

The main and only use scenario of our traffic recording and playback system at that time was to do stress testing. After the system is stable, we are also considering whether there are other scenarios that we can engage in. I just tried jvm-sandbox-repeater in the technology selection stage. The main application scenario of this tool is to do traffic comparison test. For changes that do not affect the return value structure of the interface such as code reconstruction, you can verify whether the changes are problematic through traffic comparison tests. Because the big guys think that jvm-sandbox-repeater and the underlying jvm-sandbox are a bit heavy, the technical complexity is also relatively high. In addition, there are no resources to develop and maintain these two tools, so we hope that we can do this based on the traffic recording and playback system, and run the process first.

The project is led by the QA team, the traffic replay and diff functions are developed by them, and we provide the underlying recording capabilities. The working diagram of the system is as follows:

Figure 15: Schematic diagram of comparison test

Our recording system provides real-time traffic data to the player. After the player receives the data, it immediately replays it to the pre-release and online environment. After replaying, the replayer can get the results returned by the two environments respectively, and then pass the results to the comparison module for subsequent comparison. Finally, the comparison result is stored in the database. During the comparison process, the user can see which requests failed. For the recording module, pay attention to filtering the playback traffic. Otherwise, the QPS of the interface will be doubled, and the replay voltage has been tested 🤣, I would like to mention a fault.

This project has been online for 3 months and has helped the business line find 3 serious bugs and 6 general problems, and its value has emerged. Although the project is not led by us, we are also very happy as a provider of underlying services. I hope that in the future, we can expand more usage scenarios for our system and let it grow into a lush tree.

5. Project Outcomes

As of the time the article was published, it has been nearly a year since the project went live. A total of 5 applications are connected and used, and the total number of recording and playback is almost four to five hundred times. The usage data looks a bit shabby, mainly because the company's business is toB, and there is not so much demand for stress testing. Although the usage data is relatively low, it still exerts corresponding value as a pressure measurement system. It mainly includes two aspects:

Performance problem discovery: The stress testing platform found more than a dozen performance problems for the business line, and helped the middleware team discover 6 serious basic component problems
Improved use efficiency: The new pressure measurement system is simple and easy to use, and it only takes 10 minutes to complete an online traffic recording. Compared with things that can only be done by a single person in half a day in the past, the efficiency has been increased by at least 20 times, and the user experience has been greatly improved. One proof is that more than 90% of stress testing tasks are currently completed on the new platform.

Maybe you have some doubts about the efficiency improvement data. You can think about how to get online traffic without a recording tool. The traditional approach is to modify the interface code for business development and add some logs. This requires attention to the amount of logs. After that, the modified code is published online. For some relatively large applications, a release involves dozens of machines, which is quite time-consuming. Next, clean the interface parameter data from the log file. Finally, these data must be converted into stress test scripts. This is the traditional process, and each step is time-consuming. Of course, companies with well-established infrastructure can get interface data based on the full-link tracking platform. But for most companies, it may still have to use traditional methods. On our platform, you only need to select the target application and interface, the recording duration, and click the recording button. User operations are limited to these, so the efficiency improvement is still obvious.

6. Looking to the future

Although the project has been online for a year, due to limited manpower, I am basically the only one who is developing and maintaining it, so the iteration is still relatively slow. In response to some of the problems encountered in practice, here are a few obvious problems, and hope that they can be solved one by one in the future.

1. Full link node pressure graph

At present, during stress testing, stress testers need to open the monitoring pages of many applications on the monitoring platform, and switch between multiple application monitoring during stress testing. It is hoped that in the future, the pressure graph of each node on the full link can be displayed, and the alarm information of the node can be sent to the pressure test personnel, so as to reduce the monitoring cost of the pressure test.

2. State collection and visualization of pressure measurement tools

The stress testing tool itself has some useful status information, such as the task queue backlog, the current number of coroutines, etc. This information can help us troubleshoot problems when the pressure test fails. For example, the number of tasks in the task queue is increasing, and the number of coroutines also remains high. Can any reason be inferred at this time? The high probability is that the pressure of the compressed application is too large, which will cause the RT to become longer, which will cause the pressure coroutine (fixed number) to be blocked for a long time, and eventually lead to a backlog of the queue. GoReplay currently outputs these status information to the console, which is still very inconvenient to check. At the same time, there is no alarm function, which can only be checked passively when something goes wrong. So I hope that in the future, these status data can be put on the monitoring platform, so that the experience will be much better.

3. Pressure sensing and automatic adjustment

At present, the pressure measurement system does not sense the pressure of the business application. No matter what the state of the pressure measurement application is, the pressure measurement system will perform the pressure measurement according to the established settings. Of course, due to the limitations of the GoReplay concurrency model, there is no need to worry about this problem at present. But in the future, it is not ruled out that GoReplay's concurrency model will change. For example, as long as there are tasks in the task queue, a coroutine will be immediately sent to send requests, which will cause great risks to business applications.

There are still some problems, because the importance is not high, so I won't write them here. In general, our current pressure test requirements are still relatively small, and the QPS of the pressure test is not high, resulting in a lot of optimizations that cannot be done. For example, the performance tuning of the pressure testing machine, the dynamic expansion and shrinking of the pressure testing machine. But think about it, we only have 4 pressure testing machines, the default configuration can fully meet the demand, so these problems are too lazy to toss 🤪. Of course, from the perspective of personal technical ability improvement, these optimizations are still very valuable, and you can play with it when you have time.

7. Personal gains

7.1 Technological gains

1. Getting started with Go language

Since GoReplay was developed in the Go language, and we did encounter some problems in using it, we had to investigate the source code in depth. In order to better control the tools and facilitate troubleshooting and secondary development, I specifically learned the Go language. The current level is at an introductory stage, a rookie level. I have been using Java for a long time, and I am still very dazed to learn the Go language at the beginning. For example, Go's method defines:

type Rectangle struct {
    Length uint32
    Width  uint32
}

// 计算面积
func (r *Rectangle) Area() uint32 {
    return r.Length * r.Width
}

At that time, I felt this syntax was very strange. What the hell was the declaration in front of the Area method name. Fortunately, I still have some knowledge of the C language. Then I thought about it, what should I do if I let C implement object-oriented?

struct Rectangle {
    uint32_t length;
    uint32_t width;
 
    // 成员函数声明
    uint32_t (*Area) (struct Rectangle *rect);
};

uint32_t Area(struct Rectangle *rect) {
    return rect->length * rect->width;
}

struct Rectangle *newRect(uint32_t length, uint32_t width)
{
    struct Rectangle *rp = (struct Rectangle *) malloc(sizeof(struct Rectangle));  
    rp->length = length;
    rp->width = width;
 
    // 绑定函数
    rp->Area = Area;
    return rp;
}

int main()
{
    struct Rectangle *rp = newRect(5, 8);
    uint32_t area = rp->Area(rectptr);
    printf("area: %u\n", area);
    free(pr);
    return 0;
}

After understanding the above code, you will know why Go's methods are so defined.

With the deepening of learning, I found that the grammatical characteristics of Go are really similar to C, and there is also the concept of pointers. The C language in the 21st century is indeed well-deserved. So in the learning process, I will involuntarily compare the characteristics of the two and learn Go according to the experience of C. So when I saw the code below, I was very horrified.

func NewRectangle(length, width uint32) *Rectangle {
    var rect Rectangle = Rectangle{length, width}
    return &rect
}

func main() {
    fmt.Println(NewRectangle(4, 5).Area())
}

It was expected that the operating system would ruthlessly throw a segmentation fault error to me, but there was no problem with compiling and running...Question...Question... Am I wrong? Look at it again, and I think it’s okay. In C language, pointers to stack space cannot be returned, and Go language shouldn’t do this. Here is the difference between the two languages. The Rectangle above looks like it is allocated in the stack space, but it is actually allocated in the heap space. This is the same as Java.

In general, Go syntax is similar to C, and C is my enlightenment programming language. For the Go language, I also feel very kind and like it. The syntax is simple, the standard library is rich and easy to use, and the user experience is good. Of course, since I am still in a novice village and have not written any major projects in Go, my knowledge of this language is still relatively shallow. Please forgive me if there is anything wrong with the above.

2. More proficient in the principle of

I have basically read the core logic of GoReplay recording and playback, and I have also written articles to share on the intranet. Here is a brief chat with you about this tool. GoReplay abstracts some concepts in its design, such as input and output to indicate the source and destination of data, and middleware between the input and output modules to implement the expansion mechanism. At the same time, input and output can be combined flexibly, and can even form a cluster.

Figure 16: Schematic diagram of GoReplay cluster

In the recording phase, each tcp segment is abstracted as a packet. When the amount of data is large and needs to be split into multiple message segments for transmission, the receiving end needs to combine these message segments in order, and at the same time deal with problems such as disorder and duplicate messages to ensure the next module What is passed is a complete and error-free HTTP data. These logic systems are encapsulated in tcp_message, and tcp_message and packet are in a one-to-many relationship. The following logic will fetch the data in tcp_message, mark it, and pass it to the middleware (optional) or the output module.

The playback stage process is relatively simple, but it will still be executed according to the input → [middleware] → output process. Usually the input module is input-file and the output module is output-http. An interesting point in the playback stage is the principle of double-speed playback. The acceleration function is realized by shortening the interval between requests by multiples, and the implementation code is also very simple.

In general, the core code of this tool is not much, but the function is still relatively rich, you can experience it.

3. Have more knowledge about Dubbo framework and class loading mechanism

When implementing Dubbo traffic recording, basically read the logic related to decoding. Of course, I have read this piece of logic before and wrote articles. It’s just that this time I’m going to customize the code, and I’ll learn a little bit more deeply than just reading the source code and writing articles. After all, I have to deal with some practical problems. In this process, due to the need to customize the class loader, I also have more understanding of the class loading mechanism, especially the class loader with routing function, which is quite fun. Of course, it’s no big deal to learn these technologies. The point is to discover and solve problems.

4. Other gains

The other gains are relatively minor points, so I won’t talk about it here, and leave it to everyone to think about in the form of questions.

The TCP protocol will ensure the orderly . Why does GoReplay working at the application layer still process out-of-order data?
What is the communication process of HTTP 1.1 protocol? What problems will be caused if two HTTP requests are sent consecutively on a TCP connection?

7.2 Lessons and thoughts

1. Technical selection should be cautious

I didn't have much experience in model selection at the beginning, and the inspection dimensions were few and not comprehensive e