Author: Di Mo
Review & proofreading: Fengyun
Editing & Typesetting: Wen Yan
introduction
Performance Testing Service (Performance Testing Service) is an Alibaba Cloud SaaS-based performance testing tool. It has been 10 years since it was first born to accurately simulate the traffic peaks of Double Eleven. It supports tens of thousands of pressure test tasks across the group including Double Eleven every year, and is the "early verifier" of Alibaba's internal Double Eleven technical architecture.
As a SaaS-based performance pressure measurement tool, PTS supports on-demand initiation of pressure measurement tasks, and can provide millions of concurrent and tens of millions of TPS traffic initiation capabilities. It is also 100% compatible with JMeter. It provides functions such as scene orchestration, API debugging, traffic customization, traffic recording, etc., which can quickly create business stress test scripts, and cover various operator nodes in hundreds of cities across the country, which can accurately simulate users of different levels of access to business systems and help business quickly improve System performance and stability have been widely used in retail, finance, online education and other fields.
Today PTS capabilities are upgraded again. The upgrade of the stress testing protocol further expands the scope of the stress testing protocol support and applicable scenarios, so that you no longer need to worry about the inability to stress testing with different technical architectures; the launch of the low-threshold mass flow self-service pressure capability allows the stress testing tool team Eliminate the trouble of development and operation and maintenance, click to start the pressure test, and easily have the self-service pressure test capability of millions of concurrent; the productization capability of writing pressure test in a safe and non-invasive production environment, just simply access the probe It has the ability to write stress tests in the production environment, so that every business scenario is not "left behind" during stress tests in the production environment, and more comprehensive and accurate assessment of system performance and capacity.
The newly released/upgraded features are as follows:
- Support HTTP 2 protocol.
- Support streaming media RTMP/HLS protocol.
- Support Websocket protocol.
- Support MQTT protocol.
- Support SpringCloud/Dubbo microservice protocol.
- Maximum 100W concurrent self-service stress testing capability.
- Write stress testing in a safe, non-intrusive production environment.
Pressure test support protocol upgrade
As the "language" of application system communication, protocols are quietly changing the protocols adopted by different types of systems when facing diversified scenarios today. As the most widely used transmission protocol, the HTTP protocol mainly transmits text content, which carries the mainstream traffic of the Internet in the past. When we face diversified rich text content today, the HTTP protocol is obviously no longer the only choice for our technical services. Streaming media-related protocols assume the role of porter for you to watch the video content. When watching the video, the one who knocks on "YYDS" and the server follow the Websocket protocol. The smart watch you wear on your hand, the smart electrical home, maybe It is through the MQTT protocol to keep the data synchronized with the cloud service, even if you are still browsing the text content, the service protocol of the porter communication is changing from HTTP 1.1 to HTTP 2, to HTTP 3 and other protocols.
As a technician, developer, and tester, when facing rapid business iterations, it is a headache to understand each interaction protocol itself. The stress test scenario is also similar. Obviously, we cannot accept the cost of customizing a stress test tool for each system. As a stress test tool, PTS brings you a brand new upgrade in terms of stress test support agreement, as follows:
- Support HTTP 2 protocol.
- Support streaming media RTMP/HLS protocol.
- Support Websocket protocol.
- Support MQTT protocol.
- Support SpringCloud/Dubbo microservice protocol.
Support HTTP 2 pressure test
Since the release of the HTTP 1.X protocol version in 1997, our system has used HTTP 1.x to provide content services for a long time. In the past ten years, the number of Internet content and Internet users has exploded. HTTP 1.x has been unable to meet the needs of modern networks. More and more companies have also begun to upgrade from the original HTTP 1.X to HTTP 2 in exchange for better The page load performance and safety of the website. You can feel the performance improvement brought by the HTTP 2 protocol through the following picture.
Compared with HTTP/1.1, HTTP 2's main improvements include the following:
- Use binary transmission.
- Header compression.
- Multiplexing.
- Server Push。
- Improve safety.
As you can see from the previous renderings, the performance of HTTP 2 is obviously better than that of HTTP 1.x. The key features to improve performance are binary transmission, header compression, and multiplexing. Let's look at the basic principles of these three features.
Use binary transfer
Binary protocol is more efficient to parse than plain text. In HTTP 2.0, the original transmission content is broken up into frames. All communications under the same domain name are completed on a single connection, and the original message structure is modified. Each message is composed of one or more frames. Multiple frames can be sent out of order, and can be reassembled according to the stream identifier in the frame header, as shown in the following figure:
Header compression
In HTTP 1.X, due to the stateless nature, when you initiate a request, you often need to bring a bunch of headers, and the headers of many requests are even larger than the body. The header content is too large, which increases the transmission cost to a certain extent. If many field values in thousands of request response messages under a certain domain name are repeated, it is a waste of resources.
Therefore, on the basis of binary transmission, the HTTP 2 protocol increases the header compression capability. Through the HPACK compression algorithm, a dictionary is built on both the client and the server, and the index number is used to represent the repeated string. The compression efficiency can reach 50%~ 90%. As shown in the two requests in the following figure, the first request sends all the headers, and the second request only needs to send the difference data to improve efficiency:
Multiplexing
HTTP 2 supports multiplexing technology. Multiplexing solves the problem of limiting the number of requests under the same domain name in the browser, and reduces the overhead of each new TCP request. In HTTP 2, through the aforementioned binary Frame transmission method, and by allowing the client and server to decompose HTTP messages into independent frames, and then reassemble them at the other end, a complete request and response multiplexing can be achieved. As shown in the figure below, the client is transmitting a data frame (Stream 5) to the server, and the server is transmitting an interleaved frame sequence of streams 1 and 3 to the client, and there are three parallel streams transmitting data.
Through the binary transmission of HTTP 2 and multiplexing technology, you can well see that for the same domain name in previous browsers, the limit on the number of TCP persistent connections must be shared by TCP management and control. As a result, only one pipeline can be processed at the same time. The Head-Of-Line Blocking of the request no longer exists, which is also the basis for the improvement of the efficiency of the HTTP 2 protocol.
In theory, HTTP 2 is compatible with HTTP 1.x. If the client does not support the HTTP 2 protocol, the server will automatically use the HTTP 1.x protocol for communication. In our performance stress test scenario, you can see through the above examples that the performance of HTTP 2 and HTTP 1.x is inconsistent. If the stress test engine does not support HTTP 2, it will be directly downgraded to HTTP 1.x during stress test. . In the context of today's mainstream rover devices supporting the HTTP 2 protocol, the actual results of the stress test will be biased.
Therefore, we have introduced the support of PTS HTTP 2. After the user creates the scene in the PTS console, there is no need for any operation. During the stress test, the result of the negotiation with the server will be used to decide to use the HTTP 1.x or HTTP 2 protocol. Ensure the authenticity of the stress test scene.
Support streaming media protocol pressure test
With the rise of Internet live broadcast services in recent years, Internet content is quietly undergoing earth-shaking changes. From the initial e-commerce live broadcast and game live broadcast to the online education live broadcast of the epidemic this year, based on streaming media content, more and more live broadcast formats are also shown in front of the public. From a technical point of view, do not agree with the back-end services based on the HTTP protocol, the live broadcast system is a brand new system architecture. How to simulate the scene of a user watching a video like a user behavior based on an HTTP request has become a new technical problem.
First, let's look at a complete model diagram of the live broadcast architecture. We can clearly see the macro-level architecture model diagram of the live broadcast:
From the figure, we can clearly see the three main modules of the live broadcast system:
- Push stream end.
- Streaming media server.
- Play side.
The main function of the streaming terminal is to collect the anchor's audio and video data and push it to the streaming media server. The main function of the streaming media server is to convert the data transmitted by the push streaming terminal into a specified format, and push it to the playback terminal to facilitate viewing by different playback terminal users. Of course, the current cloud manufacturer also provides a complete set of solutions for the streaming media server. In short, the playback end is to pull audio and video for playback, and present the corresponding content to the user.
It can be seen that the protocol connecting these three key modules is actually a streaming media transmission protocol. Generally speaking, a streaming media server architecture does not need to emphasize protocol consistency. The current mainstream streaming media protocols are as follows:
At present, PTS already supports the RTMP/HLS protocol. How to follow the figure below, combined with the PTS process orchestration capabilities, can truly simulate the scenes of users watching different videos. Combined with the regional customization features of the PTS pressure engine, it can easily simulate the user behavior of large-scale live broadcasts to ensure the stability of the live broadcast business.
Support Websocket protocol pressure test
Through the previous analysis of HTTP-related protocols, you can see that the HTTP protocol is a stateless, connectionless, and one-way application layer protocol, which uses a request/response model. Before the HTTP 2 protocol, communication requests could only be initiated by the client, and the server responded to the request. Before the large-scale rollout of the HTTP 2 protocol, this communication model had a drawback that the server could not actively initiate messages to the client.
In some real-time scenarios, this drawback cannot meet user needs. Before Websocket, in order to ensure real-time information, the following two methods were usually used:
- Ajax polling.
- Long pull。
The principle of Ajax polling is very simple. Let the browser send a request every few seconds to ask the server if there is new information; the principle of Long poll is similar to ajax polling, but it uses a polling method, but it uses blocking. In the model, after the client initiates a connection, if there is no message, it does not return Response to the client until there is a message. After the return, the client establishes a connection again, and the cycle continues.
It can be seen from the above that these two methods are actually constantly establishing an HTTP connection, and then waiting for the server to process it, essentially without changing the request/response model. The emergence of Websocket is to solve the above problems. Through the Websocket protocol, when the server/client establishes a connection, the server can actively push information to the client to ensure real-time message and reduce performance overhead.
In essence, Websocket is a full-duplex communication protocol based on the TCP protocol, which is completely different from HTTP, but the handshake process relies on the HTTP protocol. Careful students can easily find the following message content if they analyze the packet through packet capture:
GET /chat HTTP/1.1
Host: server.pts.console.aliyun.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: xxxxxxxxxxxxxxxxxxxx
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version: 13
Origin: https://pts.console.aliyun.com
You can see that every time a WebSocket connection is established, an HTTP request is initiated during the handshake phase. Through the HTTP protocol, the WebSocket supported version number, the protocol version number, the original address, and the host address are agreed to the server. The key part of the message is that the Upgrade header is used to tell the server to upgrade the current HTTP request to the WebSocket protocol. If the server supports it, the returned status code must be 101:
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept:xxxxxxxxxxxxxxxxxxxx
With the above return, the Websocket connection has been established, and the next step is to complete data transmission in accordance with the Websocket protocol.
As we mentioned earlier, Websocket is a new protocol derived to solve the internship problem of the request/response model. In the actual application process, we found that Websocket is widely used in online games, stock funds, sports live updates, chat rooms, bullet screens, online education and other scenes with very high real-time requirements.
By supporting the Websocket protocol, PTS enables these scenarios to quickly verify the performance and capacity of the system through performance stress testing like HTTP request-based test scenarios.
Support MQTT pressure test
MQTT is an instant messaging protocol developed by IBM, which is currently an important part of the Internet of Things. This protocol supports all platforms, and can connect almost all networked objects with the outside, and can be used as a communication protocol for sensors and actuators.
The MQTT protocol itself does not distinguish between the client (terminal) and the server (cloud). According to the MQTT model, all client communications are forwarded by the role of an MQTT broker through the pub/sub method. The architecture diagram in the actual IoT scenario is roughly as follows:
Compared with the aforementioned HTTP protocol, MQTT has the following features:
- Low protocol overhead. Based on the binary transmission protocol, the protocol header can be as short as 2 bytes.
- Support Push mode.
- High tolerance for unstable networks. The MQTT protocol natively supports the session mechanism, which can automatically recover after the link is disconnected and ensure the quality of the message.
Combining the above features, MQTT fits well with the current hotly developing IoT field. Based on the data in recent years, the proportion of the MQTT protocol in the IoT field is gradually increasing, and it has even surpassed the traditional HTTP protocol.
Therefore, in order to solve the stress testing requirements of IoT scenarios, PTS specially launched the MQTT stress testing scenario, which supports the self-built MQTT service and the Alibaba Cloud micro message queue MQTT version for stress testing. As shown in the figure below, you can quickly create stress testing scenarios in the console. :
Support microservice related protocol (SpringCloud/Dubbo) pressure test
For a single application architecture, as the business expands, application deployment and operation and maintenance will become slower and more complex, and the agile model in the application development process cannot be implemented as the number of personnel increases. The microservice architecture is used to solve the above problems.
From the structural point of view, the microservice architecture is actually to split the functional services provided by an application into multiple loosely coupled services. These services call each other through a certain protocol (RPC/HTTP, etc.) to complete the monolithic architecture A shift to a distributed architecture to provide more flexible development and deployment methods and reduce the complexity of development and operation and maintenance.
The following figure is a business case. You can see that after a user's request enters the store-web application through the HTTP protocol, it will call back-end services such as store-cart and store-product through RPC.
Then imagine the next scenario. Under the micro-service architecture system, if we do not initiate traffic from store-web, we want to separately perform stress testing on back-end services such as store-cart and store-product. If the stress testing tool does not support micro For service-related protocols, it is impossible to perform stress testing for this scenario alone; even if the stress testing tool supports part of the microservice protocol, the stress testing tool needs to be deployed in the VPC where the microservice is located to perform the stress testing. The whole process is time-consuming and laborious.
In order to solve the above-mentioned problems, PTS has launched a new microservice stress testing capability, supporting the stress testing of mainstream microservice protocols such as SpringCloud/Dubbo, and at the same time automatically opening up the user's VPC, facilitating fast performance stress testing of microservices. As shown below:
Upgrading of pressure
The predecessor of PTS was Alibaba's full-link stress test. The original intention of the full-link stress test was to truly simulate the real scenario where users across the country flock to Tmall to buy goods at midnight on Double Eleven. Before 13 years, stress tests were basically simulated stress tests in an offline environment. The advantage of offline simulation stress testing is that the implementation is relatively simple, the risk is low, and certain performance problems can also be found; but the disadvantage is that the call scene is completely different from the real call scene online, and the authenticity of the data and environment is required. Without guarantee, it is impossible to accurately evaluate system performance. Offline pressure testing is usually used to test whether a single system uses a performance bottleneck. It has little reference value for capacity calculation. If the system is to have the ability to withstand the double eleven zero peak, we need a more accurate pressure. Test mode to evaluate online capacity.
The concept of online stress testing was proposed in Ali as early as 2010. Through the single-machine drainage method, we have the ability to perform online single-machine stress testing and accurately obtain the performance limits of a single machine for the first time. The drainage pressure measurement is based on a single machine, and the corresponding capacity planning is also evaluated for a single application. Under the large-scale distributed architecture, the method based on single-application computing capacity ignores the influence of the overall call relationship and upstream and downstream dependencies. We cannot evaluate the actual carrying capacity of core pages and transaction payments in the entire chain from user login to completion of purchase. In addition, a series of links such as the computer room, network, middleware, and storage are also full of uncertainties. The emergence of full-link stress testing has changed this situation. Through application system transformation, full-link stress testing enables the online environment to handle both normal traffic and test traffic at the same time, so as to support online cluster read-write pressure that does not affect normal user access. Test to obtain the most true online actual carrying capacity data.
Today, we stand at this special time of Double Eleven. Every year, at midnight on Double Eleven, users from all over the country flock to Tongmao to purchase goods. From a technical perspective, behind it is that tens of millions of HTTP requests arrive instantly. system. The reason why Ali's system can withstand such a large-scale traffic peak is inseparable from the full-link stress test preview before Double Eleven.
Standing on the shoulders of full-link stress testing, PTS has commercialized the two major capabilities of full-link stress testing, mass flow pressure testing, and production environment writing stress testing. Through PTS, it is possible to initiate low-cost access to business-level traffic for users across the country, and at the same time, it can cover all online stress test scenarios including write requests, and the most realistic simulation of scenarios similar to Double Eleven events.
Mass flow pressure ability
Faced with the ever-increasing business scale, I believe that many users of self-built stress testing platforms have a worry, that is, how to initiate the flow of super large events. Starting from construction, the cost of environmental maintenance is high; self-developed engines have problems with the pressure machine, which causes the pressure to be unable to rise.
As shown in the figure above, the PTS on-demand traffic initiation capability supports concurrent self-service pressure testing up to 100W. Whether you are a small concurrent stress test in a daily test scenario, or a stress test that needs to simulate a super large event, just click to initiate the flow, and you don't need to worry about the above problems.
Productization of writing and stress testing capabilities in a safe and non-intrusive production environment
As mentioned earlier, Ali's full-link stress test is to make the online environment can handle normal traffic and test traffic at the same time through application system transformation, so as to support the online cluster read and write stress test that does not affect normal user access, and obtain the most realistic Actual online carrying capacity data.
The challenge of writing stress testing in the production environment is mainly in two aspects; one is to ensure the security of writing stress testing and avoid polluting online data; the other is to avoid intrusion into business code as much as possible to allow business to do Too much transformation.
Combined with many years of practical experience in Alibaba's full-link stress testing, we have summarized the prerequisites for ensuring the security of writing stress testing in the production environment:
- Ensure that the pressure test mark is not lost.
The pressure measurement flow can be correctly identified in any link. The pressure measurement mark is attached to the flow entrance layer. The middleware recognizes and continues to pass the pressure measurement mark down to ensure that the pressure measurement mark on the entire link is not lost. In this way, the downstream application and storage can also receive the pressure measurement mark. Mark. - Ensure that the pressure test process is not interrupted.
The pressure test flow can be called normally, the entire process is not blocked, and the business results that meet the expectations are returned. The application layer of the business needs to be modified to support the full link. When the application layer recognizes the pressure measurement target, it needs to bypass the verification logic such as parameter verification and security verification, such as mobile phone number format verification and user status. Verification, and some other special business verification logic. - Ensure that the pressure test data is not polluted.
Pressure test data does not cause data pollution to normal online business. The full-link scenario often contains multiple read and write scenarios. In order to isolate the pressure test data, after the storage middleware recognizes the pressure test target, it writes the data into the shadow database table to distinguish it from the real data. In order to simulate the real scene more realistically, the basic data in the shadow database table (such as buyers, sellers, products, shops, etc.) is constructed from real data plus a fixed offset. During the migration process, sampling, filtering, and Operations such as desensitization ensure data security, and are generally consistent with real data in terms of data level.
The production environment write pressure test probe released by PTS already has the above three capabilities. Only need to deploy the application on the probe, support the mainstream commonly used middleware, and configure the corresponding rules, without changing any business code. As shown in the figure below, combined with the PTS pressure capability, you can initiate a production environment pressure test when needed.
finally
The above-mentioned abilities are new functions launched by PTS at the end of the Yunqi Conference. Students who are interested in PTS are welcome to scan the QR code to join the group exchange. At the time of the Double Eleven Carnival, we not only launched JMeter's exclusive resource pack, but also 88% off the entire line of products, with a minimum price of 0.99 yuan, everyone is welcome to buy!
related links
*Alibaba Cloud PTS
https://pts.console.aliyun.com/#/overviewpage*
*PTS resource pack purchase
https://common-buy.aliyun.com/?commodityCode=ptsbag#/buy*
Scan the code and join the PTS user exchange group
For more information, please search WeChat account (AlibabaCloud888) to add a cloud native assistant! Get more information!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。