Author: Aurora Senior Test Engineer-Wang Suhua & Yang Wanwu
- Demand background
By introducing the use of the locust distributed load testing tool, everyone can quickly grasp the basic methods of performance testing and improve work efficiency. - Term explanation
Performance test definition: refers to testing various performance indicators of the system by simulating a variety of normal, peak and load conditions through automated test tools
Response time: To measure the performance of a single interface, the response time can be simply understood as the response time from the time a user sends a request to the response data returned by the server.
Number of concurrent users: The number of users who submit requests to the system at a certain physical moment. The submitted requests may be in the same scenario or in different scenarios.
TPS: Quanpin is Transaction per second, the number of transactions per second; to measure the performance of a single interface, a transaction can be understood as a process in which a client sends a request to the server and the server responds.
Single push: Only one user id is pushed when sending a message (multiple users receive messages by pushing multiple messages)
Multiple pushes: One push message can push multiple user IDs at the same time. (Multiple users receive messages through one message) - Test strategy
3.1 Demand analysis:
In response to requirements, the following scenarios need to be tested:
Use domain name single push to view the maximum TPS sent
Use domain name to push more, check the maximum TPS sent
3.2 Test method
1. Preliminary preparation for performance test:
Analyze the business scenario: what are the content of the scenario, and the scope is wider, and you can discuss with the development and product to determine the scope of this test
Analyze the test strategy: what are the test scenarios designed
Analyze the production environment
Choosing a testing tool: what method to use to test performance
2 Purpose of performance test:
Performance testing is to simulate scenario pressure in advance, find possible bottlenecks in the system, and perform system tuning to reduce the risk of server downtime
Performance testing can also be used to evaluate the system load capacity of the software under test under different loads, and provide performance data for system deployment.
3 Performance degradation curve analysis method:
The performance curve shows a downward trend with the increase of the number of users, as shown in the figure:
The blue line represents TPS, and the yellow represents response time
In the process of increasing TPS, the response time will initially be at a low level, that is, before point A. Then the response time began to increase somewhat, until the time point B when the business can bear, TPS still has room for growth. Then increase the pressure and reach the maximum TPS when point C is reached. We continue to increase the pressure, and the response time continues to increase, but the TPS will decrease.
3.3 Test tool
Use distributed stress testing tool locust
3.4 Test environment
3.4.1 Pressure test data flow chart
3.4.2 System under test
The API service is a dual-machine dual-node deployment method, and related dependent service deployments are also deployed on this service, and they are all dual-nodes.
Server resources: 4 cores 16G
Use domain name pressure test: the domain name is forwarded to the server through lbs
4 Test tool Locust
4.1 Brief
JMeter and LoadRunner performance testing tools are relatively old brands. They both use threads as virtual virtual users. Locust implements concurrent users based on coroutines. Coroutines are units smaller than threads, also called sub-threads. Multiple coroutines can be run in a thread. Therefore, Locust based on coroutine can not only perform multi-machine stress test deployment, but also complete it in one host when performing distributed deployment.
4.2 Pressure test tool: locust, version: 0.14.5
4.3 Locust local host deployment steps
1. First, we need to write the ABC.py file that the program needs to run through python+locust
Script description:
Create a new class TPush (TaskSet), inherit TaskSet, write the required interface and related information below this class; self.client calls get and post methods, the same as requests;
@task decorates this method as a user behavior, and the parameter in parentheses indicates the execution weight of the behavior: the larger the value, the higher the execution frequency, and the default is 1 if it is not set;
on_start method: This method will be called when the simulated user starts to execute the TaskSet class;
The WebsiteUser() class is used to set the basic properties of the generated load:
tasks: point to the class that defines the user's behavior;
min_wait: the minimum waiting time between execution of tasks that simulate load, in milliseconds;
max_wait: The maximum waiting time between execution of tasks that simulate load, in milliseconds.
2. Create a master run by locust:
After writing the locust file that needs to be run, we need to use the command "locust -f xxxx.py --master" to create a master. This master does not participate in the creation of concurrent users. It is mainly for monitoring and collecting statistics.
2.1 Provided to the web, we can use the system function of os in python to directly write cmd commands to run
2.2 Using no-web, you can also use the system function of os in python to directly write cmd commands to run
Of course, you can also directly enter the command to execute
At this point, we run the py file and you can see the following information, which means that we have started the locust distributed master node master
3. Create a slave running locust
We can create a branch node slave for manufacturing concurrent users. The method is to copy the master file, and then modify the file name. Make sure that the file run by the slave is the same as the master file. It depends on how many CPUs your machine has. How many slaves can be obtained, generally a slave will use a cpu to create concurrent users
ABC.py ABC_slave1.py 3.1 Then each slave file needs to enter the cmd command in the main function
3.2 You can also directly enter the command to execute the slave
Run multiple slaves, you can check the following to determine whether the distributed deployment is successful
When all slaves are turned on, you can view the console information of the master. Seeing the following information indicates that the locust distributed deployment is successful
4. Open the web terminal and you can also view the distributed slave
login page
Before starting the stress test, you need to fill in the total number of users (Number of total users to simulate), the number of users added per second (Spawn rate), and the domain name (HOST).
The Spawn rate will stop increasing when the number of users reaches the Number of total users to simulate.
View the number of slave nodes
From the above figure, it can be seen that the slaves are all in the running state, and the pressure distribution is uniform.
Test result data download
There is no way to set the execution time when using web mode to perform stress measurement in locust, so the stress measurement in web mode will always run until you click stop. If you want to run the stress measurement in web mode for a period of time, it will automatically stop. , Then you need to add the stop_timeout parameter in the parameter configuration, the value value is the execution time (unit is s)
class websitUser(HttpLocust):
task_set = Login #Define the set of tasks that need to be performed
min_wait = 1000 #Minimum waiting time
max_wait = 2000 #Maximum waiting time
stop_timeout = 60 # Stop time
5 Judge the performance bottleneck of the interface through the locust visual interface
Run for a period of time when the total number of users is 100 and an increase of 1 user per second, we check the locustweb interface:
It is found that tps is stable at a certain value (9.8). At this time, we click on Charts to view the detailed distribution map of TPS, latency and the number of users.
It can be seen that before the TPS is 10, it increases with the increase of the number of users; but after increasing to 10, no matter how the number of users increases, the tps is still stable at about 10, only the delay is increasing;
Check server performance again
It can be seen that the cpu is occupied by the program, then we can initially think that the tps of the interface is 10, and the bottleneck is on the cpu.
6. Data collection practice
Single push interface: /single
Multi-push interface: /mutiplie
- data analysis
During the stress test, the memory consumption is not as obvious as the cpu consumption. From the perspective of system stability, it is necessary to consider performance indicators under stable conditions when the service is not linked. The data collection needs to be analyzed, and the analysis has the following conclusions:
Single push: increase the number of users per second to 3, the number of concurrent users is 160, and the CPU consumption is about 80% to continue the pressure test to obtain the following performance indicators.
Multi-push: Increase the number of users per second to 2, the number of concurrent users is 120, and the cpu consumption is about 78%. Continue the pressure test to obtain the following performance indicators.
Note: (For multiple pushes, considering the size of the body, set the number of deviceTokens to 3, there is no difference for the interface, and resources are only consumed when the data is on the ground. In addition, with the help of the experience of single push and the system logic of multiple pushes, it is easy to locate The resource consumption of more pushes. So the data collection is relatively small.)
8. Test conclusion
Single push interface: The interface TPS pressure test result is 1757/s, and 90% of requests can be responded within 0.1 seconds. Because multiple machines are deployed online, the TPS is only limited to the current configuration and deployment. The main bottleneck is equipment upgrades ;
Multi-push interface: The interface TPS pressure test result is 1342/s, 90% of requests can be responded within 0.1 second, because multiple machines are deployed online, so the tps is only limited to the current configuration and deployment, the main bottleneck is equipment upgrade ;
The data quoted in the article only represents service performance in a specific stress test environment
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。