2
Introduction to a qualified performance test specialist possess for 160bedd3b00cad? Is performance testing really only testing? not that simple!

I am Xixie, the person in charge of Alibaba Cloud's elastic computing performance test.

I started to set up the performance test team of Alibaba Cloud Elastic Computing in 2018. From completing a set of performance tests in a week, to triggering a set of fully automatic performance tests in just 1 minute, to the final result sorting out with one click, internally named for the opening of the sky axe.

Now Kaitian axe has undertaken all the performance testing of the entire elastic computing: new technologies, new equipment, new specifications, etc., to ensure the stability of online performance. In addition to performance testing, it is also responsible for solving customer performance problems. During the period, I was also responsible for the "Transformers" project, and productized the underlying resource management of the physical machine, so that the current ever-changing specifications on the line can be easily implemented with only one template.

share with you some of the gains in performance testing over the past 3 years.

What is performance testing?

Performance testing is different from functional testing. Functional testing verifies whether a certain function is completed, and performance testing verifies whether a certain performance is achieved.

To verify performance, you need a lot of performance "scales", such as measuring CPU performance: SPEC CPU, UnixBench, etc.; measuring network performance using netperf, iperf, sockperf, etc.; measuring storage performance is best to use fio, and measuring memory bandwidth performance is stream.

The above mentioned are all micro Benchmarks. Usually, if a certain area is not well done, you can directly raise the defect to the relevant team for optimization; and the customer actually perceives the real business scenario, then you need to simulate the customer’s scenario, such as the customer often It uses nginx, redis, MySQL, etc., through actual scene tests to verify the pros and cons of the current server. Only by doing a good performance test can you build up your own confidence, and do a good performance analysis can you know where is weak and where to work!

What are the performance tests to do?

Needless to say, the importance of performance, I just skipped it. A server has 5 blocks: CPU, memory, storage, network, and OS. Yes, it is to verify the performance of these modules. The above also briefly introduces which tools should be used for performance testing. But for performance testing, if it is just a test, it is too simple, in fact, there are a lot of considerations.

performance of 160bedd3b00f68 can mainly be seen in the following aspects:

  1. What tools are used to verify? How to ensure the completeness of the performance test? How to quickly and continuously build performance tests, manual running is abolished, how to build engineering?
  2. How to design performance test? How to design pertinently? Performance testing is actually to test an expectation.
  3. How to analyze the performance data that comes out, and which one should I focus on?
  4. Is there a difference in performance? Why is there a difference? Virtualization issues, physical machine issues, OS issues, Benchmark issues? still is...
  5. How to analyze abnormal performance data? This problem is generally only solved by the architect of the general business team.

Maybe the problem is still a bit abstract, let me cite a few more specific examples.

First talk about CPU related:

● What is the difference between Intel, AMD and ARM? Instruction difference

● Differences in different generations of CPU

● CPU main frequency, base frequency, turbo frequency, P0-n, P0-1

● Is the CPU PIN live?

● Hyper-threading switch: bottom switch, OS switch

● NUMA architecture: membind, interleave...

● Power strategy: performance, powersave, C-State contact

● TDP: When the turbo frequency is not up to expectation, it is useful to see if TDP is restricted.

● L3 Cache size

● Memory bandwidth, memory latency

● OS: kernel version, CPU vulnerability switch

● CPU microcode

● Software: glibc version, etc.

● Different versions and special compilers: AOCC, icc

All of the above are the basic knowledge that the CPU needs, and it is even necessary to go deep into the bottom to see the problems, such as looking at some things related to the PMU of the bottom CPU. The knowledge of other modules is also essential.

In addition to knowledge, the completeness of each module test is also very important, such as analyzing network performance data, focusing on the following performance:

● Network bandwidth: multi-stream, single-stream

● Network PPS: multi-stream, single-stream

● Number of network session connections

● Number of new connections to the network

● Network delay

● Long network connection, short network connection performance

● Network concurrency performance: Can the concurrency effect continue for the above performance in the case of multiple machines?

● Network performance stability: network pressure becomes larger (pps becomes larger or bandwidth becomes larger or the number of sessions becomes larger), and the delay stability

● Network packet loss rate and retransmission rate

After the network performance data comes out, how do you look at these data? The following figure is a common quantile chart:

时延统计数值.png

In most cases, when analyzing and defining a data, you can't just look at min, average, and max, but quantile performance data is very important, as well as volatility.

If you can answer these questions very well, you must be a powerful hero.

Why is performance testing not so simple?

If the performance test is only a test, it is fairly simple. Of course, complex tests are also very complicated, such as interference tests (simulating noisy neighbors). But in general, testing is a means of engineering automation and efficiency improvement.

?

It is difficult to implement the topic in the previous section. For example, memory latency, in the case of the Intel CPU, there are 2 NUMA NODEs in the case of the whole machine, and the bottom layer is 2 CPUs, then the memory latency of the current Node and the memory latency across Nodes need to be considered. After knowing these, then the phenomenon shown in later applications may have relevant explanations.

For example, break down a performance test step, as follows:

性能测试方案.png

There is only one link here: the test link, which requires shallow skills. There is a misunderstanding. Many people think that performance testing is the "test" part, of course not!

Breakdown, design, analyze, if you are not an experienced performance engineer, how can you break down, desgin, or even analyze.

For example: Intel Cascadelake CPU performance test. When you breakdown, you need a lot of background knowledge, which will determine your future performance analysis expectations.

  1. The difference between : clock speed setting, CPU algebra difference, etc.
  2. virtualization solution : Whether to live with PIN and whether to cross Socket.
  3. underlying technology of adapted to 160bedd3b019f1: Whether the network storage is optimized.
  4. instance form : What are the specifications and how to bind the CPU.

and many more......

Next is the performance test. When testing a series of performance tests, a complete performance test plan is needed to make a comprehensive comparison. This industry only has a set of mature CPU performance tests such as SPEC CPU, SPEC JBB, UnixBench, etc., but there is no comprehensive server performance test program, which makes customers very confused when going to the cloud. How to comprehensively measure the performance of cloud servers? All these need to be accumulated and enriched and form a complete system.

After the performance test is over, you need to do performance analysis. I talked about how to look at network performance data before, but sometimes there are some strange data that need to be explained.

gives a practical example: such as the network single-stream performance test . Sometimes you will find that the fluctuation is very severe. At first, I thought that the network performance was unstable. After actual investigation, it was found that when the CPU with a probability of interruption and the CPU where the network process is located are on the same core, the performance will be very poor, and it is hard to isolate through taskset. Open, interrupt and network test processes are on different CPUs, and the network single-stream performance is up and stable.

Therefore, the performance test is knowledge reserve is thick enough, whether the business performance, performance design and final performance analysis can be fully understood, as well as the determination of the expected data results, etc. These are not overnight efforts!

What is the solution? I think there are three main points:

learn : If you analyze a business system from top to bottom, each can be developed: business architecture, OS, a language, underlying virtualization, and CPU micro-architecture are all worth learning. recommends the book "Top of Performance" , which can get you started.

: Every Case is an opportunity and it is worth studying.

delve into : You will definitely encounter problems, do not abandon or give up when you encounter problems, and carry forward the spirit of Geek!

Basic skills required by performance testers

At the end of the article, I will share the basic skills that I think performance testers should possess.

Computer system knowledge : Computer composition principles, operating systems, compilation principles, computer networks, software testing, Linux kernel analysis... University courses are required. Not only need to have book knowledge, but also analyze the actual problems.

architectural thinking : Any demand or problem requires a high position to examine the problem and think about how to design; especially the customer's problem, whether the current architecture is reasonable, and what direction the problem is.

automated testing : How to automate these manual tests is necessary to liberate productivity, and at the same time to easily complete online inspections. For special problems, it is also necessary to have the ability to construct special test cases.

analysis skills : Should be proficient in various performance analysis tools: top, vmstat, mpstat, pidstat, iostat, sar, flame graph, etc., high-level skills such as BPF.

Practical work experience : This is too important when designing Case, he knows what he wants and what not!

delve into the spirit of : a system involves all aspects of the field, from the upper-level business to the bottom-level implementation, there will be many unknowns, which require painstaking research.

This may be a little fictitious, especially for architectural thinking. Only when you have been exposed to many cases can you have some understanding. So generally speaking, it is very difficult for a fresh graduate to do a good performance test.

For example: Generally speaking, there will be many micro Benchmarks, such as test network PPS, network bandwidth, etc. What is the actual impact of these on users? It is natural to think that it is really difficult to move the user's case over with such a simple sentence: Why should the user's case be designed in this way? Behind each Case is designed by a business architect.

How should the user's case be tested? How does the customer actually stress the test? Where should the pressure of the server be set? Then, how can someone who has never engaged in actual business needs know that this is reasonable? Wouldn't it be funny if there are such a group of people designing user case performance tests? Therefore, if you want to perform a real case performance test, you must be a classmate with first-line combat experience. He knows what kind of pressure is the most appropriate, and he knows whether the server performance is sufficient and whether it needs to be improved.

to sum up

In summary, performance testing is not that simple. Performance testing is not just a test. His requirements are much higher than that of a tester. He is a performance architect and an all-rounder!

Finally, please remember that a good performance engineer is fed, and he has to go through thousands of trials before the sharp sword comes out of the sheath! mutual encouragement.

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。