Digital Display Open Sources Takin, the world's first production environment full-link stress testing platform
On June 25, the well-known domestic high-availability expert Numerical Technology announced the open source of its core product capabilities, opening the source code of the full-link stress testing platform product to the outside world, and officially named it Takin.
Currently, the project has been released on Github. world's first open source full-link stress testing platform. Takin's open source will provide more companies with ultra-low threshold, ultra-low cost, and ultra-high efficiency performance assurance capabilities.
What is the full-link stress test in the production environment?
full-link stress test is a way to use the lowest cost to make the system reach the most definite performance target. It can guarantee the continuity of the business and give the IT system the ability to resist fragility and quickly locate problems.
The IT system is built by engineers based on a series of basic components in combination with specific business scenarios. The limitations of the basic components themselves and the uncertainty of the code will cause great uncertainty in the entire system. Uncertainty will make the system behave very fragile when facing a series of "risk" scenarios (peak scenarios), so how to make the system have anti-fragility capabilities?
Through the full-link stress test of the production environment, the "risk" business behavior scenario is truly simulated, the system performance is monitored in real time, the uncertain factors in the system are identified and quickly located in advance, and the uncertain factors are processed to optimize the system resource ratio. Use the lowest hardware cost to make the system calmly face various "risk" scenarios and achieve the expected system performance goals. In this way, a normalized and stable stress testing system is implemented in the production environment, and the long-term performance of the IT system is stabilized.
Performance testing has gone through four stages of evolution from offline to online:
1 Demand drive pressure test stage
Demand-driven pressure testing usually uses simple tools for single-interface or single-system pressure testing, and can also perform some simple performance problem analysis, but in many cases there is no dedicated testing team, and independent pressure testing needs to be developed.
2 Performance regression system stage
Set up a dedicated performance test team to build an offline performance test quality platform, with full link stress testing capabilities in complex scenarios, and performance problem location capabilities.
Three questions are more representative at this stage:
- Many companies have performed performance tests offline, but there are still many problems when they arrive online. The online environment is evaluated by the pressure test results of the test environment, and the effect is not good.
- Business growth and increased marketing activities have left test engineers at a loss as to the protection of activities. Every time problems with marketing activities frequently affect the company's image.
- The efficiency of performance stress testing cannot meet the increasing demand for performance stress testing. As a result, some projects are directly online without performance stress testing, and online failures frequently occur.
In order to solve the uncertainty of the performance stress test of the test environment, the performance stress test began to evolve to the production environment and entered the stage of the performance stress test of the production environment.
3 Production read-only business stress test phase
In the test environment regression system stage, the performance stress test of the production read-only business was added, the stress test of the production environment was practiced, the production environment performance stress test regression system was built, and the performance problem analysis ability of the read-only business production stress test was built.
4 Full-service full-link stress test phase
On the basis of the previous stage, the performance stress test of the write service is added, and then the whole link stress test is carried out for the whole service. It has the performance stress test ability of the whole service and the problem location ability. If it is better, it will increase the system. Protection capabilities, such as downgrading, current limiting, fault drills, etc.
Just as Cao Xuefeng, CEO of Digital Technology, said, " Our original intention of open source Takin is actually very simple. It is to let more companies use good products, help companies provide better user service experience, and release more energy to expand their business. I believe. Everyone's feedback on the use of the product itself has a positive effect on the development iteration of the product itself, and mutual benefit realizes a virtuous circle. ”
At present, most companies are still using traditional performance stress testing methods, but with the development of distributed and microservice architecture, this method can no longer meet the guarantee of system performance. Digital Technology has decided to perform full-link stress testing on this production environment. The product is open sourced and officially named Takin.
Of course, Takin does more than that. The biggest feature of open source is openness, tolerance and innovation. It is hoped that the open source of the product can stimulate technological innovation with an open working method, and attract more outstanding developers in the industry to join the co-creation team of the full-link stress testing technology in the production environment, so that the technology can be more grounded and connect different usage scenarios.
Microservice architecture has been widely used in modern system architecture. The dual role of business complexity and system complexity makes it difficult to ensure and maintain the high availability of the entire system, and it also has a large negative impact on R&D efficiency.
In order to solve the performance bottleneck and ensure the high availability of the system, it is necessary to implement performance testing on the system, but traditional performance testing has three major problems: simulation, locality and black box.
Performing performance stress testing in the production environment is recognized as the best solution, but it is also a very challenging thing. It is easy to pollute the data such as databases and logs on the live network. The operation of cleaning the production environment test data is complicated and dangerous. For this reason, the full-link stress measurement technology in the production environment came into being.
As the first open source product for full-link stress testing in production environments, Takin can greatly help companies reduce the development complexity of the full-link stress testing platform for production, and obtain link governance and data isolation without business code intrusion. , Performance bottleneck positioning and other production stress testing core capabilities.
What is Takin?
Takin is an open source system based on Java, which can be embedded in each application node without the intrusion of business code to achieve full-link performance testing in the production environment. It is suitable for complex microservice architecture systems.
Takin architecture diagram
Takin has the following 4 features:
(1) Service code 0 intrusion: When accessing, collecting and implementing logical control, no service code needs to be modified;
(2) Data isolation: performance testing can be implemented without polluting the production environment data and logs, and direct performance testing can be performed on the write type interface in the production environment;
(3) Link management: It can help business and microservice architecture to analyze business links, and obtain link information from a functional perspective in a technical way;
(4) Performance bottleneck positioning: The performance test results can directly show the microservice architecture nodes with performance bottlenecks in the entire link.
Takin's open source content
Takin open source content mainly includes three parts: Agent probe, control middle station and big data module. A probe (agent) is implanted in a Java application. It can collect performance data, control the flow of test traffic, and report the data to the big data module. The big data module will perform some real-time calculations and store the data. The console is Responsible for the management and presentation of these business processes. Each of the three parts performs their duties to provide a normalized production environment full-link stress test service without code intrusion for the business.
The GitHub open source address is as follows:
open source community:
About Number Series : Founded in 2016, Number Series Technology is a leading system high-availability expert in China, initiated and established by a number of senior Alibaba experts. Aiming to solve the micro-service architecture governance and performance issues as the core, provide a comprehensive guarantee for the performance and stability of the enterprise system, and build a complete product matrix covering multiple modules such as full-link stress testing, E2E inspection, and fault drills. Committed to helping companies increase system availability to 99.99%.