[Technical Architecture—Key Points] High-Performance Architecture

Performance is an objective indicator, which can be specific to technical indicators such as response time and throughput. It is also a subjective experience. The experience of users is different from that of engineers, and different users have different feelings.

performance test

performance test index

There are different performance standards under different viewing angles, and different standards have different performance test indicators. The commonly used indicators are as follows:

Response time: refers to the time required by the application to perform an operation, including the time required from the time the request is issued to the time it takes to receive the final response data. Response time is the most important performance index of the system, which intuitively reflects the "speed" of the system.
Concurrent number: refers to the number of requests that the system can process at the same time. This number also reflects the load characteristics of the system.
Throughput: refers to the number of requests processed by the system per unit time, reflecting the overall processing capacity of the system.

performance test method

Performance testing can be subdivided into performance testing, load testing, stress testing, and stability testing.

Performance testing is a process of continuously increasing access pressure to the system to obtain system performance indicators and maximum load capacity. The so-called increasing access pressure in the system test environment is to continuously increase the number of concurrent requests of the test program. Generally speaking, the performance test follows the parabolic law as shown in the figure.

In the initial stage, as the number of concurrent requests increases, the system uses less resources to achieve better processing capabilities (sections a to b). This section is the daily operating range of the website, and most of the website’s access load pressure is Focusing on this interval is called performance testing. The goal of the test is to evaluate whether the system performance meets the requirements and design goals.

As the pressure continues to increase, the system's processing capacity increases slowly until it reaches a maximum value (point c), which is the maximum load point of the system. This section is called a load test. The goal of the test is to evaluate the maximum access load pressure that the system can withstand under normal operating conditions when the system exceeds the daily access pressure due to emergencies.

After this point, increase the pressure, the processing capacity of the system will drop, but the resource consumption will be more, until the resource consumption reaches the limit (point d), this point can be regarded as the point of system collapse, beyond this point, continue to increase The number of concurrent requests. The system can no longer process any requests. This section is called a stress test. The goal of the test is to evaluate the maximum access load pressure that may cause the system to crash.

performance optimization strategy

If the system has performance problems, it is necessary to analyze each link of the request experience, troubleshoot possible performance bottlenecks, and locate the problem.

Checking the performance bottleneck of a website is basically the same as Check the logs of each link of the request processing, analyze which link has an unreasonable response time and exceed expectations; then check the monitoring data and analyze the main factors affecting performance Is it the memory, disk, network, or CPU, the code problem or the unreasonable architecture design, or the lack of system resources.

After locating the specific cause of the performance problem, performance optimization is required. According to the hierarchical structure of the website, it can be divided into three categories: Web front-end performance optimization, application server performance optimization, and storage server performance optimization.

Web front-end performance optimization

Generally speaking, the web front end refers to the part before the business logic of the website, including browser loading, website view model, image service, CDN service, etc. The main optimization methods include optimizing browser access, using reverse proxy, CDN, etc.

Browser access optimization

Reduce http requests

The HTTP protocol is a stateless application layer protocol, which means that each HTTP request needs to establish a communication link and perform data transmission, and on the server side, each HTTP needs to start an independent thread to process. These communications and services are expensive. Reducing the number of HTTP requests can effectively improve access performance.

The main means to reduce HTTP is to merge CSS, merge JavaScript, and merge images. combines the JavaScript and CSS required for one visit by the browser into one file, so that the browser only needs one request. Pictures can also be merged. Multiple pictures can be merged into one. If each picture has a different hyperlink, CSS offset can be used to construct different URLs in response to mouse clicks.

Use browser cache

For a website, the update frequency of static resource files such as CSS, JavaScript, Logo, and icons are relatively low. If these files are cached in the browser, the performance can be greatly improved. By setting the attributes of Cache-Control and Expires in the HTTP header, you can set whether to enable browser caching and the caching time.

The static resource file change needs to be applied to the client browser in time. In this case, it can be achieved by changing the file name, that is, generates a new JS file and updates the reference in the HTML file. When updating static resources, a batch update method should be used, with a certain interval of , so as to avoid a sudden large number of cache failures in the user's browser, and centralized update of the cache, resulting in a sudden increase in server load and network congestion.

enable compression

Compressing files on the server side and decompressing files on the browser side can effectively reduce the amount of data transmitted by communication. However, compression puts a certain pressure on the server and browser. When the communication bandwidth is good and the server resources are insufficient, should be weighed and considered.

Reduce cookie transmission

On the one hand, a cookie is included in each request and response. A cookie that is too large will seriously affect data transmission. Therefore, which data needs to be written into the cookie must be carefully considered, and the amount of data transmitted in the cookie should be minimized.

On the other hand, for certain static resource access, such as CSS, Script, etc., sending cookies is meaningless. You can consider using independent domain names to access static resources, avoid sending cookies when requesting static resources, and reduce the number of cookie transmissions.

CDN acceleration

The essence of CDN (Content Distribute Network, content distribution network) is still a cache, and the data is cached in the nearest place to the user, so that the user can obtain the data at the fastest speed.

Since CDNs are deployed in the computer rooms of network operators, these operators are also network service providers for end users. Therefore, the first hop of the user's request for routing reaches the CDN server. When there are resources requested by the browser in the CDN, the CDN Return directly to the browser to speed up user access and reduce data center load pressure.

reverse proxy

The reverse proxy server is located on the side of the website's computer room, and the proxy website Web server receives HTTP requests. The reverse proxy server has the function of protecting the security of the website. In addition to the security function, the proxy server can also speed up web requests by configuring the caching function.

When a user accesses static content for the first time, the static content is cached on the reverse proxy server, so that when other users access the static content, they can directly return from the reverse proxy server to speed up the response speed of Web requests. Some websites will also cache some popular dynamic content on the proxy server. When the dynamic content changes, the reverse proxy will be notified of the cache failure through the internal notification mechanism, and the reverse proxy will reload the latest dynamic content and cache it again.

In addition, reverse proxy can also realize the function of load balancing, thereby improving the performance of the website under high concurrency.

Application server performance optimization

The application server is the server that handles website business. The business code of the website is deployed here. The optimization methods mainly include caching, clustering, and asynchronous.

cache

first law of performance optimization: Prioritize the use of cache to optimize performance. cache refers to storing data in a storage medium with a relatively high access speed for system processing.

Use strategy

The first prerequisite for using caching is the existence of hotspot data. If there is no hotspot access, most of the data will be squeezed out of the cache before being accessed again. And needs to cache the data read-write ratio must be high enough , the data read-write ratio should be at least 2:1, the cache is meaningful.

data is inconsistent

generally sets the invalidation time for the cached data. Once the invalidation time is exceeded, it must be reloaded from the database. Therefore, the application must tolerate data inconsistency for a certain period of time. In Internet applications, this delay is usually acceptable, but specific applications still need to be treated with caution. Another strategy is to update the cache immediately when the data is updated, but this will also bring more system overhead and transaction consistency issues.

Cache preheating

If the newly started cache system does not have any data, the performance of the system and the database load are not very good during the process of constructing the cache data, then it is best to load the hot data when the cache system is started. This cache preloading method is called Preheat the cache.

Cache Availability

With the development of the business, the cache will bear most of the pressure of data access. When the cache service crashes, the database will be down because it cannot withstand such a large amount of pressure at all, which will cause the entire website to be unavailable.

can improve the availability of the cache to a certain extent by distributing the cache data to multiple servers in the cluster through a distributed cache server cluster. When a cache server goes down, only part of the cached data is lost. Reloading this part of the data from the database will not have a significant impact on the database.

Cache penetration

If a certain non-existent data is continuously and highly concurrently requested due to improper services or malicious attacks, all requests will fall on the database because the data is not stored in the cache, which will cause a lot of pressure on the database, or even crash. A simple countermeasure is to cache non-existent data (null), or use bloom filters.

asynchronous

The use of message queues to make calls asynchronous can improve the scalability of the system and at the same time improve the performance of the system. Because the processing speed of the message queue server is much faster than the database (the message queue server also has better scalability than the database), the user's response delay can be effectively improved, and the load pressure of the database can be reduced.

The message queue has a very good peak-shaving effect—that is, stores the transaction messages generated by a short time and high concurrency in the message queue through asynchronous processing, thereby flattening the concurrent transactions during the peak period. In e-commerce website promotional activities, the reasonable use of message queues can effectively resist the impact on the system caused by the influx of orders at the beginning of the promotional activities.

It should be noted that because the data is returned to the user immediately after being written into the message queue, the data may fail in subsequent business verification, writing to the database, etc., so after using the message queue for asynchronous business processing, you need to modify the business process appropriately. with 160a09fd0c720a. For example, after the order is submitted, the order data is written into the message queue, and the user cannot be returned immediately. The successful order submission needs to be processed by the order consumer process of the message queue, and even after the goods are out of the warehouse, the user will be notified by email or SMS message. The order is successful to avoid transaction disputes.

Note: Anything that can be done later should be done later.

cluster

In the scenario of high concurrent access, use load balancing technology to build a server cluster composed of multiple servers for an application, and distribute concurrent access requests to multiple servers for processing, avoiding a single server from slow response due to excessive load pressure. Make user requests have better response delay characteristics.

Code optimization

Optimizing business codes can greatly improve system performance. There are many code optimization methods, and here we will focus on the more important aspects in summary.

multithreaded

Multi-user concurrent access is the basic requirement of the application system. The number of concurrent users of large websites will reach tens of thousands, and the number of concurrent users of a single server will reach hundreds. From the perspective of resource utilization, there are two main reasons for using multithreading: IO blocking and multiple CPUs.

Website applications are generally managed by the Web server container, and the multithreading requested by the user is usually managed by the Web server container, but whether it is a thread managed by the Web container or a thread created by the application itself, how many threads should be started on a server What? Assuming that all tasks of the same type are executed on the server, there is a simplified estimation formula for the number of threads started for this type of task for reference: Number of started threads = [task execution time/(task execution time-IO waiting time)] × CPU Number of cores.

One issue that needs attention in multi-threaded programming is thread safety, that is, synchronization. The main means to solve thread safety are the use of stateless objects and local objects (not sharing data) and the use of locks.

Resource reuse

When the system is running, it is necessary to minimize the creation and destruction of expensive system resources, such as database connections, network communication connections, threads, and complex objects. From a programming perspective, there are two main modes of resource reuse: Singleton and ObjectPool.

Although singleton is one of the more criticized patterns in the classic GoF design pattern, because the current anemia mode is mainly used in Web development, from Service to Dao are stateless objects, there is no need to create repeatedly. In this case, singleton mode is used. It's natural.

The object pool mode reduces object creation and resource consumption by reusing object instances. The so-called connection pool and thread pool are essentially object pools, and the pool management methods are basically the same.

Garbage Collection

Today's main programming languages such as Java, PHP, Golang, etc. all have automatic garbage collection. Garbage collection may have a huge impact on the performance characteristics of the system. Understanding the garbage collection mechanism of the programming language used helps program optimization and parameter tuning, as well as writing memory-safe code.

Storage performance optimization

In the application system, massive data reads and writes cause huge pressure on disk access. Although some data read pressure can be solved through Cache, the disk is still the most serious bottleneck of the system.

Mechanical hard drive and solid state drive

Mechanical hard disk is currently the most commonly used hard disk. The head arm is driven by a motor to drive the head to a designated disk location to access data. The performance of mechanical hard disks differs greatly between continuous data access and random access.

Solid-state hard drives have no mechanical devices, and data is stored on silicon crystals that can be memory permanently, so they can be accessed quickly and randomly like memory. In the application system, most of the data access is random. In this case, the SSD has better performance.

B+ tree and LSM tree

In order to improve data access characteristics, file systems or database systems usually store data after sorting to speed up data retrieval. This requires ensuring that data is still in order after being updated, inserted, and deleted. Traditional relational databases use B+ trees. ,as the picture shows.

B+ tree is a kind of N-ary sort tree optimized for disk storage. It is stored in the disk in the unit of tree node. From the root, find the node number and disk location of the required data, load it into the memory and continue. Search until you find the data you need.

At present, many NoSQL products use LSM tree as the main data structure, as shown in the figure.

LSM tree can be regarded as an N-order merged tree. Data write operations are performed in memory, and a new record will be created (modification will record the new data value, and deletion will record a deletion flag). The data is still a sorting tree in the memory. When the amount of data exceeds the set After the memory threshold is set, this sort tree will be merged with the latest sort tree on the disk. When the data volume of this sorting tree also exceeds the set threshold, it is merged with the sorting tree at the upper and lower levels of the disk. During the merging process, the old data will be overwritten with the latest updated data (or recorded as a different version).

needs to perform a read operation, it always searches from the sort tree in the memory. If it is not found, it searches in the order of the sort tree on the disk.

A data update on the LSM tree does not require disk access, and can be completed in memory, which is much faster than the B+ tree. When data access is based on write operations and read operations are focused on the most recently written data, using the LSM tree can greatly reduce the number of disk accesses and speed up the access speed.

Extended reading: [Distributed-Basic] Data Storage and Retrieval

RAID

RAID (Redundant Array of Inexpensive Disks) technology is mainly to improve disk access latency and enhance disk availability and fault tolerance. At present, server-level computers all support the insertion of multiple disks. By using RAID technology, data can be read and written concurrently and data backup on multiple disks. There are several commonly used RAID technologies, as shown in the figure.

RAID 0

When data is written to the disk from the memory buffer, the data is divided into N parts according to the number of disks, and these data are simultaneously written to N disks, so that the overall data writing speed is N times that of a disk. The same is true when reading, so RAID0 has extremely fast data read and write speed, but RAID0 does not do data backup. As long as one of the N disks is damaged, data integrity will be destroyed, and data on all disks will be damaged.

RAID 1

When data is written to the disk, a copy of the data is written to two disks at the same time, so that any disk damage will not cause data loss. Insert a new disk and it can be automatically repaired by copying the data, which has extremely high reliability. .

RAID 10

Combining the two schemes of RAID0 and RAID1, all the disks are divided into two equally, and the data is written on the two disks at the same time, which is equivalent to RAID1, but on the N/2 disks in each disk, the RAID0 technology is used to read and write concurrently , Both improve reliability and improve performance, but the disk utilization of RAID10 is low, and half of the disks are used to write backup data.

RAID 3

Under normal circumstances, two disks will not be damaged at the same time on a server. If only one disk is damaged, if the data of other disks can be used to recover the data of the damaged disk, this will ensure the reliability and performance at the same time. , Disk utilization rate has also been greatly improved.

When data is written to the disk, divide the data into N-1 copies, write them to N-1 disks concurrently, and record the check data on the Nth disk. Any disk damage (including the check data disk) can be used. Use the data of other N-1 disks to repair.

However, in scenarios with many data modifications, modifying any disk data will cause the Nth disk to rewrite the parity data. The consequence of frequent writing is that the Nth disk is easier to damage than other disks and needs frequent replacement, so RAID3 is rarely Use in practice.

RAID 5

RAID5 is very similar to RAID3, but the parity data is not written to the Nth disk, but is written to all disks in a spiral. In this way, the modification of the verification data is also averaged on all disks, avoiding the situation that RAID3 frequently writes a disk.

RAID 6

If the data requires high reliability, and if two disks are damaged at the same time, the data still needs to be repaired. At this time, RAID 6 can be used. RAID6 is similar to RAID5, but data is only written to N-2 disks, and parity information (generated using different algorithms) is written spirally on the two disks.

The comparison of various RAID technologies is shown in the table below.

RAID type	Access speed	Data reliability	Disk utilization
RAID0	quickly	Very low	100%
RAID1	very slow	Very high	50%
RAID10	medium	Very high	50%
RAID5	Faster	Higher	（N - 1）/ N
RAID6	Faster	Higher	（N - 2）/ N

RAID technology can be implemented through hardware, such as a dedicated RAID card or motherboard directly supported, or through software. RAID technology is widely used in traditional relational databases and file systems, but in NoSQL and distributed file systems that large websites prefer to use, RAID technology has been left out.