[Technical Architecture—Key Points] Scalability Architecture

The so-called scalability of the system means that there is no need to change the software and hardware design of the system, and the service processing capacity of the system can be expanded or reduced only by changing the number of deployed servers.

Basic design

In the gradual evolution of the application system, the most important technical means is to use server clusters to enhance the processing capabilities of the entire cluster by continuously adding servers to the cluster.

Generally speaking, the scalability design of the system can be divided into two categories, one is to physically separate functions to achieve scalability, and the other is to achieve scalability for a single function through clusters. The former is that different servers deploy different services and provide different functions; the latter is that multiple servers in the cluster deploy the same service and provide the same functions.

In the early days of application system development, when increasing the processing capacity by adding servers, the newly added servers always separated some functions and services from the existing servers. Specifically, it can be divided into vertical separation and horizontal separation. Vertical separation is business stratification, and horizontal separation is business separation.

Separate deployment of different functions can achieve a certain degree of scalability, but with the gradual increase in traffic, even if the separation reaches the smallest granularity, a single server cannot meet the requirements of business scale. Therefore, a server cluster must be used, that is, the same service is deployed on multiple servers to form a cluster as a whole to provide services to the outside world.

application layer design

The application server should be designed to be stateless. If the servers deployed with the same application form a cluster, each user request can be sent to any server in the cluster for processing. In this way, as long as user requests can be distributed to different servers in the cluster according to a certain rule, an application server cluster can be formed.

This request distribution device is a so-called load balancer. Load balancing is an indispensable basic technical means for websites. It can not only improve the usability of the website, but also realize the scalability of the website. Specific technical implementations are also diverse, and the basic technologies are nothing more than the following.

HTTP redirect

The HTTP redirect server is an ordinary application server. Its only function is to calculate a real web server address according to the user's HTTP request, and write the web server address into the HTTP redirect response (response status code 302 ) Return to the user's browser .

The advantage of this load balancing scheme is that it is relatively simple. The disadvantage is that the browser requires two requests to the server to complete one visit, and the performance is poor; the processing capacity of the redirect server itself may become a bottleneck, and the scalability of the entire cluster is limited; the use of 302 response code redirection may cause search engines Determined as SEO cheating, lowering the search ranking. So this kind of scheme is rare.

DNS resolution

We can register multiple A records for the application URL in the DNS server. In this case, will calculate a different IP address according to the load balancing algorithm and return . The multiple servers configured in the A record form a cluster , To achieve load balancing.

The advantage of DNS domain name resolution load balancing is that it transfers the work of load balancing to DNS, which saves the trouble of website management and maintenance of load balancing servers. At the same time, many DNSs also support domain name resolution based on geographic location, that is, resolve the domain name to the user's geographic distance. The address of the nearest server, which can speed up user access.

But DNS domain name resolution load balancing also has disadvantages. DNS is a multi-level resolution. may cache A records. When a server is offline, even if the A record of DNS is modified, it will take a long time to make it effective. During this period of time, DNS will still resolve the domain name to the offline server, resulting in user access failure ; and the control of DNS load balancing is with the domain name service provider, and the website cannot do more improvements and more powerful management. .

Large-scale websites always partially use DNS domain name resolution, using domain name resolution as the first-level load balancing method. set of servers obtained by 160a26fe5491b4 domain name resolution are not physical servers that actually provide Web services, but internal servers that also provide load balancing services. This group of internal load balancing servers perform load balancing and distribute requests to real Web servers.

reverse proxy

Earlier we mentioned that you can use reverse proxy cache resources to improve website performance. At the same time, most reverse proxy servers also provide load balancing functions, manage a group of Web servers, and forward requests to different Web servers according to the load balancing algorithm. The response processed by the Web server also needs to be returned to the user through the reverse proxy server.

Because the reverse proxy server forwards requests at the HTTP protocol level, it is also called application layer load balancing. has the advantage of being integrated with the reverse proxy server function and easy to deploy. The disadvantage is that the reverse proxy server is a transit point for all requests and responses, and its performance may become a bottleneck.

IP load balancing

At the network layer, load balancing can be performed by modifying the request target address. After the user requests the data packet to reach the load balancing server, the load balancing server calculates a real web server address according to the load balancing algorithm, and then modifies the destination IP address of the data packet to this address. After the processing of the real web application server is completed, the response data packet is returned to the load balancing server, and the load balancing server modifies the source address of the data packet to its own IP address and sends it to the user's browser.

The key here is how the actual physical web server response data packet is returned to the load balancing server. One solution is that the load balancing server modifies the source address while modifying the destination IP address, and sets the source address of the data packet to its own IP, that is, source address translation (SNAT), so that the response of the web server will return to the load balancing server; One solution is to use the load balancing server as the gateway server of the real physical server cluster at the same time, so that all response data will reach the load balancing server.

IP load balancing completes data distribution in the kernel process, which has better processing performance than reverse proxy load balancing (distributing data in applications). However, since all requests and responses need to pass through the load balancing server, the maximum response data throughput of the cluster has to be restricted by the bandwidth of the load balancing server network card.

So, can the load balancing server only distribute requests, while the response data is directly returned from the real physical server to the user?

Data link layer load balancing

As the name implies, load balancing at the data link layer refers to modifying the mac address at the data link layer of the communication protocol for load balancing. load balancing data distribution process, the IP address is not modified, only the destination mac address is modified. By configuring the virtual IP of all machines in the real physical server cluster to be consistent with the load balancing server IP address, the source IP address and destination IP address of the data packet are not modified. It can be used for the purpose of data distribution.

Since the actual physical server IP that actually processes the request is the same as the destination IP of the data request, there is no need to perform address translation through the load balancing server, and the response data packet can be directly returned to the user's browser to avoid the load balancing server network card bandwidth from becoming a bottleneck. This load balancing method is also called direct routing (DR).

As shown in the figure, after the user request reaches the load balancing server 114.100.80.10, the load balancing server changes the destination mac address of the requested data to 00:0c:29:d2, because the virtual IP addresses of all servers in the Web server cluster are equal to the load. The IP address of the server is the same, so the data can be transmitted normally to the server corresponding to the mac address 00:0c:29:d2. After processing, the server sends the response data to the gateway server of the website, and the gateway server directly sends the data packet to the user for browsing The response data does not need to pass through the load balancing server.

Data link layer load balancing is currently a widely used load balancing method. The best link layer load balancing open source product on the Linux platform is LVS (Linux Virtual Server).

Load balancing algorithm

The foregoing describes how to send request data to the Web server, and the specific load balancing algorithms usually have the following:

Round Robin (RR): All requests are distributed to each application server in turn, which is suitable for scenarios where all server hardware is the same.
Weighted Round Robin (WRR): According to the hardware performance of the application server, on the basis of polling, the request is distributed to each server according to the configured weight.
Random: The request is randomly distributed to each application server. In many cases, this scheme is very simple and practical. Even if the hardware configuration of the application server is different, a weighted random algorithm can be used.
Least Connections: Record the number of connections (requests) being processed by each application server, and distribute new requests to the server with the least connections. Similarly, the least weighted connection can also be achieved.
Source Hashing: Hash calculation is performed based on the IP address of the request source to obtain the application server, so that requests from the same IP address are always processed on the same server.

cache design

different from the application server cluster where all servers deploy the same application. The data cached in different servers in the distributed cache server cluster are different. The cache access request cannot be processed on any one of the cache server clusters. The cache must be found first There are servers that need data before they can be accessed. will seriously restrict the scalability design of the distributed cache cluster, because the newly launched cache server does not cache any data, and the offline cache server also caches many hot data of the website.

The newly launched cache server must have the least impact on the entire distributed cache cluster. That is to say, after the new cache server is added, the cached data in the entire cache server cluster should be accessed as much as possible. This is the scalability of the distributed cache cluster. The main goal of the design.

routing algorithm

In a distributed cache server cluster, for the management of the server cluster, the routing algorithm is very important. Like the load balancing algorithm, it determines which server in the cluster should be accessed.

remainder Hash algorithm

The remainder Hash is the simplest routing algorithm: divide the number of servers by the hash value of the cached data KEY, and the remainder is the server list subscript number. Due to the randomness of HashCode, the use of the remainder Hash routing algorithm can ensure that the cached data is more evenly distributed in the entire server cluster.

A slight improvement to the remainder Hash routing algorithm can achieve the same weighted routing as the weighted load balancing in the load balancing algorithm. In fact, if you don't need to consider the scalability of the cache server cluster, the remainder Hash can almost meet the vast majority of cache routing requirements.

However, when the distributed cache cluster needs to be expanded, serious problems will occur. It is easy to calculate that if the capacity is expanded from 3 servers to 4 servers, about 75% (3/4) of the cached data cannot be hit correctly. As the size of the server cluster increases, this proportion increases linearly. When a new server is added to a cluster of 100 servers, the probability of not being hit is 99% (N/(N+1)).

One solution is to expand the cache server cluster when the number of website visits is the least, and the load impact on the database is minimal at this time. Then gradually warm up the cache by simulating the request method to redistribute the data in the cache server. However, this solution has requirements for business scenarios, and a specific time period needs to be selected, which is more stringent.

consensus Hash algorithm

The consistent Hash algorithm implements the Hash mapping from KEY to the cache server through a data structure called a consistent Hash ring, as shown in the figure.

The specific algorithm process is: First construct an integer ring with a length of 0~2³² (this ring is called a consistent Hash ring), and place the cache server nodes according to the hash value of the node name (the distribution range is also 0~2³²) On this Hash ring. Then calculate the hash value according to the KEY value of the data to be cached (the distribution range is also 0~2³²), and then find the cache server node closest to the hash value of the KEY on the Hash ring clockwise, and complete the KEY to the server Hash mapping lookup.

When the cache server cluster needs to be expanded, you only need to put the hash value of the newly added node name (NODE3) into the consistent Hash ring. Since the KEY searches for the nearest node clockwise, the newly added node only in the entire ring is affected, as shown in the dark section in Figure 6.

After adding the new node NODE3, most of the original KEY can continue to be calculated to the original node, only KEY3, KEY0 are recalculated from the original NODE1 to NODE3. This will ensure that most of the cached data can continue to hit.

specific application of 160a26fe5495b6, this consistent Hash ring with a length of 2³² is usually implemented using a binary search tree. The Hash search process is actually to find the smallest value not less than the number of searches in the binary search tree. Of course, the rightmost leaf node of this binary tree is connected to the leftmost leaf node to form a ring.

However, the above algorithm still has a small problem. Assuming that the loads of the original three servers are roughly equal, the newly added node NODE3 only shares part of the load of node NODE1, which means that the amount of cached data and load pressure of NODE0 and NODE2 are greater than that of NODE1 and NODE3, which is about probability. 2 times. This result is obviously not what we want.

The solution is also very simple. Any computer problem can be solved by adding a virtual layer. To solve the load imbalance problem caused by the above consistent Hash algorithm, you can also use the virtual layer method: virtualizes each physical cache server as a group of virtual cache servers, and places the hash value of the virtual server on the Hash ring. KEY first finds the virtual server node on the ring, and then obtains the physical server information .

In this way, when a new physical server node is added, a group of virtual nodes are added to the ring. If the number of virtual nodes is large enough, this group of virtual nodes will affect the same majority of virtual nodes that already exist on the ring. These existing virtual nodes The nodes correspond to different physical nodes. The final result is: adding a new cache server will affect all the servers that already existed in the original cluster more evenly. as the picture shows.

Obviously, the more virtual nodes corresponding to each physical node, the more balanced the load between the physical nodes, and the more consistent the impact of newly added physical servers on the original physical servers, but too many will affect performance. So in practice, how many virtual server nodes are suitable for a physical server? Generally speaking, the empirical value is 150. Of course, according to the cluster size and the accuracy requirements of load balancing, this value should be treated according to the specific situation.

Data storage layer design

The scalability of the data storage server cluster puts forward higher requirements on the durability and availability of the data, because the data storage server must ensure the availability and correctness of the data under any circumstances.

Scalability design of relational database cluster

At present, the main relational data on the market supports the data replication function, and the database can be easily scaled using this function. The figure below shows a MySQL cluster scalability solution using data replication.

In this architecture, multiple MySQL instances are divided into master and slave. Data write operations are on the master server. The master server synchronizes data to other slave servers in the cluster. Offline operations such as data read operations and data analysis are on the slave server. Carried on.

In addition to the separation of database master and slave read and write, the business segmentation mode mentioned earlier can also be used in the database. Different business data tables are deployed on different database clusters, commonly known as data sub-databases.

In a large-scale system, even if database sharding and master-slave replication are carried out, for some tables whose single-table data is still very large, sharding is needed to separate a table and store it in multiple databases. At present, the commonly used middleware for sub-database and sub-table in the market is Mycat.

Compared with the powerful functions of the relational database itself, the current functions of various sub-database and sub-table middleware are very simple, restricting the use of certain functions of the relational database. However, when the website business is facing the pressure of constantly growing massive business data storage, it has to use the cluster scalability of the distributed relational database. At this time, it is necessary to avoid the various shortcomings of the distributed relational database from the business: avoid transactions or Use transaction compensation mechanism to replace database transactions; decompose data access logic to avoid JOIN operations, etc.

Further reading:
[Distributed-Key Points] Data Replication
[Distributed-Key Points] Data Partition

[Technical Architecture—Key Points] Scalability Architecture

Basic design

application layer design

HTTP redirect

DNS resolution

reverse proxy

IP load balancing

Data link layer load balancing

Load balancing algorithm

cache design

routing algorithm

remainder Hash algorithm

consensus Hash algorithm

Data storage layer design

Scalability design of relational database cluster

与昊

引用和评论

用 Go 写一个简单消息队列（六）：服务器实现

得物业务参数配置中心架构综述

分析型数据库入门指南：如何选择适合你的实时分析工具？

字节跳动开源 Godel-Rescheduler：适用于云原生系统的全局最优重调度框架

最近爆火的MCP究竟有多大魅力？MCP开发初体验｜得物技术

HarmonyOS NEXT hiLog日志类封装

HarmonyOS NEXT ArkTS布局优化与性能提升指南