From ensuring that business is not interrupted, looking at the "past and present" of the gateway

Abstract: "distributed", "microservice", and "cloud native", they are more talking about supporting "software service" elastic scaling and load balancing. API Gateway is its first hurdle and its important components. Let's take a look at its development history, current situation and future direction.

As an important part of modern distributed, micro-service, and cloud-native systems, API Gateway is also an important topic of discussion. Let’s also talk about load balancing and the current status of API Gateway.

Now when everyone is talking about the concepts of "distributed", "microservice", and "cloud native", in addition to the "software service" function itself, they are more about how to make the service better expand to support large-scale applications. . Load balancing and API gateway are its first hurdle and its important components. Let's take a look at its development history, current situation and future direction.

Load balancing

When it comes to gateways, I have to talk about load balancing. Generally, load balancing has two different methods: server load balancing and client load balancing (such as Spring Cloud Ribbon & Eureka). For non-REST services that require high real-time performance, client-side load balancing is a common choice. For games such as Glory of the Kings and League of Legends, various daily activities entering the game are based on REST. Services, and when teaming up for games, they are usually non-REST services. There are also online collaboration products, such as Welink's IM/Presence function, whose server side can be made into a REST service, while the real-time colloration and RESE service of Welink Meeting cannot meet its real-time requirements. Most of them use client load balancing, and the basic process is as follows:

1. The registration or discovery mechanism between the load balancing gateway and the server.
2. The client requests a list of conference servers from the gateway, where the server will have some business logic to calculate some server lists and return it to the client.
3. When the client gets the server list, it will send a detection message to the server, and decide which server to choose based on the response time of the detection message (the network condition between the client and the server), and the load of the server.
4. The client and the server establish a business connection through an agreed protocol.
5. If any part of the client or server is abnormal, the client will go through the 2-4 process again to restore the business connection.

It can be seen from the above that there will be a relatively complicated process for client load balancing, but once a connection is established, its performance must be optimal. However, client load balancing cannot guarantee the REST service of the server. Once the server fails, there will be a short service interruption. But this scheme is suitable for some applications with high real-time requirements (such as some applications mentioned above). For REST services, L4 load balancing (F5, LVS, nginx, haproxy, etc.)/L7 load balancing (haproxy, nginx, apache, Mysql proxy, etc.) is a common method. At this Arch Summit, some vendors introduced their 4-layer load balancing solution. Let's take a rough look at the overall architecture and development history of load balancing in the entire distributed system.

From the perspective of the development of load balancing, the evolution process is roughly as follows according to its application scale:

The current state of API Gateway

With the popularity of microservices, API Getway has developed vigorously. For API Gateway, my job is to undertake the task of message distribution and provide customers with a unified API entrance. There are usually two ways: Single API Gateway and Backend for Frontend API Gateway. Is there a third mode? For a product I made before, the basic structure of the gateway is as follows:

1. The client has a login requirement. In the login authentication response, the URL list of the api of different services is included.
2. The client uses the corresponding api url for different api requests, so that the client can realize most of the api distribution work.
3. At this time, the main task of the API gateway is actually not the original API forwarding function. It is to simplify the back-end service and realize some public functions of the back-end service.
4. Even in this scenario, there may be no API gateway, the API can directly face the application service, and then the application service will dispatch the microservice for business processing.

Back to the API gateway itself, its core design concept is to ensure uninterrupted business on the data plane. Since the services that connect to the API gateway are diverse, and the design of the client API and application is not controllable, it is difficult for you to require every connected service and client to be fault-tolerant. This requires the gateway to ensure that it can process each request normally and meet the higher SLA. Now the mainstream of the industry's API gateway uses the Nginx system. The main considerations are as follows:

Support hot restart
The component upgrade of the data plane is a high-risk action. Once an abnormality occurs, it may cause connection interruption and system abnormality. Unless your front-end LB (load balancing) has the ability to quickly drain, of course, even so, it may still lead to requests being processed. Was forcibly interrupted. So the hot restart of the data plane is very critical.
Support subscription-based dynamic routing
API routing changes relatively frequently, and the timeliness requirements are relatively high. If you adopt a regular synchronization scheme, synchronizing tens of thousands of data at a time will slow down your system. Therefore, it is very important to add a subscription routing service center. We can subscribe quickly. The routing data in ETCD is effective in real time. And the performance pressure will not be too great with only incremental data.
Support plug-in management
Nginx provides a rich ecosystem in terms of plug-ins. Different APIs and different users need different processing procedures. If each request is processed according to the same procedure, it will definitely bring related redundant operations. Plug-in management can improve performance to a certain extent, and it can also ensure that processing chains can be quickly added during the upgrade process.
High-performance forwarding capability
API gateways generally work in multi-backend API reverse proxy mode. Many self-developed API gateways are prone to bottlenecks in performance. Therefore, nginx's excellent performance and efficient traffic throughput are its core competitiveness.
Stateless and scalable
The API gateway carries the collection of all requests of the entire system, which needs to be elastically scaled according to the business scale. The service center and Nginx configuration management can quickly add and delete existing clusters, and synchronize to LVS to achieve rapid horizontal expansion capabilities.

As the current system becomes more and more complex, many service modules not only handle their own business, but also assume some non-business responsibilities, such as authentication and authorization, current limiting, caching, logging, monitoring, retry, fusing, etc. A batch of open source API gateway implementations emerged in this way.

Tyk: Tyk is an open source, lightweight, fast and scalable API gateway that supports quotas and speed limits, supports authentication and data analysis, supports multiple users and multiple organizations, and provides a full RESTful API (go language).
Kong: Kong can be thought of as an OpenResty application, and OpenResty runs on top of Nginx and uses Lua to extend Nginx. (Kong = OpenResty + Nginx + Lua)
Orange: Orange is an API Gateway based on OpenResty, which provides API and custom rule monitoring and management, such as access statistics, traffic segmentation, API redirection, API authentication, WEB firewall and other functions. (Tencent is using)
Netflix Zuul: Zuul is an edge service that provides functions such as dynamic routing, monitoring, resiliency, and security. Zuul is a JVM-based routing and server-side load balancer produced by Netflix. (Spring Cloud)
Ambassador: Ambassador is an open source microservice API gateway, built on the Envoy proxy, to provide support for rapid release, monitoring and update of multiple teams of users, support processing Kubernetes ingress controller and load balancing functions, and can be seamless with Istio integrated. (Kubernetes-Native)
Other...(For example: apiaxle: an API gateway implemented by Nodejs; api-umbrella: an API gateway implemented by Ruby.)

In addition to the above functions, as the API gateway service assumes more and more responsibilities, its performance bottlenecks become more prominent. This time, some companies in ArchSummit showed some of their own special functions, and in the product evolution, they have made some optimizations in the architecture, mainly:

Use C++ to implement the gateway to improve performance (*)
– In this meeting, 2-3 companies are using C++ implementation to improve performance. This basically has nothing to do with the architecture, it is more the difference of the programming language itself.
Proprietary protocol accelerates the mapping relationship between API and service
-Several companies are doing this, such as Tencent.
The gateway realizes hierarchical isolation, unchanged and changeable, and realizes the architecture evolution of the value-added service of the gateway (*)
– This is the architecture level. My understanding is more considerations related to architecture evolution and operation and maintenance. The front-end customer (stable) and back-end service (unstable) parts are implemented and deployed hierarchically, so that the customer-oriented gateway service is basically unchanged. When the back-end service is expanded or refactored, the system upgrade has the least impact on customers (the details are not introduced).
The gateway implements current limiting, making the back-end service more stable and simpler.
-This is easier to understand, and it is also a normal practice. This makes the design and implementation of back-end application services/microservices easier. Of course, different products have different implementations. When Tencent introduced the game API gateway, it mentioned that the service starts to create statically to maximize the connection Session, and to remove the dynamic creation and recycling mechanism to achieve optimal performance.
The gateway implements authentication and simplifies the business process of back-end services (suitable for authentication, not suitable for permissions)
– This is also a more conventional approach, and the purpose is to make the design and implementation of back-end application services/microservices easier. This kind of service is mostly suitable for authentication, but permission management may not be suitable for some special application scenarios. For example, a certain function in some applications has a finer division of permissions, which is already a content-level access control. The gateway generally cannot be implemented as a substitute for the service, and it still needs to be implemented through the service itself.

Summarize

From the perspective of the development of the gateway, it has undergone generation after generation of evolution, from the evolution of its own architecture to the superposition of its functions, which is promoting the iterative evolution of its architecture. In this era where everything is cloud, the concept of cloud native has been accepted by various industries and raised to a very high level. Even some traditional network equipment services must go to the cloud.

For the development and evolution of products, we will also "copy, learn, and change."

For the same familiar business, mature and excellent solutions, we must "copy" and use them directly instead of making wheels behind closed doors.
For the transformation and evolution of different businesses, there is not much experience to learn from, but for the architecture and knowledge of related fields. We cannot copy, but "learn", learn its thoughts and its essence.
Finally, there are changes. Any common solution and architecture can solve common problems, but because of commonality, there are inevitably some "common problems".