1
头图

This article was originally shared by Zhou Jiahui, a senior development engineer of the microservice technical team at Station B.

1 Introduction

If you used station B in 2015, then you will not forget the period when station B selectively collapsed on weekdays and inevitably collapsed on weekends in that year.

It was also that year when the number of submissions at station B surged, and the number of visits increased exponentially. In the past, the PHP family bucket also began to show a gradual decline. It was difficult to operate and maintain, monitor, troubleshoot, and call paths that were bottomless.

That is, in this year, B station began to officially use Go to reconstruct B station. From then on, B station's API gateway technology began to evolve continuously from 0 to 1. . .

  • Supplementary note: This API gateway evolution is also developed in the form of open source. For details, please refer to "12. Source code of this article" in this article.

PS: The API gateway shared in this article mainly involves HTTP short connections. Although it is somewhat different from the long connection technology, it is in the same line in terms of architectural design ideas and practices, so it is included in this "Long Connection Gateway Technology Topics" series in the article.

study Exchange:

  • Introductory article on mobile IM development: "One entry is enough for beginners: developing mobile IM from scratch"
  • Open source IM framework source code: https://github.com/JackJiang2011/MobileIMSDK (click here for alternate address)

(This article has been published simultaneously at: http://www.52im.net/thread-3941-1-1.html )

2. About the author


Zhou Jiahui: Senior development engineer of Bilibili. Always take simplicity as the core technical design concept, and pursue the ultimate simple and effective back-end architecture.

He joined Bilibili in 2017 and has been engaged in the development of accounts, gateways, and basic libraries. Coding C/V skills imparter, technical document memorizer. Open source community enthusiasts, security technology enthusiasts, active users in the cloud computing industry, network engineering skilled workers. Epic bug producer, proficient in various scenarios generated by bugs.

3. Thematic catalogue

This article is the 8th in a series of articles on the topic. The general catalogue is as follows:

"Long Connection Gateway Technology Topic (1): Summary of Production-level TCP Gateway Technology Practice of Jingdongmei"
"Special topic on long-connection gateway technology (2): Zhihu technology practice of high-performance long-connection gateway with tens of millions of concurrent connections"
"Long Connection Gateway Technology Topic (3): The technological evolution of the mobile terminal access layer gateway of 100 million yuan level"
"Long Connection Gateway Technology Topic (4): iQIYI WebSocket Real-time Push Gateway Technology Practice"
"Long connection gateway technology topic (5): Himalaya self-developed billion-level API gateway technology practice"
"Long Connection Gateway Technology Topic (6): Graphite Document Single Machine 500,000 WebSocket Long Connection Architecture Practice"
"Long Connection Gateway Technology Topic (7): Architecture Evolution of Xiaomi Xiaoai's 1.2 Million Long Connection Access Layer"
"Long Connection Gateway Technology Topic (8): The Evolution of Microservice-based API Gateway at Station B from 0 to 1" (* this article)

4. Officially refactoring station B with Go

In view of the various technical issues listed in the introduction, it was also in 2015 that the financial team officially began to use Go to restructure station B.

The first Go project at Station B, bilizone, was coded by Mr. Guan Guan (Hao Guanwei) over a weekend.

commit 4ccb1497ca6d94cec0ea1b2555dd1859e6f4f223
Author: felixhao <g [url= mailto:1@gmail.com ]1@gmail.com[/url]>
Date: Wed Jul 1 18:55:00 2015 +0800

 project init

commit 6e338bc0ee638621e01918adb183747cf2a9e567
Author: Hao Guanwei <h * @bilibili.com>
Date: Wed Jul 1 11:21:18 2015 +0800

 readme


▲ Hao Guanwei: Architect of Bilibili Main Station Technology Center
In fact, bilizone is still a large and comprehensive application. The main significance of bilizone's refactoring at that time was to sort out the PHP logic that no one could understand into a relatively standard Go application.

The biggest significance of bilizone at that time was to provide user terminals with a basically stable data structure, a relatively reliable interface and more effective monitoring.

But because bilizone is still a single application, it still inherits the shortcomings of the single application:

1) The complexity of the code is high: the method is abused, the timeout setting is confusing, and the whole body is affected;
2) One hanging all hanging: The most common ones are unreasonable timeout settings, massive accumulation of goroutines, and avalanches;
3) High testing and maintenance costs: Small changes need to be tested in all cases, and the operation and maintenance release is intimidating.

Therefore, although the frequency of collapse of station B has been reduced at this time, the problem of one explosion and all explosions is still a serious problem.

5. The micro-service-based station B architecture has taken shape

In view of the technical shortcomings of the single application faced by bilizone, the next refactoring will make the overall architecture of station B based on microservices take shape.

In order to realize bilibili in the microservice mode, we split a bilizone application into multiple independent business applications, such as accounts, manuscripts, advertisements, etc. These businesses directly provide APIs through SLB.

The call mode at that time is shown in the following figure:

However, with the splitting of functions, we exposed a number of microservices to the outside world, but faced many difficulties due to the lack of a unified export.

These difficulties are mainly:

1) The client communicates directly with the microservice and is strongly coupled;
2) Multiple requests are required, the client aggregates data, the workload is huge, and the delay is high;
3) The protocol is not conducive to unification, and there are differences between various departments, but it needs to be compatible through the client;
4) "End"-oriented API adaptation, coupled to internal services;
5) The logic of multi-terminal compatibility is complex, and each service needs to be processed;
6) Unified logic cannot converge, such as security authentication and current limiting.

6. Microservice architecture based on BFF mode

Based on the technical problems brought about by the initial microservice architecture in the previous section, and the idea that we want to cohere the processing of the peer, we naturally thought of adding an app- The component of interface, this is the next BFF (Backend for Frontend) mode.

The working mode of app-interface is shown in the following figure:

With this BFF, we can aggregate a large amount of data within the service and design coarse-grained APIs according to business scenarios.

In this way, the evolution of subsequent services also brings many advantages:

1) Lightweight interaction: protocol simplification and aggregation;
2) Differential services: data clipping and aggregation, customized API for terminals;
3) Dynamic upgrade: the original system is compatible and upgraded, and the service is updated instead of the agreement;
4) Improve communication efficiency: The collaboration model evolves into a mobile service and gateway group.

BFF can be considered as an adaptation service, which adapts the back-end microservices to the needs of the client (mainly including logic such as aggregation tailoring and format adaptation), and exposes a friendly and unified API to terminal devices to facilitate wireless device access Access to back-end services may also be accompanied by requirements such as buried points, logs, and statistics.

However, another fatal problem of BFF in this period is that the entire app-interface belongs to a single point of failure, and serious code defects or traffic floods may cause cluster downtime and all interfaces are unavailable.

7. Microservice architecture based on multiple sets of BFF patterns

In view of the technical problems of the architecture in the BFF mode in the previous section, we further iterated on the above basis to split the app-interface business.

Then multiple sets of BFF models were born:

Starting from this model, the docking model of the microservice interface of station B has been basically determined, and this model has also been promoted throughout the company.

8. The era of vertical BFF mode (2016 to 2019)

Continuing from the previous section, when the architecture of the gateway at station B developed into multiple sets of vertical BFFs, the development team iterated steadily around this model for a long period of time.

Then, with the development of the Bilibili business, the expansion of the team and several organizational structure adjustments, independent businesses such as live broadcasting and e-commerce began to appear at this time. We will discuss the development of these businesses in detail later.

After these adjustments, the responsibility of one team became clearer: the master gateway group.

The main responsibility of the main station gateway group is to maintain the BFF gateway of the above functions. At this time, the main traffic entry of bilibili is the pink board App. Here you can briefly describe all the business components on the Powder Board App.

Main station business:

1) BFF maintained by the gateway group, such as recommendation, manuscript play page, etc.;
2) BFF maintained by the business layer, such as comments, bullet screens, accounts, etc.

Independent business:

1) E-commerce services;
2) Live broadcast service;
3) Dynamic service.

The BFF of the master station business is actually divided into two categories:

1) One category is the BFF that is in charge of the gateway group;
2) The other type is the BFF maintained by the business itself.

The technology stacks of these two types of BFFs are basically the same, and the basic functions and responsibilities are similar. The reason for this division is to allow the gateway group to focus more on iterating client-side features and functions, avoiding the need to understand some interfaces for independent business scenarios. For example, the login page should be maintained by students with more secure and professional accounts.

Here we can also briefly describe how a new requirement should determine participating BFFs:

1) If this function can be completed independently by the business BFF of the business layer, the gateway group does not need to intervene;
2) If the function is a client-side feature requirement, such as a composite business such as recommendation flow, which needs to be connected with a large number of departments of the company, the gateway students will participate in the development of BFF.

At that time, the back-end students of the main station technical department followed the above two rules, which basically met the rapid development and iteration of the business.

I call this period the era of vertical BFF, because each business of the basic master station has more or less various forms of gateways, and everyone provides interfaces through this gateway, and the gateway interacts directly with the SLB.

9. Service-based unified API gateway architecture

Continuing from the previous section, let's talk about a few important businesses: e-commerce, live broadcast and news.

E-commerce and live broadcasting are not actually derived from the same period. Live broadcasting was born during the PHP period of the main site, while e-commerce is relatively later.

At that time, the technology stack of the live broadcast consisted of C++, PHP, and Go. In the early days, most of the business logic was implemented by PHP and C++. Later, it began to gradually try the Go of the main site to implement some business logic. Among them, PHP is responsible for providing interfaces to the terminal, and C++ mainly implements core business functions. Therefore, we can simply understand that the live broadcast uses the BFF gateway written in PHP.

The dynamic team is actually derived from the live broadcast team, so the technology stack is basically the same as the live broadcast at that time, and it can be simply omitted here.

As we all know, the technology stack of most e-commerce teams is Java and Spring or Dubbo.

Because there is almost no similarity in the implementation of these services, and everyone gradually agrees with the gRPC protocol, there is basically no unified idea on the technology stack, and they can communicate with each other.

With the further growth of the team at station B, the continuous growth of traffic, and after many online failures and accident analysis, everyone gradually discovered various problems under this architecture.

These problems are mainly:

1) A single complex module will also lead to high difficulty in subsequent business integration. According to Conway’s Law, there will be a mismatch between complex and aggregated BFFs and multiple teams, resulting in high communication and coordination costs between teams and low delivery efficiency;
2) A lot of cross-cutting logic, such as security authentication, log monitoring, current limiting and fusing, etc. Over time, features are iterated, the code becomes more complex, and the technical debt piles up.

At this time: we may also need a component that can coordinate across aspects, and all components such as routing, authentication, current limiting, and security can be upgraded, and can be updated and released in a unified manner. Layering, and then everyone began to introduce a business-based "unified API gateway" architecture (as shown in the figure below).

In the new architecture: the unified gateway plays an important role, and it is a powerful tool for decoupling, splitting, and subsequent upgrade and migration.

With the cooperation of the unified gateway: a single BFF can achieve decoupling and splitting, and each business line team can independently develop and deliver their own microservices, greatly improving the R&D efficiency.

In addition: After the cross-cutting logic is stripped from BFF to the gateway, BFF developers can focus more on business logic delivery, realizing the separation of concerns in architecture.

10. From service-based multi-gateway to global unified gateway (2022-present)

In the past two or three years, each business team more or less has its own business gateway to form an independent maintenance team, and has also made considerable investment in the function of the gateway.

However, with the development of the B station business and the continuous evolution of company-level middleware functions, if the work of connecting each middleware is implemented once on each gateway, the labor input and communication costs will be huge, and the realization of Non-uniform standards and non-uniform operation methods cannot achieve the best benefits brought by API gateways.

Therefore, the microservice team has developed a standard API gateway (global unified API gateway) in the internal sense of station B. The API gateway brings together the excellent experience of traffic management in various gateways in the past, and makes perfect design improvements to related functions.

In addition to the conventional current limiting, fusing, downgrading, and coloring, the current main functions of the API gateway will also provide various additional capabilities based on these basic functions and various middleware of the company.

The relevant functions of these additional advanced AP quality governance are mainly:

1) Full link grayscale;
2) Traffic sampling analysis and playback;
3) Traffic security control;
...

After accessing the API gateway, the business team can obtain these functions together to ensure the rapid iteration of the business.

11. More than just an API gateway

While developing the API gateway, we will also pay more attention to the experience of the business team when developing and connecting to the API. We will use the gateway as the starting point of the unified standard API specification to provide the business team with a more effective API development ecosystem.

These API development ecosystems may be:

1) Plan the API business domain to simplify SRE operation and maintenance;
2) Standard API meta information platform;
3) Accurate API documentation and debugging tools;
4) Type-safe API integration SDK;
5) API compatibility guarantee service.

API Gateway is a landmark milestone in our API governance ecosystem. We hope to listen to everyone's opinions in the development of API Gateway, and hope to have more voices to help us clarify our thinking.

This API gateway evolution has also been developed in the form of open source, and everyone's guidance is welcome here (for details of this source code, please refer to "12.

12. Source code of this article

Main address: https://github.com/go-kratos/gateway
Backup address: https://github.com/52im/gateway
Or download the attachment from the original link: http://www.52im.net/thread-3941-1-1.html

13. References

[1] Himalaya self-developed billion-level API gateway technology practice
[2] The technological evolution of the mobile terminal access layer gateway
[3] Architecture evolution from 1 million to 10 million high concurrency
[4] One article to understand all aspects of large-scale distributed system design
[5] Zero-based understanding of the evolution history, technical principles, and best practices of large-scale distributed architectures

(This article has been published simultaneously at: http://www.52im.net/thread-3941-1-1.html )


JackJiang
1.6k 声望810 粉丝

专注即时通讯(IM/推送)技术学习和研究。