1

Authors: Tang Changzheng,

MSE service governance helps our system to quickly realize the full-link grayscale capability in a low-cost and non-intrusive way, which further improves the stability of our system and makes our iterative launch of new requirements more secure.

—Tang Changzheng, Architect of Inline Technology

Caller Technology has entered the field of shared charging since 2014, defined and created the industry, and is the earliest shared charging company in the industry. The main business covers self-service rental of power bank, customized shopping mall navigation machine development, advertising display equipment and advertising communication services. Caller Technology has a three-dimensional product line in the industry, including large, medium and small cabinets and desktop models. At present, more than 90% of the cities in the country have achieved business services, with more than 200 million registered users, meeting the needs of users in all scenarios.

Caller Technology has rich business scenarios and many systems. It has completed containerization and microservice transformation in its technical architecture. The microservice framework uses Spring Cloud and Dubbo. With the rapid development in recent years, the power bank equipment nodes and business volume are increasing rapidly. The overall application architecture of Caller Technology has also continued to evolve with the rapid development of its business. Micro-micro-services governance is the only way the service of in-depth, and today I'll share with you the history of science and technology calls to explore in depth the process of micro-services: Origin: time Business Review calls science and technology, architecture and the status quo Pain points. : shared why we chose Alibaba Cloud Microservices Engine MSE (Microservices Engine, hereinafter referred to as MSE) on the road of technology selection. landing: how we implemented step-by-step, low-cost full-link grayscale capabilities in a short period of time, and non-destructive online and offline capabilities.

Outlook: MSE and Caller Technology will work together to further deepen the road of microservices.

origin

The internal technology trend of Caller Technology satisfies the following three points

  • Full implementation of microservices
  • Full access to K8s
  • Fast iteration, stable release demands

Inline Technology started in October 2019, the service began to undergo a comprehensive micro-service transformation, and the containerization transformation was completed; in December 2020, Inline Technology has been fully micro-serviced and fully integrated into K8s.

It can be seen that with the gradual deepening of the micro-service process of incoming calls, we will gradually face a series of challenges in the process of deepening the micro-services. In general, these challenges are divided into three major levels. They are efficiency, stability and cost, respectively. We carry out microservices, and our mission is to make business iteration more efficient, but when the number of microservices gradually increases and the links become longer and longer, if we do not carry out further governance, the efficiency problem may be greater than that of microservices. The architectural dividend brought by the service architecture itself.

Therefore, in June 2021, Caller Technology carried out observable construction of microservices; in September 2021, the construction of microservice governance capabilities began.

The advantages of full containerization

In summary, containerization has the following advantages

  • Easy deployment and greatly improved release efficiency
  • Elastic scaling
  • Significant savings in server costs
  • Operation and maintenance cost reduction

Briefly talk about the benefits brought by comprehensive containerization to the Inline Technology system. First, application deployment becomes very convenient. At the same time, due to the standardization of K8s, CI/CD is also simplified, and the overall release efficiency is greatly improved; at the same time, it is deployed on K8s. The application naturally has the ability to expand and shrink elastically, which can effectively cope with traffic floods; at the same time, after K8s is installed, the service uses resources on demand. Compared with the original long-term fixed server maintenance according to the peak, the resource utilization rate is relatively low, and now it can be greatly improved. Save server costs. Compared with traditional cluster operation and maintenance, it is very cumbersome and requires very high skills for operation and maintenance personnel: not only must be proficient in lua/ansible scripts, etc., but also understand cloud product network configuration and monitoring operation and maintenance. The operation and maintenance cost of the system is very high. The standardized interface of Alibaba Cloud Container Service ACK can solve the problems of high-density deployment and system operation and maintenance, and greatly reduce the cost.

The demand for a stable release of three axes

In daily releases, we often have the following wrong ideas:

  • The content of this change is relatively small, and the online requirements are relatively urgent, so there is no need to test and release it directly.
  • The release does not need to go through the grayscale process, and it can be released quickly and online.
  • Grayscale release is useless, it's just a process, and it will be released online directly after release, without waiting for observation
  • Although grayscale publishing is very important, it is difficult to build a grayscale environment, and the priority of time-consuming and labor-intensive is not high.

Any of these ideas could lead us to a bad launch.

Alibaba has a three-pronged concept of safe production: grayscale, observable, and rollback. All R&D students must master how to use the grayscale, observation and rollback functions of the publishing system.

Frequent release on the Internet is the norm, and the same is true for Inline Technology. The ability of the system to have grayscale, observation, and rollback is a must-have capability for a microservice system. key factor in stability. When a new version of the service is to be released online, by diverting a small portion of traffic to the new version, program problems can be discovered in time, and large-scale failures can be effectively prevented. There are already relatively mature service release strategies in the industry, such as blue-green release, A/B testing, and canary release. These release strategies mainly focus on how to release a single service.

There are a large number of microservices in Inline Technology, and the dependencies between services are intricate. If the hard isolation of multiple environments is adopted, the cost will be greatly increased and the release method will become complicated. Sometimes a feature release depends on multiple services being upgraded and launched at the same time. It is hoped that the new versions of these services can be verified with small traffic at the same time. By building an environment isolation from the Ingress gateway to the entire back-end service, grayscale verification of multiple different versions of services is performed. This is the whole point of microservice governance. The ability to link grayscale.

The challenge of self-research

Inline Technology has considered self-developed microservice governance, and Inline Technology architect Tang Changzheng has also participated in Dubbo's open source community. Microservice governance research and development is not very difficult for Inline Technology, but the self-developed microservice governance component is still There are the following unavoidable cost issues.

  • High access cost
  • High maintenance cost
  • Single function, inflexible, low scalability
  • Poor localizability

Considering the microservice governance of production applications, the microservice framework usually introduces service governance logic, and these logics are usually depended on by business code in the form of SDK, and changes and upgrades of these logics require each microservice. The business is implemented by modifying the code, and such changes have resulted in very large access and upgrade costs. At the same time, it is necessary to develop the governance function of the open-source service framework, which means that manpower needs to be managed and operated for the components of microservice governance. At the same time, self-construction will make the function very close to the business, which also means that the function will be compared. Thin and single, the future scalability is relatively weak. At the same time, there are many technical details for the realization of full-link grayscale, such as dynamic routing, node marking, traffic marking, and distributed link tracking, and the implementation cost of the technology is high. Due to the complexity of service frameworks such as Dubbo and Spring Cloud, and as the number of microservices gradually increases and the links become longer and longer, the positioning and solution of related microservice governance problems have also become a headache. With the support of professional teams such as Spring Cloud Alibaba and Dubbo, the in-depth microservices will become more calm.

first sight

The first time I came into contact with the product of MSE service governance, there were many points that hit our demands. The following points are very attractive points for our microservice governance transformation

  • no intrusion

MSE's microservice governance capabilities are implemented based on the enhanced Java Agent bytecode technology, which seamlessly supports all Spring Cloud and Dubbo versions on the market for nearly 5 years. Users can use it without changing a single line of code. Just open the MSE Microservice Governance Professional Edition, configure it online, and take effect in real time.

  • Easy access

You only need to install mse-pilot in the application market of Alibaba Cloud Containers, then enable namespace-level service management in the MSE console, and restart the application to access. Of course, it is also very easy to uninstall service management, just close the service in the console For governance, you can uninstall mse-pilot without changing the existing structure of the business. You can go up and down at any time without binding.

  • Powerful and Continuous Development

The service governance coverage of the whole life cycle from development state, test state to operation state enables R&D students to focus more on the business itself.

MSE microservice governance also provides the following solutions to solve the difficulties of microservice governance and quickly improve the microservice governance capabilities of enterprises. Stability field: emergency diagnosis, troubleshooting and recovery of online faults, online release of stability solutions, and full-link grayscale solutions for microservices. Areas of cost reduction and efficiency enhancement: cost reduction and isolation solutions for daily test environments, solutions for seamless migration of microservices to the cloud, and solutions for efficiency improvement in microservice development and testing.

  • visualization

MSE Service Governance Professional Edition provides a visual view of microservice governance traffic

For gray-scale traffic, whether it is gray or not, and how much gray is gray, the traffic will take effect in real time after the routing rules are configured, so that it can be seen at a glance, and you can know how much.

At the same time, for lossless online and offline scenarios, MSE provides end-to-end protection for the entire life cycle, and you can see at a glance whether there is a loss of traffic and where the loss is.

  • Embrace cloud native

After entering the cloud-native system, the cloud-native system dominated by K8s emphasizes flexible scheduling between clusters, and can schedule resources arbitrarily in units of PODs. After being scheduled, the IP of PODs will also change accordingly. Traditional service governance In the system, governance policies are usually configured with IP as the dimension. MSE uses labels as the dimension to configure microservice governance policies in a more cloud-native way.

At the same time, in the K8s environment, it is deeply integrated with the K8s system, and a variety of complete solutions are introduced. Lossless online and offline makes the application maintain traffic loss during the elastic scaling process. Build CI/CD through Jenkins to achieve canary release in the K8s environment. , based on Ingress to achieve full-link grayscale, etc.

Some limitations of MSE at the time

Of course, when I first came into contact with MSE microservice governance in September 2021, I found that MSE still had some limitations in its full-link capability for service governance. First, it only supported microservice gateways Spring Cloud Gateway and Zuul and cloud gateways. It cannot support the self-built Nginx gateway, and only supports routing according to the interface parameter dimension in the Dubbo scenario. For the operation and maintenance students, they also need to understand the implementation of the business interface, which is too refined and the cost of production is too high; at the same time The full-link grayscale portal only supports Http gateway or application as the portal of grayscale traffic, and cannot support TCP gateway as the traffic portal.

fall to the ground

After in-depth understanding with the architect of Caller Technology, we further abstracted and summarized the user's grayscale scene. Only by going deep into the business can we better understand the customer's needs. We summarize the following three scenarios

MSE full link grayscale scene

Scenario 1: Automatically dye the traffic passing through the machine to achieve full-link grayscale

  • After entering the node with tag, the subsequent call will preferentially select the node with the same tag, that is, "dye" the traffic passing through the tag node
  • If the node with the same tag cannot be found on the calling link with the tag, it will fallback to the node without the tag.
  • The call link with tag passes through the node without tag. If the link calls the node with tag subsequently, the call mode of tag call will be restored.

Scenario 2: Realize full-link grayscale by adding a specific header to the traffic

The client adds the identifier of the designated environment in the request, and the access layer base indicates that it is forwarded to the gateway representing the corresponding environment. The gateway of the corresponding environment identifies the corresponding project isolation environment through the isolation plug-in call, and requests to close the loop in the business project isolation environment.

Scenario 3: Full link grayscale through custom routing rules

By adding the specified header to the grayscale request, and the entire calling link will transparently transmit the header, you only need to configure the routing rules related to the header in the corresponding application, and the grayscale request with the specified header will enter the grayscale machine. , the full-link traffic grayscale can be realized on demand.

We consider that scene 1 can actually perfectly meet the full-link grayscale scene of Caller Technology, and it is also the demand of most cloud customers. Scenes 2 and 3 can be used as more advanced gameplay.

Due to marking and dyeing the applied traffic and performing full-link grayscale, we support any traffic entry, as well as the grayscale of Ingress and self-built gateways. While supporting application-level grayscale, it is compatible with custom Routing, a more flexible way to meet the full-link grayscale scene of Incoming Technology.

Full-link grayscale landing solution for incoming calls

The business structure of the incoming call is as follows. The top layer is the user interface such as the mobile terminal. The self-built Nginx gateway is used as the access layer, and the service layer is various services. Spring Cloud and Dubbo are used as the service framework.

The architecture of Incoming Technology's full-link grayscale landing is as follows:

Configure the configuration of traffic diversion at the Nginx layer. 10% of the traffic enters the grayscale environment, and 90% of the traffic enters the unmarked or online formal environment. Then the traffic passing through the grayscale environment will be automatically colored by the MSE with the color of the corresponding environment. In this way, full-link grayscale routing is performed to ensure that the traffic is closed in a grayscale environment. If there is no grayscale environment machine, such as the payment center only has online machines, then the traffic will go to the online environment. When our data center exists again If the machine is in a grayscale environment, the grayscale traffic will return to the grayscale environment of the data center.

MSE service warm-up capability

When we release during the peak hours of the day, it usually leads to loss of business traffic. Our R&D personnel have to choose to make changes during the low business peak hours at night, which greatly reduces the happiness index of R&D personnel, because they have to stay up late and work overtime. the predicament. If the traffic can be changed without loss during the daytime high traffic peak, then this will greatly improve the research and development efficiency for R&D personnel.

Caller Technology also encountered a similar problem. When the application is released in the scenario of excessive business traffic, the system service is just started. Due to the cold start process of the application, the application capacity at this time is often lower than normal, but Online traffic cannot distinguish whether the current service has just been started, and there will still be a continuous influx of large traffic. At this time, the system will be overloaded and collapsed, resulting in traffic loss. If our microservice application has the ability to preheat the service, the traffic will grow slowly according to a certain curve, so as to ensure that the service is fully preheated, and even in high concurrency and large traffic scenarios, the application can be protected from safe startup.

An agent-based non-intrusive preheating method for microservice applications provided by MSE can effectively allow users to provide service preheating capabilities for applications without modifying any code.

future

MSE Service Governance Professional Edition provides core capabilities such as full-link grayscale, outlier instance removal, canary release, and microservice governance traffic observability in a non-invasive way, helping incoming calls in a more economical and efficient way Technology quickly builds a complete microservice governance system on the cloud, effectively improving online stability and ensuring 99.9% service availability.

With the deepening of the micro-services of Caller Technology, in addition to full-link grayscale, lossless online and offline, there are more scenarios gradually emerging, and the governance of the micro-service life cycle will cover from release, operation, troubleshooting, fault recovery and full Governance of link traffic, MSE microservice governance will work together to help Caller Technology continue to improve microservice R&D efficiency and high service availability.

Click here , MSE to see more information!


阿里云云原生
1k 声望302 粉丝