Introduction to has never been a smooth journey to the cloud. In the process, we will always encounter some difficulties and challenges. Thanks to the increasing maturity of cloud native technology, these problems will definitely have corresponding solutions.

Author: Nahai

The best practice of end-cloud interconnection: https://help.aliyun.com/document\_detail/200032.html

background

In the cloud-native era, many cloud vendors at home and abroad have released strong technological dividends. How to use cheap, stable and efficient cloud facilities is a major proposition today. On the cloud, we can easily create infrastructure and middleware such as virtual networks, virtual machines, databases, message queues, and can also use PaaS and Serverless services such as container services, EDAS, SAE, and functional computing to reduce the pressure of application management and control .

But things are not all smooth sailing. Application to the cloud has been an unstoppable historical trend, but with that, developers soon realized the other side of the cloud: the sense of separation of development experience brought about by the disconnection of the network between the cloud and the cloud. Before going to the cloud, developers can complete the closed-loop development process of code development, testing, and joint debugging locally; after going to the cloud, the database, cache, message queue, and other microservice applications are all deployed in the virtual network on the cloud. We can no longer complete the development process locally.

截屏2021-04-23 下午5.11.40.png

If it is a local tyrant in the Middle East, he may consider using a physical dedicated line to get through the network. Because he only needs to pay millions of dollars in optical fiber laying fees, in-building optical cable lease fees, port occupancy fees, traffic fees, etc., and at the same time convince the security team to allow the environment to be fully opened up.

If it is a professional operation and maintenance personnel, he may consider building a VPN to get through the network. When he spent energy setting up a VPN server and found that his colleagues still couldn't use it, he complained:

  • "As soon as the VPN is turned on, the entire local system network traffic is forwarded to the cloud, and other things can't be done!"
  • "In addition to configuring VPN, you also need to configure application operating parameters, which is too much trouble!"
  • "Why can't the cloud service call the local service? Has the cloud network route been added?"
  • ...

Seeing these problems, the operation and maintenance brother also feels tired...

And now, we provide a plug-in tool out of the box, without you spending a lot of money or manpower. All you need is to turn on the switch with one button in the IDE, and then the application launched by the IDE can access the database, MQ, cache and other microservices in the cloud environment. All things are done for you by the plug-in.

Introduction

This tool is a "device-cloud interconnection" plug-in developed by us independently. "device" refers to the development side, and "cloud" refers to the network on the cloud. In some way, the two-way intercommunication between the "device" and the "cloud" can be realized. And there is no problem with traditional VPN.

截屏2021-04-27 上午10.12.45.png

The end-to-cloud interconnection function is integrated in Alibaba Cloud Toolkit (ACT), a cloud tool product, and supports Intellij IDEA and Eclipse IDE. You only need to search for "Alibaba Cloud Toolkit" in the plug-in market to install. For example, search in Intellij IDEA as follows:

截屏2021-04-20 下午10.33.25.png

We started the research and development of the device-cloud interconnection project in 2018. In this process, we have iterated large and small versions, experienced three milestones, and have been used by hundreds of thousands of people so far. Let's introduce its feature support and implementation principles.

End cloud interconnection 1.0

Phase 1.0 solves the problem of two-way interconnection between local and cloud, so that local services can not only access cloud resources, but also communicate with cloud services.

Two-way interconnection

The following is the core architecture of device-cloud interconnection, which is divided into two modules as a whole: channel service and agent machine.

截屏2021-04-27 上午10.39.10.png

Among them, the module functions are as follows:

  • proxy : responsible for cloud traffic forwarding. The end-to-cloud interconnection solution has very low requirements for agent machines, and an ECS of ordinary specifications can act as a "beggar version" agent machine. In addition, Linux systems such as Debian, Ubuntu, and Redhat already include the underlying libraries on which the cloud interconnection depends, so there is no need to install additional software.
  • channel service : responsible for local traffic forwarding. When we turn on the terminal cloud interconnection switch and start the application, the plug-in will pull up a channel service process locally. The responsibilities of this process are very simple, it is only responsible for the traffic forwarding between the local application and the cloud proxy machine, no other operations.

The communication between the channel service and the agent machine uses an encrypted channel, and the middleman cannot steal the data in the channel. In microservice applications, we combine Java native proxy parameters and self-developed traffic interception solutions to forward application traffic to channel services.

When the developer starts the application in the IDE, the Cloud Internet plug-in will automatically pull up the channel service and inject relevant parameters into the application. After startup, application traffic is automatically forwarded to the channel service without manual intervention.

From an architectural point of view, end-to-cloud interconnection is similar to VPN, and both are divided into server and client. But in fact, there is a big difference between the two. The following figure compares and summarizes:

截屏2021-04-27 上午11.10.13.png

Among them, on the point of "cloud access local", although both support, the specific principles are different. If a VPN solution is adopted, then other cloud services need to manually configure network routing when accessing local services, otherwise the network is unreachable. Through the transformation of the micro-service framework, the end-cloud interconnection can make the cloud service call the proxy machine, and then forward it to the local application through the proxy machine, without setting up a network route. In terms of ease of use and security, device-cloud interconnection is better than VPN.

End-to-cloud interconnection 2.0

In the 1.0 stage, we realized the two-way intercommunication between the local and the cloud, which met the most basic development requirements. In actual business, customers put forward higher requirements.

One of our customers has a large R&D team, and they all use device-cloud interconnect for development, but found a problem during joint debugging: the service call initiated by R&D personnel A is sometimes transferred to other nodes, and the expected R&D is not achieved. Person B's local node. This problem is caused by the routing mechanism of the microservice framework. When there are multiple nodes in a service in the environment, a random (or taking turns) algorithm is used to call. The more microservice modules and the longer the link, the more serious this problem is.

In 2.0, we have provided the ability of multiple persons to accurately coordinate with each other. This ability can make the service request "point where to call", which can greatly improve the efficiency of service joint debugging. In addition, we also provide agent-based remote debugging capabilities to facilitate local debugging of microservice nodes in the cloud environment and improve debugging efficiency.

At the same time, through horizontal product support, Device Cloud Interconnection 2.0 can serve developers of cloud-native products such as EDAS, SAE, and MSE, and has been widely praised.

Multi-person precise joint debugging

The following figure describes a typical scenario of multi-person joint debugging:

截屏2021-04-27 下午2.26.58.png

Xiao Wang is responsible for service A, and Xiao Zhang is responsible for service B. They have completed code development in a requirement iteration and are in joint debugging. Since the microservice framework uses a random (or round-robin) strategy for invocation, two problems are caused:

  • The test classmate Xiaoma is performing a functional test in the environment, and the test request is called to the local nodes of Xiao Wang and Xiao Zhang, and the test does not meet expectations;
  • The test request initiated by Xiao Wang was transferred to other nodes, and the joint debugging efficiency was very low if it did not reach his and Xiao Zhang's nodes;

Through the ability of multi-person precise joint debugging, only the requests initiated by Xiao Wang can be transferred to his local node and Xiao Zhang's local node, and the request of the test Xiaoma can only be called in a stable cloud environment.

What Xiao Wang and Xiao Zhang need to do is relatively simple. They only need to enable the full-link flow control function in the console to create a flow control environment for testing. The flow control environment can be configured with request identification rules, and it can be judged whether it is a test request through dimensions such as Cookie, Header, and request parameters. If the judgment is passed, the request is called to the node in the environment.

Then Xiao Wang and Xiao Zhang can add the local node to the test environment in the IDE, as shown below:

6E7B7D99-B019-405E-865D-6C2B08AF70E1.png

After the configuration is completed, only the requests that meet the characteristics will be called to the nodes of Xiao Wang and Xiao Zhang. In the following figure, only the requests whose Header contains "test" will be sent to their nodes:

截屏2021-04-27 下午3.27.07.png

Remote debugging

Remote debugging (Remote Debug) has always been an important means of troubleshooting, but remote debugging in a cloud-native environment is not a simple matter. This is because by default, microservice nodes on the cloud are usually not accessible from the public network. If remote debugging is required, we need to open public network access to the target node and set up security policies to allow debugging port traffic.

If there are currently three services A, B, and C, and each service has 3 nodes, then we need to establish 3 security groups and bind 9 public network cards to the machine nodes. As follows:

截屏2021-04-25 下午10.47.14.png

This method has the following problems:

  • Waste cost : Each microservice node needs to be bound to a public network card, and the cost is positively related to the number of test nodes.
  • a complex configuration : On the cloud, an elastic scaling strategy is often adopted to maintain machine nodes to achieve the on-demand use purpose of "build when you use it, and put it when you use it". Whenever we create a new machine node, we need to configure the public network card and security group separately, which is cumbersome to use.
  • has security risks : If the microservice nodes are exposed to public network access, there will be a greater security risk.

Even in some scenarios, due to security requirements, intranet machine nodes are not allowed to mount public network cards. For these problems, Cloud Internet supports agent-based remote debugging, as shown below:

截屏2021-04-27 上午11.49.15.png

The debugging request is forwarded to the agent through the channel service, and then forwarded by the agent to the target debugging node. The channel between the channel service and the agent is encrypted. For scenarios with very strict security requirements, a security group (or whitelist) strategy can be used to further improve the security level of the agent.

In use, you only need to configure Alibaba Cloud Remote Debug. The configuration content is basically the same as the remote debugging configuration that comes with the IDE, but it supports the use of a proxy to connect, as shown below:

0BB43C8F-BAB3-4117-91B2-93BC67C40610.png

Among them are the following configuration items:

  • Proxy: Specify a cloud proxy machine. When running, the plug-in will automatically pull up the channel service to connect to the agent machine without manual intervention.
  • Host: Specify the IP of the target machine node for remote debugging. The figure is 172.16.0.1.
  • Port: Specify the debugging port of the target machine for remote debugging. 5005 in the picture.

Cloud native product support

Device Cloud Interconnection 2.0 supports the three major products in the microservice area of Alibaba Cloud, EDAS (Enterprise Distributed Application Service), SAE (Serverless Application Engine) and MSE (Micro Service Engine). These three products all support microservice governance capabilities to meet the needs of different enterprises. The product features are as follows:

  • Enterprise Distributed Application Service (EDAS) : It is a one-stop PaaS platform for application lifecycle management and monitoring. It supports deployment on Kubernetes/ECS and supports Java/Go/Python/PHP/ without intrusion. NetCore and other multi-language applications release and run and service management, Java supports all versions of Spring Cloud and Apache Dubbo in the past five years, and multi-language applications open Service Mesh with one click.
  • Serverless App Engine (SAE) : Achieves the perfect integration of Serverless architecture + microservice architecture, truly on-demand usage and billing, saving idle computing resources, eliminating IaaS operation and maintenance, and effectively improving Development and operation efficiency. SAE supports popular microservice architectures such as Spring Cloud and Dubbo, and supports deployment methods such as console, Jenkins, cloud effects, and plug-ins. In addition to microservice applications, you can also deploy applications in any language through Docker images.
  • Micro Service Engine (MSE) for the mainstream open source microservice ecosystem in the industry, helping microservice users to be more stable, more convenient, and cheaper to build using open source microservice technology Microservice system. Provide registration center, configuration center full hosting (compatible with Nacos/ZooKeeper/Eureka), gateway (compatible with Zuul/Kong/Spring Cloud Gateway) and non-intrusive open source enhanced service governance capabilities.

Therefore, whether you are an EDAS user, SAE user or MSE user, you can use the cloud-to-cloud interconnection capabilities to improve the efficiency of cloud development. On the plug-in, the configuration steps of these three products are basically the same, except for the differences in the products themselves. The configuration page is as follows:

7E0E7EE2-78CB-428A-AD27-00D6667A2101.png

In the future, we will support more cloud-native products on Alibaba Cloud for interconnection, and will also serve cloud-native developers outside of Alibaba Cloud, so stay tuned.

End-to-cloud interconnection 3.0

Version 2.0 solves the problem of interconnection between Java applications and the cloud, and many details are polished relatively well, but it lacks support for the container field and diagnostic capabilities. We have supplemented these capabilities in the 3.0 stage.

If you are a Kubernetes user, you can use the Kubernetes proxy capability of the 3.0 plug-in without additional configuration of cloud proxy machines.

If you are a non-Java language user or have certain requirements for the application operating environment, you can use the container-level interconnection capabilities of the 3.0 plug-in to run applications locally using Docker. In a Docker container, applications can access cloud services and resources normally, and traffic is automatically forwarded through a proxy.

If you feel unsure about the call exception that occurs in the locally running application, you can use the local link diagnosis capability of the 3.0 plug-in. We will collect the call link of the local application in a unified manner, and the call exception is clear at a glance.

Let's introduce these characteristics in detail below.

Kubernetes proxy

The 3.0 version of the Kubernetes proxy capability can automatically open the proxy channel based on the Kubernetes cluster.

In Kubernetes-oriented development, we can communicate with the API Server through the kubectl command and the kubeconfig configuration file, and access the containers in the cluster. API Server will perform identity authentication, authentication and encryption processing on the request. If the public network access of the API Server is open, then when we execute interactive commands locally through kubectl, the API Server will act as an intermediate agent at this time, as shown below:

截屏2021-04-26 下午2.20.34.png

Based on this feature, the Device Cloud Interconnection 3.0 plug-in calls kubectl to temporarily create a proxy container when the application starts. By combining API Server and temporary agent container to get through, local applications can access cloud services and other resources. The overall link is as follows:

截屏2021-04-26 下午2.28.08.png

The agent container occupies 64MB ~ 128MB of node memory and is automatically deleted when the local application stops.

The plug-in configuration is also very simple, you only need to set the kubeconfig configuration file in the plug-in and select the Kubernetes namespace:

B9F48B8A-6CE8-42FE-ACBF-8148F990AED6.png

When starting a local application, the plug-in uses the kubeconfig configuration file to call kubectl to create a temporary container, and perform channel opening and traffic forwarding. When terminating the application, the plug-in uses the kubeconfig configuration file to call kubectl to delete the temporary container.

Container-level interconnection

Container-level interconnection means that the Docker container will be started locally and your microservice application will run in the container. The microservice application can be interconnected with the cloud environment. If you have the following scenarios, then container-level interconnection is your best choice:

  • Non-Java language applications;
  • There are specific requirements for the operating system when the application is running;

In this mode, both microservice applications and channel services use containers to run, and the overall interaction is as follows:

截屏2021-04-26 下午3.59.11.png

At the implementation level, container-level interconnection is based on iptables to intercept and forward traffic to the channel service in the proxy container, and the channel service then forwards the data to the target address through the cloud proxy. Architecturally, this mode is similar to the Sidecar mode of Service Mesh. The application container forwards traffic to the channel service container (sidecar container). However, the channel container of the device-cloud interconnection is only for transparent forwarding of data, while the sidecar of Service Mesh can perform microservice discovery and governance capabilities, which is different.

In use, the plug-in runs the Alibaba Microservice Container configuration of the container, and the interaction is as follows:

65A1AB36-B542-451C-985A-D1BCA6BB894F.png

If you are running a Java language application in the application container, the plug-in also supports quick application debugging without the need to set additional specific parameters. When starting the application, the plug-in will inject JDWP debugging parameters through environment variables to open the debugging port. The plug-in is further combined with the intelligent detection of Intellij IDEA, and the Java application in the container can be debugged with one click through the Attach debugger, as shown below:

A99E6621-3687-4919-B58A-60504BD7167A.png

As can be seen from the figure, the plug-in will print the log output of the container application in the IDE window. The "Listening for transport dt\_socket at address: 5005" in the log indicates that the Java application in the container has opened the debugging port. Click Attach debugger, the IDE will connect to the debug port of the Java application in the container, and then you can debug the code, as shown below:

E94F52F3-2162-474C-A86B-A1239BC15082.png

Local link diagnosis

During the development process, have you encountered this scenario: the downstream service interface returns 500, you only know that the interface call failed, but the specific reason is unknown? When looking for the module developer to investigate, he replied after a long time "I am a little busy now, let me see later"? After he has time to troubleshoot, he finds that the problem lies in another module. Ask you to find another classmate to troubleshoot? ...

Scenarios like this are not uncommon in the development process, and often a small problem requires a lot of effort and time to investigate. This scenario is a typical scenario of link tracking technology. Now, we integrate link tracking into the end-cloud interconnection capabilities, so that the local call link can also be reported to the cloud, and the problem is clear at a glance when an abnormality occurs.

For example, there are three services in the current environment: a trading center, a commodity center, and an inventory center. You are verifying the features of the new version with your test classmates. When the test students tested the ordering process on the page, they found that the order failed, as shown below:

截屏2021-04-27 下午3.57.16.png

Due to the many modules involved, troubleshooting time is very long. The End-Cloud Interconnect 3.0 plug-in integrates the Java Agent of ARMS (Application Real-time Monitoring Service), which collects the information on the call link through a non-intrusive code burying mechanism and reports it to the ARMS server for unified collection and intelligent analysis. When an exception occurs, you only need to query the call link based on the TraceId in the cloud, and the problem is clear:

截屏2021-04-27 下午3.52.15.png

TraceId is a concept used in the bottom layer of link tracking. It is generated from the front-end page and transparently transmitted to downstream nodes. For ease of use, the plug-in also provides a switch to print the local link. After it is turned on, it will output the relevant information of the local application service call link, as shown below:

p266003.png

The link output contains the following information:

  • TraceId : Used to mark the overall process of the request. In a distributed microservice invocation scenario, the TraceId will be transparently transmitted from the front-end application node to each node in the downstream link. Based on this TraceId, you can query the overall link processing process EDAS console or ARMS console
  • Service : The request processing entrance of the current application, such as Spring Cloud service, Dubbo service, HSF service, etc.
  • API : method signature during link processing.
  • Line : The specific number of lines processed by the method.
  • : The cost of this method and its downstream processing, in milliseconds.
  • Ext : extended information, including request processing status code, database access SQL, resource target address and other information.
  • Console link : This link information collected on the ARMS console, you can click this link to directly view the full link information.

Click the Console link link to view the upstream and downstream processing links of this request, as shown below:

47605F6F-6536-4ED5-A9AB-0D9C9D1E1D6C.png

We can also further view the processing details in each service:

603E5963-D212-4F97-890B-DE07F0D14D6D.png

Seeing this, do you feel that there are more ideas for troubleshooting:)

Write at the end

The cloud-native wave is unstoppable, and business going to the cloud is also the only way for enterprises. But going to the cloud has never been a smooth journey, and we will always encounter some difficulties and challenges in the process. Thanks to the increasing maturity of cloud native technologies, these problems will surely have corresponding solutions.

In the field of development, we are among the leading explorers among domestic cloud vendors. From the incubation of End-Cloud Interconnection 1.0 version in 2018 to the current End-Cloud Interconnection 3.0 version in 2021, problems and challenges of all sizes were encountered, but they were all resolved in the end. This ability brings great convenience to developers of public and private clouds, enabling them to complete the closed loop of development, testing, and joint debugging locally.

In the future, we will continue to provide better, more powerful, and easier-to-use cloud native tools to serve developers, so stay tuned.

Reference materials:

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。