Author: Duan Jia, Xinsheng
The development history of cloud computing is the development history of virtualization technology. In the past 20 years, cloud computing and the Internet have promoted each other's rapid development, and central cloud technology has become a common infrastructure for the whole society. With the continuous development of technologies such as the Internet of Things and artificial intelligence, especially the development of the industrial Internet, central cloud computing has begun to dwarf, and decentralized edge computing is now being re-placed with high expectations. If central cloud computing is driven by technological innovation, then edge computing must be driven by business value.
So what exactly is edge computing? What are the classifications of edge computing? What is the relationship between edge computing and central cloud? This article will strip the cocoon, explain the profound things in simple terms, and elaborate on the understanding and thinking of edge computing and cloud native.
Understanding and thinking about edge computing
Definition of edge computing
There is currently no precise definition of edge computing. From the perspective of IT cloud computing, edge computing is regarded as an extension of central cloud computing. Edge Computing Industry Alliance's definition of edge computing: "On the edge side of the network close to the source of things or data, an open platform that integrates network, computing, storage, and application core capabilities, provides edge intelligent services nearby, and meets the needs of industry digitization in agile connection, real-time key needs in business, data optimization, application intelligence, security and privacy protection.” From a CT telecom perspective, edge computing was originally also known as Mobile Edge Computing (MEC). The European Telecommunications Standards Institute (ETSI) defines MEC: "Mobile edge computing provides an IT service environment and cloud computing capabilities at the edge of the mobile network, within the radio access network (RAN), and in the vicinity of mobile users."
The definition of edge computing has its own emphasis, but the core idea is basically the same - edge computing is a new form of distributed computing based on the core technology of cloud computing and built on the edge infrastructure. An on-site cloud computing close to the data source.
With its powerful data center, central cloud computing provides large-scale pooled and elastically scalable computing, storage, network and other infrastructure services for business applications, and is more suitable for non-real-time, long-period data, and business decision-making scenarios; edge computing focuses on In real-time, short-cycle data, local decision-making and other business scenarios, such as the current popular audio and video live broadcast, IoT, industrial Internet, virtual reality, and even the Metaverse, the workload is sunk to a place away from the terminal device or close to the end user. This achieves lower network latency and improves user experience.
'Octopus-like' edge computing
Edge is edge-distributed computing relative to central computing. The core goal of edge computing is to make quick decisions and expand the computing power of the central cloud to the "last mile". Therefore, it cannot be independent of the central cloud, but under the overall architecture of cloud-edge-end, there are central management and control decisions, as well as decentralized edge autonomous decision-making, that is, octopus-style edge computing.
As shown in the cartoon above, 40% of the octopus's whole-body neuron-centric brain is located, and the remaining 60% is in the distributed legs, forming a structure of 1 brain total control coordination + N cerebellum decentralized execution. One brain is good at global scheduling for non-real-time, long-cycle big data processing and analysis; N cerebellums focus on local, small-scale data processing, and are suitable for field-level, real-time, short-cycle intelligent analysis and rapid decision-making.
Octopus-style edge computing adopts a distributed cloud-edge integration architecture of central cloud + edge computing. After a large number of terminals collect data, real-time decision-making processing of small-scale local data is completed at the edge, while complex and large-scale global decision-making processing is summarized into Central cloud in-depth analysis and processing.
The location of edge computing
Edge computing is located between the central cloud and the terminal, sinking cloud computing capabilities from the center to the edge, and solving specific business needs through the cloud-edge synergy architecture, which can minimize the transmission delay, which is also the core value of edge computing. However, the network transmission path between the central cloud and the terminal passes through the access network (distance of 30 kilometers, delay of 5 to 10 milliseconds), aggregation network, intercity network (distance of 50 to 100 kilometers, delay of 15 to 30 milliseconds) to the backbone network ( The distance is 200 kilometers, the delay is 50 milliseconds), and finally it arrives at the data center (assuming that the IDCs in the data center are all on the backbone network). The time-consuming data is the statistical value of normal network congestion, that is, the actual delay data perceived by the business side, although it is not very Precise, but sufficient to aid architectural decisions.
Cloud computing capabilities gradually sink from the center to the edge, the number of nodes increases, the coverage shrinks, and the cost of operation and maintenance services increases rapidly. According to the status quo of the domestic network (there are multiple backbone networks in China, namely China Telecom CHINANET and CN2, China Unicom CNCNET and China Mobile CMNET), backbone network nodes, intercity network nodes, aggregation network nodes, access network nodes, and tens of thousands of Business on-site computing nodes can be installed with edge computing, so the scope is too broad to form a unified standard. Therefore, we say that central cloud computing is defined by technology, and edge computing is defined by network and business requirements.
There are many participants in the edge computing ecosystem. The three key service providers of cloud manufacturers, equipment manufacturers, and operators, as well as some new AI service providers, all extend from their existing advantages to expand more customers and market space. Equipment vendors use the Internet of Things to gradually build professional clouds with a single function; cloud vendors start to sink from centralized public clouds to distributed regional clouds, and regional clouds are connected through cloud networking to form a cloud with larger coverage. In the Internet era, operators were completely blocked by public clouds and prosperous mobile applications and could only act as pipes. However, in the era of edge computing, services and networks define edge computing, and operators have returned to the focus and are irreplaceable.
Types of Edge Computing
(1) Network-defined edge computing:
By optimizing the network path between the terminal and the cloud center, the central cloud capability is gradually lowered to the terminal close to the terminal, so as to realize the access and access of services nearby. From the center to the edge, it is divided into three types: regional cloud/central cloud, edge cloud/edge computing, and edge computing/local computing:
Regional cloud/central cloud: Extend the central cloud computing services in the backbone network, expand the centralized cloud capabilities to the region, achieve full coverage of the region, solve the time-consuming on the backbone network, and optimize the network delay to about 30ms, but logically Still a central cloud service.
Edge cloud/edge computing: Extend the services of central cloud computing along the operator's network nodes, build small and medium-scale cloud services or cloud-like service capabilities, and optimize network latency to about 15ms, such as multi-access edge computing (MEC), CDN.
Edge computing/local computing: It is mainly about on-site equipment and service capabilities close to the terminal, and the logic of the terminal is stripped out to realize autonomous intelligent services at the edge. The cloud controls the edge's resource scheduling, application management, and business orchestration capabilities to delay network delays. Optimized to about 5ms, such as all-in-one machines, smart routers, etc.
In general, edge computing based on network definition is more for consumer internet business and new 2C business, sinking the capabilities and data of the cloud center to the edge in advance. In addition to the classic CDN, video and voice services, this year The Metaverse of Fire and more.
Most of the current consumer-oriented Internet services are supported by the central cloud computing capability installed in the backbone network. The delay is 30ms to 50ms, which is much smaller than the delay of the back-end business processing in the cloud itself. The original intention of sinking computing power to the edge is mainly Realizing the dispersion of massive request pressure on the central cloud and optimizing the user experience are all icing on the cake for the business, rather than helping the business.
Let's talk about the operator network here. The central cloud computing technology is to virtualize all the internal network of the data center, that is, the intra-cloud network, and derive VPC, load balancing and many other products; the outside of the data center is almost completely shielded from the operator network, only providing elasticity Public network IP and Internet egress bandwidth services, central cloud computing and operator networks are not integrated; but the evolution from central cloud computing to edge computing relies heavily on the network to link the central cloud and the edge. If the central cloud is the brain, edge computing is Intelligent tentacles, then the network is the nerve, the artery and blood vessel, but in fact the overall network planning and construction occurred before the development of cloud computing, not dedicated to cloud computing, so the central cloud computing and the operator network need to be integrated, that is, cloud network integration , the ultimate goal of cloud-network integration is to realize the network scheduling and orchestration of cloud capabilities, and the rapid definition of cloud-based network capabilities. It is hoped that with the help of new service requirements and cloud technology innovation, it will drive the profound transformation, upgrading and opening of operators' network architecture.
At present, the ability of the network greatly limits the development of cloud computing, especially in the process of edge computing and the construction of the Internet of Things; cloud-network integration and computing power network are still the exclusive game of operators, and the disruptive technological change of the new generation of 5G has detonated The subversive changes in the entire field have only solved the problem of massive device access and low-latency access to devices, and the overall back-end supporting facilities and solutions obviously cannot keep up. As far as the current situation is concerned, it is still an embarrassing situation for 5G to find business. In the future, 5G will bring greater changes and value in the field of physical industries (ports, terminals, mines, etc.) than in the consumer field.
(2) Business-defined edge computing:
In addition to consumer-oriented Internet edge scenarios, edge computing is more of a scenario derived from physical industries and intelligent society.
For physical industry scenarios, due to historical reasons, there are a large number of heterogeneous infrastructure resources at the edge and on-site; driving the construction of an edge computing platform through business needs requires not only the integration and utilization of existing infrastructure resources, but also the integration of central cloud computing Technology and capabilities are sunk to the edge and on-site, enabling a large number of stock business operation management and control to be transferred to the cloud, and massive data to be unified into the lake to support the digital transformation of the entire enterprise.
For scenarios derived from an intelligent society, the newer the business, the more sensitive it is to network latency, the greater the amount of data, and the gradual transformation of structured data into unstructured data, which requires advanced intelligent technologies such as artificial intelligence and neural networks. support.
Currently, new business scenarios that are sensitive to network delays are managed through the cloud master control, and the distributed architecture strategy is calculated in real time on the equipment site, so as to reduce the strong dependence on the network. Business-oriented edge computing is divided into two types: smart device/professional cloud and industrial edge/industry cloud:
Smart devices/professional cloud: Based on cloud computing capabilities, it provides integrated and competitive solutions around smart devices, including smart devices, cloud-based services, and end-to-cloud edge services, such as video surveillance cloud, G7 Cargo IoT, etc.;
Industry edge/industry cloud: Based on cloud computing capabilities, it provides suite products and solutions around industry applications and scenarios, such as logistics cloud and aerospace cloud.
In general, edge computing based on business definitions is more for smart devices and physical industries. For smart devices, from single-function smart devices such as AVG, dense storage, and robotic arms, to drones and unmanned vehicles. For ultra-complex smart devices such as cars, cloud computing capabilities not only support the operation of device control and management applications, but also extend to the edge side with the help of central cloud computing capabilities to solve the problem of cloud-based products that cannot be centralized and standardized management; Cloud computing technology, combined with the abstract summary of industry scenarios, builds common products and solutions in the industry. With the accelerated construction of the entire industrial Internet, it is the key direction for the future development of edge computing.
summary
For large-scale enterprises, the cloud-edge scenarios are very complex. The construction of the central cloud computing platform and the edge computing platform not only meets business needs, but also faces many infrastructure problems: the central cloud computing faces the problem of multi-cloud use and multi-cloud intercommunication; in the edge network Links face the problem of multi-operator backbone network, multi-cloud carrier network and multi-cloud cloud-network integration; the device-side access network faces the problem of multi-operator 5G network sharing, etc. Many problems can only be dealt with by means of governance. It cannot be completely solved from the technical platform level.
In general, edge computing has a wide range and a wide range of scenarios. At present, the entire industry lacks classic cases and standards. Therefore, to promote the implementation of edge computing, it must be based on the overall planning for real business scenarios and needs, and gradually build for value.
Kubernetes moves from the center to the edge
Kubernetes follows the application-centric technical architecture and ideas, supports any load with a set of technical systems, and runs on any infrastructure; shields infrastructure differences downwards, and realizes unified scheduling and orchestration of underlying basic resources; upwards through container image standardization Application, realize the automatic deployment of application load; break through the boundary of central cloud computing, seamlessly expand cloud computing capabilities to the edge and on-site, and quickly build a cloud-edge integrated infrastructure.
Extending cloud-native technologies from the center to the edge not only realizes the unification of the technical architecture of cloud-edge infrastructure, but also enables free orchestration and deployment of services on the cloud-edge. Compared with the revolutionary innovation of Kubernetes in the central cloud, although the edge scenarios have obvious advantages, the disadvantages are also fatal. Because there are special situations such as limited resources and network constraints and instability on the edge side, it is necessary to select different Kubernetes edge solutions according to different business scenarios. plan.
Kubernetes Architecture and the Challenges of Edge
Kubernetes is a typical distributed architecture. The master control node is the "brain" of the cluster, responsible for managing nodes, scheduling Pods, and controlling the running status of the cluster. Node Worker node, responsible for running the container (Container) and monitoring/reporting the running status. The following obvious challenges exist in edge computing scenarios:
1. The state is strongly consistent and the centralized storage architecture belongs to the Dacheng product of central cloud computing. It realizes continuous business services based on the orchestration and scheduling of large-scale pooled resources.
2. The Master control node and the Worker node realize real-time synchronization of status tasks through the List-Watch mechanism, but the traffic is large, and the Worker node completely relies on the Master node to persist data and has no autonomy.
3. Kubelet carries too much logic processing, and various container runtimes are compatible with various implementations, as well as Device Plugin hardware device drivers, which occupy up to 700M of running resources; the burden on edge nodes with limited resources is too heavy, especially the low-profile edge equipment.
Edge computing involves a large scope and complex scenarios, and there is no unified standard; the mainline version of the Kubernetes open source community has no adaptation plan for edge scenarios.
Kubernetes edge operation solution
For the cloud-edge distributed architecture of central cloud computing and edge computing, it is necessary to adapt Kubernetes to an architecture suitable for edge distributed deployment, realize unified management through multi-cluster management, and realize central cloud management and edge operation. There are three solutions as a whole. :
- Cluster: The advantage of sinking the Kubernetes standard cluster to the edge is that it does not require Kubernetes to do customized research and development, and can support multiple versions of Kubernetes, so that the business can truly achieve the same architecture between the cloud and the edge; the disadvantage is that it takes up a lot of management resources. The solution is more suitable for regional cloud/central cloud, edge computing/local computing and large-scale industrial edge scenarios.
- Single Node: Simplifies Kubernetes and deploys it on a single-node device. The advantages are consistent with the cluster solution. The disadvantage is that Kubernetes has incomplete capabilities, and resource occupation will increase the cost of the device. It cannot guarantee a consistent cloud-edge architecture for business applications. The deployment runs without solving the actual problem.
- Edge Node Remote Node: Based on Kubernetes secondary development enhancement and extension, decoupling Kubernetes into a cloud-edge distributed architecture scenario, centrally deploying Master management nodes, and decentralized deployment of Worker management nodes.
In addition, consistency is the pain point of edge computing. Adding a Cache at the edge can realize edge autonomy in special cases of network disconnection, and at the same time, it can ensure data consistency in normal network conditions; there is also the problem that Kubelet is relatively heavy, as Kubernetes abandons Docker It has begun to be streamlined; at the same time, the hardware update iteration is relatively fast, and compared with a small amount of hardware cost, the native and generality of Kubernetes is maintained. In fact, I hope that the Kubernetes community itself can provide an adaptive edge solution, and consider adding a caching mechanism to Kubelet.
Kubernetes edge containers grow rapidly
Kubernetes has become the de facto standard for container orchestration and scheduling. For edge computing scenarios, various domestic public cloud vendors have open sourced their own Kubernetes-based edge computing cloud-native projects, such as OpenYurt, which Alibaba Cloud contributed to CNCF, using the edge node Remote Node solution. , is the industry's first open source non-intrusive edge computing cloud native platform, adhering to the non-intrusive design concept of "Extending your native Kubernetes to Edge", and has the ability to achieve edge computing coverage in all scenarios. Huawei, Tencent, Baidu, etc. have also open sourced their own edge container platforms.
The rapid development of edge containers has driven innovation in the field, but to a certain extent, it has also made it difficult to choose when building an edge computing platform. From the perspective of technical architecture, the general architectural ideas of several edge container products are mainly to decouple Kubernetes into edge computing scenarios suitable for cloud edge, weak network and resource scarcity, which are essentially the same; the same is true from the perspective of product functions , which basically cover cloud-edge collaboration, edge autonomy, and unitized deployment functions.
How to build a cloud-native integrated cloud-native platform
At this stage, building a cloud-edge integrated cloud-native infrastructure platform capability around the Kubernetes container platform is the best choice for an edge computing platform. Through the unified multi-cluster management of containers in the cloud, the unified management of distributed clusters is realized, and the specification and configuration of Kubernetes clusters are standardized. :
- Standard cluster (large-scale): supports large-scale clusters with more than 400 nodes, configured as ETCD + Master 3 sets of 8c16G, Prometheus + Ingress 5 sets of 8C16G, N * Work nodes; mainly for cloud-native application operation scenarios with large business scale ;
- Standard cluster (medium-scale): supports clusters with more than 100 nodes, ETCD + Master + Prometheus 3 sets of 8c16G, N * Work nodes; mainly for medium-scale business scenarios;
- Edge-native container cluster: Deploy cluster management nodes in the cloud, and deploy edge nodes separately on business sites to support applications running a single business scenario, such as IoT physical device access protocol analysis applications, video surveillance analysis AI algorithm models and other business scenarios.
The optimal container cluster solution is selected according to the needs of the business scenario. The edge container cluster solution is quite different from other cluster solutions. Other clusters still maintain the same service as the central cloud cluster, the basic resources are centralized and pooled, and all applications share the entire cluster resources; The master management nodes of the edge container cluster are deployed centrally and shared for use; the worker nodes are scattered in the business site, self-service increase on demand, self-operation and maintenance, and exclusive use.
At present, it is difficult to have a unified open source product in the field of edge containers in a short period of time. Therefore, it is recommended to integrate edge native container clusters through the standard Kubernetes API at this stage. This is a moderate solution compatible with all edge containers. If you have to choose one or the other, it is recommended It is OpenYurt, non-intrusive design, the overall technical architecture and implementation are more elegant.
OpenYurt: Open Source Practice of Intelligent Edge Computing Platform
Based on the upstream open source project Kubernetes, OpenYurt is a distribution adapted for edge scenarios. It is the industry's first intelligent edge computing platform based on cloud native technology system and "zero" intrusion. With a full range of "cloud, edge and end integration" capabilities, it can quickly realize the efficient delivery, operation, maintenance and management of massive edge computing services and heterogeneous computing power.
Design Principles
OpenYurt adopts the current mainstream cloud-edge distributed collaboration technology architecture of "central management and edge operation" in the industry, always implements the concept of "Extending your native Kubernetes to Edge", and abides by the following design principles:
- The principle of "cloud-edge integration": On the basis of ensuring the same user experience and product capabilities as the central cloud, cloud-native capabilities are sunk to the edge through the cloud-edge management and control channel to realize massive intelligent edge nodes and business applications, and improve the infrastructure. A major breakthrough in the industry-leading cloud-native architecture.
- "Zero intrusion" principle: Ensure that the APIs open to users are exactly the same as native Kubernetes. Through the proxy node network traffic method, a new layer of encapsulation abstraction is added to the application lifecycle management of Worker nodes to realize unified management and scheduling of distributed worker node resources and applications. At the same time follow the "UpStream First" open source principle;
- "Low load" principle: On the basis of ensuring the functional characteristics and reliability of the platform, taking into account the versatility of the platform, strictly limiting the resources of all components, and following the design concept of minimization and simplification, so as to maximize the coverage of edge devices and Scenes.
- "One-stack" principle: OpenYurt not only realizes the enhanced functions of edge operation and management, but also provides supporting operation and maintenance management tools to achieve one-click efficient conversion between native Kubernetes and Kubernetes clusters that support edge computing capabilities;
Features
Based on the powerful container orchestration and scheduling capabilities of Kubernetes, OpenYurt has enhanced adaptations for limited edge resources, restricted and unstable networks, etc. It extends the central cloud-native capabilities to distributed edge nodes to achieve low-latency services for edge services nearby; Reverse security controls the operation and maintenance link, providing convenient and efficient unified operation and maintenance management capabilities of cloud-based centralized edge devices and applications. Its core features are as follows:
1. Edge node autonomy: In the edge computing scenario, the cloud-edge management and control network cannot guarantee continuous stability. Through enhanced adaptation, the stateless data of the native worker nodes is solved, and the data of the master control node is strongly relied on and the state is strongly consistent. These are not possible in edge scenarios. adaptation problem. In this way, when the cloud-edge network is not smooth, the edge workload is not expelled, and the business continues to serve normally; even if the edge node restarts when the network is disconnected, the business can still return to normal; that is, the temporary autonomy of the edge node.
2. Collaborative operation and maintenance channel: In the edge computing scenario, the cloud-edge network is not on the same network plane, and the edge nodes are not exposed on the public network. The central management and control cannot establish an effective network link channel with the edge nodes, resulting in all native Kubernetes operations APIs (logs/exec/metrics) fail. Adapt to enhance Kubernetes capabilities. When the edge point is initialized, a reverse channel is established between the central control and edge nodes, and the traffic of the native Kubernetes operation and maintenance APIs (logs/exec/metrics) is accepted to realize centralized and unified operation and maintenance;
3. Edge unitized load: In edge computing scenarios, business-oriented is generally a cloud-edge collaborative distributed architecture of "centralized control and decentralized operation"; for the management side, the same business needs to be deployed to nodes in different regions at the same time ; For the edge side, Worker work sections are generally scattered in a wide-area space, and have a strong regionality. The network between nodes across regions is not interconnected, resources are not shared, and resources are heterogeneous and other obvious isolation attributes. Adaptation enhances Kubernetes capabilities, and implements unitized management and scheduling of edge loads based on three layers of resources, applications, and traffic.
Through the OpenYurt open source community, more participants are introduced to build together, and the joint R&D method provides more optional professional functions. The OpenYurt features are gradually improving and expanding the coverage:
1. Edge device management: In edge computing scenarios, device-side devices are the real service objects of the platform; based on the cloud-native concept, an abstract, non-intrusive and scalable device management standard model seamlessly integrates the Kubernetes workload model and IoT device management Model to realize the last mile of platform-enabled business. At present, the integration of EdgeX Foundry open source projects is completed through the standard model, which greatly improves the management efficiency of edge devices.
2. Local resource management: In edge computing scenarios, the existing block devices or persistent memory devices on edge nodes are initialized into cloud-native and convenient container storage. Two types of local storage devices are supported: (1) based on block devices or LVM created by persistent memory device; (2) QuotaPath created based on block device or persistent memory device.
OpenYurt Design Architecture and Principles
(1) Design Architecture
Native Kubernetes is a centralized distributed architecture. The master control node is responsible for managing scheduling and controlling the cluster running status; the worker node is responsible for running containers and monitoring/reporting the running status;
Based on native Kubernetes, OpenYurt decouples and adapts the central and central distributed architecture (Cloud Master, Cloud Worker) into a centrally controlled distributed edge operation (Cloud Master, Edge Worker) for edge scenarios, forming a central brain , the octopus-style cloud-side collaborative distributed architecture of multiple distributed cerebellum, its main core points are:
1. Centralized and strongly consistent state storage of metadata is distributed to edge nodes, and the native Kubernetes scheduling mechanism is adjusted to realize that the abnormal state of autonomous nodes does not trigger rescheduling, so as to realize the temporary autonomy of edge nodes;
2. Ensure that Kubernetes capabilities are complete and consistent, while being compatible with the existing cloud-native ecosystem, as much as possible to sink the cloud-native system to the edge;
3. The mode of pooling large-scale resources in the center and entrusting multiple applications to schedule shared resources is adapted to regional small-scale or even single-node resources to achieve more refined unit workload orchestration management in edge scenarios;
4. Facing the needs of actual business scenarios at the edge, through the open community, it seamlessly integrates device management, edge AI, streaming data, etc., and the out-of-the-box general platform capabilities for actual business scenarios at the edge enable more edge application scenarios.
(2) Implementation principle
OpenYurt implements the concept of cloud-native architecture, and realizes the distributed architecture of cloud-edge collaboration and the ability of the center to control edge operation for edge computing scenarios:
- For edge node autonomy, on the one hand, the new YurtHub component implements Edge To Cloud Request proxy, and the caching mechanism persists the latest metadata on edge nodes; on the other hand, the new YurtControllerManager component takes over Native Kubernetes scheduling, so that the abnormal status of edge autonomous nodes does not trigger rescheduling;
- Aiming at the complete capability and ecological compatibility of Kubernetes, by adding YurtTunnel components, a Cloud To Edge Request reverse channel is built to ensure the consistent capabilities and user experience of central operation and maintenance management products such as Kubectl and Promethus; To the edge, including various workloads and ingress routing, etc.;
- For the edge unitized management capability, by adding the YurtAppManager component, and at the same time with NodePool, YurtAppSet (formerly UnitedDeployment), YurtAppDaemon, ServiceTopology, etc., to achieve three-layer unitized management of edge resources, workloads and traffic;
- In order to empower the actual business platform capabilities of the edge, the newly added NodeResourceManager realizes the convenient use of edge storage, and the introduction of YurtEdgeXManager/YurtDeviceController realizes the management of edge devices through cloud native mode.
core components
All new functions and components of OpenYurt are implemented through Addon and Controller. Its core mandatory and optional components are as follows:
1. YurtHub (required): There are two operating modes: edge (edge) and cloud center (cloud); it runs on all nodes on the cloud edge in the form of Static Pod, serves as a SideCar for node traffic, and accesses components on the proxy node and kube-apiserver Traffic, among which the edge YurtHub will cache the data to achieve temporary edge node autonomy.
2. YurtTunnel (required): It consists of a Server server and an Agent client. It builds a two-way authentication and encrypted cloud-edge reverse tunnel, and forwards the native Kubernetes operation and maintenance APIs (logs/exec/metrics) from the cloud center to the edge (edge). ) request traffic. The Server is deployed in the cloud center with the Deployment workload, and the Agent is deployed on the edge node with the DaemonSet workload.
3. YurtControllerManager (required): The cloud center controller takes over the NodeLifeCycle Controller of the native Kubernetes, so that the Pod application of the autonomous edge node will not be expelled when the cloud edge network is abnormal; there is also the YurtCSRController, which is used to approve the certificate application of the edge node.
4. YurtAppManager (required): implements unitized management and scheduling of edge loads, including NodePool: node pool management; YurtAppSet: the original UnitedDeployment, the business load of the node pool dimension; YurtAppDaemon: the Daemonset workload of the node pool dimension. Deployed in the cloud center as the Deployment workload.
5. NodeResourceManager (optional): The management component of the local storage resources of the edge node, which dynamically configures the local resources of the host by modifying the ConfigMap. Deployed on edge nodes as DaemonSet workloads.
6. YurtEdgeXManager/YurtDeviceController (optional): Manage edge devices through cloud native mode, currently supports the integration of EdgeX Foundry. YurtEdgeXManager is deployed in the cloud center with the Deployment workload, YurtDeviceController is deployed on the edge nodes with the YurtAppSet workload, and a set of YurtDeviceController can be deployed with the node pool NodePool as a unit.
7. Operation and maintenance management component (optional): In order to standardize cluster management, the OpenYurt community has launched the YurtCluster Operator component, which provides a cloud-native cluster API and configuration, and automates the deployment and configuration of OpenYurt-related components based on standard Kubernetes, realizing the full life cycle of an OpenYurt cluster. The old Yurtctl tool is recommended to be used only in test environments.
In addition to core functions and optional professional functions, OpenYurt continues to implement the concept of cloud-edge integration, and pushes the rich ecological capabilities of cloud native to the edge to the greatest extent. It has realized edge container storage, edge guard workload DaemonSet, and edge network access. Ingress Controller, etc., there are also planned functions such as Service Mesh, Kubeflow, Serverless, etc., wait and see.
current challenges
(1) Cloud edge network
In the edge computing scenario, the cloud-edge network is often mentioned as being poor and unstable. In fact, the domestic basic network began to be fully upgraded in 2015, especially after the "Xueliang Project" was fully completed, the basic network has been greatly improved. . The above picture is excerpted from the "48th China Internet Network Development Status" report, the proportion of fixed network 100Mbps access has reached 91.5%; wireless network access is already a high-quality network of 4G and 5G.
The real challenge lies in the networking of the cloud-side network. For the scenario of using the public cloud: the public cloud shields the data center network and only provides the Internet egress bandwidth. Through the Internet to open up the cloud-side network, it is usually only necessary to solve the problem of secure data transmission and access. Not complicated. For the private self-built IDC scenario: it is not easy to open up the cloud-side network, mainly because the operator network is not fully commercialized, and other complex products such as private IDC layer-by-layer firewalls require professional network personnel to complete the implementation.
(2) list-watch mechanism and cloud-side traffic
The List-Watch mechanism is the design essence of Kubernetes. It obtains related events and data through the active monitoring mechanism, so as to ensure that all components are loosely coupled and independent of each other, and are logically integrated. The List request returns the full amount of data. Once the Watch fails, it needs to be relisted. However, Kubernetes has considered the optimization of management data synchronization. The kubelet of the node only monitors the data of this node, and the kube-proxy will monitor all the service data, and the amount of data is relatively controllable. At the same time, the gRPC protocol is used, and the text message data is very small compared to the business data. The above picture is the pressure measurement data monitoring chart on the cluster scale of 1200 nodes.
The real challenge is the distribution of basic images and application images. The current basic images and business images, even in the central cloud, are still exploring various technologies to optimize the bottleneck of rapid image distribution; especially for edge AI applications, they are generally The push application + model library is composed. The image of the calculation application is relatively small, and the volume of the model library is very large. At the same time, the model library needs to be updated frequently with self-learning. If the model library is updated more efficiently, more technologies and solutions are required to response.
(3) Edge resources and computing power
The resource situation of the edge needs to be subdivided into scenarios. For the edge computing of the operator network and the consumer-oriented edge computing, the resources are relatively sufficient. The biggest challenge is resource sharing and isolation; for the edge of the physical industry, there will be no small IDC support. The edge resources are very sufficient to sink the entire cloud native system; for the edge of smart devices, resources are relatively scarce, but generally through an intelligent edge box, one end is connected to the device, and the other end is connected to the central control service, from the AI edge box in the above figure. From the perspective of the overall configuration, the speed of improvement is relatively fast. In the long run, the computing power of the edge is rapidly increased to meet the needs of more complex and intelligent scenarios.
(4) Kubelet is heavy and takes up a lot of resources to run
For the problem that the Kubelet is heavy and occupies a lot of resources, it is necessary to deeply understand the allocation and usage of node resources. Usually, the resources of nodes are divided into four layers from bottom to top:
- Resources required to run the operating system and system daemons (such as SSH, systemd, etc.);
- Resources required to run Kubernetes agents, such as Kubelets, container runtimes, node problem detectors, etc.;
- The resources available to the Pod;
- Resources reserved up to the eviction threshold.
There is no standard for resource allocation settings for each layer, and the configuration needs to be weighed according to the situation of the cluster. Amazon Kubernetes' algorithm for Kubelet resource allocation is Reserved memory = 255MiB + 11MiB * MAX_POD_PER_INSTANCE; Assuming that 32 Pods are running, up to 90% of the memory can be allocated For business use, relatively speaking, Kubelet does not occupy a lot of resources.
At the same time, it is necessary to adjust the response according to the requirements of the business for high availability. For edge scenarios, it is generally not recommended to run a large number of Pods on one node.
Cloud-side management-operation collaboration model for business applications
The distributed business application architecture based on the central cloud is fundamentally different from the cloud-edge distributed collaborative business application architecture. In the central cloud, it is more based on the DDD business field, splitting the complex business system into relatively independent services, and building a loosely coupled distributed application as a whole; but in the cloud-edge distributed scenario, more emphasis is placed on It is a centralized management and control operation and decentralized operation support. The management and operation system is centralized in the cloud center to realize centralized management and control, and the applications that support the real-time operation of the business are distributed to the edge to achieve low-latency and fast response.
From the perspective of business applications, the two layers of finance/operation and planning/management belong to the application of management and control operations, that is, they need to be unified and aggregated through the central cloud to achieve centralized and strong management and control; they are not sensitive to delays, and require security and big data analysis capabilities. High; the three layers of control, sensing/execution, and production process belong to operation support applications, and the central cloud can also be given priority; if the business scenario is sensitive to delay, only consider the edge computing capability to achieve distributed low-latency response;
From the point of view of request response, it is not sensitive to delay (more than 50ms), and it is limited to be deployed in central cloud computing and cloud-based edge products (CDN) implementation; sensitive to delay (less than 10ms), the backbone network of operators cannot support , Consider building an edge computing platform, and at the same time, the business faces a lot of investment and personnel;
Taking the field of physical logistics as an example, in the classic OTW system (OMS order management system, WMS warehouse management system, TMS transportation management system), OT is a typical management and operation system, so it is recommended to deploy in the central cloud, through the central cloud data aggregation, Realize cross-regional business such as order splitting, multimodal transportation; W is a warehouse management system, and the task of managing four walls is an operation support application, and warehouses generally have some automation equipment, so you can consider deploying W at the edge.
Summarize
The construction of the edge computing platform, the cloud-native technology system with Kubernetes as the core, is undoubtedly the best choice and construction path at present; however, the cloud-native system is huge and the components are complex, and sinking the system to the edge will face great challenges and difficulties , at the same time full of huge opportunities and imagination space. To truly implement the cloud native system at the edge, business applications need to be jointly implemented from various aspects such as concept, system design, and architecture design, in order to give full play to the advantages and value of the edge.
If you are interested, you can also search the group number: 31993519, and join the Dingding group of the OpenYurt project.
Poke here to learn about the OpenYurt project now!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。