The so-called shortcut, when there are many people walking, it is blocked. The same is true of the Internet highway.
Author|Editor by Haraoka|IMMENSE
01 Origin: The classic architecture of "acceleration"
CDNs didn't exist when the Internet was born.
When there is no CDN acceleration, a large number of user requests need to traverse the Internet backbone to obtain the content of the origin site.
In the 1980s, Internet technology began to be used for civilian use. People mainly accessed the network through dial-up. Due to the small number of users and the small bandwidth, it did not bring pressure to the backbone network and servers.
With the rapid development of the Internet, the number of users using the Internet has exploded, and the emergence of broadband access networks has put increasing pressure on content source servers and backbone networks.
Due to the long network distance and the network congestion of the backbone network, the end-to-end request delay will be very long, and the user's access requirements cannot be responded to in time, which will seriously affect the user experience.
In the early CDN architecture design, the core goal was to achieve "acceleration" through content distribution. The essential logic was to "move" files from the source site to a place close to the user, shortening the physical distance of content transmission to achieve the so-called "Accelerate" effect.
Based on this premise and background, the technical focus is how to make as little traffic as possible pass through the edge cluster back to the origin site, that is, to improve the hit rate of the content as much as possible.
In fact, manufacturers in the industry have basically invested the most technical investment in this aspect, trying to terminate access at the edge as much as possible, and secondly adding a cache layer upstream (called an intermediate source by many manufacturers) to "intercept" back-to-source traffic.
Therefore, in the classical CDN static acceleration, the node architecture is logically designed according to the layered design, that is, from the edge -> the first-level parent layer ->... -> the N-level parent -> the source site.
With a CDN, traffic traversing the Internet backbone is drastically reduced as a large number of requests find what they need at the edge.
In this way, the traffic pressure on the backbone network is effectively reduced, and the bandwidth cost of the SP (Service Provider, service provider) is also saved, which promotes the rapid development of Internet services.
02 Disadvantage: Loss of control in dynamic scenes
However, in some scenarios, the classic CDN technical architecture is not a panacea.
In Internet businesses represented by e-commerce, social interactive media, and blogs, there are a large number of dynamic content acceleration scenarios that cannot be cached and require real-time back-to-source.
For example, the e-commerce platform involves user registration, login, online payment, seckill and other scenarios that require dynamic acceleration.
In terms of traffic, the traffic of the entire network of a domain name decreases gradually with the deepening of the hierarchy, and finally returns to the origin site from several nodes. In some cases where the content is relatively hot, the amount of back-to-origin will be less.
From a microscopic perspective, the general logic is to send the content to the edge node closest to the customer. Then, for the subsequent parent node (Parent Node), the same logic is still followed, that is: the first-level parent is as close as possible to the edge, and the second-level parent is as close as possible to the first-level parent.
The final state is that the nodes of the CDN are concentrated near the client.
Based on this, there will be an unavoidable situation that the file does not hit the in-network node of the CDN and must be returned to the source, which will result in a relatively long non-CDN-controllable public network link back to the source.
From the quality point of view, the quality degradation caused by back-to-origin does not necessarily have a high impact on the overall domain name quality.
As an intuitive example, if the CDN hit rate of the customer domain name is 95%, that is, the proportion of back-to-source traffic is only 5%, then even if the response time of this part of the traffic is abnormal, the overall traffic will only be affected by about 5%.
Based on the above argument, if it is a traffic that needs 100% back-to-source, such as login, form submission, recommendation list, payment and other traffic scenarios. When the traffic is switched to the CDN static acceleration platform, the overall quality will easily get out of control when the nodes are highly concentrated on the edge and go back to the source through a long-distance uncontrollable public network link.
03 Thinking: The Core of Dynamic Acceleration
For purely dynamic traffic, the core problem is relatively clear:
When customer traffic is connected to the CDN edge node, it needs to send the request to the customer origin site across a long physical distance. How CDN promises to provide a low-latency, high-stable service quality is a core issue.
From the point of view of edge access, users' dynamic traffic is basically https access. Based on the widely distributed edge nodes of CDN, the TCP handshake and SSL handshake accessed by clients can be offloaded to the CDN edge nodes, thereby The operation that originally required multiple handshake interactions with the origin station over a long distance has been greatly improved.
From the point of view of transmission within a node, in order to achieve the optimal delay, it is necessary to use the shortest and optimal link, and at the same time cooperate with the most efficient transmission on this link.
"The so-called "build the road well, run the car well", these two abilities must be satisfied at the same time, in order to exert the optimal acceleration effect."
No matter how good the link is, if the intermediate transmission is accompanied by additional interaction overhead, such as excessive tcp handshake, ssl handshake, etc., it is difficult to withstand the negative impact.
We call these two capabilities "routing capability" and "transmission capability", and the core technical points are: transmission optimization and dynamic routing.
04 "Fix the Road": Transmission Optimization of Core Technology
For low latency, dynamic traffic is often dominated by small file content, that is, a network interaction is completed. Therefore, the traditional CDN based on TCP optimization of large file downloads is difficult to play a great role.
The fundamental reason is that:
At present, most TCP optimizations are based on multi-packet statistics and measurements to detect data such as the minimum delay and maximum window of the network, and to adjust the number and frequency of sending and receiving packets. Then the scenario of a network interaction (typical dynamic business scenario, such as barrage, transaction payment, login, etc.) is obviously not applicable.
Therefore, for the acceleration of dynamic traffic, the first packet (basically equal to the response time) is a core indicator. Unlike the large file scenario, since the download time may be more than seconds, the proportion of the first package in the total completion time is not very high.
For dynamic traffic, the first packet is basically everything. Its time order is almost equal to the time of a tcp handshake, so there is an extra long-link handshake overhead in the transmission process, and the impact is huge.
For the "transmission capability" of the two core capabilities of dynamic traffic, the core is actually the 0rtt capability. The so-called 0rtt means that in addition to the one-time transmission payload behavior that must be generated in the CDN node, there will be no additional round-trips on the network ( the so-called "0").
In terms of this capability, Alibaba Cloud's full-site acceleration, after years of polishing, has built a user-mode application network, which enables a transmission pipeline with zero handshake overhead at runtime between the CDN edge and the origin site.
05 "Running a good car": dynamic routing of core technologies
Regarding the routing system, based on Alibaba Cloud's years of business experience and evolution in accelerating DCDN, this article mainly throws out some views for readers to think further.
As mentioned earlier, under the default architecture of CDN, the back-to-source involves a long public network link. This link may need to cross unconnected provinces, countries, or even continents, or it needs to pass through different types of operator networks. .
In the routing of the WAN, there are many complex regional and commercial customization strategies, and detours and other situations often occur.
An effective solution is based on the widely distributed nodes of the CDN, through detection between nodes, and with the extensive connectivity between CDN nodes and various operators, constructing "path cutting" to avoid possible problems in crossing long links as much as possible.
The so-called "path cutting" is to build multi-segment TCP to guide data, and try to follow the expected link at the routing level.
For routing, it is different from general Layer 3 routing.
Because dynamic service traffic is a specific scenario, additional attention will be paid to nodes during route selection. From the node to the user source site, service characteristics, HTTP and HTTPS traffic characteristics, differences between TCP and UDP, long connections and short connections will have some subtle effects on service traffic.
Therefore, for the optimal path calculation of the network (as shown in the figure below), there are many related algorithms that can be referred to.
"The core problem of optimal path calculation is how to compose the graph, that is, how to measure and normalize the edges of the graph, which is a very important topic."
In addition to the measurement and definition of "edges" in the composition, we should also pay attention to the dimension of "nodes". The classical optimal route algorithm in academia does not consider the link or node capacity.
Then, according to the operation result of the optimal path correlation algorithm, the traffic will converge to a certain link or node, which will cause a reverse effect and lead to the deterioration of the link quality.
An image metaphor is: the so-called shortcut, if there are many people walking, it will be blocked.
The traditional classical algorithms cannot operate normally once the link capacity limitation is involved, and new models are needed to deal with such problems.
Another problem that needs to be considered at the routing level is that the classical routing algorithm is stateless.
This means that there is no relationship between the two routing processes, which will lead to a large difference in the results of each routing, and the traffic fluctuates wildly in the network, which has a great impact on the stability and processing capacity of the system. Big challenges and risks.
The last issue to be considered at the "route selection" level is to clearly distinguish what should be done at the node level and which should be done at the route selection level.
In the field of SDN, the node level is defined as the data plane, and the routing level is defined as the control plane. In other words, what is the so-called control plane to control and which can it control?
For common solutions in the industry, routing is basically centralized, so naturally, the interaction between nodes and the center should not be too frequent.
The routing level needs to go through the process of collecting and aggregating data, and decision-making and strategies will inevitably lead to delays.
For example, if a cycle of task processing and delivery is completed in 10 minutes, the system must have enough buffers. This buffer core generally reflects two points, one is to leave a certain margin, and the other is to have a certain prediction.
In one sentence, each time the routing system calculates the result, there is an implicit SLA (Service Level Agreement) for the node data plane.
For example, in a certain routing system, the current result is guaranteed to be within the next 10 minutes, under the threshold that the traffic does not exceed xx milliseconds, the probability that the delay can be controlled for xx milliseconds is 99.9%, then for some second-level link flashes If the data is disconnected or the quality deteriorates, the node data plane needs to have its own disaster recovery and bottom-up strategy. This part is within the interaction time scale of the central routing system, and it is difficult to provide effective support.
Looking at the future evolution from the perspective of routing alone, the traditional detection mode based on sub-scenarios and artificially specified strategies (detection is essentially a bypass sampling, and statistically speaking, it is hoped to construct a sampling to maximize Reflect the overall or actual business flow), and then construct the structure and calculate the path based on this. In terms of system optimization and iteration, there is more or less a certain GAP for the fit of the business.
However, in the actual business development process, in the face of a site-wide business that mixes both dynamic and static traffic scenarios, the corresponding technical architecture needs to have more consideration and comprehensive perspectives, whether it is "transmission" or "transmission". route".
From a historical point of view, the technological evolution of dynamic acceleration services is basically based on the problems of static CDN architecture in specific scenarios, and iterates and evolves continuously to develop a differentiated architecture and technology stack.
"Video Cloud Technology", your most noteworthy public account of audio and video technology, pushes practical technical articles from the frontline of Alibaba Cloud every week, where you can communicate with first-class engineers in the audio and video field. Reply to [Technology] in the background of the official account, you can join the Alibaba Cloud video cloud product technology exchange group, discuss audio and video technology with industry leaders, and obtain more latest industry information.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。