High-performance weapon: CDN I suggest you learn it!

CDN overview

CDN full name of Content Delivery Network , which is the content distribution network. The basic idea is to avoid bottlenecks and links on the Internet that may affect the speed and stability of data transmission, so that content transmission is faster and more stable.

The working principle of CDN's is to cache the resources of the source site on each node of the CDN. When a request hits the resource cache of a node, it immediately returns to the client to avoid that each requested resource is obtained through the source site, avoiding network congestion, Relieve the pressure on the origin site and ensure the speed and experience of users' access to resources.

To give an life at 16136f1e4e8bd3, we buy goods in a certain Dongshang , and express delivery can be delivered on the same day. The basic principle is to build local warehouses all over the country. When a user purchases a commodity, the smart warehouse distribution mode allows consumers to choose the nearest warehouse for delivery, so that shorten the logistics delivery time.

For the distribution of commodity inventory, please refer to the figure below. From factory (origin site) -> regional warehouse (second-level cache) -> local warehouse (first-level cache)

content distribution network is like the aforementioned smart warehouse distribution network, which solves the problem of access delay caused by distribution, bandwidth, and server performance, and is suitable for site acceleration, on-demand, live broadcast and other scenarios. It enables users to obtain the required content nearby, solves the congestion of the Internet network, and improves the response speed and success rate of users visiting the website.

`The birth of CDN`

CDN was born twenty years ago, to address the content source server and transmission backbone network problem of excessive pressure, in 1995 , Professor at MIT, one of the inventors of the Internet Tom Leighton led the graduate Danny Lewin and several other A top researcher tried to solve the problem of network congestion with mathematical problems.

They use mathematical algorithms to deal with the dynamic routing of content, and finally solve the problem that plagued Internet users. Later, Jonathan Seelig, an MBA student from the Sloan School of Management, joined Leighton's team. Since then, they have begun to implement their own business plans, and finally formally established the company on August 20, 1998, named Akamai. Akamai company through the intelligent Internet distribution, ended the awkward situation of "World Wide Wait".

In 1998, China’s first CDN company ChinaCache was established

`How CDN works`

`Access CDN`

Before accessing CDN , when we visit a certain domain name, we directly get the IP address of the first real server. The whole process is as follows (the picture is a bit simple)

When we need to accelerate website, by registering with the operators accelerate their own domain name, the domain name the source station, then into their own domain names DNS configuration information, A records modified to CNAME record can be. Alibaba Cloud acceleration application reference is as follows:

`CDN access process`

1. When the user accesses the image content, it is local DNS of 16136f1e4e8ead first, and if the LDNS hits, it will be directly returned to the user.
2. LDNS MISS, forward authorized DNS query
3. Return the domain name CNAME [picwebws.pstatp.com.wsglb0.com.]() corresponding to the IP address (actually the IP address of the DNS scheduling system)
4. The domain name resolution request is sent to the DNS dispatch system, and the DNS dispatch system allocates the best node IP address for the request.
5. The returned parsed IP address
6. The user cache server, and the cache server responds to the user request and transmits the content required by the user to the user terminal.

Figure: HUAWEI CLOUD site-wide acceleration diagram

`What problem does CDN solve`

`Too much pressure on the backbone network`

Tom Leighton in 1995 years, led the team to try to solve network congestion problems with math problems to solve backbone network pressure is too large. As surfing the Internet more and more teenagers, the core node traffic throughput of the backbone network is not enough to support the growth of Internet users, through CDN user traffic can be prevented from flowing through the backbone network.

The backbone network is a global local area network. Tier 1 Internet Service Providers (ISPs) connect their high-speed optical fiber networks together to form the backbone of the Internet, enabling efficient transmission of traffic between different geographic areas.

1. Local area network

Local Area Network (LAN) refers to a group of computers interconnected by multiple computers in a certain area. For example, in college, when the network is disconnected after 12 o'clock in the evening, we can still open the CS through the router. Warcraft. That is based on the local area network interconnection, to realize the communication between data sharing and information.

2. Backbone network

Here is a reference to the entire network architecture of China Telecom. The backbone network can be understood as a nationwide local area network. Through the flow of core nodes, the entire network can be interconnected. This is why we call it Internet.

Beijing, Shanghai, and Guangzhou are the super cores of ChinaNet. In addition to super cores, ChinaNet also has ordinary cores such as Tianjin, Xi'an, Nanjing, Hangzhou, Wuhan, and Chengdu.

`Middlemile of three kilometers`

Usually there is a "three kilometers" distance during network access

The first kilometer is from the source station to the ISP access point
The second kilometer is: the ISP access point of the origin site to the ISP access point of the visiting user
The third kilometer (the last kilometer) is: user ISP access point to user client

The CDN network layer is mainly used for accelerate the second kilometer ( middlemile ),

In the CDN infrastructure, two-level servers are usually used for acceleration:

L1 (lower layer): The closer you are to the user (or commonly known as netizens), the better. It is usually used to cache static data that can be cached, called lastmile (last mile).
L2 (upper layer): The closer to the source station, the better, called firstmile (first mile), when L1 cannot hit the cache, or the content is not cacheable, the request will be transparently transmitted to L2 through L1, if L2 still does not hit the cache Or the content cannot be cached, it will continue to be transparently transmitted to the upstream of L2 (it may be the source station, or L3). At the same time, L2 can also do the convergence of traffic and the number of requests, reducing the amount of return to the source (if it can be cached) , Reduce the source station pressure.
The part between L1 and L2 is the "internal network" of the CDN, which is called middlemile (middle one kilometer).

`The composition of CDN`

`Global Load Balance (GLB)`

When a user visits a website that joins the CDN service, the domain name resolution request will ultimately be handled by the "intelligent scheduling DNS".
It uses a set of pre-defined strategies to provide the user with the node address closest to the user , so that the user can get fast service.
At the same time, it needs to maintain communication with CDN nodes distributed in various places, track the health status, capacity and other information of each node, and ensure that user requests are distributed to nearby available nodes.

`Cache server`

The main function of the cache server is to cache hot data. The data types include: static resources (html, js, css, etc.), multimedia resources (img, mp3, mp4, etc.), and dynamic data ( edge rendering).

The well-known open source software related to CDN is:

Squid
Varnish
Nginx
OpenResty
ATS
HAProxy

For specific comparison, please refer to: https://blog.csdn.net/joeyon1985/article/details/46573281

CDN's layered architecture

`Source station`

The origin site refers to the original site where the content is posted. Adding, deleting, and changing files on the website are all performed on the origin site; in addition, all objects crawled by the cache server come from the origin site.

`CDN scheduling strategy`

`DNS scheduling`

Based on the origin of the egress IP of the requester's local DNS and the operator's DNS scheduling.

Problems with DNS scheduling:

The DNS cache time will not be refreshed before the TTL expires. This will cause a large delay in automatic scheduling when the node is abnormal, which will directly affect online business access.
A large number of local DNS does not support the EDNS protocol and cannot obtain the real IP of the customer. Most of the time, CDNs can only make decisions through the local DNS IP, and cross-regional scheduling often occurs.

`HTTP DNS scheduling`

The client requests a fixed HTTP DNS address, and obtains the resolution result according to the response. It can improve the accuracy of resolution (unlike DNS scheduling, which can only make decisions through local DNS IP), and can well avoid problems such as hijacking.

Of course, this model also has some problems. For example, every time the client loads the URL, an HTTP DNS query may be generated, which requires high performance and network access.

`302 dispatch`

Real-time traffic scheduling based on client IP and 302 scheduling cluster.

Let's look at an example:

After accessing the URL link, the request is sent to the scheduling cluster at this time. The client information we can get is the client's exit IP (in most cases, the same). The algorithm and DNS-based scheduling can be the same, but The judgement basis is changed from the local DNS export IP to the client's export IP.
The browser receives the 302 response, follows the URL in the Location, and continues to initiate the http request. This time the target IP of the request is the CDN edge node, and the CDN node will respond to the actual file content.

Advantages of 302 scheduling:

Real-time scheduling, because there is no local DNS cache, it is suitable for CDN peak-shaving processing, which is of great significance to cost control;
High accuracy, directly obtain the client's export IP for scheduling.

302 Disadvantages of scheduling:

It has to jump every time, which is not friendly to delay-sensitive businesses. Generally only suitable for large files.

`AnyCast BGP routing scheduling`

Based on the BGP AnyCast routing strategy, only a few external IPs are provided, and the routing strategy can be adjusted quickly.

Currently AWS CloudFront and CloudFlare both use this method to perform scheduling at the routing level.

This method can well resist DDOS attacks and reduce network congestion.

Of course, the cost and program design of this method are more complicated, so domestic CDNs are still using UniCast.

`Some concepts`

`How CDN works`

Local cache data by key-value form, url and the local cache map storage structure and Map similar use the hash linked list cache +.

`CDN hit rate`

A core standard to measure the quality of our CDN service. When the resource accessed by the user happens to be in the cache system, it can be directly returned to the user, indicating a CDN hit; if the CDN cache does not hit the resource, then the back-to-origin action will be triggered.

`CDN back to source`

When the CDN local cache does not hit, trigger back to source action,

first-level cache access second-level cache whether there are relevant data, if so, return to the first-level cache.
L2 cache Miss, triggers the L2 cache back to the source request, requesting the corresponding data from the source station. After the result is obtained, it is cached to the local cache, and the data is returned to the first level cache.
First-level cache gets the data, caches the data locally, and returns it to the user.

`CDN warm-up data`

The access modes mentioned above are all based on the Pull mode. It is up to the user to decide which part of the hotspot data will eventually be stored in the CDN cache; for big promotion scenarios, we often need to pre-heat to the edge node ), to avoid a large number of users visiting after the big promotion is opened, causing excessive pressure on the origin site. At this time, the Push mode is used.

`Summary of the characteristics of CDN`

1. resource access acceleration: local Cache is accelerated, which improves the access speed of corporate sites (especially sites containing a large number of pictures and static pages), and greatly improves the stability of the above-mentioned sites

2. eliminates the bottleneck problem of network interconnection between operators: the mirroring service eliminates the impact of the bottleneck caused by the interconnection between different operators, realizes the network acceleration across operators, and ensures that users in different networks can get good access quality.

3. remote acceleration: The remote access user intelligently automatically selects the Cache server according to DNS load balancing technology, selects the fastest Cache server, and accelerates the speed of remote access

4. bandwidth optimization: The remote Mirror cache server of the server is automatically generated, and the data is read from the cache server when remote users access, reducing the bandwidth of remote access, sharing network traffic, and reducing the load of the original site WEB server.

5. cluster anti-attack: The widely distributed CDN nodes and the intelligent redundancy mechanism between nodes can effectively prevent hackers from intruding and reduce the impact of various DDoS attacks on the website, while ensuring better service quality.

`Pay attention, don't get lost`

Alright guys, the above is the entire content of this article. I will update a few high-quality interviews and articles related to common technology stacks every week. Thank you everyone for seeing this, if this article is well written, please three consecutive times! ! ! Thank you for your support and recognition. See you in the next article! ！

High-performance weapon: CDN I suggest you learn it!

CDN overview

`The birth of CDN`

`How CDN works`

`Access CDN`

`CDN access process`

`What problem does CDN solve`

`Too much pressure on the backbone network`

`Middlemile of three kilometers`

`The composition of CDN`

`Global Load Balance (GLB)`

`Cache server`

`Source station`

`CDN scheduling strategy`

`DNS scheduling`

`HTTP DNS scheduling`

`302 dispatch`

`AnyCast BGP routing scheduling`

`Some concepts`

`How CDN works`

`CDN hit rate`

`CDN back to source`

`CDN warm-up data`

`Summary of the characteristics of CDN`

`Pay attention, don't get lost`

九灵

`引用和评论`

Java12的新特性

Java8的新特性

浏览器原生「磁吸」效果！Anchor Positioning 锚点定位神器解析

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性