CDN overview
CDN
full name of Content Delivery Network
, which is the content distribution network. The basic idea is to avoid bottlenecks and links on the Internet that may affect the speed and stability of data transmission, so that content transmission is faster and more stable.
The working principle of CDN's is to cache the resources of the source site on each node of the CDN. When a request hits the resource cache of a node, it immediately returns to the client to avoid that each requested resource is obtained through the source site, avoiding network congestion, Relieve the pressure on the origin site and ensure the speed and experience of users' access to resources.
To give an life at 16136f1e4e8bd3, we buy goods in a certain Dongshang
, and express delivery can be delivered on the same day. The basic principle is to build local warehouses all over the country. When a user purchases a commodity, the
smart warehouse distribution mode allows consumers to choose the nearest warehouse for delivery, so that
shorten the logistics delivery time.
For the distribution of commodity inventory, please refer to the figure below. From factory (origin site) -> regional warehouse
(second-level cache) -> local warehouse
(first-level cache)
content distribution network is like the aforementioned
smart warehouse distribution network, which solves the problem of access delay caused by distribution, bandwidth, and server performance, and is suitable for site acceleration, on-demand, live broadcast and other scenarios. It enables users to obtain the required content nearby, solves the congestion of the Internet network, and improves the response speed and success rate of users visiting the website.
The birth of CDN
CDN was born twenty years ago, to address the content source server and transmission backbone network problem of excessive pressure, in
1995
, Professor at MIT, one of the inventors of the Internet Tom Leighton
led the graduate Danny Lewin and several other A top researcher tried to solve the problem of network congestion with mathematical problems.
They use mathematical algorithms to deal with the dynamic routing of content, and finally solve the problem that plagued Internet users. Later, Jonathan Seelig, an MBA student from the Sloan School of Management, joined Leighton's team. Since then, they have begun to implement their own business plans, and finally formally established the company on August 20, 1998, named Akamai. Akamai
company through the intelligent Internet distribution, ended the awkward situation of "World Wide Wait".
In 1998, China’s first CDN company ChinaCache
was established
How CDN works
Access CDN
Before accessing CDN
, when we visit a certain domain name, we directly get the IP address of the first real server. The whole process is as follows (the picture is a bit simple)
When we need to accelerate website, by registering with the operators accelerate their own domain name, the domain name the source station, then into their own domain names DNS configuration information, A
records modified to CNAME
record can be. Alibaba Cloud acceleration application reference is as follows:
CDN access process
- 1. When the user accesses the image content, it is
local DNS of 16136f1e4e8ead first, and if the LDNS hits, it will be directly returned to the user.
- 2.
LDNS
MISS, forwardauthorized DNS query
- 3. Return the domain name
CNAME
[picwebws.pstatp.com.wsglb0.com.]() corresponding to the IP address (actually the IP address of the DNS scheduling system) - 4. The domain name resolution request is sent to the
DNS dispatch system, and the DNS dispatch system allocates the best node IP address for the request.
- 5. The returned parsed
IP address
- 6. The user
cache server, and the cache server responds to the user request and transmits the content required by the user to the user terminal.
Figure: HUAWEI CLOUD site-wide acceleration diagram
What problem does CDN solve
Too much pressure on the backbone network
Tom Leighton
in 1995
years, led the team to try to solve network congestion problems with math problems to solve backbone network pressure is too large. As
surfing the Internet more and more teenagers, the core node traffic throughput of the backbone network is not enough to support the growth of Internet users, through
CDN
user traffic can be prevented from flowing through the backbone network.
The backbone network is a global local area network. Tier 1 Internet Service Providers (ISPs) connect their high-speed optical fiber networks together to form the backbone of the Internet, enabling efficient transmission of traffic between different geographic areas.
1. Local area network
Local Area Network (LAN) refers to a group of computers interconnected by multiple computers in a certain area. For example, in college, when the network is disconnected after 12 o'clock in the evening, we can still open the
CS
through the router. Warcraft. That is based on the local area network interconnection, to realize the communication between data sharing and information.
2. Backbone network
Here is a reference to the entire network architecture of China Telecom. The backbone network can be understood as a nationwide local area network. Through the flow of core nodes, the entire network can be interconnected. This is why we call it Internet.
Beijing, Shanghai, and Guangzhou are the super cores of ChinaNet. In addition to super cores, ChinaNet also has ordinary cores such as Tianjin, Xi'an, Nanjing, Hangzhou, Wuhan, and Chengdu.
Middlemile of three kilometers
Usually there is a "three kilometers" distance during network access
- The first kilometer is from the source station to the ISP access point
- The second kilometer is: the ISP access point of the origin site to the ISP access point of the visiting user
- The third kilometer (the last kilometer) is: user ISP access point to user client
The CDN network layer is mainly used for accelerate the second kilometer (
middlemile
),
In the CDN infrastructure, two-level servers are usually used for acceleration:
- L1 (lower layer): The closer you are to the user (or commonly known as netizens), the better. It is usually used to cache static data that can be cached, called lastmile (last mile).
- L2 (upper layer): The closer to the source station, the better, called firstmile (first mile), when L1 cannot hit the cache, or the content is not cacheable, the request will be transparently transmitted to L2 through L1, if L2 still does not hit the cache Or the content cannot be cached, it will continue to be transparently transmitted to the upstream of L2 (it may be the source station, or L3). At the same time, L2 can also do the convergence of traffic and the number of requests, reducing the amount of return to the source (if it can be cached) , Reduce the source station pressure.
- The part between L1 and L2 is the "internal network" of the CDN, which is called middlemile (middle one kilometer).
The composition of CDN
Global Load Balance (GLB)
- When a user visits a website that joins the CDN service, the domain name resolution request will ultimately be handled by the "intelligent scheduling DNS".
- It uses a set of pre-defined strategies to provide the user with the node address closest to the user
, so that the user can get fast service.
- At the same time, it needs to maintain communication with CDN nodes distributed in various places, track the health status, capacity and other information of each node, and ensure that user requests are distributed to nearby available nodes.
Cache server
The main function of the cache server is to cache hot data. The data types include: static resources (html, js, css, etc.),
multimedia resources (img, mp3, mp4, etc.), and dynamic data (
edge rendering).
The well-known open source software related to CDN is:
- Squid
- Varnish
- Nginx
- OpenResty
- ATS
- HAProxy
For specific comparison, please refer to: https://blog.csdn.net/joeyon1985/article/details/46573281
CDN's layered architecture
Source station
The origin site refers to the original site where the content is posted. Adding, deleting, and changing files on the website are all performed on the origin site; in addition, all objects crawled by the cache server come from the origin site.
CDN scheduling strategy
DNS scheduling
Based on the origin of the egress IP of the requester's local DNS and the operator's DNS scheduling.
Problems with DNS scheduling:
- The DNS cache time will not be refreshed before the TTL expires. This will cause a large delay in automatic scheduling when the node is abnormal, which will directly affect online business access.
- A large number of local DNS does not support the EDNS protocol and cannot obtain the real IP of the customer. Most of the time, CDNs can only make decisions through the local DNS IP, and cross-regional scheduling often occurs.
HTTP DNS scheduling
The client requests a fixed HTTP DNS address, and obtains the resolution result according to the response. It can improve the accuracy of resolution (unlike DNS scheduling, which can only make decisions through local DNS IP), and can well avoid problems such as hijacking.
Of course, this model also has some problems. For example, every time the client loads the URL, an HTTP DNS query may be generated, which requires high performance and network access.
302 dispatch
Real-time traffic scheduling based on client IP and 302 scheduling cluster.
Let's look at an example:
- After accessing the URL link, the request is sent to the scheduling cluster at this time. The client information we can get is the client's exit IP (in most cases, the same). The algorithm and DNS-based scheduling can be the same, but The judgement basis is changed from the local DNS export IP to the client's export IP.
- The browser receives the 302 response, follows the URL in the Location, and continues to initiate the http request. This time the target IP of the request is the CDN edge node, and the CDN node will respond to the actual file content.
Advantages of 302 scheduling:
- Real-time scheduling, because there is no local DNS cache, it is suitable for CDN peak-shaving processing, which is of great significance to cost control;
- High accuracy, directly obtain the client's export IP for scheduling.
302 Disadvantages of scheduling:
- It has to jump every time, which is not friendly to delay-sensitive businesses. Generally only suitable for large files.
AnyCast BGP routing scheduling
Based on the BGP AnyCast routing strategy, only a few external IPs are provided, and the routing strategy can be adjusted quickly.
Currently AWS CloudFront and CloudFlare both use this method to perform scheduling at the routing level.
This method can well resist DDOS attacks and reduce network congestion.
Of course, the cost and program design of this method are more complicated, so domestic CDNs are still using UniCast.
Some concepts
How CDN works
Local cache data by key-value
form, url and the local cache map storage structure and Map
similar use the hash linked list cache +.
CDN hit rate
A core standard to measure the quality of our CDN service. When the resource accessed by the user happens to be in the cache system, it can be directly returned to the user, indicating a CDN hit; if the CDN cache does not hit the resource, then the back-to-origin action will be triggered.
CDN back to source
When the CDN local cache does not hit, trigger back to source action,
first-level cache access
second-level cache whether there are relevant data, if so, return to the first-level cache.
L2 cache Miss, triggers the L2 cache back to the source request, requesting the corresponding data from the source station. After the result is obtained, it is cached to the local cache, and the data is returned to the first level cache.
First-level cache gets the data, caches the data locally, and returns it to the user.
CDN warm-up data
The access modes mentioned above are all based on the Pull mode. It is up to the user to decide which part of the hotspot data will eventually be stored in the CDN cache; for big promotion scenarios, we often need to pre-heat
to the edge node
), to avoid a large number of users visiting after the big promotion is opened, causing excessive pressure on the origin site. At this time, the
Push mode is used.
Summary of the characteristics of CDN
1. resource access acceleration: local Cache is accelerated, which improves the access speed of corporate sites (especially sites containing a large number of pictures and static pages), and greatly improves the stability of the above-mentioned sites
2. eliminates the bottleneck problem of network interconnection between operators: the mirroring service eliminates the impact of the bottleneck caused by the interconnection between different operators, realizes the network acceleration across operators, and ensures that users in different networks can get good access quality.
3. remote acceleration: The remote access user intelligently automatically selects the Cache server according to DNS load balancing technology, selects the fastest Cache server, and accelerates the speed of remote access
4. bandwidth optimization: The remote Mirror cache server of the server is automatically generated, and the data is read from the cache server when remote users access, reducing the bandwidth of remote access, sharing network traffic, and reducing the load of the original site WEB server.
5. cluster anti-attack: The widely distributed CDN nodes and the intelligent redundancy mechanism between nodes can effectively prevent hackers from intruding and reduce the impact of various DDoS attacks on the website, while ensuring better service quality.
Pay attention, don't get lost
Alright guys, the above is the entire content of this article. I will update a few high-quality interviews and articles related to common technology stacks every week. Thank you everyone for seeing this, if this article is well written, please three consecutive times! ! ! Thank you for your support and recognition. See you in the next article!
!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。