Introduction to CDN caches various Internet content on edge servers close to users by deploying edge servers distributed around the world, thereby reducing user access delays and greatly reducing traffic across the Internet core network. The use of CDN for Internet services has become an inevitable choice.
Challenges facing enterprise edge applications
CDN caches various Internet content on edge servers close to users by deploying edge servers distributed around the world, thereby reducing user access latency and greatly reducing the traffic passing through the Internet core network. The use of CDN for Internet services has become an inevitable choice. Traditional website protection basically protects the origin site. Customers purchasing firewalls, WAF and other products can protect their core business content from malicious theft. However, traditional protection methods cannot fully satisfy the scenario where business traffic is distributed through CDN:
- The deployment location is in front of the source station, mainly to protect the source station. In the CDN architecture, the pages are basically cached on the CDN, and crawlers can directly crawl away the user's sensitive business data directly from the CDN.
- The identification method mainly relies on embedding JS in the user's page. This method essentially modifies the user's page, which is very intrusive, and can only be adapted to web services, and does not take effect for api services.
- Disposal methods generally use frequency control to limit high-frequency IP and other features. This method is easy to bypass. Now crawlers basically use the IP proxy pool method to randomly modify the header of the request, so it is difficult to find the characteristics. Frequency control.
CDN currently undertakes a large number of services on the main site, and must also ensure service browsing and transaction experience to prevent content from being maliciously stolen. More and more business data is cached on CDN edge servers, and the weight of edge security is getting higher and higher. And machine traffic management based on edge cloud came into being to deal with hidden dangers of CDN edge security and realize user application data security protection.
The realization and advantages of edge cloud machine traffic management
The analysis and processing flow of machine traffic management based on CDN edge nodes is shown in the following figure:
Internet access is generally divided into normal user, commercial search engine access, malicious crawler access, etc. Machine traffic management extracts request message characteristics at the edge, identifies the request type based on the message characteristics, blocks malicious crawler access at the edge, and protects the CDN cache Resources are not maliciously crawled.
The advantages of machine flow management are as follows:
- Based on the CDN edge network architecture, the machine traffic management capability is realized. The request type of the domain name is identified through the characteristics of the request message, and it is distinguished whether it is a normal request or a malicious machine request. It helps users manage their own requests and block malicious requests.
- By identifying the request type of the domain name, the requested message type is marked in real time, and the message type in the current business request is displayed very intuitively. Customers can intuitively perceive the distribution of access types on their own website, and deal with abnormal reports. The type of text is dealt with.
- By disposing of the message type instead of the IP, as long as the message type of the malicious request remains the same, the attacker's random header field or the use of the second dial proxy IP pool cannot be bypassed.
Verification of actual results of machine flow management
In the Double 11 business scenario, machine traffic management identifies all traffic that visits the details page of the master site, and classifies bot traffic in a detailed manner. The core strategy is to allow regular commercial crawlers such as search engines to restrict or block malicious crawlers.
By analyzing the traffic of the details page and the behavior characteristics of the request, it was analyzed that nearly 40% of the requests were malicious visits. Before Double 11, through the activation of the disposal strategy, the main website successfully blocked more than 70% of crawler traffic. The following figure shows the comparison of traffic before and after the treatment is turned on. The blue line is the traffic trend when the treatment strategy is not turned on, and the green line is the traffic trend after the treatment strategy is turned on. The interception effect is very obvious and does not affect the actual business operation.
On the day of Double 11, basically the requested access characteristics did not change. Finally, hundreds of millions of malicious requests, millions of malicious IPs, and tens of millions of maliciously crawled product IDs .
CDN machine traffic management undertakes more protection of the main site business, and found that some requests to crawl the content of the main site can pass through the protection strategy, that is, the crawling request behavior has changed. Through the analysis of the online sudden increase of qps, it is found that the mutant crawler mainly uses the browser engine of IE, and the source IP uses a large number of seconds dial proxy IP, which has obvious characteristics of commercial crawlers. After the report, an emergency plan was quickly formed, and the abnormal type was quickly dealt with.
Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。