11

When sharing on the Internet, there is definitely an operation that cannot be avoided, that is, "file upload". We often post on Weibo, WeChat Moments, etc. The image upload function in file upload is used in the sending process. Upload local pictures, videos, audios and other files to the program server for other users to browse or download. This results in the influx of a large amount of data on the website every day. While the large amount of data brings users, it also brings some security problems.

In the storage space of the website, website developers often find all kinds of junk files such as xml, html, apk, etc. These files may be injected into advertisements or disseminated resources such as pornographic videos, which seriously affect the operation of the website business. These junk files are uploaded to the storage space of the website through the file upload function. If the website upload program does not check or strictly filter the data submitted by the user, the server can easily upload the modified data.

File upload is one of the most easily exploited links in data security. To reduce the interruption of malicious file upload, we first need to figure out its principle.

The role of file types and file extensions

Computer data content is generally stored on storage hardware such as hard disks. Due to the huge space of the hard disk itself, like a large warehouse, in order to facilitate data storage and management, we created the concept of files, that is, the operating system uses the file format to encapsulate a piece of data stored in the space.

However, with the development of the Internet, from the original plain text files to today's various types of multimedia files, such as images, audios, videos, etc., we store more and more files, and the types are more and more abundant. The volume is getting bigger and bigger. If these files are not distinguished, it will be extremely troublesome to find them. So the file format (or file type) came into being. Each type of file can be saved in computer storage in one or more file formats. Each file format usually has one or more extensions that can be used to identify, and the extension can help users and applications identify the file format.

For example, there is a file named README.txt, .txt is the file extension, and txt is suitable for plain text files. This type of file may be an explanatory document with the content of the file in plain text.

Windows 下默认使用记事本打开 txt 文件

In addition, the extension can also help the operating system determine how to read the file. For example, the file score.doc, the doc file can be opened with Word, then after a Windows user double-clicks a .doc file, the Windows system will search the self-maintaining database table according to the file extension "doc" and search for "Can open this extension file" Program”, such as the Word program, after which the system will automatically start the Word program and notify Word to load the file.

It can be seen from this that when the Windows system opens a file, it only needs the extension in the file name to find the corresponding program. Therefore, changing the extension of a file will also change the default opening method of the file in the system. And if the content of the file itself does not meet the program's expectations for the format of the file content, an error will occur when it is opened, or unexpected results will occur.

How does the browser recognize open files

Because of the gradual improvement of the functions of Internet tools, compared to opening files locally, the probability of opening files with a browser has gradually increased. So, how does the browser confirm the file type of the accessed resource? In fact, it is judged by the response header.

When the user enters the URL, the server where the resource is located will respond with a Content-Type response header whose value is the type (MIME type) corresponding to the file. If the browser supports this format, the browser will try to render and display the corresponding file.

Compared with Windows systems, browsers usually use MIME types instead of file extensions to handle URLs. Therefore, it is very important to add the correct MIME type in the response header. If the configuration is not correct, the browser may distort the content of the file, and the downloaded file will be processed incorrectly, which will affect the normal operation of the website.

How the malicious file is uploaded

At the beginning, we mentioned that some malicious resources will be uploaded by uploading files. The file format of these malicious resources is obviously a normal format, but after opening, there will be special access effects such as web page jumps. How is this done?

In fact, the principle is very simple, it is the operation made by modifying the MIME type. For example, the test.jpg picture above. Although the resource suffix of the URL link is jpg, the file type of the real resource is text/html, which is the web page type.

And this type of webpage file supports the embedding of JS code, and the user who opens the file can be redirected to the specified website through these codes. Although this phenomenon looks similar to DNS hijacking, it is actually different. Regarding the issue of DNS hijacking, you can check [Vernacular Popular Science] Talking about the little knowledge of DNS" learn more.

test.jpg 的真身

It seems that this kind of malicious files is achieved by modifying the file MIME, so can we reduce the upload and access of such malicious files by restricting MIME and other methods? Yes, there are many other methods, and we will explain them one by one using another example of cloud storage.

Means to prevent malicious file upload

Identity Traceability-TOKEN Upload

When using TOKEN upload, TOKEN authentication will use the identification information of the file uploaded by the terminal to calculate the TOKEN, control the upload validity period, and fix the upload directory or upload suffix. Unlike the general way that all users use an operator's information on the server to authenticate and upload, the TOKEN authentication provided by cloud storage can achieve more fine-grained authority control.

After the TOKEN function is turned on, we can assign an independent logo to each user, so that the files uploaded by the user will be stored in a separate directory according to the logo.

X-Upyun-Uri-Prefix = /服务名/client_37ascii     // 用户标识前缀,对应存储上的一个目录,如 /client_37ascii/ 
X-Upyun-Uri-Postfix = .jpg  // 限定上传文件后缀 

Files uploaded in this way can be quickly traced back through the identification to find out who uploaded a large number of malicious files and process them.

Document Proof-Content-Type

The second way is to restrict the file name. We can restrict the MIME type of uploaded files, such as restricting uploaded images to Content-Type, so that even if a malicious file is uploaded, when the browser accesses it, the browser will force it to parse according to the image format. Resources will fail to parse, thereby restricting access to malicious files.

Both REST API and FORM API of cloud storage support mandatory setting of Content-Type type. Among them, the FORM API supports multiple restriction methods:

For specific usage, we take the Java SDK Form API upload as an example:

//初始化uploader
FormUploader uploader = new FormUploader(BUCKET_NAME, OPERATOR_NAME, OPERATOR_PWD);
//初始化 policy 参数组 Map
 final Map<String, Object> paramsMap = new HashMap<String, Object>();
//添加 SAVE_KEY 参数
 paramsMap.put(Params.SAVE_KEY, savePath);
 //添加文件上传限制
paramsMap.put(Params.CONTENT_TYPE, "image/jpg"); //强制文件MIME类型
paramsMap.put(Params.ALLOW_FILE_TYPE, "jpg,jpeg,png"); //强制文件扩展名
paramsMap.put(Params.CONTENT_LENGTH_RANGE, "102400,1024000"); //强制文件大小,单位字节
//执行上传
uploader.upload(paramsMap, file);

As long as any of the above parameters are set in the upload settings, if there is an upload request, cloud storage will detect the content of the uploaded file, and then use the judgment value to match the upload specified value. If the match is successful, the upload is allowed, and if the match fails, it will return a 403 status. code.

设置了 Content-Type 上传的伪装图片再也打不开了

Access Denied-Edge Rule

The two methods mentioned above are to exclude malicious files when uploading. What should I do with files that have already been uploaded?

For malicious resources that have been uploaded disguised as pictures, we can identify them by edge rules. By adding an implicit cloud processing picture parameter to each picture link, the malicious file cannot be displayed normally, and a 405 error feedback appears.

边缘规则

访问效果

In addition to picture restrictions, if a large number of malicious APKs are found in the picture space, we can also quickly prohibit access to them through edge rules.

控制台边缘规则配置

访问效果

popular links-statistical analysis

If there are many files, and there are many file types, it takes a long time to troubleshoot, and you need to check for problems immediately. You can also take a look at the log analysis function of Paiyun. This function counts the access status of access domain names under each service every day, and can count the analysis data of TOP 1000 according to dimensions such as popular files, popular clients, popular reference files, resource status codes, file sizes, and popular IPs.

Statistical analysis can help you take a comprehensive inventory of services. If you find that a certain resource or a certain IP has unusually high frequency access, you can locate and troubleshoot in time.

Information Security-Content Identification

If your website has a lot of traffic, and the above method is not appropriate. You can also take a look at the two content recognition tools of Tianqing and Tiance from Youpaiyun and Fanwei.

These two tools take AI intelligent security detection as the core, and use machine learning classifier algorithms to "intelligently" review information such as pictures and videos, gradually turning the "perceptionist" from a profession into an "algorithm" and "model". Free up manpower and greatly improve processing efficiency, helping companies reduce input costs. Provide customers with content security early warning, content security data, and content security review services, and provide complete network information content security solutions,

At present, it has provided several Internet companies and government departments with a low-latency, high-precision, and visualized one-stop content security service from engine identification to manual review.

Recommended reading

[Vernacular Popular Science] Talk about the little knowledge of DNS

online fraud? Network streaking? Is it all because of HTTP?


云叔_又拍云
5.9k 声望4.6k 粉丝

又拍云是专注CDN、云存储、小程序开发方案、 短视频开发方案、DDoS高防等产品的国内知名企业级云服务商。