1

background

Recently I came across a magical website that can be opened in the browser, but 403 is directly accessed through curl or code. I guess this must be done UA校验 , so when requesting, give the UA of the browser to Bring it, and then visit and find that it is still 403, but this is not difficult for me, there must be other request headers to check, directly open the network in the browser, copy all the request headers and bring them, make sure that I and the browser The request at the http protocol level is exactly the same, so it is impossible to fail, but it is still 403 after running.

Put an address: https://pixabay.com

think

There is no black magic for the server to verify the client, because it is all communicated through the TCP protocol, it is impossible for the browser to send an HTTP message and I send the same HTTP message to the server. , it can only be verified at the TLS layer, so I wireshark to capture the packet to see if I can find the difference in the TLS handshake. It is well known that a client sends to the service during the TLS handshake The Client Hello message on the client side is likely to be used to distinguish browser and non-browser requests, because in this packet, the client needs to tell the server the supported cipher suite, TLS version, etc. Information, and this information will be different according to the implementation of the client, first grab a normal browser request message, as shown below:

Then access the captured packet through curl, as shown below;

It can be seen that there is indeed a big difference between the two sides of the message. After comparing and checking one by one, it is found that it is very likely because the curl request message lacks supported_versions 403 caused by the extended information, the browser is here The extended information content is shown in the figure:

Indicates support for TLSv1.2 and TLSv1.3 , and the protocol after the final handshake is also switched to TLSv1.3 , as you can see in the above two comparison pictures, the browser is TLSv1.3 , and curl is going TLSv1.2 , it may be necessary to use TLSv1.3 to access successfully.

verify

Immediately googled how to specify the TLS version of curl, and found that you only need to add the --tlsv1.3 parameter, as follows:

 $ curl -I --tlsv1.3 'https://pixabay.com/'  \
> -H 'accept-language: zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6' \
> -H 'user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.5060.114 Safari/537.36 Edg/103.0.1264.49'
HTTP/2 200
date: Fri, 22 Jul 2022 02:40:35 GMT
content-type: text/html; charset=utf-8
cf-ray: 72e8cffc18c73d5a-HKG
cache-control: s-maxage=86400
content-language: en
vary: Accept-Encoding, Cookie, Accept-Language
cf-cache-status: MISS
content-security-policy: frame-ancestors none
expect-ct: max-age=604800, report-uri="https://report-uri.cloudflare.com/cdn-cgi/beacon/expect-ct"
referrer-policy: strict-origin-when-cross-origin
x-frame-options: DENY
set-cookie: __cf_bm=Cy4a751rDND6kHhu.RzEr5DpqnaxRdpUxaMfNfkya0A-1658457635-0-AS1DaewDqNjWHZ/m74A88bNyEG0EFsZAwmsm/ON5QQEuh8B6XOS7PkSnhGgXPLV+LtEvzOKTy/WWHmwY63uGlD0=; path=/; expires=Fri, 22-Jul-22 03:10:35 GMT; domain=.pixabay.com; HttpOnly; Secure; SameSite=None
server: cloudflare
alt-svc: h3=":443"; ma=86400, h3-29=":443"; ma=86400

After repeated verification, it was found that in addition to specifying tlsv1.3 , it was necessary to add accept-language and user-agent header, and it must be http2 protocol, three conditions are missing Not one.

nodejs access

As mentioned above, the http2 protocol must be adopted, and now the popular http clients on the market basically only support http2 , so we can only start with the basic library. After the training, the request is also successful. The code is as follows:

 const http2 = require("http2");

function get(host, path) {
  return new Promise((resolve, reject) => {
    const session = http2.connect(`https://${host}`, {
      minVersion: "TLSv1.3",
      maxVersion: "TLSv1.3",
    });

    session.on("error", (err) => {
      reject(err);
    });

    const req = session.request({
      [http2.constants.HTTP2_HEADER_AUTHORITY]: host,
      [http2.constants.HTTP2_HEADER_METHOD]: http2.constants.HTTP2_METHOD_GET,
      [http2.constants.HTTP2_HEADER_PATH]: path,
      "user-agent":
        "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.127 Safari/537.36 Edg/100.0.1185.50",
    });

    req.setEncoding("utf8");
    let data = "";
    req.on("data", (chunk) => {
      data += chunk;
    });
    req.on("end", () => {
      session.close();
      if (data) {
        try {
          resolve(data);
        } catch (e) {
          reject(e);
        }
      }
    });
    req.on("error", (err) => {
      reject(err);
    });
    req.end();
  });
}

(async function () {
  const data = await get("pixabay.com", "/");
  console.log(data);
})();

in-depth

Although it has been successfully requested, in the spirit of exploration, I continue to find out that cloudflare has an official blog dedicated to this TLS interception technology. The link is as follows:
https://blog.cloudflare.com/monsters-in-the-middleboxes/

One of the paragraphs also proves my conjecture. The translation is as follows:

That is to say, cloudflare will maintain a set of browser TLS fingerprints. When receiving a Client Hello request, it will check this set of fingerprints. If it does not match, it will intercept the request, which can intercept most of the fingerprints that are not from the browser. request.

I'm MonkeyWie , welcome to scan the code👇👇 to follow!不定期在公众号中分享JAVAGolang前端dockerk8s知识。

wechat


mokeyWie
2.5k 声望642 粉丝

全干工程师~