1
头图
Pay attention to WeChat public account: K brother crawler, QQ exchange group: 808574309, continue to share advanced crawler, JS/Android reverse engineering and other technical dry goods!

Introduction

When analyzing the JavaScript code of some sites, the simpler code and functions are usually one by one, for example:

function a() {console.log("a")}
function b() {console.log("a")}
function c() {console.log("a")}

However, a slightly more complicated site usually encounters a code structure similar to the following:

!function(i) {
    function n(t) {
        return i[t].call(a, b, c, d)
    }
}([
    function(t, e) {}, 
    function(t, e, n) {}, 
    function(t, e, r) {}, 
    function(t, e, o) {}
]);

This kind of writing is very common in JavaScript, and it may be very simple for people who are familiar with JavaScript, but most crawler engineers use Python or Java to write code. Seeing this kind of syntax may be confused, because they are stripping JS is often encountered when encrypting code, so understanding this syntax is very important for crawler engineers.

This writing method seems to have no official name, which is equivalent to modular programming, so most people call it webpack. The above example looks more laborious. Simply optimize it:

!function (allModule) {
    function useModule(whichModule) {
        allModule[whichModule].call(null, "hello world!");
    }
    useModule(0)
}([
    function module0(param) {console.log("module0: " + param)},
    function module1(param) {console.log("module1: " + param)},
    function module2(param) {console.log("module2: " + param)},
]);

Run the above code, it will output module0: hello world! . I believe that through easy-to-understand variable names and function names, you should be able to understand the general meaning. Call useModule(0) , select the first one from all functions, and pass hello world! to module0 and output.

Carefully observe the above code, we will find that the !function(){}() and function.call() syntax are mainly used, and then we will introduce them one by one.

Function declaration and function expression

In ECMAScript (a standard of JavaScript), there are two most commonly used methods for creating function objects, that is, using function declarations or function expressions. The ECMAScript specification makes it clear that function declarations must always carry an identifier, which is We are talking about the function name, and the function expression can be omitted.

Function declaration, will assign a name to the function, will be loaded into the scope before the code is executed, so call the function before or after the function declaration :

test("Hello World!")

function test(arg) {
    console.log(arg)
}

Function expression, create an anonymous function, then the anonymous function is assigned to a variable, time code execution at the function expression will be defined, so call the function to run correctly after the function expression , otherwise it will Error:

var test = function (arg) {
    console.log(arg)
}

test("Hello World!")

IIFE call function expression immediately

The full name of IIFE is Immediately-invoked Function Expressions, which is translated into immediately-invoked function expressions, also known as self-executing functions, immediate-executing functions, self-executing anonymous functions, etc. IIFE is a kind of grammar. This mode is essentially a function expression (named (Or anonymous) are executed immediately after creation. When a function becomes a function expression that is executed immediately, the variables in the expression cannot be accessed from the outside. IIFE is mainly used to isolate the scope and avoid pollution.

IIFE basic syntax

The writing method of IIFE is very flexible, mainly in the following formats:

1. An anonymous function is preceded by a unary operator, followed by () :

!function () {
    console.log("I AM IIFE")
}();

-function () {
    console.log("I AM IIFE")
}();

+function () {
    console.log("I AM IIFE")
}();

~function () {
    console.log("I AM IIFE")
}();

() after the anonymous function, and then use () the whole:

(function () {
    console.log("I AM IIFE")
}());

3. First () the anonymous function with 061766b1885527, and then add () :

(function () {
    console.log("I AM IIFE")
})();

4. To use the arrow function expression, first use () the arrow function expression, and then add () :

(() => {
  console.log("I AM IIFE")
})()

5. The anonymous function is preceded void keyword 061766b1885593, followed by () , void specifies that an expression is to be calculated or run, but does not return a value:

void function () {
    console.log("I AM IIFE")
}();

Sometimes, we may also see the situation where the semicolon before and after the function is executed immediately, for example:

;(function () {
    console.log("I AM IIFE")
}())

;!function () {
    console.log("I AM IIFE")
}()

This is because the immediate execution function is usually used as a separate module. There is generally no problem. However, it is recommended to add a semicolon before or after the immediate execution function, so that it can be effectively isolated from the preceding or following code, otherwise there may be accidents. Unexpected error.

IIFE parameter passing

Put the parameters in () at the end to achieve parameter transfer:

var text = "I AM IIFE";

(function (param) {
    console.log(param)
})(text);

// I AM IIFE
var dict = {name: "Bob", age: "20"};

(function () {
    console.log(dict.name);
})(dict);

// Bob
var list = [1, 2, 3, 4, 5];

(function () {
    var sum = 0;
    for (var i = 0; i < list.length; i++) {
        sum += list[i];
    }
    console.log(sum);
})(list);

// 15

Function.prototype.call() / apply() / bind()

Function.prototype.call() , Function.prototype.apply() , Function.prototype.bind() are more commonly used methods. Their functions are exactly the same, that is, changes the this in the function to . The differences are as follows:

  • call() method will execute this function immediately, accepting one or more parameters, separated by commas;
  • apply() method will execute this function immediately, accepting an array containing multiple parameters;
  • bind() method will not execute this function immediately. It returns a modified function for later calling. The accepted parameters call() same as 061766b18858cb.

call()

call() method accepts multiple parameters. The first parameter thisArg specifies the point of the this object in the function body. If the function is in non-strict mode, it will automatically be replaced with a pointer to the global object when it is specified as null or undefined (the window object in the browser) ), in strict mode, this in the function body is still null. Starting from the second parameter, each parameter is passed into the function in turn. The basic syntax is as follows:

function.call(thisArg, arg1, arg2, ...)

Example:

function test(a, b, c) {
    console.log(a + b + c)
}

test.call(null, 1, 2, 3)  // 6
function test() {
    console.log(this.firstName + " " + this.lastName)
}

var data = {firstName: "John", lastName: "Doe"}
test.call(data)  // John Doe

apply()

apply() method accepts two parameters. The first parameter thisArg call() method, and the second parameter is a subscripted collection. Starting from ECMAScript version 5, this collection can be an array or a class array. The apply() method takes The elements in this set are passed as parameters to the called function. The basic syntax is as follows:

function.apply(thisArg, [arg1, arg2, ...])

Example:

function test(a, b, c) {
    console.log(a + b + c)
}

test.apply(null, [1, 2, 3])  // 6
function test() {
    console.log(this.firstName + " " + this.lastName)
}

var data = {firstName: "John", lastName: "Doe"}
test.apply(data)  // John Doe

bind()

bind() method and call() are the same, except that bind() returns a function. The basic syntax is as follows:

function.bind(thisArg, arg1, arg2, ...)

Example:

function test(a, b, c) {
    console.log(a + b + c)
}

test.bind(null, 1, 2, 3)()  // 6
function test() {
    console.log(this.firstName + " " + this.lastName)
}

var data = {firstName: "John", lastName: "Doe"}
test.bind(data)()  // John Doe

Understand webpack

With the above knowledge, let's understand modular programming, which is the webpack writing method mentioned earlier:

!function (allModule) {
    function useModule(whichModule) {
        allModule[whichModule].call(null, "hello world!");
    }
    useModule(0)
}([
    function module0(param) {console.log("module0: " + param)},
    function module1(param) {console.log("module1: " + param)},
    function module2(param) {console.log("module2: " + param)},
]);

First of all, this entire code is an IIFE immediate call function expression, the passed parameter is an array, which contains three methods, namely module0 , module1 and module2 , which can be regarded as three modules, then IIFE accepts the parameters allModule It contains these three modules. IIFE also contains a function useModule() , which can be regarded as a module loader, that is, which module to use. In the example, useModule(0) means to call the first module. The function uses the call() method to change the function in the function. this points to and passes parameters, and calls the corresponding module for output.

Rewrite webpack

For the webpack modular writing that we often encounter in the crawler reverse engineering, it can be easily rewritten. The following takes a piece of encrypted code as an example:

CryptoJS = require("crypto-js")

!function (func) {
    function acvs() {
        var kk = func[1].call(null, 1e3);
        var data = {
            r: "I LOVE PYTHON",
            e: kk,
            i: "62bs819idl00oac2",
            k: "0123456789abcdef"
        }
        return func[0].call(data);
    }

    console.log("加密文本:" + acvs())

    function odsc(account) {
        var cr = false;
        var regExp = /(^\d{7,8}$)|(^0\d{10,12}$)/;
        if (regExp.test(account)) {
            cr = true;
        }
        return cr;
    }

    function mkle(account) {
        var cr = false;
        var regExp = /^([a-zA-Z0-9_\.\-\+])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/;
        if (regExp.test(account)) {
            cr = true;
        }
        return cr;
    }

}([
    function () {
        for (var n = "", t = 0; t < this.r.length; t++) {
            var o = this.e ^ this.r.charCodeAt(t);
            n += String.fromCharCode(o)
        }
        return encodeURIComponent(n)
    },
    function (x) {
        return Math.ceil(x * Math.random())
    },
    function (e) {
        var a = CryptoJS.MD5(this.k);
        var c = CryptoJS.enc.Utf8.parse(a);
        var d = CryptoJS.AES.encrypt(e, c, {
            iv: this.i
        });
        return d + ""
    },
    function (e) {
        var b = CryptoJS.MD5(this.k);
        var d = CryptoJS.enc.Utf8.parse(b);
        var a = CryptoJS.AES.decrypt(e, d, {
            iv: this.i
        }).toString(CryptoJS.enc.Utf8);
        return a
    }
]);

You can see the encryption key entry function is acvs() , acvs() which in turn calls the IIFE parameter list inside the first and the second function, the rest of the other functions are interference terms, while the first function uses r and The e parameter can be directly passed in, and finally rewritten as follows:

function a(r, e) {
    for (var n = "", t = 0; t < r.length; t++) {
        var o = e ^ r.charCodeAt(t);
        n += String.fromCharCode(o)
    }
    return encodeURIComponent(n)
}

function b(x) {
    return Math.ceil(x * Math.random())
}

function acvs() {
    var kk = b(1e3);
    var r = "I LOVE PYTHON";
    return a(r, kk);
}

console.log("加密文本:" + acvs())

Summarize

After reading this article, you may think that webpack is nothing more than that. It seems to be relatively simple, but in fact, when we analyze specific sites, it is often not as simple as the above example. This article aims to give everyone a simple understanding of modular programming webpack. Based on the principle, follow-up Brother K will lead you to analyze the more complex webpack in actual combat! stay tuned!


K哥爬虫
166 声望148 粉丝

Python网络爬虫、JS 逆向等相关技术研究与分享。