2

source

tree-shaking First proposed by Rich Harris in rollup .

Born to reduce final build size.

Here is the description from MDN :

tree-shaking is a term commonly used to describe the behavior of removing dead-code in the context of JavaScript.

It relies on the import and export statements in ES2015 to detect whether code modules are exported, imported, and used by JavaScript files.

In modern JavaScript applications, we use module packaging (like webpack or Rollup) to automatically remove unreferenced code when packaging multiple JavaScript files into a single file. This is important for preparing the code for release, so that the final file has a clean structure and minimized size.

tree-shaking VS dead code elimination

Speaking tree-shaking have to talk about dead code elimination , referred to as DCE .

Many people tend to regard tree-shaking as a technology to achieve DCE . If it's all the same thing, the end goal is the same (less code). Why should it be renamed as tree-shaking ?

tree-shaking Rich Harris , the inventor of the term, tells us the answer in his article "tree-shaking versus dead code elimination" .

Rich Harris cites an example of making a cake. The original text is as follows:

Bad analogy time: imagine that you made cakes by throwing whole eggs into the mixing bowl and smashing them up, instead of cracking them open and pouring the contents out. Once the cake comes out of the oven, you remove the fragments of eggshell, except that's quite tricky so most of the eggshell gets left in there.

You'd probably eat less cake, for one thing.

That's what dead code elimination consists of — taking the finished product, and imperfectly removing bits you don't want. tree-shaking, on the other hand, asks the opposite question: given that I want to make a cake, which bits of what ingredients do I need to include in the mixing bowl?

Rather than excluding dead code, we're including live code. Ideally the end result would be the same, but because of the limitations of static analysis in JavaScript that's not the case. Live code inclusion gets better results, and is prima facie a more logical approach to the problem of preventing our users from downloading unused code.

To put it simply: DCE For example, when making a cake, put the whole egg directly, and then remove the egg shell from the cake when finished. And tree-shaking is to take out the egg shell first, and then make the cake. Both have the same result, but the process is completely different.

dead code

dead code generally has the following characteristics:

  • code will not be executed, not reachable
  • The result of code execution will not be used
  • Code only affects dead variables (write only, not read)

Use webpack to package the following code in mode: development mode:

 function app() {
    var test = '我是app';
    function set() {
        return 1;
    }
    return test;
    test = '无法执行';
    return test;
}

export default app;

Final packing result:

 eval(
    "function app() {\n    var test = '我是app';\n    function set() {\n        return 1;\n    }\n    return test;\n    test = '无法执行';\n    return test;\n}\n\napp();\n\n\n//# sourceURL=webpack://webpack/./src/main.js?"
);

You can see that there are still code blocks that cannot be executed in the packaged result.

webpack doesn't support dead code elimination ? Yes, webpack is not supported.

It turns out that the webpack dead code elimination is not webpack itself, but the famous uglify .

By reading the source code, it is found that in the mode: development mode, the terser-webpack-plugin plugin will not be loaded.

 // lib/config/defaults.js
D(optimization, 'minimize', production);
A(optimization, 'minimizer', () => [
    {
        apply: (compiler) => {
            // Lazy load the Terser plugin
            const TerserPlugin = require('terser-webpack-plugin');
            new TerserPlugin({
                terserOptions: {
                    compress: {
                        passes: 2
                    }
                }
            }).apply(compiler);
        }
    }
]);

// lib/WebpackOptionsApply.js
if (options.optimization.minimize) {
    for (const minimizer of options.optimization.minimizer) {
        if (typeof minimizer === 'function') {
            minimizer.call(compiler, compiler);
        } else if (minimizer !== '...') {
            minimizer.apply(compiler);
        }
    }
}

And terser-webpack-plugin the plugin uses uglify to achieve it.

We pack in mode: production mode.

 // 格式化后结果
(() => {
    var r = {
            225: (r) => {
                r.exports = '我是app';
            }
        },
    // ...
})();

You can see the final result, the non-executable part of the code has been removed. In addition, it also helped us to compress the code, delete comments and other functions.

tree shaking doesn't work

tree shaking Essentially by analyzing static ES modules, to eliminate unused code.

_ESModule_ Features

It can only appear as a statement at the top level of a module, not inside a function or inside an if. (ECMA-262 15.2)
Imported module names can only be string constants. (ECMA-262 15.2.2)
Regardless of where the import statement appears, all imports must already be done when the module is initialized. (ECMA-262 15.2.1.16.4 - 8.a)
import binding is immutable, similar to const. For example you can't import { a } from './a' and assign a to something else. (ECMA-262 15.2.1.16.4 - 12.c.3)
————Quoted from You Yuxi

Let's take a look at the effect of tree shaking .

we have a module

 // ./src/app.js
export const firstName = 'firstName'

export function getName ( x ) {
    return x.a
}

getName({ a: 123 })

export function app ( x ) {
    return x * x * x;
}

export default app;

Below are 7 instances.

 // 1*********************************************
// import App from './app'

// export function main() {
//     var test = '我是index';
//     return test;
// }

// console.log(main)

// 2*********************************************

// import App from './app'

// export function main() {
//     var test = '我是index';
//     console.log(App(1))
//     return test;
// }

// console.log(main)


// 3*********************************************

// import App from './app'

// export function main() {
//     var test = '我是index';
//     App.square(1)
//     return test;
// }

// console.log(main)


// 4*********************************************

// import App from './app'

// export function main() {
//     var test = '我是index';
//     let methodName = 'square'
//     App[methodName](1)
//     return test;
// }

// console.log(main)

// 6*********************************************

// import * as App from './app'

// export function main() {
//     var test = '我是index';
//     App.square(1)
//     return test;
// }

// console.log(main)

// 7*********************************************

// import * as App from './app'

// export function main() {
//     var test = '我是index';
//     let methodName = 'square'
//     App[methodName](1)
//     return test;
// }

// console.log(main)

Use the simplest webpack configuration for packaging

 // webpack.config.js
module.exports = {
    entry: './src/index.js',
    output: {
        filename: 'dist.js'
    },
    mode: 'production'
};

It can be seen from the results that all the packaging results in the first 6 have eliminated dead codes, and only the seventh type has failed to be eliminated.

 /* ... */
const r = 'firstName';
function o(e) {
    return e.a;
}
function n(e) {
    return e * e * e;
}
o({ a: 123 });
const a = n;
console.log(function () {
    return t.square(1), '我是index';
});

I haven't learned about it in detail, so I can only guess. Due to the characteristics of JavaScript dynamic language makes static analysis more difficult. The current parser is statically parsed, and it is still unable to analyze the full import and dynamic use of the grammar. .

For more tree shaking execution related can refer to the following link:

Of course, savvy programmers won't be stumped by this, since static analysis doesn't work, it's up to the developer to manually mark the file as side-effect-free.

tree shaking and sideEffects

sideEffects support two writing methods, one is false , the other is array

  • If all code contained no side effects, we could simply mark the property as false
  • If your code does have some side effects, you can provide an array instead

It can be set in package.js .

 // boolean
{
  "sideEffects": false
}

// array
{
  "sideEffects": ["./src/app.js", "*.css"]
}

It can also be set in module.rules .

 module.exports = {
  module: {
    rules: [
      {
        test: /\.jsx?$/,
        exclude: /(node_modules)/,
        use: {
          loader: 'babel-loader',
        },
        sideEffects: false || []
      }
    ]
  },
}

sideEffects: false is set, and then repackaged

 var e = {
            225: (e, r, t) => {
                (e = t.hmd(e)).exports = '我是main';
            }
        },

Only the code of the main.js module is left, and the code of app.js has been eliminated.

usedExports

webpack in addition to sideEffects also provides another way of marking elimination. That is through the configuration item usedExports .

The information collected by optimization.usedExports will be used by other optimization means or code generation, such as unused exports will not be generated, when all uses are suitable, the export name will be treated as a single token character. Useless code cleanup in compression tools will benefit from this option and will be able to remove unused exports.

mode: productions is enabled by default.

 module.exports = {
  //...
  optimization: {
    usedExports: true,
  },
};

usedExportsterser有没有sideEffect ,如果没有用到,又没有sideEffect的话,就会在打包时替It's marked unused harmony.

Finally, Terser , UglifyJS and so on DCE tool "shake" this invalid code.

terser test

The principle of tree shaking

tree shaking itself also adopts the method of static analysis.

Static code analysis refers to scanning the program code through lexical analysis, syntax analysis, control flow analysis, data flow analysis and other technologies without running the code to verify whether the code meets the specifications, security, A code analysis technique for reliability, maintainability and other indicators

tree shaking ES6Module语法, tree Shaking ES6 的语法: import export .

Next, let's take a look at how the ancient version rollup is implemented tree shaking .

  1. Initialize Module according to the content of the entry module, and use acorn to convert ast
  2. Analysis ast . Look for import and export keywords to establish dependencies
  3. Analysis ast , collect the functions, variables and other information existing in the current module
  4. Analyze ast again, collect the usage of each function variable, because we collect the code according to the dependency relationship, if the function variable is not used,
  5. According to the collected information such as function variable identifiers, make a judgment. If it is import , then create Module and take a few steps again. Otherwise, store the corresponding code information in a unified result .
  6. Generate bundle according to the final result.

file

Source version: v0.3.1

Create bundle through the entry entry file, and execute the build method to start packaging.

 export function rollup ( entry, options = {} ) {
    const bundle = new Bundle({
        entry,
        resolvePath: options.resolvePath
    });

    return bundle.build().then( () => {
        return {
            generate: options => bundle.generate( options ),
            write: ( dest, options = {} ) => {
                let { code, map } = bundle.generate({
                    dest,
                    format: options.format,
                    globalName: options.globalName
                });

                code += `\n//# ${SOURCEMAPPING_URL}=${basename( dest )}.map`;

                return Promise.all([
                    writeFile( dest, code ),
                    writeFile( dest + '.map', map.toString() )
                ]);
            }
        };
    });
}

build Internally execute fetchModule method, according to the file name, readFile read the file content, create Module .

 build () {
    return this.fetchModule( this.entryPath, null )
        .then( entryModule => {
            this.entryModule = entryModule;

            if ( entryModule.exports.default ) {
                let defaultExportName = makeLegalIdentifier( basename( this.entryPath ).slice( 0, -extname( this.entryPath ).length ) );
                while ( entryModule.ast._scope.contains( defaultExportName ) ) {
                    defaultExportName = `_${defaultExportName}`;
                }

                entryModule.suggestName( 'default', defaultExportName );
            }

            return entryModule.expandAllStatements( true );
        })
        .then( statements => {
            this.statements = statements;
            this.deconflict();
        });
}

fetchModule ( importee, importer ) {
    return Promise.resolve( importer === null ? importee : this.resolvePath( importee, importer ) )
        .then( path => {
                /*
                    缓存处理
                */

                this.modulePromises[ path ] = readFile( path, { encoding: 'utf-8' })
                    .then( code => {
                        const module = new Module({
                            path,
                            code,
                            bundle: this
                        });

                        return module;
                    });

            return this.modulePromises[ path ];
        });
}

According to the content of the read file, use the acorn compiler to perform the conversion of ast .

 // 
export default class Module {
    constructor ({ path, code, bundle }) {
        /*
        初始化
        */
        this.ast = parse(code, {
            ecmaVersion: 6,
            sourceType: 'module',
            onComment: (block, text, start, end) =>
            this.comments.push({ block, text, start, end })
        });
        this.analyse();
    }

file

Traverse node information. Look for the keywords import and export . This step is what we often say to analyze the static structure of esm .

import的信息,收集到this.imports中,把exports的信息,收集到this.exports中.

 this.ast.body.forEach( node => {
    let source;
    if ( node.type === 'ImportDeclaration' ) {
        source = node.source.value;

        node.specifiers.forEach( specifier => {
            const isDefault = specifier.type === 'ImportDefaultSpecifier';
            const isNamespace = specifier.type === 'ImportNamespaceSpecifier';

            const localName = specifier.local.name;
            const name = isDefault ? 'default' : isNamespace ? '*' : specifier.imported.name;

            if ( has( this.imports, localName ) ) {
                const err = new Error( `Duplicated import '${localName}'` );
                err.file = this.path;
                err.loc = getLocation( this.code.original, specifier.start );
                throw err;
            }

            this.imports[ localName ] = {
                source, // 模块id
                name,
                localName
            };
        });
    }

    else if ( /^Export/.test( node.type ) ) {
        if ( node.type === 'ExportDefaultDeclaration' ) {
            const isDeclaration = /Declaration$/.test( node.declaration.type );

            this.exports.default = {
                node,
                name: 'default',
                localName: isDeclaration ? node.declaration.id.name : 'default',
                isDeclaration
            };
        }

        else if ( node.type === 'ExportNamedDeclaration' ) {
            // export { foo } from './foo';
            source = node.source && node.source.value;

            if ( node.specifiers.length ) {
                node.specifiers.forEach( specifier => {
                    const localName = specifier.local.name;
                    const exportedName = specifier.exported.name;

                    this.exports[ exportedName ] = {
                        localName,
                        exportedName
                    };

                    if ( source ) {
                        this.imports[ localName ] = {
                            source,
                            localName,
                            name: exportedName
                        };
                    }
                });
            }

            else {
                let declaration = node.declaration;

                let name;

                if ( declaration.type === 'VariableDeclaration' ) {
                    name = declaration.declarations[0].id.name;
                } else {
                    name = declaration.id.name;
                }

                this.exports[ name ] = {
                    node,
                    localName: name,
                    expression: declaration
                };
            }
        }
    }
}

file

 analyse () {
        // imports and exports, indexed by ID
        this.imports = {};
        this.exports = {};

        // 遍历 ast 查找对应的 import、export 关联
        this.ast.body.forEach( node => {
            let source;

            // import foo from './foo';
            // import { bar } from './bar';
            if ( node.type === 'ImportDeclaration' ) {
                source = node.source.value;

                node.specifiers.forEach( specifier => {
                    const isDefault = specifier.type === 'ImportDefaultSpecifier';
                    const isNamespace = specifier.type === 'ImportNamespaceSpecifier';

                    const localName = specifier.local.name;
                    const name = isDefault ? 'default' : isNamespace ? '*' : specifier.imported.name;

                    if ( has( this.imports, localName ) ) {
                        const err = new Error( `Duplicated import '${localName}'` );
                        err.file = this.path;
                        err.loc = getLocation( this.code.original, specifier.start );
                        throw err;
                    }

                    this.imports[ localName ] = {
                        source, // 模块id
                        name,
                        localName
                    };
                });
            }

            else if ( /^Export/.test( node.type ) ) {
                // export default function foo () {}
                // export default foo;
                // export default 42;
                if ( node.type === 'ExportDefaultDeclaration' ) {
                    const isDeclaration = /Declaration$/.test( node.declaration.type );

                    this.exports.default = {
                        node,
                        name: 'default',
                        localName: isDeclaration ? node.declaration.id.name : 'default',
                        isDeclaration
                    };
                }

                // export { foo, bar, baz }
                // export var foo = 42;
                // export function foo () {}
                else if ( node.type === 'ExportNamedDeclaration' ) {
                    // export { foo } from './foo';
                    source = node.source && node.source.value;

                    if ( node.specifiers.length ) {
                        // export { foo, bar, baz }
                        node.specifiers.forEach( specifier => {
                            const localName = specifier.local.name;
                            const exportedName = specifier.exported.name;

                            this.exports[ exportedName ] = {
                                localName,
                                exportedName
                            };

                            if ( source ) {
                                this.imports[ localName ] = {
                                    source,
                                    localName,
                                    name: exportedName
                                };
                            }
                        });
                    }

                    else {
                        let declaration = node.declaration;

                        let name;

                        if ( declaration.type === 'VariableDeclaration' ) {
                            name = declaration.declarations[0].id.name;
                        } else {
                            name = declaration.id.name;
                        }

                        this.exports[ name ] = {
                            node,
                            localName: name,
                            expression: declaration
                        };
                    }
                }
            }
        }

        // 查找函数,变量,类,块级作用与等,并根据引用关系进行关联
        analyse( this.ast, this.code, this );     
}

Next, find functions, variables, classes, block-level functions, etc., and associate them according to the reference relationship.

Use magicString for each statement node to add content modification function.

Traverse the entire ast tree, first initialize a Scope as the namespace of the current module. If it is a function or block-level scope, etc., create a new one Scope . Each Scope is associated through parent to establish a relationship tree according to the namespace.

If it is a variable and a function, associate it with the current Scope , and add the corresponding identifier name to Scope . At this point, the functions and variables that appear on each node have been collected.

file

Next, traverse ast again. Finds the variable function, if it was just read, or if it was just modified.

According to the Identifier type to find the identifier, if the current identifier can be found in Scope , it means that it has been read. Stored in the _dependsOn collection.

file

Next, according to the AssignmentExpression , UpdateExpression and CallExpression type nodes, collect our identifiers, whether they have been modified or passed by the current parameters. And store the result in _modifies .

 function analyse(ast, magicString, module) {
    var scope = new Scope();
    var currentTopLevelStatement = undefined;

    function addToScope(declarator) {
        var name = declarator.id.name;
        scope.add(name, false);

        if (!scope.parent) {
            currentTopLevelStatement._defines[name] = true;
        }
    }

    function addToBlockScope(declarator) {
        var name = declarator.id.name;
        scope.add(name, true);

        if (!scope.parent) {
            currentTopLevelStatement._defines[name] = true;
        }
    }

    // first we need to generate comprehensive scope info
    var previousStatement = null;
    var commentIndex = 0;

    ast.body.forEach(function (statement) {
        currentTopLevelStatement = statement; // so we can attach scoping info

        Object.defineProperties(statement, {
            _defines: { value: {} },
            _modifies: { value: {} },
            _dependsOn: { value: {} },
            _included: { value: false, writable: true },
            _module: { value: module },
            _source: { value: magicString.snip(statement.start, statement.end) }, // TODO don't use snip, it's a waste of memory
            _margin: { value: [0, 0] },
            _leadingComments: { value: [] },
            _trailingComment: { value: null, writable: true } });

        var trailing = !!previousStatement;

        // attach leading comment
        do {
            var comment = module.comments[commentIndex];

            if (!comment || comment.end > statement.start) break;

            // attach any trailing comment to the previous statement
            if (trailing && !/\n/.test(magicString.slice(previousStatement.end, comment.start))) {
                previousStatement._trailingComment = comment;
            }

            // then attach leading comments to this statement
            else {
                statement._leadingComments.push(comment);
            }

            commentIndex += 1;
            trailing = false;
        } while (module.comments[commentIndex]);

        // determine margin
        var previousEnd = previousStatement ? (previousStatement._trailingComment || previousStatement).end : 0;
        var start = (statement._leadingComments[0] || statement).start;

        var gap = magicString.original.slice(previousEnd, start);
        var margin = gap.split('\n').length;

        if (previousStatement) previousStatement._margin[1] = margin;
        statement._margin[0] = margin;

        walk(statement, {
            enter: function (node) {
                var newScope = undefined;

                magicString.addSourcemapLocation(node.start);

                switch (node.type) {
                    case 'FunctionExpression':
                    case 'FunctionDeclaration':
                    case 'ArrowFunctionExpression':
                        var names = node.params.map(getName);

                        if (node.type === 'FunctionDeclaration') {
                            addToScope(node);
                        } else if (node.type === 'FunctionExpression' && node.id) {
                            names.push(node.id.name);
                        }

                        newScope = new Scope({
                            parent: scope,
                            params: names, // TODO rest params?
                            block: false
                        });

                        break;

                    case 'BlockStatement':
                        newScope = new Scope({
                            parent: scope,
                            block: true
                        });

                        break;

                    case 'CatchClause':
                        newScope = new Scope({
                            parent: scope,
                            params: [node.param.name],
                            block: true
                        });

                        break;

                    case 'VariableDeclaration':
                        node.declarations.forEach(node.kind === 'let' ? addToBlockScope : addToScope); // TODO const?
                        break;

                    case 'ClassDeclaration':
                        addToScope(node);
                        break;
                }

                if (newScope) {
                    Object.defineProperty(node, '_scope', { value: newScope });
                    scope = newScope;
                }
            },
            leave: function (node) {
                if (node === currentTopLevelStatement) {
                    currentTopLevelStatement = null;
                }

                if (node._scope) {
                    scope = scope.parent;
                }
            }
        });

        previousStatement = statement;
    });

    // then, we need to find which top-level dependencies this statement has,
    // and which it potentially modifies
    ast.body.forEach(function (statement) {
        function checkForReads(node, parent) {
            if (node.type === 'Identifier') {
                // disregard the `bar` in `foo.bar` - these appear as Identifier nodes
                if (parent.type === 'MemberExpression' && node !== parent.object) {
                    return;
                }

                // disregard the `bar` in { bar: foo }
                if (parent.type === 'Property' && node !== parent.value) {
                    return;
                }

                var definingScope = scope.findDefiningScope(node.name);

                if ((!definingScope || definingScope.depth === 0) && !statement._defines[node.name]) {
                    statement._dependsOn[node.name] = true;
                }
            }
        }

        function checkForWrites(node) {
            function addNode(node, disallowImportReassignments) {
                while (node.type === 'MemberExpression') {
                    node = node.object;
                }

                // disallow assignments/updates to imported bindings and namespaces
                if (disallowImportReassignments && has(module.imports, node.name) && !scope.contains(node.name)) {
                    var err = new Error('Illegal reassignment to import \'' + node.name + '\'');
                    err.file = module.path;
                    err.loc = getLocation(module.code.toString(), node.start);
                    throw err;
                }

                if (node.type !== 'Identifier') {
                    return;
                }

                statement._modifies[node.name] = true;
            }

            if (node.type === 'AssignmentExpression') {
                addNode(node.left, true);
            } else if (node.type === 'UpdateExpression') {
                addNode(node.argument, true);
            } else if (node.type === 'CallExpression') {
                node.arguments.forEach(function (arg) {
                    return addNode(arg, false);
                });
            }

            // TODO UpdateExpressions, method calls?
        }

        walk(statement, {
            enter: function (node, parent) {
                // skip imports
                if (/^Import/.test(node.type)) return this.skip();

                if (node._scope) scope = node._scope;

                checkForReads(node, parent);
                checkForWrites(node, parent);

                //if ( node.type === 'ReturnStatement')
            },
            leave: function (node) {
                if (node._scope) scope = scope.parent;
            }
        });
    });

    ast._scope = scope;
}

The result of execution is as follows:

file

In the previous step, we associated the declarations of functions, variables, classes, block-level functions, etc. with our current node. Now we need to collect this information on the node and put it in Module

 //  
this.ast.body.forEach( statement => {
    Object.keys( statement._defines ).forEach( name => {
        this.definitions[ name ] = statement;
    });

    Object.keys( statement._modifies ).forEach( name => {
        if ( !has( this.modifications, name ) ) {
            this.modifications[ name ] = [];
        }

        this.modifications[ name ].push( statement );
    });
});

file

From this, we can see which ones are depended on and which ones have been modified in each statement .

When our operation in the entry module is completed, we traverse the statement node, and execute ---5088a03c947f4c41f7fdaab975b64f15 _dependsOn according to the information in define .

If the data of _dependsOn this.imports , it means that the identifier is an import module, call the fetchModule method, and repeat the above logic.

If it is a normal function variable or the like, collect the corresponding statement . At the end of the execution, we can collect all the related statement , but it has not been collected, which means that it is useless code and has been filtered.

Finally, it is reassembled into bundle and sent to our file through fs .

stay last

There are many more points worth exploring in tree shaking, such as:

  • css tree shaking
  • Webpack's tree shaking implementation
  • How to avoid tree shaking ineffective
  • ...

References


袋鼠云数栈UED
280 声望37 粉丝

我们是袋鼠云数栈 UED 团队,致力于打造优秀的一站式数据中台产品。我们始终保持工匠精神,探索前端道路,为社区积累并传播经验价值。


下一篇 »
CSS SandBox