17

highlight: a11y-dark

theme: smartblue

NodeJS currently has two systems: one is CommonJS (CJS for short), and the other is ECMAScript modules (ESM for short); this article mainly contains three topics:

  1. The Internals of CommonJS
  2. ESM module system for NodeJS platform
  3. The difference between CommonJS and ESM; how to convert between the two systems
    First, let’s talk about why there is a module system

Why have a modular system

A good language must have a module system, because it can solve the basic needs encountered in engineering for us

  • Splitting functions into modules can make the code more organized and easier to understand, allowing us to independently develop and test the functions of each submodule
  • The function can be encapsulated, and then other modules can be directly introduced and used to improve reusability
  • Implement encapsulation: only need to provide simple input and output documents to the outside world, and the internal implementation can be shielded from the outside, reducing the cost of understanding
  • Manage dependencies: A good module system allows developers to easily build other modules based on existing third-party modules. In addition, the module system allows users to easily import the modules they want, and import modules on the dependency chain
    At the beginning, JavaScript did not have a good module system, and pages mainly introduced different resources through multiple script tags. However, with the gradual complexity of the system, the traditional script tag mode cannot meet the business needs, so I began to plan to define a set of module systems, such as AMD, UMD, etc.
    NodeJS is a server-side language that runs in the background. Compared with the browser's html, it lacks script tags to import files, and completely relies on the js files of the local file system. So NodeJS implements a module system according to the CommonJS specification
    The ES2015 specification was released in 2015. At this time, JS has a formal standard for the module system. The module system built according to this standard is called the ESM system, which makes the browser and the server more consistent in the management of modules.

CommonJS modules

There are two basic ideas in CommonJS planning:

  • Users can import a module in the local file system through the requeire function
  • Through the two special variables of exports and module.exports, the ability to publish externally

    module loader

    Here's a simple implementation of a simple module loader
    The first is the function that loads the content of the module. We put this function in a private scope to avoid polluting the global environment, and then eval runs the function

    function loadModule(filname, module, require) {
    const wrappedSrc = `
      (function (module, exports, require) {
        ${fs.readFileSync(filename, 'utf-8')}
      })(module, module.exports, require)
    `
    eval(wrappedSrc)
    }

    In the code we read the module content readFileSync Generally speaking, when calling the file system API, the synchronous version should not be used, but this method is indeed used here. Commonjs uses synchronous operation to ensure that multiple modules can be installed and the normal dependency order is introduced.
    Now implementing the require function

    function require(moduleName) {
    const id = require.resolve(moduleName);
    if (require.cache[id]) {
      return require.cache[id].exports
    }
    
    // 模块的元数据
    
    const module = {
      exports: {},
      id,
    }
    
    require.cache[id] = module;
    
    loadModule(id, module, require);
    
    // 返回导出的变量
    return module.exports
    }
    
    require.cache = {};
    require.resolve = (moduleName) => {
    // 根据ModuleName解析完整的模块ID
    }

    The above implements a simple require function. There are several parts of this self-made module system that need to be explained.

  • After entering the ModuleName of the module, first parse out the full path of the module (how to parse it will be discussed later), and then save the result in the id variable
  • If the module has already been loaded, the result in the cache will be returned immediately
  • If the template has not been loaded, then configure an environment. Specifically, first create a module variable and let it contain an exports attribute. The content of this object will be populated by the code used by the module when exporting the API
  • Cache the module object
  • Execute the loadModule function, pass in the newly created module object, and mount the content of another module through the function
  • Returns the exported content of another module

    Module Resolution Algorithm

    The full path of the parsing module is mentioned earlier. By passing in the module name, the module parsing function can return the corresponding full path of the module, and then load the code of the corresponding module through the path, and use this path to identify the identity of the module. resolve function mainly deals with the following three cases

  • want to load a file module? If the moduleName starts with /, it will be regarded as an absolute path. When loading, you only need to install the path and return it as it is. If the moduleName starts with ./ , then it is regarded as a relative path, so the relative path is calculated from the directory where the module is requested to be loaded
  • is the core module to be loaded If moduleName does not start with / or ./ , then the algorithm will first try to find the core module of NodeJS
  • to be loaded is not a package module If no moduleName matching core modules, start from the issue of the request load module, called up layer by layer search node_modules stranger, we have not been able to see inside there with moduleName module matches , and load the module if it exists. If not, continue along the line and node_modules directory, all the way to the root of the filesystem
    In this way, two modules can depend on different versions of the package, but they can still be loaded normally
    For example the following directory structure:

    myApp
      - index.js
      - node_modules
          - depA
              - index.js
          - depB
              - index.js
              - node_modules
                  - depA
          - depC
              - index.js
              - node_modules
                  - depA

    In the above example, although myApp , depB , and depC all depend on depA loaded modules are indeed different. for example:

  • In /myApp/index.js , the source loaded is /myApp/node_modules/depA
  • In /myApp/node_modules/depB/index.js , the load is /myApp/node_modules/depB/node_modules/depA
  • At /myApp/node_modules/depC/index.js , the load is /myApp/node_modules/depC/node_modules/depA
    The reason why NodeJs can manage dependencies well is because it has a core part of the module resolution algorithm behind it, which can manage thousands of packages without conflict or version incompatibility.

    circular dependency

    Many people think that circular dependencies are a theoretical design problem, but this kind of problem is likely to appear in real projects, so you should know how CommonJS handles this situation. It is possible to realize the risks by looking at the require function implemented before. The following is an example to explain
    UML 图.jpg
    There is a module of mian.js, which needs to depend on two modules, a.js and b.js. At the same time, a.js needs to depend on b.js, but b.js in turn depends on a.js, which causes a cycle Dependency, here is the source code:

    // a.js
    exports.loaded = false;
    const b = require('./b');
    module.exports = {
    b,
    loaded: true
    }
    // b.js
    exports.loaded = false;
    const a = require('./a')
    module.exports = {
    a,
    loaded: false
    }
    // main.js
    const a = require('./a');
    const b = require('./b');
    console.log('A ->', JSON.stringify(a))
    console.log('B ->', JSON.stringify(b))

    Running main.js gives the following result

image.png
As can be seen from the results, CommonJS is at the risk of circular dependencies. When the b module imports the a module, the content is not complete. Specifically, it only reflects the state of the a.js module when it requests the 061e42bf9c789a module, but cannot reflect the state of the a.js module that is finally loaded.
The following is an example diagram to illustrate this process
UML 图 (1).jpg
The following is the specific process explanation

  1. The whole process starts from main.js, which starts by importing the a.js module
  2. The first thing a.js does is export a value called loaded and set it to false
  3. a.js module requires import of b.js module
  4. Similar to a.js, b.js first exports the variable loaded as false
  5. b.js continues to execute and needs to import a.js
  6. Since the system has already started processing the a.js module, b.js will immediately copy the content exported by a.js to this module
  7. b.js will change the loaded value it exports to false
  8. Since b has been executed, control will return to a.js, and he will copy the state of the b.js module
  9. a.js continues to execute, modify the export value loaded to true
  10. Finally execute main.js
    As can be seen above, due to synchronous execution, the a.js module imported by b.js is not complete and cannot reflect the final state of b.js.
    As you can see in the above example, the result of circular dependencies, which is more serious for large projects.

The method of use is relatively simple, and the limited space will not explain it in this article.

ESM

ESM is part of the ECMAScript 2015 specification, which establishes a unified module system for Javascript to adapt to various execution environments. An important difference between ESM and CommonJS is that the ES module is static, that is to say, the statement importing the module must be written at the top level. In addition, referenced modules can only use constant strings and cannot rely on expressions that need to be dynamically evaluated at runtime.
For example, we cannot introduce ES modules in the following ways

if (condition) {
  import module1 from 'module1'
} else {
  import module2 from 'module2'
}

And CommonJS can import different modules based on conditions

let module = null
if (condition) {
  module = require("module1")
} else {
  module = require("module2")
}

It seems to be stricter than CommonJS, but it is precisely because of this static introduction mechanism that we can statically analyze dependencies and remove logic that will not be executed. This is called tree-shaking

module loading process

To understand how the ESM system works and how it handles circular dependencies, we need to understand how the system parses and executes Javascript code

Stages of loading modules

The goal of the interpreter is to construct a graph to describe the dependencies between the modules to be loaded. This graph is also called a dependency graph.
It is through this dependency graph that the interpreter judges the dependencies of modules and decides in which order it should execute the code. For example, if we need to execute a js file, the interpreter will start from the entry and look for all import statements. If an import statement is encountered during the search process, it will recurse in a depth-first manner until all the codes are parsed. complete.
This process can be subdivided into three processes:

  1. Profiling: finds all import statements and recursively loads the contents of each module from related files
  2. Instantiation: For an exported entity, keep a named import in memory, but do not assign a value to it for the time being. At this time, dependencies should be established according to the import and export keywords, and the js code will not be executed at this time.
  3. Execution: At this stage, NodeJS starts to execute the code, which enables the actual exported entity to obtain the actual value
    In CommonJS, the file is executed while parsing dependencies. So when you see require, it means that the previous code has been executed. Because the require operation does not have to be at the beginning of the file, but can appear in the task place
    However, the ESM system is different. These three stages are separated. It must first construct the complete dependency graph before starting to execute the code.

    circular dependency

    In the example of CommonJS circular dependency mentioned earlier, use ESM to transform

    // a.js
    import * as bModule from './b.js';
    export let loaded = false;
    export const b = bModule;
    loaded = true;
    // b.js
    import * as aModule from './b.js';
    export let loaded = false;
    export const a = aModule;
    loaded = true;
    // main.js
    import * as a from './a.js';
    import * as b from './b.js';
    console.log("A =>", a)
    console.log("B =>", b)

    It should be noted that the JSON.strinfy method cannot be used here, because circular dependencies are used here
    image.png
    In the above execution results, you can see that both a.js and b.js can observe each other completely. Unlike CommonJS, the state obtained by a module is an incomplete state.

dissect

Let's analyze the process below:
UML 图 (2).jpg

Take the picture above as an example:

  1. Starting from the main.js analysis, first found an import statement, and then entered a.js
  2. Start execution from a.js, find another import statement, execute b.js
  3. When b.js starts to execute, an import statement is found and a.js is introduced. Because a.js has been depended on before, we will not execute this path again.
  4. b.js continues to execute and finds that there are no other import statements. After returning to a.js, I also found that there are no other import statements, and then directly returned to the main.js entry file. Continue to execute and find that b.js is required to be introduced, but this module has been accessed before, so this path will not be executed
    After a depth-first approach, the module dependency graph has formed a tree diagram, and then the interpreter executes code through this dependency graph
    At this stage, the interpreter starts from the entry point and starts to analyze the dependencies between the modules. At this stage, the interpreter only cares about the import statements of the system, and loads the modules that these statements want to import, and explores the dependency graph in a depth-first manner. Traverse dependencies in this way to get a tree-like structure

    instantiate

    At this stage, the interpreter starts at the bottom of the tree and works its way to the top. Before reaching a module, it will look for all the attributes to be exported by the module, and build an implicit table in memory to store the name of the attribute to be exported by this module and the value that the attribute will have.
    As shown below:

流程图.jpg
As you can see from the above figure, in what order the modules are instantiated

  1. The interpreter starts with the b.js module, which it finds to export loaded and a
  2. Then the interpreter analyzes the a.js module, and he finds that this module needs to export loaded and b
  3. Finally, analyzing the main.js module, he found that this module does not export any functions
  4. The set of exports map constructed in the instantiation phase only records the relationship between the exported name and the value that the name will have. As for the value itself, it is not initialized in this phase.
    After the above process, the parser needs to be executed again. This time, it will associate the names exported by each module with those modules that imported them, as shown in the following figure:

流程图 (1).jpg
The steps this time are:

  1. The module b.js should be connected with the content exported by the module b.js, this link is called aModule
  2. The module a.js should be connected with the content exported by the module a.js, this link is called bModule
  3. Finally, the module main.js should be connected with the content exported by the module b.js
  4. At this stage, all the values are not initialized, we just establish the corresponding links, so that these links can point to the corresponding values, as for the value itself, we need to wait until the next stage to determine

    implement

    At this stage, the system finally has to execute the code in each file. He accesses the original dependency graph from bottom to top according to the depth-first order of the post-order, and executes the accessed files one by one. In this example, main.js will be executed last. This execution result ensures that when the program runs the main logic, the values exported by each module are all initialized

UML 图.jpg
The specific steps in the above figure are:

  1. Execute from b.js. The first line of code to be executed will initialize the loaded exported by the module to false
  2. Next, it will be executed, and aModule will be copied to a. At this time, a gets a reference value, which is the a.js module
  3. Then set loaded to true. At this time, all the values of the b module are all determined.
  4. Now execute a.js. First initialize the export value loaded to false
  5. Next, the value of the b attribute exported by the module gets the initial value, which is the reference of bModule
  6. Finally, change the value of loaded to true. At this point, we finally determined the values corresponding to these attributes exported by the a.js module system.
    After completing these steps, the system can officially execute the main.js file. At this time, all the attributes exported by each module have been evaluated. Since the system imports modules by reference rather than copying, even if there is a cycle between modules Dependencies, each module can still fully see the final state of the other

    The difference and interactive use of CommonJS and ESM

    Here are some important differences between CommonJS and ESM, and how to use both modules together when necessary

    ESM does not support some references provided by CommonJS

    CommonJS provides some key references that are not supported by ESM, these include require , exports , module.exports , __filename , __diranme . If these are used in ES modules, there will be a reference error in the program.
    In the ESM system, we can obtain a reference through the special object import.meta, which refers to the URL of the current file. Specifically, the file path of the current module is obtained by writing import.meta.url, which is similar to file: ///path/to/current_module.js . Based on this path, we can construct the two absolute paths represented by __filename and __dirname

    import { fileURLToPath } from 'url';
    import { dirname } from 'path';
    const __dirname = fileURLToPath(import.meta.url);
    const __dirname = dirname(__filename);

    The require function of CommonJS can also be implemented in the ESM module by using the following method:

    import { createRequire } from 'module';
    const require = createRequire(import.meta.url)

    Now, you can use this require() function to load the Commonjs module in the ES module system environment

    Use another module in one of the module systems

    module.createRequire function is used in the ESM module to load the commonJS module. In addition to this method, you can actually import CommonJS modules through the import language. However, this method will only export the default export content;

    import pkg from 'commonJS-module'
    import { method1 } from 'commonJS-module' // 会报错

    But there is no way to do it in commonJS , we can't introduce the ESM module into 061e42bf9c8390
    In addition, ESM does not support importing json files as modules, which can be easily implemented in commonjs
    The following import statement will report an error

    import json from 'data.json'

    If you need to import a json file, you also need to use the createRequire function:

    import { createRequire } from 'module';
    const require = createRequire(import.meta.url);
    const data = require("./data.json");
    console.log(data)

Summarize

This article mainly explains how the two module systems in NodeJS work. Understanding these reasons can help us write bugs that avoid some difficult problems.


云中歌
1.1k 声望121 粉丝

哈哈哈哈