javascript - Definitely the clearest one - NodeJS module system - 个人文章

highlight: a11y-dark

theme: smartblue

NodeJS currently has two systems: one is CommonJS (CJS for short), and the other is ECMAScript modules (ESM for short); this article mainly contains three topics:

The Internals of CommonJS
ESM module system for NodeJS platform
The difference between CommonJS and ESM; how to convert between the two systems
First, let’s talk about why there is a module system

Why have a modular system

A good language must have a module system, because it can solve the basic needs encountered in engineering for us

Splitting functions into modules can make the code more organized and easier to understand, allowing us to independently develop and test the functions of each submodule
The function can be encapsulated, and then other modules can be directly introduced and used to improve reusability
Implement encapsulation: only need to provide simple input and output documents to the outside world, and the internal implementation can be shielded from the outside, reducing the cost of understanding
Manage dependencies: A good module system allows developers to easily build other modules based on existing third-party modules. In addition, the module system allows users to easily import the modules they want, and import modules on the dependency chain
At the beginning, JavaScript did not have a good module system, and pages mainly introduced different resources through multiple script tags. However, with the gradual complexity of the system, the traditional script tag mode cannot meet the business needs, so I began to plan to define a set of module systems, such as AMD, UMD, etc.
NodeJS is a server-side language that runs in the background. Compared with the browser's html, it lacks script tags to import files, and completely relies on the js files of the local file system. So NodeJS implements a module system according to the CommonJS specification
The ES2015 specification was released in 2015. At this time, JS has a formal standard for the module system. The module system built according to this standard is called the ESM system, which makes the browser and the server more consistent in the management of modules.

CommonJS modules

There are two basic ideas in CommonJS planning:

Users can import a module in the local file system through the requeire function
Through the two special variables of exports and module.exports, the ability to publish externally
module loader
Here's a simple implementation of a simple module loader
The first is the function that loads the content of the module. We put this function in a private scope to avoid polluting the global environment, and then eval runs the function
```
function loadModule(filname, module, require) {
const wrappedSrc = `
  (function (module, exports, require) {
    ${fs.readFileSync(filename, 'utf-8')}
  })(module, module.exports, require)
`
eval(wrappedSrc)
}
```
In the code we read the module content readFileSync Generally speaking, when calling the file system API, the synchronous version should not be used, but this method is indeed used here. Commonjs uses synchronous operation to ensure that multiple modules can be installed and the normal dependency order is introduced.
Now implementing the require function
```
function require(moduleName) {
const id = require.resolve(moduleName);
if (require.cache[id]) {
  return require.cache[id].exports
}

// 模块的元数据

const module = {
  exports: {},
  id,
}

require.cache[id] = module;

loadModule(id, module, require);

// 返回导出的变量
return module.exports
}

require.cache = {};
require.resolve = (moduleName) => {
// 根据ModuleName解析完整的模块ID
}
```
The above implements a simple require function. There are several parts of this self-made module system that need to be explained.
After entering the ModuleName of the module, first parse out the full path of the module (how to parse it will be discussed later), and then save the result in the id variable
If the module has already been loaded, the result in the cache will be returned immediately
If the template has not been loaded, then configure an environment. Specifically, first create a module variable and let it contain an exports attribute. The content of this object will be populated by the code used by the module when exporting the API
Cache the module object
Execute the loadModule function, pass in the newly created module object, and mount the content of another module through the function
Returns the exported content of another module
Module Resolution Algorithm
The full path of the parsing module is mentioned earlier. By passing in the module name, the module parsing function can return the corresponding full path of the module, and then load the code of the corresponding module through the path, and use this path to identify the identity of the module. resolve function mainly deals with the following three cases
want to load a file module? If the moduleName starts with /, it will be regarded as an absolute path. When loading, you only need to install the path and return it as it is. If the moduleName starts with ./ , then it is regarded as a relative path, so the relative path is calculated from the directory where the module is requested to be loaded
is the core module to be loaded If moduleName does not start with / or ./ , then the algorithm will first try to find the core module of NodeJS
to be loaded is not a package module If no moduleName matching core modules, start from the issue of the request load module, called up layer by layer search node_modules stranger, we have not been able to see inside there with moduleName module matches , and load the module if it exists. If not, continue along the line and node_modules directory, all the way to the root of the filesystem
In this way, two modules can depend on different versions of the package, but they can still be loaded normally
For example the following directory structure:
```
myApp
  - index.js
  - node_modules
      - depA
          - index.js
      - depB
          - index.js
          - node_modules
              - depA
      - depC
          - index.js
          - node_modules
              - depA
```
In the above example, although myApp , depB , and depC all depend on depA loaded modules are indeed different. for example:
In /myApp/index.js , the source loaded is /myApp/node_modules/depA
In /myApp/node_modules/depB/index.js , the load is /myApp/node_modules/depB/node_modules/depA
At /myApp/node_modules/depC/index.js , the load is /myApp/node_modules/depC/node_modules/depA
The reason why NodeJs can manage dependencies well is because it has a core part of the module resolution algorithm behind it, which can manage thousands of packages without conflict or version incompatibility.
circular dependency
Many people think that circular dependencies are a theoretical design problem, but this kind of problem is likely to appear in real projects, so you should know how CommonJS handles this situation. It is possible to realize the risks by looking at the require function implemented before. The following is an example to explain

There is a module of mian.js, which needs to depend on two modules, a.js and b.js. At the same time, a.js needs to depend on b.js, but b.js in turn depends on a.js, which causes a cycle Dependency, here is the source code:
```
// a.js
exports.loaded = false;
const b = require('./b');
module.exports = {
b,
loaded: true
}
// b.js
exports.loaded = false;
const a = require('./a')
module.exports = {
a,
loaded: false
}
// main.js
const a = require('./a');
const b = require('./b');
console.log('A ->', JSON.stringify(a))
console.log('B ->', JSON.stringify(b))
```
Running main.js gives the following result

As can be seen from the results, CommonJS is at the risk of circular dependencies. When the b module imports the a module, the content is not complete. Specifically, it only reflects the state of the a.js module when it requests the 061e42bf9c789a module, but cannot reflect the state of the a.js module that is finally loaded.
The following is an example diagram to illustrate this process
UML 图 (1).jpg
The following is the specific process explanation

The whole process starts from main.js, which starts by importing the a.js module
The first thing a.js does is export a value called loaded and set it to false
a.js module requires import of b.js module
Similar to a.js, b.js first exports the variable loaded as false
b.js continues to execute and needs to import a.js
Since the system has already started processing the a.js module, b.js will immediately copy the content exported by a.js to this module
b.js will change the loaded value it exports to false
Since b has been executed, control will return to a.js, and he will copy the state of the b.js module
a.js continues to execute, modify the export value loaded to true
Finally execute main.js
As can be seen above, due to synchronous execution, the a.js module imported by b.js is not complete and cannot reflect the final state of b.js.
As you can see in the above example, the result of circular dependencies, which is more serious for large projects.

The method of use is relatively simple, and the limited space will not explain it in this article.

ESM

ESM is part of the ECMAScript 2015 specification, which establishes a unified module system for Javascript to adapt to various execution environments. An important difference between ESM and CommonJS is that the ES module is static, that is to say, the statement importing the module must be written at the top level. In addition, referenced modules can only use constant strings and cannot rely on expressions that need to be dynamically evaluated at runtime.
For example, we cannot introduce ES modules in the following ways

if (condition) {
  import module1 from 'module1'
} else {
  import module2 from 'module2'
}

And CommonJS can import different modules based on conditions

let module = null
if (condition) {
  module = require("module1")
} else {
  module = require("module2")
}

It seems to be stricter than CommonJS, but it is precisely because of this static introduction mechanism that we can statically analyze dependencies and remove logic that will not be executed. This is called tree-shaking

module loading process

To understand how the ESM system works and how it handles circular dependencies, we need to understand how the system parses and executes Javascript code

Stages of loading modules

The goal of the interpreter is to construct a graph to describe the dependencies between the modules to be loaded. This graph is also called a dependency graph.
It is through this dependency graph that the interpreter judges the dependencies of modules and decides in which order it should execute the code. For example, if we need to execute a js file, the interpreter will start from the entry and look for all import statements. If an import statement is encountered during the search process, it will recurse in a depth-first manner until all the codes are parsed. complete.
This process can be subdivided into three processes:

Profiling: finds all import statements and recursively loads the contents of each module from related files
Instantiation: For an exported entity, keep a named import in memory, but do not assign a value to it for the time being. At this time, dependencies should be established according to the import and export keywords, and the js code will not be executed at this time.
Execution: At this stage, NodeJS starts to execute the code, which enables the actual exported entity to obtain the actual value
In CommonJS, the file is executed while parsing dependencies. So when you see require, it means that the previous code has been executed. Because the require operation does not have to be at the beginning of the file, but can appear in the task place
However, the ESM system is different. These three stages are separated. It must first construct the complete dependency graph before starting to execute the code.
circular dependency
In the example of CommonJS circular dependency mentioned earlier, use ESM to transform
```
// a.js
import * as bModule from './b.js';
export let loaded = false;
export const b = bModule;
loaded = true;
// b.js
import * as aModule from './b.js';
export let loaded = false;
export const a = aModule;
loaded = true;
// main.js
import * as a from './a.js';
import * as b from './b.js';
console.log("A =>", a)
console.log("B =>", b)
```
It should be noted that the JSON.strinfy method cannot be used here, because circular dependencies are used here

In the above execution results, you can see that both a.js and b.js can observe each other completely. Unlike CommonJS, the state obtained by a module is an incomplete state.

dissect

Let's analyze the process below:
UML 图 (2).jpg

Take the picture above as an example:

Starting from the main.js analysis, first found an import statement, and then entered a.js
Start execution from a.js, find another import statement, execute b.js
When b.js starts to execute, an import statement is found and a.js is introduced. Because a.js has been depended on before, we will not execute this path again.
b.js continues to execute and finds that there are no other import statements. After returning to a.js, I also found that there are no other import statements, and then directly returned to the main.js entry file. Continue to execute and find that b.js is required to be introduced, but this module has been accessed before, so this path will not be executed
After a depth-first approach, the module dependency graph has formed a tree diagram, and then the interpreter executes code through this dependency graph
At this stage, the interpreter starts from the entry point and starts to analyze the dependencies between the modules. At this stage, the interpreter only cares about the import statements of the system, and loads the modules that these statements want to import, and explores the dependency graph in a depth-first manner. Traverse dependencies in this way to get a tree-like structure
instantiate
At this stage, the interpreter starts at the bottom of the tree and works its way to the top. Before reaching a module, it will look for all the attributes to be exported by the module, and build an implicit table in memory to store the name of the attribute to be exported by this module and the value that the attribute will have.
As shown below:

流程图.jpg
As you can see from the above figure, in what order the modules are instantiated

The interpreter starts with the b.js module, which it finds to export loaded and a
Then the interpreter analyzes the a.js module, and he finds that this module needs to export loaded and b
Finally, analyzing the main.js module, he found that this module does not export any functions
The set of exports map constructed in the instantiation phase only records the relationship between the exported name and the value that the name will have. As for the value itself, it is not initialized in this phase.
After the above process, the parser needs to be executed again. This time, it will associate the names exported by each module with those modules that imported them, as shown in the following figure:

流程图 (1).jpg
The steps this time are:

The module b.js should be connected with the content exported by the module b.js, this link is called aModule
The module a.js should be connected with the content exported by the module a.js, this link is called bModule
Finally, the module main.js should be connected with the content exported by the module b.js
At this stage, all the values are not initialized, we just establish the corresponding links, so that these links can point to the corresponding values, as for the value itself, we need to wait until the next stage to determine
implement
At this stage, the system finally has to execute the code in each file. He accesses the original dependency graph from bottom to top according to the depth-first order of the post-order, and executes the accessed files one by one. In this example, main.js will be executed last. This execution result ensures that when the program runs the main logic, the values exported by each module are all initialized

UML 图.jpg
The specific steps in the above figure are:

Execute from b.js. The first line of code to be executed will initialize the loaded exported by the module to false
Next, it will be executed, and aModule will be copied to a. At this time, a gets a reference value, which is the a.js module
Then set loaded to true. At this time, all the values of the b module are all determined.
Now execute a.js. First initialize the export value loaded to false
Next, the value of the b attribute exported by the module gets the initial value, which is the reference of bModule
Finally, change the value of loaded to true. At this point, we finally determined the values corresponding to these attributes exported by the a.js module system.
After completing these steps, the system can officially execute the main.js file. At this time, all the attributes exported by each module have been evaluated. Since the system imports modules by reference rather than copying, even if there is a cycle between modules Dependencies, each module can still fully see the final state of the other
The difference and interactive use of CommonJS and ESM
Here are some important differences between CommonJS and ESM, and how to use both modules together when necessary
ESM does not support some references provided by CommonJS
CommonJS provides some key references that are not supported by ESM, these include require , exports , module.exports , __filename , __diranme . If these are used in ES modules, there will be a reference error in the program.
In the ESM system, we can obtain a reference through the special object import.meta, which refers to the URL of the current file. Specifically, the file path of the current module is obtained by writing import.meta.url, which is similar to file: ///path/to/current_module.js . Based on this path, we can construct the two absolute paths represented by __filename and __dirname
```
import { fileURLToPath } from 'url';
import { dirname } from 'path';
const __dirname = fileURLToPath(import.meta.url);
const __dirname = dirname(__filename);
```
The require function of CommonJS can also be implemented in the ESM module by using the following method:
```
import { createRequire } from 'module';
const require = createRequire(import.meta.url)
```
Now, you can use this require() function to load the Commonjs module in the ES module system environment
Use another module in one of the module systems
module.createRequire function is used in the ESM module to load the commonJS module. In addition to this method, you can actually import CommonJS modules through the import language. However, this method will only export the default export content;
```
import pkg from 'commonJS-module'
import { method1 } from 'commonJS-module' // 会报错
```
But there is no way to do it in commonJS , we can't introduce the ESM module into 061e42bf9c8390
In addition, ESM does not support importing json files as modules, which can be easily implemented in commonjs
The following import statement will report an error
```
import json from 'data.json'
```
If you need to import a json file, you also need to use the createRequire function:
```
import { createRequire } from 'module';
const require = createRequire(import.meta.url);
const data = require("./data.json");
console.log(data)
```

Summarize

This article mainly explains how the two module systems in NodeJS work. Understanding these reasons can help us write bugs that avoid some difficult problems.

Definitely the clearest one - NodeJS module system

theme: smartblue

Why have a modular system

CommonJS modules

module loader

Module Resolution Algorithm

circular dependency

ESM

module loading process

Stages of loading modules

circular dependency

dissect

instantiate

implement

The difference and interactive use of CommonJS and ESM

ESM does not support some references provided by CommonJS

Use another module in one of the module systems

Summarize

云中歌

引用和评论

Vue.js-Vue实例

2025年最新反编译微信小程序的教程及工具

手写一个动态海洋和天空效果的vue hooks

你可能不知道的图片加载相关知识

原生JS大揭秘—JS代码执行原理解刨

使用CSS给标题添加书名号并超出省略

原生electron起步-从零到一完成构建和打包