The sixth part of the Webpack series: How to write a loader

The full text is 5000 words, in-depth analysis of Webpack Loader's characteristics, operating mechanism, development skills, welcome to like and pay attention. Writing is not easy, and reprinting in any form is prohibited without the author's consent! ! !

Regarding Webpack Loader, there are already a lot of information on the Internet, it is difficult to tell the flower, but there is no way to avoid this point when writing a series of Webpack blog posts, so I read more than 20 open source projects, and try to summarize some as comprehensively as possible. Knowledge and skills you need to know when writing Loader. contain:

So, let's get started.

Meet Loader

If you want to summarize, I think Loader is a content translator with side effects!

The core of Webpack Loader can only be to implement content converters-convert all kinds of resources into standard JavaScript content formats, such as:

css-loader convert css to __WEBPACK_DEFAULT_EXPORT__ = ".a{ xxx }" format
html-loader converts html to __WEBPACK_DEFAULT_EXPORT__ = "<!DOCTYPE xxx" format
vue-loader more complicated, it will convert the .vue file into multiple JavaScript functions, corresponding to template, js, css, custom block

So why do you need to do this conversion? Essentially, it is because Webpack only recognizes text that conforms to the JavaScript specification (other parsers have been added after Webpack 5): In the make phase, when parsing the module content, acorn will be called to convert the text into an AST object, and then analyze the code structure and analyze the module. Dependence; this set of logic does not work for scenes such as pictures, json, Vue SFC, etc. Loader is required to intervene to transform resources into content forms that Webpack can understand.

Plugin is another set of extension mechanism of Webpack, which is more powerful. It can insert specialized processing logic in the hook of each object. It can cover the whole life process of Webpack, and its capabilities, flexibility, and complexity will be much stronger than Loader.

Loader basics

At the code level, Loader is usually a function with the following structure:

module.exports = function(source, sourceMap?, data?) {
  // source 为 loader 的输入，可能是文件内容，也可能是上一个 loader 处理结果
  return source;
};

The Loader function receives three parameters, namely:

source : Resource input, for the first executed loader, it is the content of the resource file; the subsequent executed loader is the execution result of the previous loader
sourceMap : Optional parameter, code sourcemap structure
data : Optional parameters, other information that needs to be passed in the Loader chain, such as posthtml/posthtml-loader will pass the AST object of the parameter through this parameter

Among them, source is the most important parameter. What most source have to do is to translate 060c020b117e6b into another form of output , such as the core source code of webpack-contrib/raw-loader

//... 
export default function rawLoader(source) {
  // ...

  const json = JSON.stringify(source)
    .replace(/\u2028/g, '\\u2028')
    .replace(/\u2029/g, '\\u2029');

  const esModule =
    typeof options.esModule !== 'undefined' ? options.esModule : true;

  return `${esModule ? 'export default' : 'module.exports ='} ${json};`;
}

The function of this code is to wrap the text content into a JavaScript module, for example:

// source
I am Tecvan

// output
module.exports = "I am Tecvan"

After modular packaging, this text content turns into a resource module that Webpack can handle, and other modules can also reference and use it.

Return multiple results

The above example uses the return statement to return the processing result. In addition, the Loader can also callback for downstream Loader or Webpack itself. For example, in webpack-contrib/eslint-loader :

export default function loader(content, map) {
  // ...
  linter.printOutput(linter.lint(content));
  this.callback(null, content, map);
}

Return the translated content and sourcemap content at the same time through the this.callback(null, content, map) The full signature of callback

this.callback(
    // 异常信息，Loader 正常运行时传递 null 值即可
    err: Error | null,
    // 转译结果
    content: string | Buffer,
    // 源码的 sourcemap 信息
    sourceMap?: SourceMap,
    // 任意需要在 Loader 间传递的值
    // 经常用来传递 ast 对象，避免重复解析
    data?: any
);

Asynchronous processing

When it comes to asynchronous or CPU-intensive operations, the Loader can also return processing results in an asynchronous form. For example, the core logic of webpack-contrib/less-loader

import less from "less";

async function lessLoader(source) {
  // 1. 获取异步回调函数
  const callback = this.async();
  // ...

  let result;

  try {
    // 2. 调用less 将模块内容转译为 css
    result = await (options.implementation || less).render(data, lessOptions);
  } catch (error) {
    // ...
  }

  const { css, imports } = result;

  // ...

  // 3. 转译结束，返回结果
  callback(null, css, map);
}

export default lessLoader;

In less-loader, the logic is divided into three steps:

Call this.async get the asynchronous callback function. At this time, Webpack will mark the Loader as an asynchronous loader and suspend the current execution queue until callback is triggered
Call less library to translate less resources into standard css
Call asynchronous callback callback to return the processing result

this.async of the asynchronous callback function returned by this.callback same as the 060c020b1181cd introduced in the previous section, so I won’t repeat it here.

Cache

Loader provides developers with a convenient extension method, but the various resource content translation operations performed in Loader are usually CPU-intensive-this may cause performance problems in a single-threaded Node scenario; or asynchronous Loader will suspend subsequent loader queues until the asynchronous Loader triggers a callback. A little carelessness may cause the execution of the entire loader chain to take too long.

For this reason, by default, Webpack will cache the execution result of Loader until the resource or resource dependency changes. Developers need to have a basic understanding of this. If necessary, you can explicitly declare not to cache this.cachable

module.exports = function(source) {
  this.cacheable(false);
  // ...
  return output;
};

Context and Side Effect

In addition to serving as a content converter, the Loader running process can also restrict the Webpack compilation process through some context interfaces, resulting in side effects other than content conversion.

Context information can be obtained through this . The this object is created by the NormolModule.createLoaderContext function before calling Loader. Commonly used interfaces include:

const loaderContext = {
    // 获取当前 Loader 的配置信息
    getOptions: schema => {},
    // 添加警告
    emitWarning: warning => {},
    // 添加错误信息，注意这不会中断 Webpack 运行
    emitError: error => {},
    // 解析资源文件的具体路径
    resolve(context, request, callback) {},
    // 直接提交文件，提交的文件不会经过后续的chunk、module处理，直接输出到 fs
    emitFile: (name, content, sourceMap, assetInfo) => {},
    // 添加额外的依赖文件
    // watch 模式下，依赖文件发生变化时会触发资源重新编译
    addDependency(dep) {},
};

Among them, addDependency , emitFile , emitError , emitWarning will all have side effects on the subsequent compilation process. For example, less-loader contains such a piece of code:

  try {
    result = await (options.implementation || less).render(data, lessOptions);
  } catch (error) {
    // ...
  }

  const { css, imports } = result;

  imports.forEach((item) => {
    // ...
    this.addDependency(path.normalize(item));
  });

To explain, the code first calls less compile the file content, and then traverses all import statements, which is the result.imports above example, and calls the this.addDependency function one by one to register all other imported resources as dependencies. After these other resource files change, they will Trigger recompilation.

Loader chain call

In use, multiple Loaders can be configured for a certain resource file. Loaders are executed from front to back (pitch) and then from back to front according to the order of configuration, thereby forming a set of content translation workflow. For example, for the following configuration :

module.exports = {
  module: {
    rules: [
      {
        test: /\.less$/i,
        use: [
          "style-loader",
          "css-loader",
          "less-loader",
        ],
      },
    ],
  },
};

This is a typical less processing scenario. For .less suffix of 060c020b11842f, three loaders of less, css, and style are set to co-process resource files. According to the defined order, Webpack parses the content of the less file and then transmits it to the less-loader; less The result returned by -loader is then passed to css-loader for processing; the result of css-loader is then passed to style-loader; the final result is based on the processing result of style-loader, and the process is simplified as follows:

In the above example, the three Loaders play the following roles:

less-loader : Realize the conversion of less => css and output css content, which cannot be directly applied under the Webpack system
css-loader : The css content is packaged into content similar to module.exports = "${css}" , and the packaged content conforms to JavaScript syntax
style-loader : The thing to do is very simple, is to wrap the css module in the require statement, and call functions such as injectStyle at runtime to inject the content into the style tag of the page

The three Loaders respectively complete part of the content conversion work, forming a call chain from right to left. The chain call design has two advantages. One is to maintain the single responsibility of a single Loader, which reduces the complexity of the code to a certain extent; the other is that fine-grained functions can be assembled into a complex and flexible processing chain to improve the performance of a single Loader. Reusability.

However, this is only part of the chain call, there are two problems in it:

Once the Loader chain is started, all Loaders need to be executed before it ends. There is no chance of interruption-unless an exception is explicitly thrown
In some scenarios, you do not need to care about the specific content of the resource, but Loader needs to be executed after the source content is read

In order to solve these two problems, Webpack the loader on the basis of superimposed pitch concepts.

Loader Pitch

There have been many articles about Loader on the Internet, but most of them did not give pitch , and did not make it clear why the pitch function was designed, and what are the common use cases for pitch.

In this section, I will talk about the feature of loader pitch from the three dimensions of what, how, and why.

What is pitch

Webpack allows a function named pitch be mounted on this function, and the pitch will be executed earlier than the Loader itself at runtime, for example:

const loader = function (source){
    console.log('后执行')
    return source;
}

loader.pitch = function(requestString) {
    console.log('先执行')
}

module.exports = loader

The full signature of the Pitch function:

function pitch(
    remainingRequest: string, previousRequest: string, data = {}
): void {
}

Contains three parameters:

remainingRequest : The resource request string after the current loader
previousRequest : List of loaders experienced before executing the current loader
data data as 060c020b11c728 of Loader function, used to pass information that needs to be propagated in Loader

These parameters are not complicated, but are closely related to requestString. Let's look at an example to deepen our understanding:

module.exports = {
  module: {
    rules: [
      {
        test: /\.less$/i,
        use: [
          "style-loader", "css-loader", "less-loader"
        ],
      },
    ],
  },
};

The parameters obtained in css-loader.pitch

// css-loader 之后的 loader 列表及资源路径
remainingRequest = less-loader!./xxx.less
// css-loader 之前的 loader 列表
previousRequest = style-loader
// 默认值
data = {}

Scheduling logic

Pitch is translated into Chinese as throw, pitch, strength, the highest point of things etc. I think the reason why the pitch feature is ignored is completely the pot of this name, and it reflects a whole set of Loader's life cycle concept that is executed.

In terms of implementation, the Loader chain execution process is divided into three stages: pitch, resource analysis, and execution. The design is very similar to the DOM event model. Pitch corresponds to the capture phase; execution corresponds to the bubbling phase; and between the two phases, Webpack will Perform reading and parsing operations of resource content, corresponding to the AT\_TARGET phase of the DOM event model:

pitch loader.pitch function (if any) one by one from left to right according to the configuration sequence pitch to interrupt the execution of the subsequent link:

So why design the feature of pitch? After analyzing open source projects such as style-loader, vue-loader, to-string-loader, I personally concluded two words: block !

Example: style-loader

Let’s review the less loading chain mentioned earlier:

less-loader : Convert the content of the less specification to standard css
css-loader : Wrap css content into JavaScript modules
style-loader : The export result of the JavaScript module link , style tags, etc., so that the css code can run correctly on the browser

In fact, style-loader is only responsible for making css run in the browser environment. In essence, it does not need to care about the specific content. It is very suitable for processing with pitch. The core code:

// ...
// Loader 本身不作任何处理
const loaderApi = () => {};

// pitch 中根据参数拼接模块代码
loaderApi.pitch = function loader(remainingRequest) {
  //...

  switch (injectType) {
    case 'linkTag': {
      return `${
        esModule
          ? `...`
          // 引入 runtime 模块
          : `var api = require(${loaderUtils.stringifyRequest(
              this,
              `!${path.join(__dirname, 'runtime/injectStylesIntoLinkTag.js')}`
            )});
            // 引入 css 模块
            var content = require(${loaderUtils.stringifyRequest(
              this,
              `!!${remainingRequest}`
            )});

            content = content.__esModule ? content.default : content;`
      } // ...`;
    }

    case 'lazyStyleTag':
    case 'lazySingletonStyleTag': {
        //...
    }

    case 'styleTag':
    case 'singletonStyleTag':
    default: {
        // ...
    }
  }
};

export default loaderApi;

key point:

loaderApi is an empty function and does not do any processing
loaderApi.pitch of splicing in 060c020b11cd2e, the exported code includes:
- Introduce the runtime module runtime/injectStylesIntoLinkTag.js
- Reuse remainingRequest parameters and re-import the css file

The results are roughly as follows:

var api = require('xxx/style-loader/lib/runtime/injectStylesIntoLinkTag.js')
var content = require('!!css-loader!less-loader!./xxx.less');

Note that when the pitch function of style-loader returns this section, the subsequent Loader will not continue to execute, and the current call chain is interrupted:

After that, Webpack continued to parse and construct the results returned by style-loader, and encountered the inline loader statement:

var content = require('!!css-loader!less-loader!./xxx.less');

So from the perspective of Webpack, the loader chain is actually called twice for the same file, the first time is interrupted at the pitch of the style-loader, and the second time the style-loader is skipped based on the content of the inline loader.

Similar techniques have also appeared in other warehouses, such as vue-loader. Interested students can check the article " Webpack case-vue-loader principle analysis " that I posted on the ByteFE public account, which will not be discussed here. .

Advanced skills

development tools

Webpack provides two useful tools for Loader developers, which appear frequently in many open source Loaders:

webpack/loader-utils : Provides a series of tool functions such as reading configuration, requestString serialization and deserialization, and calculating hash values
webpack/schema-utils ：Parameter verification tool

The specific interfaces of these tools have been clearly explained in the corresponding readme, so I won't go into details. Here are some examples that are often used when writing Loader: how to obtain and verify user configuration; how to splice output file names.

Obtain and verify the configuration

Loader usually provides some configuration items for developers to customize the running behavior. Users can use.options attribute of the Webpack configuration file, for example:

module.exports = {
  module: {
    rules: [{
      test: /\.less$/i,
      use: [
        {
          loader: "less-loader",
          options: {
            cacheDirectory: false
          }
        },
      ],
    }],
  },
};

Inside the Loader, you need to use getOptions loader-utils library to obtain the user configuration, and use validate schema-utils library to verify the validity of the parameters, such as css-loader:

// css-loader/src/index.js
import { getOptions } from "loader-utils";
import { validate } from "schema-utils";
import schema from "./options.json";


export default async function loader(content, map, meta) {
  const rawOptions = getOptions(this);

  validate(schema, rawOptions, {
    name: "CSS Loader",
    baseDataPath: "options",
  });
  // ...
}

When using schema-utils for verification, you need to declare the configuration template in advance, which is usually processed into an additional json file, such as "./options.json" in the above example.

Splicing output file name

Webpack supports output.filename , which is the output file, in a way [path]/[name]-[hash].js . This level of rules usually does not require attention, but some scenarios such as webpack-contrib/file-load er need to stitch the result according to the asset file name.

file-loader supports the introduction of text or binary files such as png, jpg, svg, etc. into the JS module, and writes the file to the output directory. There is a problem here: if the file is called a.jpg , after being processed by Webpack, the output is [hash].jpg , how to correspond What? At this time, you can use loader-utils provided by interpolateName to obtain the path and name of the resource written in file-loader

import { getOptions, interpolateName } from 'loader-utils';

export default function loader(content) {
  const context = options.context || this.rootContext;
  const name = options.name || '[contenthash].[ext]';

  // 拼接最终输出的名称
  const url = interpolateName(this, name, {
    context,
    content,
    regExp: options.regExp,
  });

  let outputPath = url;
  // ...

  let publicPath = `__webpack_public_path__ + ${JSON.stringify(outputPath)}`;
  // ...

  if (typeof options.emitFile === 'undefined' || options.emitFile) {
    // ...

    // 提交、写出文件
    this.emitFile(outputPath, content, null, assetInfo);
  }
  // ...

  const esModule =
    typeof options.esModule !== 'undefined' ? options.esModule : true;

  // 返回模块化内容
  return `${esModule ? 'export default' : 'module.exports ='} ${publicPath};`;
}

export const raw = true;

The core logic of the code:

According to the Loader configuration, call the interpolateName method to splice the full path of the target file
Call the context this.emitFile interface to write out the file
Return module.exports = ${publicPath} , other modules can refer to the file path

In addition to file-loader, css-loader and eslint-loader all use this interface. If you are interested, please check the source code by yourself.

unit test

The benefits of writing unit tests in Loader are very high. On the one hand, it is not necessary for developers to write demos and how to build test environments; on the other hand, for end users, projects with a certain test coverage usually mean higher, More stable quality.

After reading more than 20 open source projects, I summarized a set of commonly used unit testing procedures in Webpack Loader scenarios, taking Jest · 🃏 Delightful JavaScript Testing as an example:

Create an instance in Webpack and run Loader
Obtain Loader execution results, compare and analyze to determine whether it meets expectations
Determine whether there is an error during execution

How to run Loader

There are two ways. One is to run in the node environment and call the Webpack interface, and use code instead of the command line to execute compilation. Many frameworks will use this method, such as vue-loader, stylus-loader, babel-loader, etc., and the advantages of operation The effect is closest to the end user, the disadvantage is that the operating efficiency is relatively low (can be ignored).

Take posthtml/posthtml-loader as an example, it will create and run a Webpack instance before starting the test:

// posthtml-loader/test/helpers/compiler.js 文件
module.exports = function (fixture, config, options) {
  config = { /*...*/ }

  options = Object.assign({ output: false }, options)

  // 创建 Webpack 实例
  const compiler = webpack(config)

  // 以 MemoryFS 方式输出构建结果，避免写磁盘
  if (!options.output) compiler.outputFileSystem = new MemoryFS()

  // 执行，并以 promise 方式返回结果
  return new Promise((resolve, reject) => compiler.run((err, stats) => {
    if (err) reject(err)
    // 异步返回执行结果
    resolve(stats)
  }))
}

Tips:
As shown in the above example, use the compiler.outputFileSystem = new MemoryFS() statement to set Webpack to output to the memory, which can avoid writing to disk and improve the compilation speed.

Another method is to write a series of mock methods to build a simulated Webpack runtime environment, such as emaphp/underscore-template-loader . The advantage is that it runs faster, but the disadvantage is that the development workload is large and the versatility is low. That's it.

Comparison result

After the end of the previous example would run to resolve(stats) return to the way the results, stats object compilation process includes almost all the information, including the time-consuming, product, module, chunks, errors, warnings, etc., in the previous article I to share a few Webpack The practical analysis tool has already made a more in-depth introduction to this, and interested students can go to read it.

In the test scenario, the stats object, such as the implementation of style-loader:

// style-loader/src/test/helpers/readAsset.js 文件
function readAsset(compiler, stats, assets) => {
  const usedFs = compiler.outputFileSystem
  const outputPath = stats.compilation.outputOptions.path
  const queryStringIdx = targetFile.indexOf('?')

  if (queryStringIdx >= 0) {
    // 解析出输出文件路径
    asset = asset.substr(0, queryStringIdx)
  }

  // 读文件内容
  return usedFs.readFileSync(path.join(outputPath, targetFile)).toString()
}

To explain, this code first calculates the file path output by the asset, and then calls the readFile method of outputFileSystem to read the file content.

Next, there are two ways to analyze content:

Call Jest's expect(xxx).toMatchSnapshot() assertion to determine whether the current running result is consistent with the previous running result, so as to ensure the consistency of the results of multiple modifications. Many frameworks use this method extensively
Interpret the resource content and determine whether it meets expectations. For example, in the unit test of less-loader, the same code will be run twice for less compilation, once by Webpack, and once by directly calling the less library, and then analyze whether the results of the two runs are the same

less-loader who are interested in this, strongly recommend to look at the test directory of 060c020b11de0a.

Abnormal judgment

Finally, you need to determine whether there is an exception in the compilation process, which can also be stats object:

export default getErrors = (stats) => {
  const errors = stats.compilation.errors.sort()
  return errors.map(
    e => e.toString()
  )
}

In most cases, it is hoped that there will be no errors in the compilation. At this time, just judge whether the result array is empty. In some cases, it may be necessary to determine whether a specific exception is thrown. At this time, expect(xxx).toMatchSnapshot() can be asserted, and the results before and after the update can be compared with the snapshot.

debugging

In the process of developing Loader, there are some tips to improve debugging efficiency, including:

Use ndb tool to realize breakpoint debugging
Use npm link to link the Loader module to the test project
Use the resolveLoader configuration item to add the directory where Loader is located to the test project, such as:

// webpack.config.js
module.exports = {
  resolveLoader:{
    modules: ['node_modules','./loaders/'],
  }
}

Irrelevant summary

This is the seventh article in the Webpack principle analysis series. To be honest, I did not expect to write so much at the beginning, and I will continue to focus on this front-end engineering field. My goal is to make a book of my own. Classmates are welcome to like and follow. If you feel any omissions or doubts, please comment and discuss.

Previous articles
[Summary of 4D characters] One article understands the core principles of
Ten minutes to refine Webpack: module.issuer attribute detailed
bit difficult webpack knowledge point: Dependency Graph in-depth analysis
bit difficult knowledge points: Webpack Chunk subcontracting rules in detail
[Recommended collection] Webpack 4+ collection of excellent learning materials
Webpack series fifth: a thorough understanding of Webpack runtime