Teach you how to implement a babel plugin~

If you have babel-related knowledge, it is recommended to skip the prerequisite knowledge part and go directly to the "Plugin Writing" section.

Pre-knowledge

what is AST

To learn babel, the necessary knowledge is to understand AST.

What is AST?

Let's take a look at Wikipedia's explanation:

In computer science, an Abstract Syntax Tree (AST), or simply a Syntax tree, is an abstract representation of the grammatical structure of source code. It represents the grammatical structure of the programming language in the form of a tree, and each node on the tree represents a structure in the source code

" An abstract representation of the grammatical structure of source code" These words should be emphasized, which is the key to our understanding of AST. To speak human words is to describe our code in a tree-shaped data structure according to a certain agreed specification. Come out, so that the js engine and transpiler can understand.

for example:

就好比现在框架会利用`虚拟dom`这种方式把`真实dom`结构描述出来再进行操作一样，而对于更底层的代码来说，AST就是用来描述代码的好工具。

Of course, AST is not unique to JS. The code of each language can be converted into the corresponding AST, and there are many specifications of the AST structure. Most of the specifications used in js are estree . Of course, this is only a simple understanding.

`What does AST look like`

After understanding the basic concepts of AST, what does AST look like?

astexplorer.net This website can generate AST online, we can try to generate AST in it to learn the structure

`The processing process of babel`

Q: How many stages does it take to stuff a refrigerator into an elephant?

Open fridge -> stuff elephant -> close fridge

The same is true for babel. Babel uses AST to compile the code. First of all, it is necessary to convert the code into AST, then process the AST, and then convert the AST back after processing.

That is, the following process

code is converted to AST -> process AST -> AST is converted to code

Then we give them a more professional name

Parse -> Convert -> Generate

`Parse (parse)`

Convert source code to abstract syntax tree (AST) through parser

The main task of this stage is to convert code into AST, which will go through two stages, lexical analysis and syntax analysis. When the parse phase begins, a document scan is performed first, and lexical analysis is performed during this time. So how do you understand this analysis? If we compare a piece of code we write to a sentence, what lexical analysis does is split the sentence. Just as the "I'm eating" can be disassembled into "I", "I'm eating", and "Eating", so does code. for example:

const a = '1'

It will be disassembled into the most fine-grained words (tokon):

'const', 'a', '=', '1'

This is what the lexical analysis stage does.

After the lexical analysis is completed, the tokens obtained by the analysis are handed over to the grammatical analysis. The task of the grammatical analysis stage is to generate AST according to the tokens. It traverses the tokens, and finally generates a tree according to a specific structure. This tree is the AST.

As shown in the figure below, you can see the structure of the above statement. We have found several important information. The outermost layer is a VariableDeclaration which declarations VariableDeclarator variable declaration const ] object, two keywords a and 1 were found.

In addition to these keywords, you can also find important information such as line numbers, etc., which will not be elaborated here. Anyway, this is what our AST looks like in the end.

The question is, how to convert code to AST in babel? At this stage, we will use the parser @babel/parser provided by babel, formerly called Babylon, which was not developed by the babel team itself, but a fork-based acorn project.

It provides us with a way to convert code to AST, the basic usage is as follows:

For more information, you can visit the official documentation to view @babel/parser

`transform`

After the parse phase, we have successfully obtained the AST. After babel receives the AST, it performs a depth-first traversal of it using @babel/traverse, and plugins are triggered at this stage to access each different type of AST node in the form of a vistor function. Taking the above code as an example, we can write the VariableDeclaration function to access the VariableDeclaration node, and this method will be triggered whenever a node of this type is encountered. as follows:

The method accepts two parameters,

`path`

path is the currently accessed path, and contains node information, parent node information, and many methods of operating the node. ATS can be added, updated, moved, deleted, etc. using these methods.

`state`

The state contains the current plugin information and parameter information, etc., and can also be used to customize the transfer of data between nodes.

`generate`

generate: print the converted AST into target code and generate a sourcemap

This stage is relatively simple. After the AST is processed in the transform stage, the task of this stage is to convert the AST back to code. During this period, the AST will be depth-first traversed, and the corresponding code will be generated according to the information contained in the node. Generate the corresponding sourcemap.

`Classic case attempt`

As the saying goes, the best learning is hands-on, let's try a simple classic case together: Convert es6's var in the above case to es5's const

`Step 1: Convert to AST`

Generate AST using @babel/parser It is relatively simple, it is the same as the above case, at this time our ast variable is the converted AST

const parser = require('@babel/parser');
const ast = parser.parse('const a = 1');

`Step 2: Process the AST`

AST processing with @babel/traverse

At this stage, by analyzing the generated AST structure, we determined that the VariableDeclaration field controls const in kind , so can we try to rewrite var kind want? That being the case, let's try

const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default

const ast = parser.parse('const a = 1');
traverse(ast, {
    VariableDeclaration(path, state) {
      // 通过 path.node 访问实际的 AST 节点
      path.node.kind = 'var'
    }
});

Well, at this time, we modified kind by conjecture and rewritten it to var , but we still can't know whether it is actually effective, so we need to convert it back to code to see the effect.

`Step 3: Generate code`

AST processing with @babel/generator

const parser = require('@babel/parser');
const traverse = require('@babel/traverse').default
const generate = require('@babel/generator').default

const ast = parser.parse('const a = 1');
traverse(ast, {
    VariableDeclaration(path, state) {
      path.node.kind = 'var'
    }
});

// 将处理好的 AST 放入 generate
const transformedCode = generate(ast).code
console.log(transformedCode)

Let's look at the effect again:

The execution is completed, it is successful, it is the effect we want~

`How to develop plugins`

Through the above classic case, I have probably understood the use of babel, but how should we write our usual plug-ins?

In fact, the development of the plug-in is the same as the basic idea above, but as a plug-in, we only need to pay attention to the conversion stage.

Our plugin needs to export a function/object. If it is a function, it needs to return an object. We only need to do the same thing in the visitor of the object, and the function will accept several parameters. The api inherits the one provided by babel. A series of methods, options is the parameter passed when we use the plugin, dirname is the file path during processing.

The above case is transformed as follows:

module.exports = {
    visitor: {
        VariableDeclaration(path, state) {
          path.node.kind = 'var'
        }
    }
}
// 或是函数形式
module.exports = (api, options, dirname) => {
    return {
        visitor: {
          VariableDeclaration(path, state) {
            path.node.kind = 'var'
          }
        }
    }
}

`Plugin writing`

On the basis of the prerequisite knowledge, let's explain the development of a babel plug-in step by step. First of all, we define the core requirements of the plug-in to be developed next:

A function can be automatically inserted and called.
Automatically import dependent dependencies of insert functions.
The function to be inserted and the function to be inserted can be specified through comments. If no comments are specified, the default insertion position is in the first column.

The basic effects are shown as follows:

Before processing

// log 声明需要被插入并被调用的方法
// @inject:log
function fn() {
    console.log(1)
    // 用 @inject:code指定插入行
    // @inject:code
    console.log(2)
}

After processing

// 导入包 xxx 之后要在插件参数内提供配置
import log from 'xxx'
function fn() {
    console.log(1)
    log()
    console.log(2)
}

`idea arrangement`

After understanding the general requirements, don't rush to do it first, we have to think about how to start, and have already imagined the problems that need to be dealt with in the process.

Find the function marked with @inject and see if there is a location mark for @inject:code inside it.
Import the appropriate packages for all inserted functions.
When the tag is matched, all we have to do is insert the function, and we also need to deal with functions in various situations, such as: object methods, iife, arrow functions, and so on.

`Design plugin parameters`

In order to improve the flexibility of the plugin, we need to design a more appropriate parameter rule. The plugin parameter accepts an object.

key as the function name of the insert function.
kind represents the import form. There are three import methods named, default, namespaced, this design reference babel-helper-module-imports
- named corresponds to the form of import { a } from "b"
- default corresponds to the form of import a from "b"
- namespaced corresponds to the form of import * as a from "b"
require is the package name of the dependency

For example, I need to insert the log method, which needs to be imported from the log4js package, and is in the form of named , and the parameters are in the following form.

// babel.config.js
module.exports = {
  plugins: [
    // 填写我们的plugin的js 文件地址
    ['./babel-plugin-myplugin.js', {
      log: {
        // 导入方式为 named
        kind: 'named',
        require: 'log4js'
      }
    }]
  ]
}

`start`

Well, knowing what to do and designing the rules for the parameters, we can start.

First, we enter https://astexplorer.net/ to generate an AST from the code to be processed to facilitate us to sort out the structure, and then we perform specific coding

The first is the function declaration statement. Let's analyze its AST structure and how to deal with it. Let's take a look at the demo

// @inject:log
function fn() {
    console.log('fn')
}

The generated AST structure is as follows, you can see that there are two key attributes:

leadingComments indicates the front comments, you can see that there is an element inside, which is @inject:log written in our demo
body is the specific content of the function body. The console.log('fn') written by demo is in it at this time. We need to operate it when we insert the code.

Well, we know that we can know whether the function needs to be inserted through leadingComments , and the operation of body can realize our code insertion requirements. .

First, we have to find the FunctionDeclaration layer, because only this layer has the leadingComments attribute, and then we need to traverse it to match the function that needs to be inserted. Then insert the matched function into the body, but we need to pay attention to the level of the insertable body. The body in FunctionDeclaration is not an array but BlockStatement , which represents the function body of the function, and it also has a body, so we The actual operation position is in the body of this BlockStatement

code show as below:

module.exports = (api, options, dirname) => {

  return {
    visitor: {
      // 匹配函数声明节点
      FunctionDeclaration(path, state) {
        // path.get('body') 相当于 path.node.body
        const pathBody = path.get('body')
        if(path.node.leadingComments) {
          // 过滤出所有匹配 @inject:xxx 字符 的注释
          const leadingComments = path.node.leadingComments.filter(comment => /\@inject:(\w+)/.test(comment.value) )
          leadingComments.forEach(comment => {
            const injectTypeMatchRes = comment.value.match(/\@inject:(\w+)/)
            // 匹配成功
            if( injectTypeMatchRes ) {
              // 匹配结果的第一个为 @inject:xxx 中的 xxx ,  我们将它取出来
              const injectType = injectTypeMatchRes[1]
              // 获取插件参数的 key， 看xxx 是否在插件的参数中声明过
              const sourceModuleList = Object.keys(options)
              if( sourceModuleList.includes(injectType) ) {
                // 搜索body 内部是否有 @code:xxx 注释
                // 因为无法直接访问到 comment，所以需要访问 body内每个 AST 节点的 leadingComments 属性
                const codeIndex = pathBody.node.body.findIndex(block => block.leadingComments && block.leadingComments.some(comment => new RegExp(`@code:\s?${injectType}`).test(comment.value) ))
                // 未声明则默认插入位置为第一行
                if( codeIndex === -1 ) {
                  // 操作`BlockStatement` 的 body
                  pathBody.node.body.unshift(api.template.statement(`${state.options[injectType].identifierName}()`)());
                }else {
                  pathBody.node.body.splice(codeIndex, 0, api.template.statement(`${state.options[injectType].identifierName}()`)());
                }
              }
            }
          })
        }
      }
  }
})

After writing, let's look at the result, log was successfully inserted, because we did not use @code:log , it was inserted in the first row by default

Then we try to use the @code:log identifier, we change the demo code to the following

// @inject:log
function fn() {
    console.log('fn')
    // @code:log
}

Run the code again to see the result, it is indeed successfully inserted at the @code:log position

After processing our first case function declaration, some people may ask, what do you do with arrow functions without function bodies? for example:

// @inject:log
() => true

Is this a problem? no problem!

If there is no function body, we can just give it a function body. How to do it?

First of all, let's learn to analyze the AST structure. First, we see that the outermost layer is actually a ExpressionStatement expression declaration, and then the inner part is the ArrowFunctionExpression arrow function expression. It can be seen that the structure generated by our previous function declaration is very different. , In fact, we don't have to be fascinated by so many layers of structures, we only need to find useful information for us, in one sentence: Which layer has leadingComments, we will look for which layer . leadingComments here is on ExpressionStatement , so let's just find it

After analyzing the structure, how to judge whether there is a function body? Remember the BooleanLiteral we saw in the body when we were dealing with the function declaration above, but you saw BlockStatement in the body of our arrow function. Therefore, we can judge its body type to know whether there is a function body. The specific method can use the type judgment method path.isBlockStatement() provided by babel to distinguish whether there is a function body.

module.exports = (api, options, dirname) => {

  return {
    visitor: {
      ExpressionStatement(path, state) {
        // 访问到 ArrowFunctionExpression
        const expression = path.get('expression')
        const pathBody = expression.get('body')
        if(path.node.leadingComments) {
          // 正则匹配 comment 是否有 @inject:xxx 字符
          const leadingComments = path.node.leadingComments.filter(comment => /\@inject:(\w+)/.test(comment.value) )
          
          leadingComments.forEach(comment => {
            const injectTypeMatchRes = comment.value.match(/\@inject:(\w+)/)
            // 匹配成功
            if( injectTypeMatchRes ) {
              // 匹配结果的第一个为 @inject:xxx 中的 xxx ,  我们将它取出来
              const injectType = injectTypeMatchRes[1]
              // 获取插件参数的 key， 看xxx 是否在插件的参数中声明过


              const sourceModuleList = Object.keys(options)
              if( sourceModuleList.includes(injectType) ) {
                // 判断是否有函数体
                if (pathBody.isBlockStatement()) {
                  // 搜索body 内部是否有 @code:xxx 注释
                  // 因为无法直接访问到 comment，所以需要访问 body内每个 AST 节点的 leadingComments 属性
                  const codeIndex = pathBody.node.body.findIndex(block => block.leadingComments && block.leadingComments.some(comment => new RegExp(`@code:\s?${injectType}`).test(comment.value) ))
                  // 未声明则默认插入位置为第一行
                  if( codeIndex === -1 ) {
                    pathBody.node.body.unshift(api.template.statement(`${injectType}()`)());
                  }else {
                    pathBody.node.body.splice(codeIndex, 0, api.template.statement(`${injectType}()`)());
                  }
                }else {
                  // 无函数体情况
                  // 使用 ast 提供的 `@babel/template`  api ， 用代码段生成 ast
                  const ast = api.template.statement(`{${injectType}();return BODY;}`)({BODY: pathBody.node});
                 // 替换原本的body
                  pathBody.replaceWith(ast);
                }
              }
            }
          })
        }
      }
  }
}
}

It can be seen that except for the newly added function body judgment, the function body insertion code is generated, and the new AST is used to replace the original node. Except for these, the general logic is the same as the processing process of the previous function declaration.

The API related usage of @babel/template used to generate AST can be found in the document @babel/template

The functions for different situations are basically the same, and the summary is:

Analyze the AST and find the node where leadingComments is located -> find the node where the insertable body is located -> write the insertion logic

There are still many cases of actual processing, such as: object attributes, iife, function expressions, etc. The processing ideas are the same, so I will just repeat them here. I will post the complete code of the plugin at the bottom of the article.

`automatic introduction`

The first item is completed, then the second item of the requirement, how to automatically introduce the package we use, such as log4js used in the above case, then the code we processed should be automatically added:

import { log } from 'log4js'

At this point, we can think about it, we need to deal with the following two situations

log has been imported
log variable name is already occupied

For question 1, we need to check whether log4js has been imported, and named has been imported in the form of log For problem 2, we need to give log a unique alias, and make sure to use this alias in subsequent code insertions. So this requires us to deal with the logic of automatic introduction at the beginning of the file.

There is a general idea, but how do we complete the automatic introduction logic in advance. In doubt, let's take a look at the structure of the AST. It can be seen that the outermost layer of the AST is the File node, which has a comments attribute, which contains all the comments in the current file. With this, we can parse out the functions that need to be inserted in the file and import them in advance. Let's look further down, the inside is a Program , we will visit it first, because it will be called before other types of nodes, so we want to implement the automatic introduction logic at this stage.

Little knowledge: babel provides the path.traverse method, which can be used to synchronously access and process the child nodes under the current node.

As shown in the figure:

code show as below:

const importModule = require('@babel/helper-module-imports');

// ......
{
    visitor: {
      Program(path, state) {
        // 拷贝一份options 挂在 state 上,  原本的 options 不能操作
        state.options = JSON.parse(JSON.stringify(options))

        path.traverse({
          // 首先访问原有的 import 节点， 检测 log 是否已经被导入过
          ImportDeclaration (curPath) {
            const requirePath = curPath.get('source').node.value;
            // 遍历options
            Object.keys(state.options).forEach(key => {
              const option = state.options[key]
              // 判断包相同
              if( option.require === requirePath ) {
                const specifiers = curPath.get('specifiers')
                specifiers.forEach(specifier => {

                  // 如果是默认type导入
                  if( option.kind === 'default' ) {
                    // 判断导入类型
                    if( specifier.isImportDefaultSpecifier() ) {
                      // 找到已有 default 类型的引入
                      if( specifier.node.imported.name === key ) {
                        // 挂到 identifierName 以供后续调用获取
                        option.identifierName = specifier.get('local').toString()
                      }
                    }
                  }

                    // 如果是 named 形式的导入
                  if( option.kind === 'named' ) {
                    // 
                    if( specifier.isImportSpecifier() ) {
                      // 找到已有 default 类型的引入
                      if( specifier.node.imported.name === key ) {
                        option.identifierName = specifier.get('local').toString()
                      }
                    }
                  }
                })
              }
            })
          }
        });


        // 处理未被引入的包
        Object.keys(state.options).forEach(key => {
          const option = state.options[key]
          // 需要require 并且未找到 identifierName 字段
          if( option.require && !option.identifierName )  {
            
            // default形式
            if( option.kind === 'default' ) {
              // 增加 default 导入
              // 生成一个随机变量名, 大致上是这样 _log2
              option.identifierName = importModule.addDefault(path, option.require, {
                nameHint: path.scope.generateUid(key)
              }).name;
            }

            // named形式
            if( option.kind === 'named' ) {
              option.identifierName = importModule.addNamed(path, key, option.require, {
                nameHint: path.scope.generateUid(key)
              }).name
            }
          }

          // 如果没有传递 require 会认为是全局方法，不做导入处理
          if( !option.require ) {
            option.identifierName = key
          }
        })
    }
  }
}

In the Program node, we first copied a copy of the received plugin configuration options , and hung it on state . It was said before that state can be used for data transfer between AST nodes, and then we first access Program under ImportDeclaration import , to see if log4js has been imported. If it has been imported, it will be recorded in the identifierName field. After completing the access to the import statement, we can judge whether it has been imported according to the identifierName field. If not, use @babel/ helper-module-imports creates import and uses the generateUid method provided by babel to create unique variable names.

In this way, we also need to adjust the previous code slightly, we cannot directly use the method name extracted from the comment @inject:xxx , Instead, identifierName should be used, and the key part of the code should be modified as follows:

if( sourceModuleList.includes(injectType) ) {
  // 判断是否有函数体
  if (pathBody.isBlockStatement()) {
    // 搜索body 内部是否有 @code:xxx 注释
    // 因为无法直接访问到 comment，所以需要访问 body内每个 AST 节点的 leadingComments 属性
    const codeIndex = pathBody.node.body.findIndex(block => block.leadingComments && block.leadingComments.some(comment => new RegExp(`@code:\s?${injectType}`).test(comment.value) ))
    // 未声明则默认插入位置为第一行
    if( codeIndex === -1 ) {
      // 使用 identifierName 
      pathBody.node.body.unshift(api.template.statement(`${state.options[injectType].identifierName}()`)());
    }else {
      // 使用 identifierName 
      pathBody.node.body.splice(codeIndex, 0, api.template.statement(`${state.options[injectType].identifierName}()`)());
    }
  }else {
    // 无函数体情况
    // 使用 ast 提供的 `@babel/template`  api ， 用代码段生成 ast

    // 使用 identifierName 
    const ast = api.template.statement(`{${state.options[injectType].identifierName}();return BODY;}`)({BODY: pathBody.node});
    // 替换原本的body
    pathBody.replaceWith(ast);
  }
}

The final effect is as follows:

We have implemented automatic function insertion and automatic introduction of dependent packages.

`end`

This article is a summary of my own records after learning the "Babel Plug-in Clearance Cheats" booklet. I started to be like most of my classmates who wanted to write babel plug-ins but couldn't start, so this article is mainly based on my own experience when writing plug-ins. ideas to write. Hope it can also give you an idea.

The full version already supports the insertion of custom code snippets. The complete code has been uploaded to github , and also released to npm . Welcome to star and issue.

Giving a star is a favor, not giving it is an accident, haha.