Vue3 source code analysis (2): AST parser

In the previous article, we started from the entry of packges/vue/src/index.ts and learned about the compilation process of a Vue object. In the article, we mentioned that the baseCompile function generates an AST abstract syntax tree during execution. Doubt this is a very critical step, because only when we get the generated AST, we can traverse the nodes of the AST to perform transform operations, such as parsing v-if , v-for , or analyzing the nodes to statically upgrade the nodes that meet the conditions. Both rely on the previously generated AST abstract syntax tree. So today we will take a look at the parsing of AST and see how Vue parses templates.

Generate AST abstract syntax tree

First, let's revisit the logic of ast in the baseCompile function and its subsequent use:

export function baseCompile(
  template: string | RootNode,
  options: CompilerOptions = {}
): CodegenResult {

  /* 忽略之前逻辑 */

  const ast = isString(template) ? baseParse(template, options) : template

  transform(
    ast,
    {/* 忽略参数 */}
  )

  return generate(
    ast,
    extend({}, options, {
      prefixIdentifiers
    })
  )
}

Because I have already processed the logic comments that we don't need to pay attention to, the logic in the function body will be very clear now:

Generate ast object
Pass the ast object as a parameter to the transform function to transform the ast node
Pass the ast object as a parameter to the generate function, and return the compilation result

Here we mainly focus on the generation of ast. It can be seen that the generation of ast is judged by a ternary operator. If the template parameter passed in is a string, then baseParse is called to parse the template string, otherwise the template is directly used as the ast object. What is done in baseParse to generate ast? Let's take a look at the source code,

export function baseParse(
  content: string,
  options: ParserOptions = {}
): RootNode {
  const context = createParserContext(content, options) // 创建解析的上下文对象
  const start = getCursor(context) // 生成记录解析过程的游标信息
  return createRoot( // 生成并返回 root 根节点
    parseChildren(context, TextModes.DATA, []), // 解析子节点，作为 root 根节点的 children 属性
    getSelection(context, start)
  )
}

I added a comment to the function of baseParse to make it easier for everyone to understand the role of each function. First, the analytical context will be created, and then the cursor information will be obtained according to the context. Since it has not been parsed, the column, line, and offset attributes in the cursor correspond to Both are the starting positions of the template. After that, the root node is created and the root node is returned. At this point, the ast tree is generated and the analysis is complete.

Create the root node of the AST

export function createRoot(
  children: TemplateChildNode[],
  loc = locStub
): RootNode {
  return {
    type: NodeTypes.ROOT,
    children,
    helpers: [],
    components: [],
    directives: [],
    hoists: [],
    imports: [],
    cached: 0,
    temps: 0,
    codegenNode: undefined,
    loc
  }
}

Looking at the code of the createRoot function, we can find that the function returns a root node object of type RootNode, and the children parameter we pass in will be used as the children parameter of the root node. It is very easy to understand here, just imagine it according to the tree data structure. So the key point of generating ast will focus on the parseChildren function. If you don't look at the source code of the parseChildren function, you can roughly understand that this is a function for parsing child nodes by seeing the meaning of the text. Next, let's take a look at the most critical parseChildren function in AST parsing, or the old rules. To help everyone understand, I will streamline the logic in the function body.

Resolve child nodes

function parseChildren(
  context: ParserContext,
  mode: TextModes,
  ancestors: ElementNode[]
): TemplateChildNode[] {
  const parent = last(ancestors) // 获取当前节点的父节点
  const ns = parent ? parent.ns : Namespaces.HTML
  const nodes: TemplateChildNode[] = [] // 存储解析后的节点

  // 当标签未闭合时，解析对应节点
  while (!isEnd(context, mode, ancestors)) {/* 忽略逻辑 */}

  // 处理空白字符，提高输出效率
  let removedWhitespace = false
  if (mode !== TextModes.RAWTEXT && mode !== TextModes.RCDATA) {/* 忽略逻辑 */}

  // 移除空白字符，返回解析后的节点数组
  return removedWhitespace ? nodes.filter(Boolean) : nodes
}

From the above code, we can know that the parseChildren function receives three parameters, context: parser context, mode: text data type, ancestors: ancestor node array. In the execution of the function, the parent node of the current node is first obtained from the ancestor node, the namespace is determined, and an empty array is created to store the parsed node. After that, there will be a while loop to determine whether the closing position of the label is reached. If it is not the label that needs to be closed, the source template string is classified and parsed in the loop body. After that, there will be a piece of logic for processing whitespace characters, and the resolved nodes array will be returned after the processing is complete. After everyone has a preliminary understanding of the execution process of parseChildren, let's take a look at the core of the function, the logic inside the while loop.

The parser will determine the type of text data in while, and will continue to parse it only when TextModes is DATA or RCDATA.

The first case is to determine whether it is necessary to parse the "Mustache" grammar (double braces) in the Vue template grammar. If there is no v-pre instruction in the current context to skip the expression, and the source template string is separated by our specified At the beginning of the character (the double brace in context.options.delimiters at this time), the double brace will be parsed. It can be found here that if you do not want to use double curly braces as expression interpolation when you have special needs, then you only need to change the delimiters attribute in the options before compiling.

Next, it will be judged that if the first character is "<" and the second character is "!", it will try to parse the comment tag. In the <!DOCTYPE and <!CDATA , the DOCTYPE will be ignored and parsed as a comment.

Later, it will be judged that when the second character is "/", "</" has already met the condition of a closed tag, so it will try to match the closed tag. When the third character is ">" and the label name is missing, an error will be reported and the parser will advance three characters, skipping "</>".

If "</" begins and the third character is a lowercase English character, the parser will parse the closing tag.

If the first character of the source template string is "<" and the second character starts with a lowercase English character, the parseElement function will be called to parse the corresponding tag.

When the branch condition for judging string characters ends and no node node is parsed, node will be used as the text type, and parseText will be called for parsing.

Finally, the generated node is added to the nodes array and returned at the end of the function.

This is the logic inside the while loop and is the most important part of parseChildren. In this judgment process, we saw the parsing of the double brace grammar, saw how the comment node was parsed, and also saw the parsing of the opening and closing tags, and the parsing of the text content. The simplified code is in the box below. You can compare the above explanation to understand the source code. Of course, the comments in the source code are also very detailed.

while (!isEnd(context, mode, ancestors)) {
  const s = context.source
  let node: TemplateChildNode | TemplateChildNode[] | undefined = undefined

  if (mode === TextModes.DATA || mode === TextModes.RCDATA) {
    if (!context.inVPre && startsWith(s, context.options.delimiters[0])) {
      /* 如果标签没有 v-pre 指令，源模板字符串以双大括号 `{{` 开头，按双大括号语法解析 */
      node = parseInterpolation(context, mode)
    } else if (mode === TextModes.DATA && s[0] === '<') {
      // 如果源模板字符串的第以个字符位置是 `!`
      if (s[1] === '!') {
                // 如果以 '<!--' 开头，按注释解析
        if (startsWith(s, '<!--')) {
          node = parseComment(context)
        } else if (startsWith(s, '<!DOCTYPE')) {
                    // 如果以 '<!DOCTYPE' 开头，忽略 DOCTYPE，当做伪注释解析
          node = parseBogusComment(context)
        } else if (startsWith(s, '<![CDATA[')) {
          // 如果以 '<![CDATA[' 开头，又在 HTML 环境中，解析 CDATA
          if (ns !== Namespaces.HTML) {
            node = parseCDATA(context, ancestors)
          }
        }
      // 如果源模板字符串的第二个字符位置是 '/'
      } else if (s[1] === '/') {
        // 如果源模板字符串的第三个字符位置是 '>'，那么就是自闭合标签，前进三个字符的扫描位置
        if (s[2] === '>') {
          emitError(context, ErrorCodes.MISSING_END_TAG_NAME, 2)
          advanceBy(context, 3)
          continue
        // 如果第三个字符位置是英文字符，解析结束标签
        } else if (/[a-z]/i.test(s[2])) {
          parseTag(context, TagType.End, parent)
          continue
        } else {
          // 如果不是上述情况，则当做伪注释解析
          node = parseBogusComment(context)
        }
      // 如果标签的第二个字符是小写英文字符，则当做元素标签解析
      } else if (/[a-z]/i.test(s[1])) {
        node = parseElement(context, ancestors)
        
      // 如果第二个字符是 '?'，当做伪注释解析
      } else if (s[1] === '?') {
        node = parseBogusComment(context)
      } else {
        // 都不是这些情况，则报出第一个字符不是合法标签字符的错误。
        emitError(context, ErrorCodes.INVALID_FIRST_CHARACTER_OF_TAG_NAME, 1)
      }
    }
  }
  
  // 如果上述的情况解析完毕后，没有创建对应的节点，则当做文本来解析
  if (!node) {
    node = parseText(context, mode)
  }
  
  // 如果节点是数组，则遍历添加进 nodes 数组中，否则直接添加
  if (isArray(node)) {
    for (let i = 0; i < node.length; i++) {
      pushNode(nodes, node[i])
    }
  } else {
    pushNode(nodes, node)
  }
}

Parse the template element Element

In the while loop, in each branch judgment branch, we can see that node will receive the return value of the analytical function of various node types. And here I will talk about the parseElement function of parsing elements in detail, because this is the most frequently used scenario in our templates.

Let me streamline the source code of parseElement and paste it up, and then talk about the logic inside.

function parseElement(
  context: ParserContext,
  ancestors: ElementNode[]
): ElementNode | undefined {
  // 解析起始标签
  const parent = last(ancestors)
  const element = parseTag(context, TagType.Start, parent)
  
  // 如果是自闭合的标签或者是空标签，则直接返回。voidTag例如： `<img>`, `<br>`, `<hr>`
  if (element.isSelfClosing || context.options.isVoidTag(element.tag)) {
    return element
  }

  // 递归的解析子节点
  ancestors.push(element)
  const mode = context.options.getTextMode(element, parent)
  const children = parseChildren(context, mode, ancestors)
  ancestors.pop()

  element.children = children

  // 解析结束标签
  if (startsWithEndTagOpen(context.source, element.tag)) {
    parseTag(context, TagType.End, parent)
  } else {
    emitError(context, ErrorCodes.X_MISSING_END_TAG, 0, element.loc.start)
    if (context.source.length === 0 && element.tag.toLowerCase() === 'script') {
      const first = children[0]
      if (first && startsWith(first.loc.source, '<!--')) {
        emitError(context, ErrorCodes.EOF_IN_SCRIPT_HTML_COMMENT_LIKE_TEXT)
      }
    }
  }
  // 获取标签位置对象
  element.loc = getSelection(context, element.loc.start)

  return element
}

First, we will get the parent node of the current node, and then call the parseTag function to parse.

The execution of the parseTag function is roughly as follows:

Match the tag name first.
Parse the attribute attribute in the element and store it in the props attribute
Check if there is a v-pre instruction, if so, modify the inVPre attribute in the context to true
Detect the self-closing label, if it is self-closing, set the isSelfClosing attribute to true
Determine the tagType, is it ELEMENT element or COMPONENT component, or SLOT slot
Return the generated element object

Due to space reasons, I will not post the source code of parseTag here, and interested students can check it by themselves.

After obtaining the element object, it will determine whether the element is a self-closing label or an empty label, such as <img> , <br> , <hr> , and if this is the case, the element object is returned directly.

Then we will try to parse the child nodes of the element, push the element onto the stack, and then recursively call parseChildren to parse the child nodes.

const parent = last(ancestors)

Looking back at the line of code in parseChildren and parseElement, you can find that after the element is pushed onto the stack, the parent node we get is the current node. After the parsing is complete, call ancestors.pop() the element object of the currently parsed child node, and assign the parsed children object to the children property of the element to complete the child node parsing of the element. Here is a very clever design.

Finally, the end tag is matched, the loc location information of the element is set, and the parsed element object is returned.

Example: template element parsing

Please see the template we want to parse below. The picture shows the storage situation of the node stack after parsing during the parsing process.

<div>
  <p>Hello World</p>
</div>

parseElement

The yellow rectangle in the figure is a stack. When parsing begins, parseChildren first encounters the div tag and starts to call the parseElement function. The div element is parsed through the parseTag function and pushed onto the stack to recursively parse the child nodes. The parseChildren function is called for the second time, and when the p element is encountered, the parseElement function is called to push the p label onto the stack. At this time, there are two labels, div and p, in the stack. Parse the child nodes in p again, and call the parseChildren label for the third time. This time, no label will be matched and the corresponding node will not be generated. Therefore, the parseText function will be used to generate text, and the node will be parsed as HelloWorld, and the node will be returned.

After adding this text type node to the children attribute of the p tag, the child nodes of the p tag are now parsed, the ancestor stack is popped up, and the end tag is parsed, the element object corresponding to the p tag is returned.

The node node corresponding to the p label is generated, and the corresponding node is returned in the parseChildren function.

After the div tag receives the node of the p tag, it is added to its own children attribute and popped out of the stack. The ancestor's stack is now empty. After the div tag completes the logic of closing and parsing, it returns the element element.

Finally, the first call of parseChildren returns the result, the node object corresponding to the div is generated, and the result is also returned. This result is passed in as the children parameter of the createRoot function to generate the root node object and complete the ast parsing.

postscript

In this article, we analyze the baseParse function called when ast is generated, and then baseParse returns the result of the call to createRoot, until we explain in detail the execution process of one of the specific parsers in the parseChildren parsing child node function. Finally, through a simple template example, to see how Vue's parser analyzes and analyzes the situation in the ancestor stack, a more comprehensive explanation of the work flow of the parser.

If this article can help you understand the workflow of the parser in Vue3, I hope I can like the article. ❤️

Vue3 source code analysis (2): AST parser

Generate AST abstract syntax tree

Create the root node of the AST

Resolve child nodes

Parse the template element Element

Example: template element parsing

postscript

Originalix

引用和评论

Vue3 源码解析（十）：watch 的实现原理

Vue.js-Vue实例

2025年最新反编译微信小程序的教程及工具

你可能不知道的图片加载相关知识

手写一个动态海洋和天空效果的vue hooks

原生JS大揭秘—JS代码执行原理解刨

使用CSS给标题添加书名号并超出省略