Node.js application full link tracking technology-[full link information acquisition]

The two core elements of the full link tracking technology are full link information acquisition and full link information storage display .

The Node.js application is no exception, here will be divided into two articles to introduce; first article introduces Node.js application full link information acquisition, the second article introduces Node.js application full link information storage display .

1. Node.js application full link tracking system

In the current industry, without considering Serverless, the mainstream Node.js architecture design mainly has the following two solutions:

General architecture: only ssr and bff, not servers and microservices;
Full scene architecture: including ssr, bff, server, and microservices.

The structure description diagram corresponding to the above two schemes is shown in the following figure:

In the above two general architectures, nodejs will face a problem, that is:

When the request link is getting longer and more services are called, and it also contains various microservice calls, the following demands appear:

How to quickly define the problem when an exception occurs in the request;
How to quickly find out the reason for the slow response when the request response is slow;
How to quickly locate the root cause of the problem through the log file.

To solve the above demands, we need a technology that aggregates the key information of each request and connects all the request links in series. Let us know how many service and microservice requests are called in a request, and in which request context a certain service or microservice is called.

This technology is the full link tracking of Node.js applications. It is an indispensable technical guarantee for Node.js in complex server-side business scenarios.

In summary, we need Node.js application full-link tracking. After talking about why it is necessary, the following will introduce how to obtain full-link information of Node.js application.

2. Full link information acquisition

The acquisition of full-link information is the most important part of the full-link tracking technology. Only when the full link information acquisition is completed, will there be a subsequent storage display process.

For multi-threaded languages such as Java and Python, the full link information acquisition is assisted by a thread context such as ThreadLocal. For Node.js, due to the single-threaded and IO callback-based way to complete asynchronous operations, there is a problem of natural acquisition difficulty in obtaining full-link information. So how to solve this problem?

3. Industry solutions

Because Node.js is single-threaded, the design idea of non-blocking IO. In terms of full link information acquisition, so far, there are mainly the following 4 solutions:

domain: node api；
zone.js: Angular community product;
Explicit delivery: manual delivery, middleware mounting;
Async Hooks：node api；

Among the above four schemes, domain has been abandoned due to serious memory leaks; the implementation of zone.js is very violent, the API is relatively obscure, and the most critical disadvantage is that monkey patch can only mock api, not mock language; explicit The transmission is too cumbersome and intrusive; in a comprehensive comparison, the best solution is the fourth solution, which has the following advantages:

A new core module added by node 8.x, which is also used by the official maintainers of Node, and there is no memory leak;
It is very suitable for realizing implicit link tracking, with little intrusion, and the optimal solution for the current implicit tracking;
Provide API to track the life cycle of asynchronous resources in node;
Use async_hook to realize the context relationship;

Having said the advantages, let's introduce how to obtain full link information through Async Hooks.

Four, Async Hooks [asynchronous hooks]

4.1 Async Hooks concept

Async Hooks is a newly added core module of Node.js v8.x. It provides APIs to track the life cycle of asynchronous resources in Node.js, which can help us correctly track the processing logic and relationships of asynchronous calls. In the code, just write import asyncHook from'async\_hooks' to import the async\_hooks module.

summarized in one sentence: async_hooks is used to track the life cycle of asynchronous resources in Node.js.

The current stable version of Node.js is v14.17.0. Let's look at the api differences of different versions of Async Hooks through a picture. As shown below:

It can be seen from the figure that the api has changed a lot. This is because from version 8 to version 14, async_hooks is still Stability: 1-Experimental

Stability: 1-Experimental : This feature is still under development, and will not be backward compatible when changed in the future, and may even be removed. It is not recommended to use this feature in a production environment.

But it doesn't matter, trust the official team, here our full link information acquisition solution is implemented based on Node v9.x version api. For the introduction and basic use of Async Hooks api, you can read the official document, and the understanding of the core knowledge will be explained below.

Below we will systematically introduce the design and implementation of the full link information acquisition scheme based on Async Hooks, which are collectively referred to as zone-context hereinafter.

4.2 Understand the core knowledge of async_hooks

Before introducing zone-context, it is necessary to have a correct understanding of the core knowledge of async_hooks. Here is a summary of the following 6 points:

Every function (whether asynchronous or synchronous) provides a context, which we call async scope. This recognition is very important for understanding async_hooks;
There is an asyncId in each async scope, which is the symbol of the current async scope. The asyncId in the same async scope must be the same. When each asynchronous resource is created, the asyncId is automatically incremented and is globally unique;
There is a triggerAsyncId in each async scope, which is used to indicate which async scope triggered the current function;
Through asyncId and triggerAsyncId, we can trace the entire asynchronous call relationship and link, which is the core of full link tracking;
Use the async_hooks.createHook function to register the listener functions for init and other related events that occur in the life cycle of each asynchronous resource;
The same async scope may be called and executed multiple times. No matter how many times it is executed, its asyncId must be the same. Through the monitoring function, we can easily track the number of executions, time and context of its execution.

The above 6 points of knowledge are very important for understanding async\_hooks. It is precisely because of these characteristics that async\_hooks can excellently complete the full link information acquisition of Node.js applications.

At this point, the design and implementation of zone-context will be introduced below, please read it down with me.

Five, zone-context

5.1 Architecture design

The overall architecture design is shown in the figure below:

The core logic is as follows: After the asynchronous resource (call) is created, it will be monitored by async_hooks. After listening, process the acquired asynchronous resource information, integrate it into the required data structure, and store the data in the invoke tree after integration. When the asynchronous resource ends, the gc operation is triggered to delete and recycle data that is no longer useful in the invoke tree.

From the above core logic, we can know that this architecture design needs to implement the following three functions:

Asynchronous resource (call) monitoring
invoke tree
gc

Now let's introduce the realization of the above three functions one by one.

5.2 Asynchronous resource (call) monitoring

monitor asynchronous calls?

The async_hooks (tracking the life cycle of Node.js asynchronous resources) code used here is implemented as follows:

asyncHook
  .createHook({
    init(asyncId, type, triggerAsyncId) {
      // 异步资源创建（调用）时触发该事件
    },
  })
  .enable()

Did you find that the implementation of this function is very simple? Yes, you can track all asynchronous operations.

In understanding the core knowledge of async_hooks, we mentioned that the entire asynchronous call relationship and link can be traced through asyncId and triggerAsyncId. Now, when you look at the parameters in init, you will find that both asyncId and triggerAsyncId exist, and they are passed implicitly and do not need to be passed in manually. In this way, we can get these two values in the init event every time we make an asynchronous call. The realization of the invoke tree function is inseparable from these two parameters.

After introducing asynchronous call monitoring, the implementation of invoke tree will be introduced below.

5.3 Combination of invoke tree design and asynchronous call monitoring

5.3.1 Design

The overall design idea of invoke tree is shown in the figure below:

The specific code is as follows:

interface ITree {  [key: string]: {    // 调用链路上第一个异步资源asyncId    rootId: number    // 异步资源的triggerAsyncId    pid: number    // 异步资源中所包含的异步资源asyncId    children: Array<number>  }} const invokeTree: ITree = {}

Create a large object invokeTree, each attribute represents a complete invocation link of an asynchronous resource. The key and value of the attribute represent the following meanings:

The key of the attribute is the asyncId representing this asynchronous resource.
The value of the attribute represents all the link information aggregation objects that this asynchronous resource passes through. Please see the comments in the above code for the meaning of each attribute in the object.

Through this design, you can get the key information of any asynchronous resource in the entire request link. Collect the root node context.

5.3.2 Combined with asynchronous call monitoring

Although the invoke tree is designed. But how to associate asyncId, triggerAsyncId and invokeTree in the init event of asynchronous call monitoring?

code show as below:

asyncHook
  .createHook({
    init(asyncId, type, triggerAsyncId) {
      // 寻找父节点
      const parent = invokeTree[triggerAsyncId]
      if (parent) {
        invokeTree[asyncId] = {
          pid: triggerAsyncId,
          rootId: parent.rootId,
          children: [],
        }
        // 将当前节点asyncId值保存到父节点的children数组中
        invokeTree[triggerAsyncId].children.push(asyncId)
      }
    }
  })
  .enable()

Look at the above code, the entire code roughly has the following steps:

When an asynchronous call is monitored, it will first go to the invokeTree object to find out whether it contains an attribute whose key is triggerAsyncId;
If yes, it means that the asynchronous call is in the tracking link, and the storage operation is performed. The asyncId is regarded as the key. The attribute value is an object, which contains three attributes, namely pid, rootId, and children. The specific meaning has been mentioned above. pass;
If not, it means that the asynchronous call is not in the tracking link. No operation is performed, such as storing the data in the invokeTree object;
Store the current asynchronous call asyncId into the children property of triggerAsyncId in invokeTree.

So far, the design of the invoke tree and how to combine asynchronous call monitoring have been introduced. The design and implementation of gc function will be introduced below.

5.4 gc

5.4.1 Purpose

We know that the number of asynchronous calls is very large. If gc operations are not performed, the invoke tree will become larger and larger, and the memory of the node application will be gradually filled with these data, so the invoke tree needs to be garbage collected.

5.4.2 Design

gc is mainly as follows : When the asynchronous resource ends, garbage collection is triggered, looking for all asynchronous resources triggered by this asynchronous resource, and then searching recursively according to this logic until all recoverable asynchronous resources are found.

Not much to say, just go to the code directly, the gc code is as follows:

interface IRoot {
  [key: string]: Object
}
 
// 收集根节点上下文
const root: IRoot = {}
 
function gc(rootId: number) {
  if (!root[rootId]) {
    return
  }
 
  // 递归收集所有节点id
  const collectionAllNodeId = (rootId: number) => {
    const {children} = invokeTree[rootId]
    let allNodeId = [...children]
    for (let id of children) {
      // 去重
      allNodeId = [...allNodeId, ...collectionAllNodeId(id)]
    }
    return allNodeId
  }
 
  const allNodes = collectionAllNodeId(rootId)
 
  for (let id of allNodes) {
    delete invokeTree[id]
  }
 
  delete invokeTree[rootId]
  delete root[rootId]
}

gc core logic: use collectionAllNodeId to recursively find all recyclable asynchronous resources (id ). Then delete the attributes keyed by these ids in invokeTree. Finally delete the root node.

Everyone has seen the declaration object root. What is this?

Root is actually a root node object that we set when we listen to an asynchronous call. This node object can manually pass in some link information, so that other tracking information can be added to the full link tracking, such as error information and time-consuming Time and so on.

5.5 Everything is in place, only the east wind is owed

Our asynchronous event listener is designed, invoke tree is designed, and gc is also designed. So how to connect them in series? For example, if we want to monitor an asynchronous resource, how can we combine the invoke tree with the asynchronous resource?

Three functions are needed here to complete the combination, which are ZoneContext , setZoneContext , getZoneContext . Let's introduce these three functions one by one:

5.5.1 ZoneContext

This is a factory function used to create an asynchronous resource instance, the code is as follows:

// 工厂函数
async function ZoneContext(fn: Function) {
  // 初始化异步资源实例
  const asyncResource = new asyncHook.AsyncResource('ZoneContext')
  let rootId = -1
  return asyncResource.runInAsyncScope(async () => {
    try {
      rootId = asyncHook.executionAsyncId()
      // 保存 rootId 上下文
      root[rootId] = {}
      // 初始化 invokeTree
      invokeTree[rootId] = {
        pid: -1, // rootId 的 triggerAsyncId 默认是 -1
        rootId,
        children: [],
      }
      // 执行异步调用
      await fn()
    } finally {
      gc(rootId)
    }
  })
}

You will find that in this function, there is such a line of code:

const asyncResource = new asyncHook.AsyncResource('ZoneContext')

line of code mean?

It means that we have created an asynchronous resource instance named ZoneContext, and the asynchronous resource can be controlled more finely through the property method of this instance.

What is the use of the asyncResource.runInAsyncScope method?

Call the runInAsyncScope method of this instance, and wrap the incoming asynchronous call in the runInAsyncScope method. It can be guaranteed that under the asynchronous scope of this resource (fn ), the executed code can be traced to the invokeTree we set to achieve more fine-grained control of asynchronous calls. After the execution, the gc call is made to complete the memory recovery.

5.5.2 setZoneContext

Used to set additional tracking information for asynchronous calls. code show as below:

function setZoneContext(obj: Object) {
  const curId = asyncHook.executionAsyncId()
  let root = findRootVal(curId)
  Object.assign(root, obj)
}

Assign the passed obj to the root object through Object.assign(root, obj), and the key is the property of curId. In this way, we can set the information we want to track for the asynchronous calls we want to track.

5.5.3 getZoneContext

Used to get the attribute value of rootId that is asynchronously adjusted. code show as below:

function findRootVal(asyncId: number) {
  const node = invokeTree[asyncId]
  return node ? root[node.rootId] : null
}
function getZoneContext() {
  const curId = asyncHook.executionAsyncId()
  return findRootVal(curId)
}

Pass in asyncId to the findRootVal function to get the attribute value of the root object whose key is rootId. In this way, we can get the information we want to track and complete a closed loop.

So far, we have explained the core design and implementation of Node.js application full-link information acquisition. The logic is a bit abstract, and more thinking and understanding are needed to have a deeper grasp of the acquisition of full-link tracking information.

Finally, we use the design and implementation of this full link tracking to show a tracking demo.

5.6 Use zone-context

5.6.1 Determine the nesting relationship of asynchronous calls

In order to better explain the nesting relationship of asynchronous calls, we have simplified it here, and no invoke tree is output. The example code is as follows:

// 对异步调用A函数进行追踪
ZoneContext(async () => {
  await A()
})
 
// 异步调用A函数中执行异步调用B函数
async function A() {
  // 输出 A 函数的 asyncId
  fs.writeSync(1, `A 函数的 asyncId -> ${asyncHook.executionAsyncId()}\n`)
  Promise.resolve().then(() => {
    // 输出 A 函数中执行异步调用时的 asyncId
    fs.writeSync(1, `A 执行异步 promiseC 时 asyncId 为 -> ${asyncHook.executionAsyncId()}\n`)
    B()
  })
}
 
// 异步调用B函数中执行异步调用C函数
async function B() {
  // 输出 B 函数的 asyncId
  fs.writeSync(1, `B 函数的 asyncId -> ${asyncHook.executionAsyncId()}\n`)
  Promise.resolve().then(() => {
    // 输出 B 函数中执行异步调用时的 asyncId
    fs.writeSync(1, `B 执行异步 promiseC 时 asyncId 为 -> ${asyncHook.executionAsyncId()}\n`)
    C()
  })
}
 
// 异步调用C函数
function C() {
  const obj = getZoneContext()
  // 输出 C 函数的 asyncId
  fs.writeSync(1, `C 函数的 asyncId -> ${asyncHook.executionAsyncId()}\n`)
  Promise.resolve().then(() => {
    // 输出 C 函数中执行异步调用时的 asyncId
    fs.writeSync(1, `C 执行异步 promiseC 时 asyncId 为 -> ${asyncHook.executionAsyncId()}\n`)
  })
}

The output is:

A 函数的 asyncId -> 3
A 执行异步 promiseA 时 asyncId 为 -> 8
B 函数的 asyncId -> 8
B 执行异步 promiseB 时 asyncId 为 -> 13
C 函数的 asyncId -> 13
C 执行异步 promiseC 时 asyncId 为 -> 16

The following information can be derived by just looking at the output results:

After the A function performs an asynchronous call, the asyncId is 8 and the asyncId of the B function is 8, which shows that the B function is called by the A function;
After the B function performs an asynchronous call, the asyncId is 13 and the asyncId of the C function is 13, which shows that the C function is called by the B function;
After the C function performs an asynchronous call, the asyncId is 16, and there are no other functions whose asyncId is 16, which means that no other functions are called in the C function;
Based on the above three points, we can know that the asynchronous call nesting relationship of this link is: A —> B -> C;

At this point, we can clearly and quickly know who was called by whom, and who called whom.

5.6.2 Additional settings for tracking information

On the basis of the above example code, add the following code:

ZoneContext(async () => {
  const ctx = { msg: '全链路追踪信息', code: 1 }
  setZoneContext(ctx)
  await A()
})
 
function A() {
  // 代码同上个demo
}
 
function B() {
  // 代码同上个demo
  D（）
}
 
// 异步调用C函数
function C() {
  const obj = getZoneContext()
  Promise.resolve().then(() => {
    fs.writeSync(1, `getZoneContext in C -> ${JSON.stringify(obj)}\n`)
  })
}
 
// 同步调用函数D
function D() {
  const obj = getZoneContext()
  fs.writeSync(1, `getZoneContext in D -> ${JSON.stringify(obj)}\n`)
}

Output the following: Error in rendering code macro: parameter

The value of'com.atlassian.confluence.ext.code.render.InvalidValueException' is invalid.

getZoneContext in D -> {"msg":"Full link tracking information","code":1}

getZoneContext in C-> {"msg":"Full link tracking information","code":1}

It can be found that after executing the tracking information set before the A function, the A function is called, the B function is called in the A function, and the C function and the D function are called in the B function. In the C function and D function, you can access the set tracking information.

This shows that when locating and analyzing nested asynchronous call problems, the key tracking information of the top-level setting can be obtained through getZoneContext. It can be quickly traced back to the exception of a nested asynchronous call,

It is caused by an asynchronous call exception at the top level.

5.6.3 Invoke tree with large and complete tracking information

The example code is as follows:

ZoneContext(async () => {
  await A()
})
async function A() {
  Promise.resolve().then(() => {
    fs.writeSync(1, `A 函数执行异步调用时的 invokeTree -> ${JSON.stringify(invokeTree)}\n`)
    B()
  })
}
async function B() {
  Promise.resolve().then(() => {
    fs.writeSync(1, `B 函数执行时的 invokeTree -> ${JSON.stringify(invokeTree)}\n`)
  })
}

The output is as follows:

A 函数执行异步调用时的 invokeTree -> {"3":{"pid":-1,"rootId":3,"children":[5,6,7]},"5":{"pid":3,"rootId":3,"children":[10]},"6":{"pid":3,"rootId":3,"children":[9]},"7":{"pid":3,"rootId":3,"children":[8]},"8":{"pid":7,"rootId":3,"children":[]},"9":{"pid":6,"rootId":3,"children":[]},"10":{"pid":5,"rootId":3,"children":[]}}
 
B 函数执行异步调用时的 invokeTree -> {"3":{"pid":-1,"rootId":3,"children":[5,6,7]},"5":{"pid":3,"rootId":3,"children":[10]},"6":{"pid":3,"rootId":3,"children":[9]},"7":{"pid":3,"rootId":3,"children":[8]},"8":{"pid":7,"rootId":3,"children":[11,12]},"9":{"pid":6,"rootId":3,"children":[]},"10":{"pid":5,"rootId":3,"children":[]},"11":{"pid":8,"rootId":3,"children":[]},"12":{"pid":8,"rootId":3,"children":[13]},"13":{"pid":12,"rootId":3,"children":[]}}

According to the output results, the following information can be derived:

1. The rootId (initial asyncId, which is also the value of the top node) of this asynchronous call link is 3

2. When the function performs an asynchronous call, the call link is shown in the figure below:

3. When the function performs an asynchronous call, the call link is shown in the following figure:

The interrelationship and sequence between all asynchronous calls can be clearly seen from the call link diagram. Provides strong technical support for troubleshooting and performance analysis of asynchronous calls.

Six, summary

At this point, the design, implementation, and case demonstration for obtaining full-link information from Node.js applications have been introduced. The acquisition of full-link information is the most important part of the full-link tracking system. When the information is acquired, the next step is to store and display the full-link information.

In the next article, I will explain how to store and display the acquired information professionally and friendly based on the OpenTracing open source protocol.

Author: vivo Internet front-end team-Yang Kun