Development of front-end error and performance monitoring SDK based on typescript

Front-end error monitoring and performance data often have a very important impact on the stability of the business. Even if we are very careful in the development stage, it is inevitable that there will be anomalies in the online environment, and we often realize the anomalies in the online environment. The performance data of the page is related to the user experience, so it is also very important to collect the performance data of the page.

At present, third-party complete solutions include sentry abroad, and fundebug and frontjs in China. They provide SDK and data services for front-end access, and then there is a certain free quota. If you exceed it, you need to use a paid plan. The front-end SDK user monitors the abnormality and performance of the client. The back-end service user can create applications, and each application is assigned an APPKEY, and then the SDK completes automatic reporting.

This article does not consider data services, only analyzes the front-end monitoring, talks about how the web monitors and collects these data, and integrates these functions through TS to make a set of front-end monitoring SDK.

Since we need to collect data, we need to clarify what data may be needed. At present, there are some data as follows:

Page fault data
Page resource loading
Page performance data
Interface data
Mobile phone, browser data
Page access data
User behavior data
...

Let's analyze how to obtain these data:

Page fault data

window.onerror AOP ability to catch exceptions Whether asynchronous or non-asynchronous errors, onerror can catch runtime errors.
window.onerror cannot capture page resource loading errors, but resource loading errors can be window.addEventListener in the capture phase. Since addEventListener can also catch js errors, filtering is needed to avoid repeated triggering of event hooks
window.onerror cannot catch unhandled exceptions in the Promise task, and can be caught unhandledrejection

Page resource loading exception

window.addEventListener(
  "error",
  function (event) {
    const target: any = event.target || event.srcElement;
    const isElementTarget =
      target instanceof HTMLScriptElement ||
      target instanceof HTMLLinkElement ||
      target instanceof HTMLImageElement;
    if (!isElementTarget) return false;

    const url = target.src || target.href;
    onResourceError?.call(this, url);
  },
  true
);

Page logic and uncaught promise exception

 const oldOnError = window.onerror;
 const oldUnHandleRejection = window.onunhandledrejection;

 window.onerror = function (...args) {
   if (oldOnError) {
     oldOnError(...args);
   }

   const [msg, url, line, column, error] = args;
   onError?.call(this, {
     msg,
     url,
     line,
     column,
     error
   });
 };

 window.onunhandledrejection = function (e: PromiseRejectionEvent) {
   if (oldUnHandleRejection) {
     oldUnHandleRejection.call(window, e);
   }

   onUnHandleRejection && onUnHandleRejection(e);
 };

In Vue, we should use Vue.config.errorHandler = function(err, vm, info) {}; for exception capture, so that we can get more context information.

For React, React 16 provides a built-in function componentDidCatch, which can be used to easily get the error information under React

componentDidCatch(error, info) {
    console.log(error, info);
}

Page performance data

Usually we will pay attention to the following performance indicators:

White screen time: the time from when the browser enters the address and press Enter to when the page starts to have content;
First screen time: the time from when the browser enters the address and press Enter to when the first screen content is rendered;
User-operable time node: domready triggers the node, and the click event responds;
Total download time: the trigger node of window.onload.

Blank screen time

The white screen time node refers to the time when the user enters the website (input url, refresh, jump, etc.), until the page has content displayed.
This process includes dns query, establishment of tcp connection, sending the first http request (if https is used, the verification time of TLS is also involved), returning the html document, and parsing the head of the html document is completed.

First screen time

The statistics of the first screen time are more complicated because it involves multiple elements such as pictures and asynchronous rendering. Observing the loading view, we can find that the main factor affecting the first screen is the loading . The time when the first screen rendering is completed can be obtained by counting the loading time of the picture in the first screen.

It is also necessary to determine the loading time when the page has an iframe
GIF pictures may repeatedly trigger the load event on IE and need to be excluded
In the case of asynchronous rendering, the first screen should be calculated after the data is inserted asynchronously.
Css important background images can be counted by JS requesting the image url (the browser will not load repeatedly)
If there is no picture, take the JS execution time as the first screen, which means that the text appears time

User operable time

The time when the DOM is parsed, the DomReady time can be counted, because the event is usually bound at this point in time

The method for obtaining performance data on the web is very simple, just use the Performance interface that comes with the browser

Page performance data collection

Performance interface can obtain performance-related information in the current page. It is part of the High Resolution Time API and also integrates the Performance Timeline API, Navigation Timing API, User Timing API, and Resource Timing API.

It can be seen from the figure that many indicators appear in pairs. Here we directly calculate the difference to find the time consumption of the key nodes in the corresponding page loading process. Here we introduce a few of the more commonly used ones, such as:

const timingInfo = window.performance.timing;

// DNS解析，DNS查询耗时
timingInfo.domainLookupEnd - timingInfo.domainLookupStart;

// TCP连接耗时
timingInfo.connectEnd - timingInfo.connectStart;

// 获得首字节耗费时间，也叫TTFB
timingInfo.responseStart - timingInfo.navigationStart;

// *: domReady时间(与DomContentLoad事件对应)
timingInfo.domContentLoadedEventStart - timingInfo.navigationStart;

// DOM资源下载
timingInfo.responseEnd - timingInfo.responseStart;

// 准备新页面时间耗时
timingInfo.fetchStart - timingInfo.navigationStart;

// 重定向耗时
timingInfo.redirectEnd - timingInfo.redirectStart;

// Appcache 耗时
timingInfo.domainLookupStart - timingInfo.fetchStart;

// unload 前文档耗时
timingInfo.unloadEventEnd - timingInfo.unloadEventStart;

// request请求耗时
timingInfo.responseEnd - timingInfo.requestStart;

// 请求完毕至DOM加载
timingInfo.domInteractive - timingInfo.responseEnd;

// 解释dom树耗时
timingInfo.domComplete - timingInfo.domInteractive;

// *：从开始至load总耗时
timingInfo.loadEventEnd - timingInfo.navigationStart;

// *: 白屏时间
timingInfo.responseStart - timingInfo.fetchStart;

// *: 首屏时间
timingInfo.domComplete - timingInfo.fetchStart;

Interface data

Interface data mainly includes interface time-consuming and interface request exceptions. Time-consuming can be calculated through time statistics during the interception of XmlHttpRequest and fetch requests, and exceptions can be judged through the readyState and status attributes of xhr.

XmlHttpRequest interception: modify the prototype of XMLHttpRequest, enable event monitoring when sending a request, and inject SDK hooks
Five ready states of XMLHttpRequest.readyState:

0: The request is not initialized (open() has not been called).
1: The request has been established, but has not been sent yet (send() has not been called).
2: The request has been sent and is being processed (usually the content header can now be obtained from the response).
3: The request is being processed; usually part of the data is available in the response, but the server has not completed the generation of the response.
4: The response is complete; you can get and use the server's response.

XMLHttpRequest.prototype.open = function (method: string, url: string) {
  // ...省略
  return open.call(this, method, url, true);
};
XMLHttpRequest.prototype.send = function (...rest: any[]) {
  // ...省略
  const body = rest[0];

  this.addEventListener("readystatechange", function () {
    if (this.readyState === 4) {
      if (this.status >= 200 && this.status < 300) {
        // ...省略
      } else {
        // ...省略
      }
    }
  });
  return send.call(this, body);
};

Fetch interception: Object.defineProperty

Object.defineProperty(window, "fetch", {
  configurable: true,
  enumerable: true,
  get() {
    return (url: string, options: any = {}) => {
      return originFetch(url, options)
        .then((res) => {
            // ...
        })
    };
  }
});

Mobile phone, browser data

Get the analysis through navigatorAPI, use the third-party package mobile-detect help us get the analysis

Page access data

Global data adds url, page title, user identification, SDK can automatically assign a random user label as an identification to the web session to identify a single user

User behavior data

Mainly include the user clicks on the page element, console information, and the user's mouse movement track.

User clicks on the element: window event proxy
Console information: rewrite console
User mouse movement track: third-party library rrweb

The following is a unified monitoring SDK design for these data

SDK development

In order to better decouple the modules, I decided to use event-based subscriptions. The entire SDK is divided into several core modules. Due to the use of ts development and the code will maintain good naming conventions and semantics, it will only be available in key places. Note, see the Github repository at the end of the article for the complete code implementation.

class: WebMonitor: core monitoring class
class: AjaxInterceptor: intercept ajax request
class: ErrorObserver: monitor global errors
class: FetchInterceptor: intercept fetch requests
class: Reporter: report
class: Performance: monitor performance data
class: RrwebObserver: access to rrweb to obtain user behavior trajectories
class: SpaHandler: processing for SPA applications
util: DeviceUtil: auxiliary function for obtaining device information
event: event center

Events provided by the SDK

Externally exposed events, _ starts with internal events in the framework

export enum TrackerEvents {
  // 对外暴露事件
  performanceInfoReady = "performanceInfoReady",  // 页面性能数据获取完毕
  reqStart = "reqStart",  // 接口请求开始
  reqEnd = "reqEnd",   // 接口请求完成
  reqError = "reqError",  // 请求错误
  jsError = "jsError",  // 页面逻辑异常
  vuejsError = "vuejsError",  // vue错误监控事件
  unHandleRejection = "unHandleRejection",  // 未处理promise异常
  resourceError = "resourceError",  // 资源加载错误
  batchErrors = "batchErrors",  // 错误合并上报事件，用户合并上报请求节省请求数量
  mouseTrack = "mouseTrack",  //  用户鼠标行为追踪
}

How to use

import { WebMonitor } from "femonitor-web";
const monitor = Monitor.init();
/* Listen single event */
monitor.on([event], (emitData) => {});
/* Or Listen all event */
monitor.on("event", (eventName, emitData) => {})

Core module analysis

WebMonitor、errorObserver、ajaxInterceptor、fetchInterceptor、performance

WebMonitor

Integrate other classes of the framework, deepmerge the incoming configuration and the default configuration, and initialize according to the configuration

this.initOptions(options);

this.getDeviceInfo();
this.getNetworkType();
this.getUserAgent();

this.initGlobalData(); // 设置一些全局的数据，在所有事件中globalData中都会带上
this.initInstances();
this.initEventListeners();

API

Support chain operation

on: listen for events
off: remove the event
useVueErrorListener: Use Vue error monitoring to obtain more detailed component data
changeOptions: modify configuration
configData: Set global data

errorObserver

Monitor window.onerror and window.onunhandledrejection, and parse err.message to obtain the error data that you want to emit.

window.onerror = function (...args) {
  // 调用原始方法
  if (oldOnError) {
    oldOnError(...args);
  }

  const [msg, url, line, column, error] = args;

  const stackTrace = error ? ErrorStackParser.parse(error) : [];
  const msgText = typeof msg === "string" ? msg : msg.type;
  const errorObj: IError = {};

  myEmitter.customEmit(TrackerEvents.jsError, errorObj);
};

window.onunhandledrejection = function (error: PromiseRejectionEvent) {
  if (oldUnHandleRejection) {
    oldUnHandleRejection.call(window, error);
  }

  const errorObj: IUnHandleRejectionError = {};
  myEmitter.customEmit(TrackerEvents.unHandleRejection, errorObj);
};

window.addEventListener(
  "error",
  function (event) {
    const target: any = event.target || event.srcElement;
    const isElementTarget =
      target instanceof HTMLScriptElement ||
      target instanceof HTMLLinkElement ||
      target instanceof HTMLImageElement;
    if (!isElementTarget) return false;

    const url = target.src || target.href;

    const errorObj: BaseError = {};
    myEmitter.customEmit(TrackerEvents.resourceError, errorObj);
  },
  true
);

ajaxInterceptor

Intercept ajax requests and trigger custom events. Rewrite the open and send methods of XMLHttpRequest

XMLHttpRequest.prototype.open = function (method: string, url: string) {
  const reqStartRes: IAjaxReqStartRes = {
  };

  myEmitter.customEmit(TrackerEvents.reqStart, reqStartRes);
  return open.call(this, method, url, true);
};

XMLHttpRequest.prototype.send = function (...rest: any[]) {
  const body = rest[0];
  const requestData: string = body;
  const startTime = Date.now();

  this.addEventListener("readystatechange", function () {
    if (this.readyState === 4) {
      if (this.status >= 200 && this.status < 300) {
        const reqEndRes: IReqEndRes = {};

        myEmitter.customEmit(TrackerEvents.reqEnd, reqEndRes);
      } else {
        const reqErrorObj: IHttpReqErrorRes = {};
        
        myEmitter.customEmit(TrackerEvents.reqError, reqErrorObj);
      }
    }
  });
  return send.call(this, body);
};

fetchInterceptor

Intercept fetch and trigger custom events.

Object.defineProperty(window, "fetch", {
  configurable: true,
  enumerable: true,
  get() {
    return (url: string, options: any = {}) => {
      const reqStartRes: IFetchReqStartRes = {};
      myEmitter.customEmit(TrackerEvents.reqStart, reqStartRes);

      return originFetch(url, options)
        .then((res) => {
          const status = res.status;
          const reqEndRes: IReqEndRes = {};

          const reqErrorRes: IHttpReqErrorRes = {};

          if (status >= 200 && status < 300) {
            myEmitter.customEmit(TrackerEvents.reqEnd, reqEndRes);
          } else {
            if (this._url !== self._options.reportUrl) {
              myEmitter.customEmit(TrackerEvents.reqError, reqErrorRes);
            }
          }

          return Promise.resolve(res);
        })
        .catch((e: Error) => {
          const reqErrorRes: IHttpReqErrorRes = {};
          myEmitter.customEmit(TrackerEvents.reqError, reqErrorRes);
        });
    };
  }
});

performance

Get page performance through Performance, emit event after performance data is complete

const {
  domainLookupEnd,
  domainLookupStart,
  connectEnd,
  connectStart,
  responseEnd,
  requestStart,
  domComplete,
  domInteractive,
  domContentLoadedEventEnd,
  loadEventEnd,
  navigationStart,
  responseStart,
  fetchStart
} = this.timingInfo;

const dnsLkTime = domainLookupEnd - domainLookupStart;
const tcpConTime = connectEnd - connectStart;
const reqTime = responseEnd - requestStart;
const domParseTime = domComplete - domInteractive;
const domReadyTime = domContentLoadedEventEnd - fetchStart;
const loadTime = loadEventEnd - navigationStart;
const fpTime = responseStart - fetchStart;
const fcpTime = domComplete - fetchStart;

const performanceInfo: IPerformanceInfo<number> = {
  dnsLkTime,
  tcpConTime,
  reqTime,
  domParseTime,
  domReadyTime,
  loadTime,
  fpTime,
  fcpTime
};

myEmitter.emit(TrackerEvents.performanceInfoReady, performanceInfo);

See the Github warehouse address below for the complete SDK implementation. Star, fork, and issue are welcome.

Web front-end monitoring SDK: https://github.com/alex1504/femonitor-web