The principle analysis of some technical points of the front-end monitoring SDK

A complete front-end monitoring platform includes three parts: data collection and reporting, data sorting and storage, and data display.

This article is about the first link-data collection and reporting. The following figure is an outline of the content of this article, you can get a general understanding:

It is difficult to understand just looking at theoretical knowledge. For this reason, I wrote a simple monitoring SDK based on the technical points of this article. You can use it to write some simple demos to help deepen your understanding. Read it together with this article, the effect is better.

Performance data collection

The chrome development team has proposed a series of indicators for detecting web page performance:

FP (first-paint), the time from when the page is loaded to when the first pixel is drawn on the screen
FCP (first-contentful-paint), the time from when the page is loaded to when any part of the page content is rendered on the screen
LCP (largest-contentful-paint), the time from when the page is loaded to when the largest text block or image element is rendered on the screen
CLS (layout-shift), the cumulative score of all unexpected layout shifts that occurred during the period when the lifecycle state

These four performance indicators need to be obtained through PerformanceObserver (it can also be obtained through performance.getEntriesByName() , but it is not notified when the event is triggered). PerformanceObserver is a performance monitoring object used to monitor performance measurement events.

FP

FP (first-paint), the time from when the page is loaded to when the first pixel is drawn on the screen. In fact, it's okay to understand FP as a white screen time.

The measurement code is as follows:

const entryHandler = (list) => {        
    for (const entry of list.getEntries()) {
        if (entry.name === 'first-paint') {
            observer.disconnect()
        }

       console.log(entry)
    }
}

const observer = new PerformanceObserver(entryHandler)
// buffered 属性表示是否观察缓存数据，也就是说观察代码添加时机比事情触发时机晚也没关系。
observer.observe({ type: 'paint', buffered: true })

The content of FP can be obtained through the above code:

{
    duration: 0,
    entryType: "paint",
    name: "first-paint",
    startTime: 359, // fp 时间
}

Among them, startTime is the drawing time we want.

FCP

FCP (first-contentful-paint), the time from when the page is loaded to when any part of the page content is rendered on the screen. For this indicator, "content" refers to text, images (including background images), <svg> elements or non-white <canvas> elements.

In order to provide a good user experience, the FCP score should be controlled within 1.8 seconds.

Measurement code:

const entryHandler = (list) => {        
    for (const entry of list.getEntries()) {
        if (entry.name === 'first-contentful-paint') {
            observer.disconnect()
        }
        
        console.log(entry)
    }
}

const observer = new PerformanceObserver(entryHandler)
observer.observe({ type: 'paint', buffered: true })

The content of FCP can be obtained through the above code:

{
    duration: 0,
    entryType: "paint",
    name: "first-contentful-paint",
    startTime: 459, // fcp 时间
}

Among them, startTime is the drawing time we want.

LCP

LCP (largest-contentful-paint), the time from when the page is loaded to when the largest text block or image element is rendered on the screen. The LCP indicator will report the relative time image or text block visible in the visible area to first starts to load

A good LCP score should be controlled within 2.5 seconds.

Measurement code:

const entryHandler = (list) => {
    if (observer) {
        observer.disconnect()
    }

    for (const entry of list.getEntries()) {
        console.log(entry)
    }
}

const observer = new PerformanceObserver(entryHandler)
observer.observe({ type: 'largest-contentful-paint', buffered: true })

The content of LCP can be obtained through the above code:

{
    duration: 0,
    element: p,
    entryType: "largest-contentful-paint",
    id: "",
    loadTime: 0,
    name: "",
    renderTime: 1021.299,
    size: 37932,
    startTime: 1021.299,
    url: "",
}

Among them, startTime is the drawing time we want. element refers to the DOM element drawn by LCP.

The difference between FCP and LCP is: FCP is triggered as soon as any content is drawn, and LCP is triggered when the maximum content is rendered.

The types of elements examined by LCP are:

<img> element
<svg> element embedded in <image> element
<video> element (use cover image)
Elements with a background image loaded through the url() ) function (instead of using the CSS gradient
block-level element that contains text nodes or other sub-elements of inline-level text elements.

CLS

CLS (layout-shift), starting from the page load and its life cycle state becomes the cumulative score of all unexpected layout shifts that occurred during the hidden period.

The layout offset score is calculated as follows:

布局偏移分数 = 影响分数 * 距离分数

measures the impact of the unstable element on the visible area between two frames.
distance score refers to the unstable element in a frame divided by the maximum dimension of the visible area (width or height, whichever is greater).

CLS is the sum of all the layout offset scores .

When a DOM is shifted between two rendered frames, CLS is triggered (as shown in the figure).

The rectangle in the figure above has moved from the upper left corner to the right, which is considered a layout offset. At the same time, in CLS, there is a conversation window : one or more single layout offsets that occur in rapid succession, each offset is less than 1 second apart, and the maximum duration of the entire window is 5 seconds.

For example, in the second session window in the above figure, there are four layout offsets in it, and the interval between each offset must be less than 1 second, and the time between the first offset and the last offset cannot More than 5 seconds, so that it can be regarded as a session window. If this condition is not met, it is considered a new session window. Some people may ask, why do we have to do this? Evolving the CLS metric obtained by the chrome team based on a large number of experiments and research.

There are three calculation methods for CLS:

Accumulate
Take the average of all session windows
Take the maximum value in all session windows

Accumulate

That is, all the layout offset scores from the beginning of the page load are added together. However, this calculation method is not friendly to pages with a long life cycle. The longer the page persists, the higher the CLS score.

Take the average of all session windows

This calculation method is not based on a single layout offset, but based on the session window. Add up the values of all session windows and take the average value. But this calculation method also has disadvantages.

As can be seen from the above figure, the first session window produced a relatively large CLS score, and the second session window produced a relatively small CLS score. If you take their average value as the CLS score, you can't see the health of the page at all. The original page has a lot of offset in the early stage and less offset in the later stage. The current average value cannot reflect this situation.

Take the maximum value in all session windows

This method is currently the best calculation method. It only takes the maximum value of all session windows each time to reflect the worst case of page layout offset. For details, please see Evolving the CLS metric .

The following is the measurement code for the third calculation method:

let sessionValue = 0
let sessionEntries = []
const cls = {
    subType: 'layout-shift',
    name: 'layout-shift',
    type: 'performance',
    pageURL: getPageURL(),
    value: 0,
}

const entryHandler = (list) => {
    for (const entry of list.getEntries()) {
        // Only count layout shifts without recent user input.
        if (!entry.hadRecentInput) {
            const firstSessionEntry = sessionEntries[0]
            const lastSessionEntry = sessionEntries[sessionEntries.length - 1]

            // If the entry occurred less than 1 second after the previous entry and
            // less than 5 seconds after the first entry in the session, include the
            // entry in the current session. Otherwise, start a new session.
            if (
                sessionValue
                && entry.startTime - lastSessionEntry.startTime < 1000
                && entry.startTime - firstSessionEntry.startTime < 5000
            ) {
                sessionValue += entry.value
                sessionEntries.push(formatCLSEntry(entry))
            } else {
                sessionValue = entry.value
                sessionEntries = [formatCLSEntry(entry)]
            }

            // If the current session value is larger than the current CLS value,
            // update CLS and the entries contributing to it.
            if (sessionValue > cls.value) {
                cls.value = sessionValue
                cls.entries = sessionEntries
                cls.startTime = performance.now()
                lazyReportCache(deepCopy(cls))
            }
        }
    }
}

const observer = new PerformanceObserver(entryHandler)
observer.observe({ type: 'layout-shift', buffered: true })

After reading the above text description, look at the code to understand. The measurement content of a layout offset is as follows:

{
  duration: 0,
  entryType: "layout-shift",
  hadRecentInput: false,
  lastInputTime: 0,
  name: "",
  sources: (2) [LayoutShiftAttribution, LayoutShiftAttribution],
  startTime: 1176.199999999255,
  value: 0.000005752046026677329,
}

value field in the code is the layout offset score.

DOMContentLoaded, load events

When the pure HTML is fully loaded and parsed, the DOMContentLoaded event will be triggered, without waiting for css, img, iframe to load.

When the entire page and all dependent resources such as style sheets and images have finished loading, the load event will be triggered.

Although these two performance indicators are relatively old, they can still reflect some conditions of the page. It is still necessary to monitor them.

import { lazyReportCache } from '../utils/report'

['load', 'DOMContentLoaded'].forEach(type => onEvent(type))

function onEvent(type) {
    function callback() {
        lazyReportCache({
            type: 'performance',
            subType: type.toLocaleLowerCase(),
            startTime: performance.now(),
        })

        window.removeEventListener(type, callback, true)
    }

    window.addEventListener(type, callback, true)
}

First screen rendering time

In most cases, the first screen rendering time can be obtained through the load event. Except for some special cases, such as asynchronously loaded images and DOM.

<script>
    setTimeout(() => {
        document.body.innerHTML = `
            <div>
                <!-- 省略一堆代码... -->
            </div>
        `
    }, 3000)
</script>

In this case, the first screen rendering time cannot be obtained through the load At this time, we need to use MutationObserver to get the first screen rendering time. MutationObserver triggers an event when the properties of the monitored DOM element change.

First screen rendering time calculation process:

Use MutationObserver to monitor the document object, and trigger an event whenever the DOM element attribute changes.
Determine whether the DOM element is in the first screen. If it is performance.now() requestAnimationFrame() callback function to obtain the current time as its drawing time.
Compare the drawing time of the last DOM element with the time of all loaded pictures in the first screen, and use the maximum value as the first screen rendering time.

Monitor the DOM

const next = window.requestAnimationFrame ? requestAnimationFrame : setTimeout
const ignoreDOMList = ['STYLE', 'SCRIPT', 'LINK']
    
observer = new MutationObserver(mutationList => {
    const entry = {
        children: [],
    }

    for (const mutation of mutationList) {
        if (mutation.addedNodes.length && isInScreen(mutation.target)) {
             // ...
        }
    }

    if (entry.children.length) {
        entries.push(entry)
        next(() => {
            entry.startTime = performance.now()
        })
    }
})

observer.observe(document, {
    childList: true,
    subtree: true,
})

The above code is the code to monitor DOM changes, and at the same time, it needs to filter out tags such as style , script , link

Determine whether it is above the fold

There may be a lot of content on a page, but users can only see the content of one screen at most. Therefore, when counting the rendering time of the first screen, you need to limit the scope and limit the rendering content to the current screen.

const viewportWidth = window.innerWidth
const viewportHeight = window.innerHeight

// dom 对象是否在屏幕内
function isInScreen(dom) {
    const rectInfo = dom.getBoundingClientRect()
    if (rectInfo.left < viewportWidth && rectInfo.top < viewportHeight) {
        return true
    }

    return false
}

Use `requestAnimationFrame()` get DOM drawing time

When the DOM change triggers the MutationObserver event, it only means that the DOM content can be read, but it does not mean that the DOM is drawn on the screen.

As can be seen from the above figure, when the MutationObserver event is triggered, it can be read that document.body is already content on 061645888734ab, but in fact nothing is drawn on the left screen. So call requestAnimationFrame() to get the current time as the DOM drawing time after the browser draws successfully.

Compare the loading time of all pictures on the first screen

function getRenderTime() {
    let startTime = 0
    entries.forEach(entry => {
        if (entry.startTime > startTime) {
            startTime = entry.startTime
        }
    })

    // 需要和当前页面所有加载图片的时间做对比，取最大值
    // 图片请求时间要小于 startTime，响应结束时间要大于 startTime
    performance.getEntriesByType('resource').forEach(item => {
        if (
            item.initiatorType === 'img'
            && item.fetchStart < startTime 
            && item.responseEnd > startTime
        ) {
            startTime = item.responseEnd
        }
    })
    
    return startTime
}

optimization

The current code has not been optimized yet, there are two main points to note:

When is the rendering time reported?
If it is compatible with the asynchronous addition of DOM?

The first point is that the rendering time must be reported after the DOM no longer changes. Generally, the DOM no longer changes after the load event is triggered. So we can report at this point in time.

The second point is to report after the LCP event is triggered. Regardless of whether the DOM is loaded synchronously or asynchronously, it needs to be drawn, so you can listen to the LCP event, and only allow reporting after the event is triggered.

Combining the above two solutions together, there is the following code:

let isOnLoaded = false
executeAfterLoad(() => {
    isOnLoaded = true
})


let timer
let observer
function checkDOMChange() {
    clearTimeout(timer)
    timer = setTimeout(() => {
        // 等 load、lcp 事件触发后并且 DOM 树不再变化时，计算首屏渲染时间
        if (isOnLoaded && isLCPDone()) {
            observer && observer.disconnect()
            lazyReportCache({
                type: 'performance',
                subType: 'first-screen-paint',
                startTime: getRenderTime(),
                pageURL: getPageURL(),
            })

            entries = null
        } else {
            checkDOMChange()
        }
    }, 500)
}

checkDOMChange() code is called every time a MutationObserver event is triggered and needs to be processed with an anti-shake function.

Interface request time-consuming

Interface requests are time-consuming and require monitoring of XMLHttpRequest and fetch.

listens to XMLHttpRequest

originalProto.open = function newOpen(...args) {
    this.url = args[1]
    this.method = args[0]
    originalOpen.apply(this, args)
}

originalProto.send = function newSend(...args) {
    this.startTime = Date.now()

    const onLoadend = () => {
        this.endTime = Date.now()
        this.duration = this.endTime - this.startTime

        const { status, duration, startTime, endTime, url, method } = this
        const reportData = {
            status,
            duration,
            startTime,
            endTime,
            url,
            method: (method || 'GET').toUpperCase(),
            success: status >= 200 && status < 300,
            subType: 'xhr',
            type: 'performance',
        }

        lazyReportCache(reportData)

        this.removeEventListener('loadend', onLoadend, true)
    }

    this.addEventListener('loadend', onLoadend, true)
    originalSend.apply(this, args)
}

How to judge whether the XML request is successful? It can be based on whether his status code is between 200~299. If it is, it is a success, otherwise it fails.

monitor fetch

const originalFetch = window.fetch

function overwriteFetch() {
    window.fetch = function newFetch(url, config) {
        const startTime = Date.now()
        const reportData = {
            startTime,
            url,
            method: (config?.method || 'GET').toUpperCase(),
            subType: 'fetch',
            type: 'performance',
        }

        return originalFetch(url, config)
        .then(res => {
            reportData.endTime = Date.now()
            reportData.duration = reportData.endTime - reportData.startTime

            const data = res.clone()
            reportData.status = data.status
            reportData.success = data.ok

            lazyReportCache(reportData)

            return res
        })
        .catch(err => {
            reportData.endTime = Date.now()
            reportData.duration = reportData.endTime - reportData.startTime
            reportData.status = 0
            reportData.success = false

            lazyReportCache(reportData)

            throw err
        })
    }
}

For fetch, you can judge whether the request is successful ok true , the request is successful, otherwise it fails.

Note , the interface request time monitored may be different from the time detected on chrome devtool. This is because what chrome devtool detects is the time of the HTTP request sending and the entire process of the interface. However, xhr and fetch are asynchronous requests, and the callback function needs to be called after the interface request is successful. When the event is triggered, the callback function will be put in the message queue, and then the browser will process it, there is also a waiting process in the middle.

Resource loading time, cache hit rate

The resource and navigation events can be monitored PerformanceObserver If the browser does not support PerformanceObserver , you can also use performance.getEntriesByType(entryType) for downgrade processing.

When the resource event is triggered, the corresponding resource list can be obtained. Each resource object contains the following fields:

From these fields we can extract some useful information:

{
    name: entry.name, // 资源名称
    subType: entryType,
    type: 'performance',
    sourceType: entry.initiatorType, // 资源类型
    duration: entry.duration, // 资源加载耗时
    dns: entry.domainLookupEnd - entry.domainLookupStart, // DNS 耗时
    tcp: entry.connectEnd - entry.connectStart, // 建立 tcp 连接耗时
    redirect: entry.redirectEnd - entry.redirectStart, // 重定向耗时
    ttfb: entry.responseStart, // 首字节时间
    protocol: entry.nextHopProtocol, // 请求协议
    responseBodySize: entry.encodedBodySize, // 响应内容大小
    responseHeaderSize: entry.transferSize - entry.encodedBodySize, // 响应头部大小
    resourceSize: entry.decodedBodySize, // 资源解压后的大小
    isCache: isCache(entry), // 是否命中缓存
    startTime: performance.now(),
}

Determine whether the resource hits the cache

transferSize field in these resource objects, which indicates the size of the acquired resource, including the response header field and the size of the response data. If this value is 0, it means that it is read directly from the cache (mandatory cache). If this value is not 0, but the encodedBodySize field is 0, it means that it is taking the negotiation buffer ( encodedBodySize represents the size of the request response data body).

function isCache(entry) {
    // 直接从缓存读取或 304
    return entry.transferSize === 0 || (entry.transferSize !== 0 && entry.encodedBodySize === 0)
}

If the above conditions are not met, the cache is missed. Then the cache hit rate can be obtained by all the data/total data that hits the cache of 16164548873725.

`Browser round-trip cache BFC (back/forward cache)`

bfcache is a kind of memory cache, which saves the entire page in memory. When the user returns, they can immediately see the entire page without having to refresh it again. According to the article bfcache , firfox and safari have always supported bfc, and chrome is only supported by higher version mobile browsers. But I tried it, only safari browser supports it, maybe my firfox version is wrong.

But BFC also has disadvantages. When the user returns and restores the page from BFC, the code of the original page will not be executed again. To this end, the browser provides a pageshow event, you can put the code that needs to be executed again in it.

window.addEventListener('pageshow', function(event) {
  // 如果该属性为 true，表示是从 bfc 中恢复的页面
  if (event.persisted) {
    console.log('This page was restored from the bfcache.');
  } else {
    console.log('This page was loaded normally.');
  }
});

For pages restored from bfc, we also need to collect their FP, FCP, LCP and other time.

onBFCacheRestore(event => {
    requestAnimationFrame(() => {
        ['first-paint', 'first-contentful-paint'].forEach(type => {
            lazyReportCache({
                startTime: performance.now() - event.timeStamp,
                name: type,
                subType: type,
                type: 'performance',
                pageURL: getPageURL(),
                bfc: true,
            })
        })
    })
})

The above code is well understood. After the pageshow event is triggered, subtract the event trigger time from the current time. This time difference is the drawing time of the performance indicator. Pay attention to . The values of these performance indicators of pages restored from bfc are generally very small, generally around 10 ms. So add an identification field bfc: true to them. In this way, they can be ignored when doing performance statistics.

`FPS`

Using requestAnimationFrame() we can calculate the FPS of the current page.

const next = window.requestAnimationFrame 
    ? requestAnimationFrame : (callback) => { setTimeout(callback, 1000 / 60) }

const frames = []

export default function fps() {
    let frame = 0
    let lastSecond = Date.now()

    function calculateFPS() {
        frame++
        const now = Date.now()
        if (lastSecond + 1000 <= now) {
            // 由于 now - lastSecond 的单位是毫秒，所以 frame 要 * 1000
            const fps = Math.round((frame * 1000) / (now - lastSecond))
            frames.push(fps)
                
            frame = 0
            lastSecond = now
        }
    
        // 避免上报太快，缓存一定数量再上报
        if (frames.length >= 60) {
            report(deepCopy({
                frames,
                type: 'performace',
                subType: 'fps',
            }))
    
            frames.length = 0
        }

        next(calculateFPS)
    }

    calculateFPS()
}

The code logic is as follows:

First record an initial time, and then add 1 to the frame number requestAnimationFrame() After one second has passed, the current frame rate can be obtained frames/elapsed time.

When three consecutive FPS below 20 appear, we can conclude that the page is stuck. For details, please see How to monitor webpage stuck .

export function isBlocking(fpsList, below = 20, last = 3) {
    let count = 0
    for (let i = 0; i < fpsList.length; i++) {
        if (fpsList[i] && fpsList[i] < below) {
            count++
        } else {
            count = 0
        }

        if (count >= last) {
            return true
        }
    }

    return false
}

`Vue route change rendering time`

We already know how to calculate the first screen rendering time, but how to calculate the page rendering time caused by the page routing switch of the SPA application? This article uses Vue as an example to talk about my ideas.

export default function onVueRouter(Vue, router) {
    let isFirst = true
    let startTime
    router.beforeEach((to, from, next) => {
        // 首次进入页面已经有其他统计的渲染时间可用
        if (isFirst) {
            isFirst = false
            return next()
        }

        // 给 router 新增一个字段，表示是否要计算渲染时间
        // 只有路由跳转才需要计算
        router.needCalculateRenderTime = true
        startTime = performance.now()

        next()
    })

    let timer
    Vue.mixin({
        mounted() {
            if (!router.needCalculateRenderTime) return

            this.$nextTick(() => {
                // 仅在整个视图都被渲染之后才会运行的代码
                const now = performance.now()
                clearTimeout(timer)

                timer = setTimeout(() => {
                    router.needCalculateRenderTime = false
                    lazyReportCache({
                        type: 'performance',
                        subType: 'vue-router-change-paint',
                        duration: now - startTime,
                        startTime: now,
                        pageURL: getPageURL(),
                    })
                }, 1000)
            })
        },
    })
}

The code logic is as follows:

router.beforeEach() hook will be triggered when the route is switched. The current time is recorded as the rendering start time in the callback function of the hook.
Using Vue.mixin() of all components mounted() injection function. Each function performs an anti-shake function.
mounted() last component is triggered, it means that all components under this route have been mounted. You can get the rendering time this.$nextTick()

At the same time, one situation must be considered. When the route is not switched, the components may be changed. At this time, the rendering time mounted() So you need to add a needCalculateRenderTime field, and set it to true when switching routes, which means that the rendering time can be calculated.

`Error data collection`

`Resource loading error`

Use addEventListener() monitor the error event, you can capture the resource loading failure error.

// 捕获资源加载失败错误 js css img...
window.addEventListener('error', e => {
    const target = e.target
    if (!target) return

    if (target.src || target.href) {
        const url = target.src || target.href
        lazyReportCache({
            url,
            type: 'error',
            subType: 'resource',
            startTime: e.timeStamp,
            html: target.outerHTML,
            resourceType: target.tagName,
            paths: e.path.map(item => item.tagName).filter(Boolean),
            pageURL: getPageURL(),
        })
    }
}, true)

`js error`

Use window.onerror to monitor js errors.

// 监听 js 错误
window.onerror = (msg, url, line, column, error) => {
    lazyReportCache({
        msg,
        line,
        column,
        error: error.stack,
        subType: 'js',
        pageURL: url,
        type: 'error',
        startTime: performance.now(),
    })
}

`promise error`

Use addEventListener() monitor the unhandledrejection event, and you can catch unhandled promise errors.

// 监听 promise 错误 缺点是获取不到列数据
window.addEventListener('unhandledrejection', e => {
    lazyReportCache({
        reason: e.reason?.stack,
        subType: 'promise',
        type: 'error',
        startTime: e.timeStamp,
        pageURL: getPageURL(),
    })
})

`sourcemap`

Generally, the code in the production environment is compressed, and the sourcemap file is not uploaded in the production environment. Therefore, the code error message in the production environment is difficult to read. Therefore, we can use source-map to restore these compressed code error messages.

When the code reports an error, we can get the corresponding file name, number of rows, and number of columns:

{
    line: 1,
    column: 17,
    file: 'https:/www.xxx.com/bundlejs',
}

Then call the following code to restore:

async function parse(error) {
    const mapObj = JSON.parse(getMapFileContent(error.url))
    const consumer = await new sourceMap.SourceMapConsumer(mapObj)
    // 将 webpack://source-map-demo/./src/index.js 文件中的 ./ 去掉
    const sources = mapObj.sources.map(item => format(item))
    // 根据压缩后的报错信息得出未压缩前的报错行列数和源码文件
    const originalInfo = consumer.originalPositionFor({ line: error.line, column: error.column })
    // sourcesContent 中包含了各个文件的未压缩前的源码，根据文件名找出对应的源码
    const originalFileContent = mapObj.sourcesContent[sources.indexOf(originalInfo.source)]
    return {
        file: originalInfo.source,
        content: originalFileContent,
        line: originalInfo.line,
        column: originalInfo.column,
        msg: error.msg,
        error: error.error
    }
}

function format(item) {
    return item.replace(/(\.\/)*/g, '')
}

function getMapFileContent(url) {
    return fs.readFileSync(path.resolve(__dirname, `./maps/${url.split('/').pop()}.map`), 'utf-8')
}

Every time the project is packaged, if the sourcemap is turned on, then each js file will have a corresponding map file.

bundle.js
bundle.js.map

At this time, the js file is placed on the static server for users to access, and the map file is stored on the server for restoring error messages. source-map library can restore the uncompressed code error information based on the compressed code error information. For example, after compression, the error position is 1 row and 47 columns, and the real position after restoration may be 4 rows and 10 columns. In addition to location information, the original source code can also be obtained.

The figure above is an example of code error restoration. Since this part of the content does not belong to the scope of the SDK, so I opened a warehouse to do this, if you are interested, you can take a look.

`Vue error`

Using window.onerror not catch Vue errors, it needs to use the API provided by Vue to monitor.

Vue.config.errorHandler = (err, vm, info) => {
    // 将报错信息打印到控制台
    console.error(err)

    lazyReportCache({
        info,
        error: err.stack,
        subType: 'vue',
        type: 'error',
        startTime: performance.now(),
        pageURL: getPageURL(),
    })
}

`Behavioral data collection`

`PV、UV`

PV (page view) is page views, UV (Unique visitor) user visits. PV only needs to visit the page once, and UV visits multiple times in the same day only counts once.

For the front-end, you only need to report the PV once every time you enter the page, and the UV statistics are done on the server side, mainly to analyze the reported data to get the UV statistics.

export default function pv() {
    lazyReportCache({
        type: 'behavior',
        subType: 'pv',
        startTime: performance.now(),
        pageURL: getPageURL(),
        referrer: document.referrer,
        uuid: getUUID(),
    })
}

`Time on page`

The user enters the page and records an initial time. When the user leaves the page, the current time is subtracted from the initial time, which is the length of time the user stays. This calculation logic can be done in the beforeunload event.

export default function pageAccessDuration() {
    onBeforeunload(() => {
        report({
            type: 'behavior',
            subType: 'page-access-duration',
            startTime: performance.now(),
            pageURL: getPageURL(),
            uuid: getUUID(),
        }, true)
    })
}

`Page visit depth`

It is useful to record the depth of page visits, such as different active pages a and b. The average visit depth of a is only 50%, and the average visit depth of b is 80%, indicating that b is more popular with users. Based on this, you can modify a activity page in a targeted manner.

In addition, you can also use the depth of visit and the length of stay to identify e-commerce brush orders. For example, when someone enters the page, he pulls the page to the bottom and waits for a period of time to purchase. Someone slowly scrolls down the page and finally buys. Although their staying time on the page is the same, it is obvious that the first person is more like a scalper.

The page access depth calculation process is slightly more complicated:

When the user enters the page, the current time, scrollTop value, visible height of the page, and total height of the page are recorded.
The moment the user scrolls the page, the scroll event will be triggered. In the callback function, the data obtained at the first point is used to calculate the page visit depth and stay time.
When the user scrolls the page to a certain point, stop and continue to watch the page. At this time, the current time, scrollTop value, visible height of the page, and total height of the page are recorded.
Repeat the second point...

Please see the specific code:

let timer
let startTime = 0
let hasReport = false
let pageHeight = 0
let scrollTop = 0
let viewportHeight = 0

export default function pageAccessHeight() {
    window.addEventListener('scroll', onScroll)

    onBeforeunload(() => {
        const now = performance.now()
        report({
            startTime: now,
            duration: now - startTime,
            type: 'behavior',
            subType: 'page-access-height',
            pageURL: getPageURL(),
            value: toPercent((scrollTop + viewportHeight) / pageHeight),
            uuid: getUUID(),
        }, true)
    })

    // 页面加载完成后初始化记录当前访问高度、时间
    executeAfterLoad(() => {
        startTime = performance.now()
        pageHeight = document.documentElement.scrollHeight || document.body.scrollHeight
        scrollTop = document.documentElement.scrollTop || document.body.scrollTop
        viewportHeight = window.innerHeight
    })
}

function onScroll() {
    clearTimeout(timer)
    const now = performance.now()
    
    if (!hasReport) {
        hasReport = true
        lazyReportCache({
            startTime: now,
            duration: now - startTime,
            type: 'behavior',
            subType: 'page-access-height',
            pageURL: getPageURL(),
            value: toPercent((scrollTop + viewportHeight) / pageHeight),
            uuid: getUUID(),
        })
    }

    timer = setTimeout(() => {
        hasReport = false
        startTime = now
        pageHeight = document.documentElement.scrollHeight || document.body.scrollHeight
        scrollTop = document.documentElement.scrollTop || document.body.scrollTop
        viewportHeight = window.innerHeight        
    }, 500)
}

function toPercent(val) {
    if (val >= 1) return '100%'
    return (val * 100).toFixed(2) + '%'
}

`User clicks`

Using addEventListener() monitor the mousedown and touchstart events, we can collect the size of the user's click area each time, the specific location of the click coordinate in the entire page, and the content of the clicked element.

export default function onClick() {
    ['mousedown', 'touchstart'].forEach(eventType => {
        let timer
        window.addEventListener(eventType, event => {
            clearTimeout(timer)
            timer = setTimeout(() => {
                const target = event.target
                const { top, left } = target.getBoundingClientRect()
                
                lazyReportCache({
                    top,
                    left,
                    eventType,
                    pageHeight: document.documentElement.scrollHeight || document.body.scrollHeight,
                    scrollTop: document.documentElement.scrollTop || document.body.scrollTop,
                    type: 'behavior',
                    subType: 'click',
                    target: target.tagName,
                    paths: event.path?.map(item => item.tagName).filter(Boolean),
                    startTime: event.timeStamp,
                    pageURL: getPageURL(),
                    outerHTML: target.outerHTML,
                    innerHTML: target.innerHTML,
                    width: target.offsetWidth,
                    height: target.offsetHeight,
                    viewport: {
                        width: window.innerWidth,
                        height: window.innerHeight,
                    },
                    uuid: getUUID(),
                })
            }, 500)
        })
    })
}

`Page jump`

Use addEventListener() monitor page jump events of popstate and hashchange Note that calling history.pushState() or history.replaceState() will not trigger the popstate event. This event is only triggered when a browser action is made, such as the user clicking the browser's back button (or calling the history.back() or history.forward() method in the Javascript code). The same is true for hashchange

export default function pageChange() {
    let from = ''
    window.addEventListener('popstate', () => {
        const to = getPageURL()

        lazyReportCache({
            from,
            to,
            type: 'behavior',
            subType: 'popstate',
            startTime: performance.now(),
            uuid: getUUID(),
        })

        from = to
    }, true)

    let oldURL = ''
    window.addEventListener('hashchange', event => {
        const newURL = event.newURL

        lazyReportCache({
            from: oldURL,
            to: newURL,
            type: 'behavior',
            subType: 'hashchange',
            startTime: performance.now(),
            uuid: getUUID(),
        })

        oldURL = newURL
    }, true)
}

`Vue routing changes`

Vue can use the router.beforeEach hook to monitor routing changes.

export default function onVueRouter(router) {
    router.beforeEach((to, from, next) => {
        // 首次加载页面不用统计
        if (!from.name) {
            return next()
        }

        const data = {
            params: to.params,
            query: to.query,
        }

        lazyReportCache({
            data,
            name: to.name || to.path,
            type: 'behavior',
            subType: ['vue-router-change', 'pv'],
            startTime: performance.now(),
            from: from.fullPath,
            to: to.fullPath,
            uuid: getUUID(),
        })

        next()
    })
}

`Data reporting`

`Reporting method`

The following methods can be used for data reporting:

The simple SDK I wrote uses a combination of the first and second methods for reporting. The advantage of using sendBeacon for reporting is very obvious.

Using the sendBeacon() method will enable the user agent to asynchronously send data to the server when it has the opportunity, without delaying the unloading of the page or affecting the loading performance of the next navigation. This solves all the problems when submitting analysis data: the data is reliable, the transmission is asynchronous, and the loading of the next page will not be affected.

In browsers that do not support sendBeacon, we can use XMLHttpRequest to report. An HTTP request includes two steps: sending and receiving. In fact, for reporting, we only need to make sure that it can be sent out. That is, it does not matter whether the response is received or not. To this end, I did an experiment. Beforeunload, I used XMLHttpRequest to transmit 30kb of data (generally, the data to be reported is rarely so large). After changing to a different browser, it can be sent successfully. Of course, this is also related to hardware performance and network status.

`Reporting time`

There are three reporting opportunities:

Use requestIdleCallback/setTimeout delayed reporting.
Report in the beforeunload callback function.
Cache the reported data and report it after reaching a certain amount.

It is recommended to combine three methods to report together:

Cache the reported data first, and after a certain amount is cached, use requestIdleCallback/setTimeout report.
When the page leaves, the unreported data will be reported uniformly.

`Summarize`

It is difficult to understand just looking at theoretical knowledge. For this reason, I wrote a simple monitoring SDK based on the technical points mentioned in this article. You can use it to write some simple demos to help deepen your understanding. Read it together with this article, the effect is better.

`Reference`

`Performance monitoring`

`Error monitoring`

noerror
source-map

`Behavior monitoring`

popstate
hashchange

The principle analysis of some technical points of the front-end monitoring SDK

Performance data collection

FP

FCP

LCP

CLS

Accumulate

Take the average of all session windows

Take the maximum value in all session windows

DOMContentLoaded, load events

First screen rendering time

Monitor the DOM

Determine whether it is above the fold

Use requestAnimationFrame() get DOM drawing time

Compare the loading time of all pictures on the first screen

optimization

Interface request time-consuming

Resource loading time, cache hit rate

Browser round-trip cache BFC (back/forward cache)

FPS

Vue route change rendering time

Error data collection

Resource loading error

js error

promise error

sourcemap

Vue error

Behavioral data collection

PV、UV

Time on page

Page visit depth

User clicks

Page jump

Vue routing changes

Data reporting

Reporting method

Reporting time

Summarize

Reference

Performance monitoring

Error monitoring

Behavior monitoring

谭光志

引用和评论

前端性能优化：从系统分析到实践策略

Javascript(turfjs)等值线图绘制

2024最全ECharts 实战大全（速记版+资源）

Jerry和您聊聊Chrome开发者工具

CSS 如何模拟“真实的”进度条？

💢线上高延迟请求排查

一些react使用小技巧（中）