Hello everyone, my name is Yang Chenggong.
When it comes to monitoring systems, most students first think of back-end monitoring. Obviously, such as detecting server performance, database performance, API access traffic, and the operation of various services, etc., are closely related to the backend. The front-end is more responsible for the role of UI display. It mainly focuses on how to layout and design the page. It seems that there is nothing to monitor, so the concept of monitoring has not been involved all the time.
So everyone agreed that as long as the backend is stable and controllable, the application is stable and controllable, but is this really the case?
In recent years, the front-end has developed rapidly. Thanks to the continuous evolution of JavaScript and the continuous enhancement of browser functions, the front-end can do more and more things, and the corresponding front-end applications are becoming more and more complex. Problems that we would not have encountered before, are now coming out of our minds.
For example, Xiao Ming is a front-end programmer. One day, a user reported that there was no response when a button on a certain page was clicked. Xiao Ming immediately found the button and tapped it, huh? It's normal. Then Xiaoming used several different accounts to test, and it was still normal. This time, Xiao Ming was stumped.
what to do? I believe front-end programmers all over the world have the same reaction to strange problems. Xiao Ming told the user: it may be a browser cache problem, can't you force a refresh, or try to log out? The user followed Xiao Ming's suggestion and it worked! So I sent a series of "Thanks 🙏" to Xiao Ming. Xiao Ming smiled awkwardly, and quickly replied "Small meaning".
Two days later, another user reported the same problem. Xiao Ming resorted to the above universal solution, and it still worked. But is the problem really solved? No! However, Xiao Ming has tried many times but failed to reproduce the exception. There may be many reasons, such as:
- Data problem, may not get a certain attribute
- Front-end problem, JS code execution exception
- Interface problem, maybe the interface is not responding, or not returning the expected value
However, there is no problem under normal circumstances, and Xiaoming has tested it many times. This problem must occur in a specific scenario, but we can't judge or capture it.
Bugs like these lurk in our systems like landmines that will explode at some point. The most embarrassing thing is that even if it explodes, it is difficult for us to find it, which makes our "demining operations" difficult.
One sunny afternoon, Xiao Ming was sitting on the toilet thinking about life. Suddenly, a flash of light flashed in his mind, and Xiao Ming thought: "If the system can automatically obtain the abnormal data and save it at the moment when the user triggers the abnormality, and then can see the data somewhere in the background, I can Did you find the cause of the error right away?"
Xiao Ming slapped his thigh, yes! Why didn't I think of it sooner? In this way, as long as an exception occurs, we can automatically capture the abnormal data. If we encounter an online error again, we don’t need user feedback, we can find out by ourselves, and we can immediately locate the cause of the error. Isn’t this killing two birds with one stone?
I believe that many front-end seniors have also been troubled by the above problems, and then, like Xiao Ming, slowly came up with this idea: "Save the abnormal data when the error is reported for subsequent investigation ." In the process of continuous practice of this idea, it has gradually evolved into today's front-end monitoring.
Of course, today's front-end monitoring is not just monitoring abnormal data, any data that is conducive to product analysis can be added to monitoring. So I think front-end monitoring refers to the collection of key data generated by users in the process of using the system, storing it in the database, and then searching and analyzing it later . Such a complete set of implementation is called a front-end monitoring system.
What specific problems can front-end monitoring solve?
The above uses an example to deduce the background of front-end monitoring, and roughly describes how it tracks online error reporting. You should have a preliminary understanding of the meaning of front-end monitoring. Now let's focus on the project and explore in detail what problems it can solve.
exception error
The first is the problem of abnormal error reporting. Just like the scene in the example, an abnormality occurs online, and sometimes it is difficult for us to reproduce. Even if there is no user feedback, we do not know there is such a problem, which conveys to the user a feeling that our product is very unstable. . Therefore, front-end monitoring is a very critical guarantee for online product stability and abnormal timely feedback.
Of course, in addition to front-end exceptions, we can also catch interface exceptions . Sometimes front-end programmers laugh at themselves as "back-boilers", products, tests, and users, if they encounter problems, they first find the front-end. Sometimes the leader has a bad temper. He scolded his head and covered his face first when he came up, and the humble front-end didn't dare to speak, because the problem would only be clear after investigation. After the investigation, it was an interface problem. After being scolded in vain, I felt very accurate.
But if we have front-end monitoring, we can immediately get the error information, page, address, parameters, etc. when the exception occurs, and we can know what the problem is. The next time you encounter an online accident, the front end can calmly and objectively say which side is the problem. If you encounter dumping behavior, the front end can also bravely say no. After all, I have the evidence in hand, how can I let you roar?
performance detection issues
Tracking exceptions is the most practical part of front-end monitoring, but not only that, performance monitoring is also a very critical part.
The current front-end engineering volume is very large. If the code quality is not high, or the project architecture design is unreasonable, it is easy to encounter performance problems. Performance problems, such as the loading time of the first screen, whether the page is stuck, blank screen, repeated resource requests, etc., can be monitored through data collection, such as calculating the rendering time, the number of requested interfaces, the total amount of requested resources, etc., to monitor a page and discover it in time. performance issues.
So in addition to "solving problems", what value does front-end monitoring have?
Operational Feedback Tool
In fact, in addition to helping programmers to continuously optimize and improve applications, front-end monitoring also plays an indispensable role in products and operations. Specifically, by collecting user behavior data through " buried point monitoring ", you can make statistical analysis on the usage of online products, such as the overall PV/UV, the number of visits to a certain function, the visit period, the click-through rate, etc. etc. data. This data can help products and operations understand what’s going on, which in turn can improve product functionality.
The collection of these behavioral data can very accurately describe the actual usage of a function or a person. Of course, the amount of data collected is much larger than the abnormal data. In contrast, anomaly monitoring collects data only when an anomaly occurs, while behavioral data means that as long as a user uses our product and interacts with the product, theoretically these data must be collected.
Of course monitoring is multifaceted, and what data is collected depends on the situation. In short, you can learn about any situation of the product by designing collection rules and then collecting data. This aspect is very flexible and is not limited to the well-known indicators.
Why choose self-study?
With the development of front-end monitoring, there will inevitably be a mature third-party platform. There are three most commonly used in China:
- sentry
- webfunny
- fundebug
First of all, the two platforms sentry and fundebug are paid, and the more data you have, the higher the cost, which is equivalent to a data hosting platform. Although webfunny can be deployed privately, its functions are fixed and the code cannot be changed. This is its disadvantage: it is not flexible enough to customize functions.
Therefore, although there are already mature monitoring systems on the market, many teams still choose to develop them by themselves. One is that the data can be stored on your own server without spending extra money; the other is that it is flexible and can customize functions. For example, you can access your own DingTalk or enterprise WeChat message push when an exception is triggered, which requires Your surveillance system is highly flexible.
And as we said above, custom collection rules . I think this is the most important reason. The data collected by different rules is different, so the third-party standard collection rules may not meet the needs of your company. For example, some companies need to obtain device IDs as unique IDs, while others need user IDs. This is determined by the business, and every company is different.
Our front-end group is a self-developed front-end monitoring platform. The advantage is that you can customize your own collection rules, design your own database storage fields, and save the data on your own platform. The flexibility and reliability are very high, and it can meet your own diverse needs.
Self-developed front-end monitoring technology stack
Let’s start with the conclusion, our company’s front-end monitoring is done by the front-end team itself, so the technology stack is React + Node.js + MongoDB
.
This is a relatively conventional technical solution. The front-end does it by itself, so the technology stack is dominated by JS. At the same time, this is also something that the front end can ponder and understand, and it can be regarded as a standard solution.
Among them, for the Node.js part, we use the express
framework to write the interface. The interface is generally divided into two categories, namely 写入
and 查询统计
. The function is to collect data from the front end. After that, it needs to be stored by calling the interface. After that, on the monitoring panel, the data query is also displayed through the interface.
Behind the interface is the MongoDB
database, which is used to store the data we have collected. Why choose MongoDB? The main reason is that its writing performance is very high and the writing speed is very fast. As we said above, when the monitoring system collects behavior data, the writing is very frequent, so the requirements for the writing performance are very high, but the query requirements are not so high.
There is also a difficult point here, that is, after collecting a large amount of data, we need statistical analysis of various dimensions. for example:
- Ranking of the number of visits and visit duration of users in a certain period of time
- Visit frequency and stay time ranking of pages in a certain period of time
- Statistics on the number and proportion of interface errors reported in a certain period of time
These more complex query statistics are mainly used in MongoDB's aggregation query. It’s okay to write basic grouping statistics on the front end, but we are stretched thin for such complex queries. How to do it? We devoured all the documents of the MongoDB aggregate query for a long time, looked for functions one by one according to the requirements, and saw which one could be implemented, and turned over almost all the aggregate functions.
After the interface is completed, finally use React to implement a management background, display the data in the form of charts and tables, and then you can see the usage of online products in real time.
Of course, there is another step, which is to write a notification interface for docking DingTalk or enterprise WeChat, and initiate a notification when an exception is triggered, so that we can know the abnormal situation in time. Our notification is this:
This information can be more comprehensive to see what went wrong. If you see more detailed errors, go to the exception panel to find:
In short, we first comprehensively monitor the abnormality of the interface. After confirming that there is no problem with the data, we will check the front-end. The efficiency is improved, and the burden is less. Isn't this the best of both worlds?
In the end, this small system we developed by ourselves played a great role after the product was launched, and was praised by the boss, which encouraged us to continue to improve it~
more resources
The source of this article is the success of the official account programmer . The author Yang Chenggong focuses on the sharing of front-end engineering and architecture. Follow me for more hard-core knowledge.
Any questions and suggestions in this article are welcome to communicate with me, thank you for reading🙏
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。