17

background

H5 optimization in seconds is a common problem, so Dewu combined the client and H5 to work together. This article will gradually introduce how to increase the second opening from 30% to 75% through the optimization method of client + H5 (1+1>2)? Subsequent interface pre-requests, client-side pre-rendering, and pre-loading 2.0 will once again help improve indicators.

Why optimize?

Reported by Global Web Performance Matters for ecommerce

  • 47% of users care more about whether a page loads within 2 seconds.
  • 52% of online users believe that page opening speed affects their loyalty to a website.
  • Every 1 second slows down the page PV by 11% and user satisfaction by 16%.
  • Nearly half of mobile users give up because they haven't opened a page within 10 seconds.

Overall system architecture diagram:

Indicator selection

First of all, let’s talk about FMP, the indicator used to measure seconds, so why not choose FCP or LCP? FCP will be triggered only when it needs to be rendered. LCP has poor compatibility. Dewu hopes to measure seconds to open from the user's point of view. The time point when the user clicks to open a webview to the complete presentation of the first screen content is Dewu. Defined FMP trigger timing.

After the indicators are clear, let's take a look at what time-consuming the complete FMP contains.

Next, it will be divided into two parts, the client-side optimization part and the H5 optimization part.

Client-side optimization

Through HTML preloading, HTML pre-request, offline package, interface pre-request, link keep-alive, pre-rendering and other means to improve the opening speed of the first screen of the page, the preloading, pre-request and offline package can respectively increase the opening speed by about 10%.

HTML preload

By configuring the HTML main document to be downloaded in advance by the client, the downloaded HTML document is directly used when the user accesses it, thereby reducing the HTML network request time and improving the opening speed of the web page.

How to determine which pages need to be downloaded

The predecessors planted the trees and the others enjoyed the shade. The Dewu App has a lot of resource slots, such as banners, King Kong slots, and Zhongtong slots. What content is displayed in these slots has already been produced by the intelligent recommendation algorithm, so it can be specified directly. These resource bits are preloaded.

page cache management

After the page is preloaded, it can't be updated all the time, right? So when do you update the page's cache?

  • Preloaded pages are stored in memory, and the app cache will be cleared when you close the app
  • Manually control the maximum cache time by configuring the expiration time
  • Initiate an asynchronous thread to update the HTML document after the page is entered.

Faced with reality:

However, in the later grayscale process, I was educated by reality and found that some SSR pages will involve changes in status, such as the scene of receiving coupons. These states are rendered by the SSR service. The user has not received the coupon when entering the page. At this time, updating the HTML document is really lonely. After the user receives the coupon, close the page and enter again, and find the state in the page. Still let the user get the coupon, click on the coupon and tell others that you have already received it.

improvement measures

  1. When the H5 page is opened, for the component whose state may change, it requests the interface again to obtain the latest state data.
  2. The client is modified from updating the HTML document when entering the page to: updating the HTML document when the webview is closed.

online revenue

So far the problem has been solved, is the engineer's task over? If you think that the function is finished, then please change your thinking at this moment and think about what our goal is? Our goal is to increase the seconds to open, and preloading is only a means to improve the second to open, but after the function is implemented, we do not know how many seconds to open this function. Therefore, after the function development is completed, it is necessary to Start paying attention to the results after going live to see how preloading performs. As can be seen from the figure below, the second opening rate can be increased by more than 10% when preloading is enabled.

challenges encountered

  1. The preloaded pages are basically the pages of the SSR service, and the preloading creates a large number of requests invisibly. At this time, the SSR service of Dewu cannot handle such a large amount of requests
  2. Even if the SSR service can handle it, it will put pressure on the entire backend service chain

SSR service expansion

To solve the problem of server pressure, it is natural to think of increasing the number of machines, so we expanded the number of SSR machines and doubled the number of machines. At this time, we continued to try to expand the number of preloaded users, but we still could not resist this Large QPS, and this time also caused the second problem, the server of the algorithm department issued an alarm, so the heavy volume plan encountered obstacles again.

Game Breaker CDN

Using the caching capability of the CDN server can not only reduce the pressure on the SSR server but also reduce the pressure on the back-end service link. Why not use such a good thing? There is a suspense here, and the H5 optimization part will be introduced in detail later.

Client cooperation transformation

  • Support all open preloading capabilities for CDN domain names, and maintain the original heavy volume ratio for non-CDN domain names

Open screen preload

During this process, the traffic proportion of the page was also analyzed, and it was found that the page traffic proportion of the screen-opening advertisement source was also high. So is it possible to preload the HTML document content of the screen-opening advertisement?

Open screen page preload strategy

  1. De-duplicate the preload list. There may be duplicate pages in the open-screen advertisement list, and their background images and effective times are different.
  2. Added the configuration related to the effective time, and there are pages in the open-screen advertisement list that will only be displayed at a certain time in the future
  3. Add black and white list control, there may be third-party cooperation pages in the open screen advertisement list, they do not want pre-loading statistics to cause inaccurate PV

Preload Outlook

Since the HTML can be downloaded in advance, is it possible to go a step further and load the resources in the page in advance, so that when a page is opened, most of the network requests can be reduced and the content can be presented to the user more quickly. There is also a need to consider how to cooperate with the offline package mentioned below.

HTML pre-request

At the same time when the webview is initialized, request the HTML main document, wait for the HTML document to be downloaded and render after the initial success of the webview, which reduces the user's waiting time. After the client's request is successful, the webview loads the local HTML and saves it for next use. When the pre-request HTML is enabled, the second opening can be increased by about 8%.

Pre-request vs pre-load

In essence, the difference between HTML preloading and HTML pre-request is that the timing of downloading HTML documents is different. Pre-loading means that the app will be downloaded without any action after the app is started, but pre-request will only be opened when the user clicks to open the H5 page. will download when. If the user opens an H5 page for the second time and finds that the HTML has been downloaded locally and has not expired, it will be used directly. The behavior at this time is consistent with the preloading function.

encounter challenges

After going online, it was found that the pre-request only increased by about 2% in seconds. After analysis, the problem was found:

  1. The cache validity time is too short, and the page expiration time is only configured for 10 minutes, which means that the user will have to download it again after 10 minutes. Can the cache time be extended?
  2. H5 pages have no self-updating capability and cannot support configuring a longer cache time, which is consistent with the preloading HTML problem.

Dig

The low-end computer is used to analyze the entire second-time-consuming link locally. Why use a low-end computer to analyze it? Low-end machines have the advantage of naturally adding a slow-play function, which can detect problems to the greatest extent.

Android h5 page load executes in parallel with native layout fill

It can be seen from the figure that the time consumption before the h5 page is loaded is distributed in the activityStart() function, which includes onCreateView, the longest time-consuming is the layout filling inflate(), because the WebView object is created in advance, directly from the object pool The time-consuming is mainly in the initialization process, WebView's own initialization WebViewChromiumFactoryProvider. startYourEngines (takes 87 us, less than 1 ms), and some other initializations of WebView, jockey initialization and so on.

The calculation in seconds includes the time-consuming time from View initialization to WebVIew url loading, so that optimization points can be found. Webview loadUrl can be prepended, and h5 page loading and native layout filling can be performed in parallel. When onCreateView, create FrameLayout and return, after executing WebView loadUrl, the main thread starts to inflate the layout. After the layout is loaded successfully, it will addView to FrameLayout, reducing the blocking time of loadUrl. The mid-to-high-end models have an optimization effect of about 15ms, and the low-end models have an optimization effect of 30~50ms.

The timing of downloading HTML at both ends is advanced to the routing stage

The timing of pre-requesting HTML is when entering the native page. This time point has passed 100ms from the user's click event. Is it possible to advance the timing of downloading HTML? After some exploration, we finally chose to intercept in the routing phase, which can be closed uniformly and the time interval from the user's click can be ignored. In this way, the timing of downloading HTML is advanced by an average of 80ms+.

The flow at this time becomes as follows.

Some students may ask, why not download it when the user clicks? It is definitely time-consuming from user clicks to routing.

  1. The code level is not easy to maintain. If you need to invade the business level when you click, there are thousands of entries, and it is difficult to maintain and manage
  2. The time-consuming part from clicking to routing has been tested offline, which is almost negligible.

final online revenue

After the above problem is solved, the cache time is changed to 1 day, and it is found that the second opening can be increased by about 8% when the pre-request HTML is turned on, which is not much different from the effect of pre-loading.

offline package

By aggregating the css, js and other resources required in the H5 page in a compressed package in advance, the client downloads and decompresses the app after the app is started, and when the H5 page is subsequently accessed, it matches whether there are local offline resources to speed up the page. access speed.

Android implementation

The implementation of resource interception on Android is relatively simple. Webview supports shouldInterceptRequest, which can detect whether resource interception is required in this method. If necessary, return the WebResourceResponse object, without directly returning null

iOS implementation

However, some difficulties were encountered on the iOS side, and the following solutions were investigated:

Scheme 1: NSURLProtocol interception method

NSURLProtocol interception method, using WKBrowsingContextController and registerSchemeForCustomProtocol. The private class/selector is obtained by reflection. Handle http and https requests to NSURLProtocol by registering. In this way, the request can be intercepted, but it is found that the body of the post request will be lost. And NSURLProtocol is globally enabled once registered. We hope that he will only intercept pages that have access to the offline package, but there is no way to control him. He will intercept requests for all pages, including third-party cooperation pages, which is obviously unacceptable.

Option 2: Intercept requests through CustomProtocol

In iOS 11 and above, there is an API for loading custom resources: WKURLSchemeHandler.

You can modify the current page url to a custom scheme protocol, such as: https://fast.dewu.com to duapp://fast.dewu.com Then register the scheme in the client, and the front end cooperates to modify all the resources in the page Requests that are not adaptive protocols, such as: src="//fast.dewu.com/xxx" can be intercepted. However, during the test, it was found that the interface only allows domain names in the whitelist to initiate cross-domain requests for security reasons, and multiple domain names cannot be configured, so the solution cannot be continued.

Option 3: hook handlesURLScheme

Still use WKURLSchemeHandler and then hook WKWebview's handlesURLScheme method to support proxying for http and https requests. Although the request can be intercepted in this way, the following problems are encountered:

body loss problem

However, this situation has been fixed after iOS 11.3, and only blob type data will be lost. JS needs to proxy the behavior of fetch and XMLHttpRequest. When the request is initiated, the body content will be informed to native through JSBridge, and the request will be handed over to the client for initiation. The client will call back the js method after the request is completed.

Cookie lost, unusable problem

By proxy document.cookie assignment and value action, it is managed by the client, but there is a need to pay extra attention here. Cross-domain verification needs to be done to prevent malicious pages from modifying the cookie.

encounter challenges

So far, the development of the function has been completed and launched. First, a set of online revenue data is obtained. Android has a revenue of about 10% when it is turned offline, but the second opening rate is lower when iOS is turned offline. After the repair process, iOS can also increase the seconds to open by more than 10%.

Android and iOS implementation differences

After analysis and comparison, it is found that the interception action of Android is relatively light, and it can be judged whether it needs to be intercepted.

However, once the page is intercepted on the iOS side, all http and https requests in the page will be intercepted, and the client will initiate the request to respond, and the request cannot be returned to the webview to initiate.

iOS cache issue fix

After the resources in the page pass through the client request proxy, the cached memory will be used when the webview itself is opened for the second time. Now the cache is also invalid, so only a set of caching mechanism can be implemented in the client.

  1. Determine which resources can be cached and how long to cache based on the http protocol
  2. Add a custom control strategy to allow only some types of resources to be cached

Offline package download error rate management

As can be seen from the figure below, the download error rate of offline packages is around 6%. Such a high download error rate is definitely unacceptable. After a series of optimization methods, the download error rate of offline packages has been reduced from about 6% to 0.3. % floats around.

Let's take a look at the flow chart and problem points before optimization

Find a large number of unknown hosts, network request failures, and disconnected network connections through buried points. The analysis code found that the download did not perform queue control, and multiple offline packages would be downloaded concurrently, resulting in multiple download tasks competing for resources. The following optimizations have been made in response to the identified problems:

  1. A retry mechanism is added for download failure, and the number of retries can be dynamically configured to alleviate the problems of network request failure and network connection disconnection.
  2. Added the download task queue management function, which can dynamically configure the number of concurrent downloads to alleviate the problem of different download tasks competing for resources.
  3. For weak and no network conditions, the download will be delayed until the network is good.
  4. The offline package download supports httpdns, which is used to solve the problem that the domain name cannot be resolved.

The following is the flow chart after optimization:

Outlook:

For offline resources are directly stored on the disk, each access will take disk IO time. After testing on low-end machines, it is found that this time will fluctuate between 0-10ms, and the memory will be used reasonably later. Up, by setting the upper limit of memory, the upper limit of the number of files, and even the file type, and the elimination and update of memory files through the LRU strategy

Interface pre-request

Initiating the first screen interface request of the H5 page through the client is far earlier than waiting for the client page initialization, HTML download, and JS download execution time, thus saving the user's first screen waiting time. During the local test, it was found that the interface pre-request can be advanced by 100+ms, and the user can see the content faster.

Features

The client will obtain the configuration after the app starts, save the page address and corresponding interface information that supports pre-request, and when the user opens the webview, it will initiate the corresponding pre-request interface in parallel, and save the result. When the JS execution starts to obtain the data on the first screen, it will first ask the client whether the corresponding response data already exists. If the data has been obtained at this time, there is no need to initiate a request. Otherwise, the js will also initiate an interface request and enable the racing mode. The following is the overall flow chart:

Configure the platform

So how does the client know what interface this page needs to request? And what are the parameters of the interface? Of course, the configuration platform is indispensable, which supports the following functions:

  1. Configure the page url that needs to be preloaded and correspond to an api url and parameters that need to be requested
  2. Configure the audit function to avoid misconfigured releases and go online

QA

Since there is SSR server-side rendering, why do you still need the interface pre-request function?

First of all, even in the case of SSR, there may still be some components in the first screen content that are straight out of the skeleton, and need to wait for the page to be rendered and executed before requesting data, and some pages are SPA. It is a good complement for both cases.

Pre-built link & link keep alive

After it is turned on, the DNS 90th percentile time is reduced from 80ms to 0ms, the 90th percentile time of TCP connection is reduced from 65ms to 0, the average time of DNS is reduced from 55ms to 4.3ms, and the average time of TCP connection is reduced from 55ms to 4.3ms. 30ms is reduced to 2.5ms.

Time-consuming analysis of network requests

From the above figure, we can see that a network request can only be sent after the time-consuming stages of DNS resolution, TCP connection establishment, and SSL connection establishment. So can we save this time?

Common network request frameworks such as OkHttp on the client side can fully support the functions of http1.1 and HTTP2, and also support connection multiplexing. If you understand the advantages of this connection multiplexing mechanism, you can take advantage of it. For example, when the APP is waiting for the screen to open, the connection of key domain names is established in advance, so that after entering the corresponding page, the network request results can be obtained faster, giving users more good experience. In the case of network environment deviation, this pre-connection will theoretically have a better effect.

Implementation plan

The link can be established by initiating a HEAD request in advance for the domain name link, and the network framework will automatically put the connection into the connection pool. And release it after 5 minutes of no operation by default, and repeat the above actions within five minutes to keep the link.

In addition, it is necessary to pay attention to the number of connection pools. If the data in the connection pool is too small, but there are many domain names, the links maintained by the pre-built connection will be easily released, which requires convergence or enlargement of the connection through the domain name. number of pools to optimize.

online income

Will the pre-built connection increase the pressure on the server? This is definitely possible. First of all, the grayscale strategy will be implemented for the pre-built connection function itself. After the HTML page is hosted by the CDN, it will directly open the full amount of the cdn domain name, so that there is no need to worry that the cdn domain name will not be able to withstand the pressure.

Let's take a look at the online effect. From the figure below, we can see that the time-consuming of DNS 90th percentile is reduced from 80ms to 0ms, the time-consuming of TCP connection 90th percentile is reduced from 65ms to 0, and the average time-consuming of DNS From 55ms to 4.3ms, the average time spent on establishing a TCP connection is reduced from 30ms to 2.5ms.

pre-rendering

The client renders the page through the webview in advance, and it can be displayed directly when waiting for the user to visit. So as to achieve the instant opening effect. But this function certainly cannot be opened to all pages, and there are certain drawbacks.

  1. It consumes additional client resources, needs to be executed when the main thread is idle, and needs to control the number of pre-rendered pages.
  2. If the page is entered, it will rain red envelopes. This kind of page is not suitable for pre-rendering and needs to be avoided.

The following picture [School Season] is a pre-rendered H5 page in the business. You can see that when the [School Season] page is opened, the page has been rendered and there is no waiting process.

In the future, we plan to expand this capability to the general webview, and open it for big promotions and pages with high PV volume.

H5 optimization

SSR server-side rendering

SSR server side rendering (English: server side render) In general, the data rendering of an H5 page is completely completed by the client side. First, the page data is requested through AJAX and the corresponding data is filled into the template to form a complete page for presentation. to users. And server-side rendering puts the data request and template filling on the server side, and returns the rendered complete page to the client.

SSR has an average improvement of 15% for seconds to open. Since it is server-side rendering, it will put pressure on the server, especially after the preloading HTML function is turned on. How to solve it?

Preliminary optimization content:

  1. Interface cache: The node service injects the redis instance into the ctx, and the business side processes the interface-related cache in the server-side rendering logic, which involves: configuration file delivery, ab interface.
  2. Static page cache: Since the page is completed without interface interaction, and all user displays are the same, static html resources are generated by renderToHtml and written to the cache.
  3. No user state page caching: In most cases, the displayed content of this type of page is the same, and the data requested by the server is also consistent. When processing, the server determines whether it needs to be cached according to whether there is a user login state.
  4. The content of the Thousand People Thousand Faces interface is changed from SSR to CSR, and the skeleton diagram is displayed. Since the content of Thousand People Thousand Faces is provided by the algorithm interface, and the algorithm interface itself has a slow response, this method can reduce the server response time. , and can display content to users more quickly.

Game Breaker CDN

Through so many optimization methods, it is still unable to meet the needs of preloading, and through analysis, it is found that the network phase takes a long time. Finally, the big killer of CDN is moved out. There are many reasons for not being on CDN, mainly in the following aspects:

  • There are thousands of people who see the content differently.

It can be solved by the above optimization 4. Modify the CSR content of the original SSR rendering. Since it has already been on the CDN, it is planned to modify this part of the content back to the SSR again, so that users can see the product more quickly instead of the skeleton, and then pass CSR to update the content.

  • The page status changes and the cache cannot be updated in time

This problem has already been solved in the above-mentioned client-side preloading optimization part. You can request the interface to refresh the data again after the page is opened to ensure the accuracy of the data, but this part of the workload is also relatively large. There are 30+ components that need to refresh the state, and the components developed after that need to consider the state update.

  • HTML template content changes cannot be updated in time

There are two places that cause template content changes. The first scenario is that in the builder scenario, the operation can dynamically modify the template content to change the page structure (low frequency). The second scenario is that the template content needs to be updated after the project is released ( high frequency).

For this problem, the CDN cache content can be updated by automatically calling the CDN service provider's refresh cache interface when the content is changed.

Pre-rendered HTML

Render the SPA page through puppeteer and save the HTML document, cooperate with the above page refresh strategy, and host the HTML through CDN, so that your SPA page is as smooth as SSR.

The main implementation scheme is to use the webpack-based plug-in prerender-spa-plugin and configure the routes that need to be pre-rendered, so that the pages corresponding to the routes will be produced after packaging. The solution itself is general, but manual checking is required for each page accessed.

The humble css package size optimization

It is well known that css loading will block HTML rendering, and finally reduce the first screen public css from 118kb to 38kb. The following figure simulates the SSR page loading sequence diagram in a weak network environment through chrome. It can be seen from the figure that the download of styles.fb201fce.chunk.css takes 18s, which blocks the rendering of the page. The HTML main document takes 2.38s to download, but the actual rendering time is after 20s.

The optimization idea is also relatively single. The CSS files required for the first screen are embedded in the HTML through inline, and returned by the SSR service. The CSS files are split and loaded on demand.

I have the idea, and then I will see how to implement it. I initially tried the MiniCssExtractPlugin plugin, which can divide the css into separate files, and each js will generate a corresponding css file, but it needs to be built on webpack5. However, in the project The next version used is 9.5, so I thought about upgrading to the latest version of next12. After the upgrade, I found that various errors were reported in other packages during the build, and I found that some packages did not support the latest next12. After trying to fix it for a day, it has not been resolved , and upgrading to the latest version is not sure whether it will cause other stability problems, so let's put aside the search for other methods.

After unremitting efforts, I found the clue by reading the next source code, and found that all the public css was grouped by splitChunks when packaging. Since the components in the project are imported dynamically, here I directly modify the webpack packaging parameters in next.config.js , delete the splitChunks.cacheGroups.styles configuration, and use the default chunks: async configuration to import on demand.

Image optimization

Avoid image src being empty

Although the src attribute is an empty string, the browser will still initiate an HTTP request to the server, especially when the SSR server cannot bear the pressure, so special attention needs to be paid here.

Image compression and format selection

The advantage of WebP is that it has a better image data compression algorithm, which can bring a smaller image size, and has image quality that is indistinguishable to the naked eye; it also has lossless and lossy compression modes, Alpha transparency and animation. Features, the conversion effect on JPEG and PNG is quite good, stable and uniform.

Select the appropriate resolution by passing parameters to the image server

Detail optimization

Packaging optimization

  • The page components are split, and the resources required for the above-the-fold content are loaded first
  • Webpack splitchunks effectively split common dependencies and improve cache utilization
  • Components are loaded on demand
  • Tree Shaking reduces code size

Non-critical js , css lazy loading

  • defer, async, dynamically load js
  • Lazy loading js on iOS devices

Media resource loading optimization

  • Lazy loading of pictures and videos
  • Resource compression, select the appropriate resolution by passing parameters to the image server

Other resource optimization

  • Delayed sending of data buried point reporting without blocking onload event triggering
  • Custom font optimization, use fontmin to generate a streamlined font package

page rendering optimization

  • Page rendering time optimization

<!---->

    • SSR page first screen css inline (Critial CSS)

<!---->

  • Fair Use of Layers
  • Layout jitter optimization: set the width and height in advance
  • Reduce reflow and redraw operations

Code level optimization

  • Time-consuming task segmentation

<!---->

    • Reduce main thread time with Web Worker

<!---->

  • Through RAF callbacks, code logic is executed when the thread is idle
  • Avoid css nesting too deep

monitor

In order to help developers better measure and improve front-end page performance, the W3C Performance Group introduced the Performance API, among which the Navigation Timing API realizes automatic and accurate page performance management. The front-end performance monitoring indicators of Dewu also obtain data from the Performance API for reporting and statistical analysis.

system structure

After the SDK data is collected, it will be reported to the Alibaba Cloud sls log platform . Then, the data will be cleaned in real time through flink and stored in clickhouse. The platform backend reads the clickhouse data and performs various aggregation processing before using it.

indicator market

Before optimization, you must first establish monitoring indicators. The Internet calls it a handle. Without monitoring indicators, no matter how you optimize, you don’t know how the optimization effect will be, let alone what to do next? and what issues remain unresolved. Therefore, the indicators are first before optimization, and of course the accuracy of the indicators must be ensured.

The indicator market mainly includes the following functions:

  1. You can quickly view the version, device manufacturer, device name, device system version, and network ratio within a certain period of time, and you can also filter and troubleshoot based on these fields.
  2. The middle area shows the overall concern and the client-side time and H5-second opening time of the active page.
  3. The bottom area shows the seconds-to-open time of each business domain.
  4. The average time-consuming and 90th percentile time- consuming are also shown here. The disadvantage of the average time-consuming is that it is easy to be averaged, and everyone should have the experience of being averaged. The 90th percentile time is briefly explained here: it means that 90% of the access time is lower than the 90th percentile, and so on, the 50th percentile means that 50% of the access time is lower than the 50th percentile. , the quantile value is obtained by sorting all the time-consuming data from small to large.

white screen monitoring

Under normal circumstances, after completing the above optimization measures, users can basically open the H5 page in seconds. But there are always exceptions. The user's network environment and system environment are very different, and even WebView may crash internally. When a problem occurs, what the user sees may be just a white screen page, so further optimization means need to detect whether a white screen occurs and corresponding countermeasures.

The most intuitive solution for detecting a white screen is to take a screenshot of the WebView, and traverse the color values of the pixels of the screenshot. If the color points of non-solid colors exceed a certain threshold, it can be considered that it is not a white screen. First get the Bitmap object containing the WebView view, then reduce the screenshot to the specified resolution size, such as: 100*auto, traverse the pixels of the detected image, when the pixels of the non-solid color are greater than 5%, it can be considered as a non-white screen However, there are many situations outside the list. We analyze the screenshots through image recognition technology, and we can well perceive whether the current screen is blank, whether it is loading, whether it is a special page, and so on.

The white screen is an important indicator. We have issued an alarm notification for the rapid increase of the overall white screen rate and the newly added white screen page, which is convenient for developers to intervene in time and start troubleshooting.

performance issue discovery

Mainly through monitoring methods such as CDN coverage monitoring, http request monitoring, network monitoring (loading failure, abnormal time-consuming, abnormal transmission size), image monitoring (uncompressed, abnormal resolution) and other monitoring methods to find potential problems in the page, and also provides Problem analysis ability, enter the url address of the page on the problem analysis page to help you find problems and give suggestions for modification.

CDN not covered Monitoring

The importance of CDN is self-evident, it can speed up resource access, thereby improving user experience. We analyze the online buried point data to find out the list of resources not covered by CDN, so as to promote the optimization of various business students.

HTTP request monitoring

Why monitor HTTP requests? Let's first look at the new features of HTTPS relative to HTTP:

  1. Content encryption: using hybrid encryption technology, the middleman cannot directly view the plaintext content
  2. Verify identity: authenticate the client through the certificate to access its own server
  3. Protect data integrity: prevent transmitted content from being impersonated or tampered with by middlemen

Then HTTP is easy to be viewed or even tampered with by middlemen. In this case, for the security of our service, we need to upgrade the existing HTTP protocol uniformly, and then we need to monitor to find out.

Network Monitoring

The second opening rate of some pages is low, so we need to analyze the reasons. Is the interface response of this page relatively slow, or is the page itself requesting relatively large resources? If a network request fails, it should be sensed immediately, and it cannot passively wait for user feedback.

Picture monitoring

Including uncompressed pictures, abnormal picture resolution, abnormal picture transfer size greater than 300kb, and abnormal image resource transfer size greater than 1M.

page problem analysis

There are a bunch of functions listed above, which may be annoying to business students. What are the specific problems of my page? You can't let me go to the above functions to see one by one, which exception is the page I'm responsible for? This function itself makes use of existing functions to perform aggregate analysis through a page path.

exception monitoring

H5 anomalies have always been monitored using sentry, but the sentry system cannot measure the severity of product anomalies due to the lack of correlation with PV and DAU data. The lack of business domain association results in exceptions that cannot be divided by business domain. The user behavior log has not yet been connected with the native terminal, and it is easy to encounter the bottleneck of incomplete context during problem analysis. Another problem is that sentry will limit the flow and discard some abnormal data when the qps is high.

Since sentry can already help us with certain troubleshooting and analysis capabilities, we do not plan to do the same function of sentry, but to do the part that sentry does not support. We have designed the following functions for the above problems:

  • Abnormal problem indicator measurement

<!---->

    • Increase exception rate, page exception rate, influence user rate trend
    • Increase the distribution ratio and business domain division under the multi-dimensional problem (system version, APP version, H5 release version, network, etc.)

<!---->

  • Enhancement of abnormal problem aggregation ability (increased aggregation problem ability)

<!---->

    • The exception list supports sorting by latest addition, Top PV, number of exceptions, and number of affected users
    • Distinguish between three-party SDK exceptions, interface exceptions and other different exception types

future outlook

Although the current second opening rate has reached more than 75%, at the same time we have an important indicator, the 90th percentile time-consuming, and we are committed to improving the user experience of the end user H5 page. After the 90th percentile optimization is completed, we may consider continuing. In-depth optimization of the 95th percentile time-consuming.

Summarize

Finally, I would like to thank those students who contributed to the second opening of the Dewu H5 page, and the H5 team. The students are all great, and various optimization methods and ideas emerge in an endless stream.

So far, we have systematically explained the background and the whole process from the establishment of indicators to the optimization and launching in seconds. The full text is divided into three parts, client, H5, and monitoring. If reading this article has benefited you, please give your little hands a thumbs up! If you are still not satisfied after reading it or have any questions or ideas, please feel free to comment and exchange in the message area.

Finally, the overall optimization brain map is presented:

Text/XU MING

Pay attention to Dewu Technology and be the most fashionable technical person!


得物技术
854 声望1.5k 粉丝