Author: Wang Wenhua (Lian Mo)
Qianniu is a multi-end open work platform for Alibaba merchants. It serves millions of active merchants to operate businesses on mobile and desktop terminals every day, including store management, customer service reception, information messaging and other functions.
At the same time, Qianniu itself is an open-end system architecture, and the second and third parties can provide services to merchants through the open system (we call it the plug-in system). It is called a plug-in because we define a number of open nodes and standards in the business link of the merchant, and the business side implements them according to the standards and completes the corresponding functions. It is precisely because of the existence of these standards and specifications that different plug-ins can be connected in series to operate, thereby avoiding the problem of closed-loop function being broken due to the selection of different plug-ins by merchants.
The following are the open nodes defined by Qianniu:
Opening up has promoted the entry of business and third-party ISVs, enabling Qianniu to make fuller use of external resources and services, such as accelerating the development progress and meeting the customization needs of merchants. A three-party plug-in needs to go through 4 stages: ISV development, service market launch, merchant purchase, and use on Qianniu. In the case that the merchant does not choose to operate the plug-in by default, Qianniu also has a set of rules to guide users to give priority to the free version of the plug-in, and ISV can guide the upgrade order during the merchant's trial process to make a profit. But opening up will also bring corresponding problems-merchant experience problems.
In order to enhance the business experience, we initiated an open experience upgrade project. After continuous governance, the average monthly open public opinion of Qianniu has been reduced by 50%. So, Qianniu, and how is the overall prevention and control plan designed?
The problem and its cause
The characteristics of Qianniu's open public opinion
Due to the characteristics of open, KN open public opinion scattered , and complex causes . There are a large number of tools on Qianniu, which are provided by the second and third party teams. Some second party tools have a long history and insufficient maintenance investment, and the technical capabilities of ISVs are uneven. There are many open technology stacks, including early H5, mid-term QAP (weex packaged open framework), and small programs. The plug-in startup link is long, involving more than 7 technical links from the front end to the ISV server, which is easily affected by the jitter of the network and various services. Numerous unstable factors have brought challenges to the governance of open public opinion.
What is the core issue of open experience?
There are various open experience problems, mainly including the following three types of core experience problems:
- The plug-in opens the overall link for a long time, and all aspects of launching, commercial ordering, container operation and loading will affect the startup of the plug-in;
- Because of the design of the authority control of the main and sub-accounts in Qianniu terminal, the main account can restrict the sub-account permissions in each function, and the sub-account will have the blocking problem of insufficient authority when it is used;
- There are many logical problems in ISV or second-party business. Qianniu, as a platform, currently lacks sufficient online problem awareness and effective governance tools.
The overall plan of prevention and treatment
- Optimize plug-in startup link: Improve the fault tolerance of startup link technology products, and optimize the opening success rate to over 99.7%.
- Construction permission application closed loop: Enhance the efficiency of permission application and approval, and optimize the experience of sub-accounts using plug-ins.
- Establish a data measurement system: Accumulate the grasp of driving business optimization.
There are many businesses on Qianniu, and the reasons for public opinion are complicated and change rapidly. It is not enough to solve the problems of public opinion individually. The governance of open business is gradual. On the one hand, needs to solve the known problems , and on the other hand, establish measurement standards and stability monitoring core nodes in the plug-in startup and operation phase, and consolidate the governance results . Drive the decline of public opinion with the idea of governance-monitoring-prevention-optimization.
Start link optimization
Introduction to the startup process
[]()
- Protocol routing: Qianniu officially defines a set of open nodes (pits) and corresponding standard protocols (egtradeDetail to view order details), and ISV implements the functions to be undertaken according to the protocol. At this stage, the configured protocol needs to be parsed and routed to the default plug-in appkey of user settings or operation configuration;
- Plug-in meta information search: Find the plug-in meta information corresponding to the target appkey from the list of plug-ins issued by the server;
- Permission verification: verify whether the sub-account has permission to open this plugin;
- Commercialization guarantee: Complete the free version subscription for new users or users whose subscription relationship has expired;
- Pre-authorization QAP: For third-party plug-ins, explicit user authorization is required to allow third-party ISVs to access data;
- Container routing and rendering: According to the plug-in meta-information, business parameters are assembled and handed over to the corresponding container for rendering.
Full link monitoring
First, it is necessary to locate the cause and distribution of the plug-in startup failure, and use this to determine the follow-up governance and optimization plan. Although it is theoretically possible to analyze logs for each public opinion, in actual operation, due to the heavy workload and the lack of a global statistical perspective. Therefore, first establish a plug-in to start full-link monitoring, retain error context information, and count the accurate startup success rate and failure cause distribution to provide a measurement basis for optimization.
The embedded point dimensions include the target plug-in appkey, technology type (H5, QAP, applet), error stage, error description, open plug-in source or entry information, and the start and end time of each stage.
These dimensions have several functions:
- Configure alarms of different dimensions, such as a sudden drop in the success rate of the H5 plug-in or a significant increase in the number of errors at a certain stage;
- When the overall success rate changes, it is convenient to compare the trends of different dimensions and quickly locate the level of the problem;
- The error stage information is convenient to view the error distribution by stage, and optimize the success rate in stages;
- Open source and entry information provides more information about the scene where the problem occurred.
Plug-in startup optimization special
By full-Link monitoring can see two types of errors, one is a plug-in open front link failure , the other is subscription relationship is not established ;
fault tolerance of the front link
The main reason for the start-up link error is the lack of key information of the start-up link caused by weak network or server jitter, such as plug-in meta-information and small program packages, which are optimized through abnormal compensation. Qianniu sorted out the header plug-ins of the core business link, built-in meta-information, pre-downloaded small program packages using subscription relationships and scene-based information. After optimization, the hit rate of small program packages increased from 85% to 97%, and the overall number of failures A decrease of 55%.
Commercialization guarantee plan upgrade
On the Qianniu end, merchants must establish a subscription relationship with the plug-in before using the plug-in, which is a commercial model of periodic ordering. Qianniu renews the free version for merchants before using it to ensure that the main functions are available. Through the review of the old solution, it was found that the original product link could not cover scenarios such as order failure. After upgrading the commercial guarantee scheme, the number of related errors dropped by 56.7%, and the pre-link time consumption dropped by 170ms. The strategy is as follows:
- Exception compensation: Coordinating the server to increase the order status information. If the order is being established, the query will be delayed and retryed, and the relationship will be established before opening (the order process involves external systems such as Huijin, and the effective time fluctuates greatly); if the order cannot be successful ( egISV was punished and the order was frozen), guided the replacement of similar plug-ins (and weighted by quality points to influence ranking, driving ISV optimization)
- Performance optimization: Add a frequency control strategy to the pre-order link to reduce the frequency of pre-calling. Adding post-renewal to extend the validity period, avoid entering the pre-process, and at the same time, it can also cover those scenarios where the applet is not opened through the plug-in process. For the most commonly used default plugins, add silent renewal when idle.
Start optimization special results: the success rate rose from 99% to 99.7% , the front link from 350ms to 130ms ;
Permission application link construction
The seller uses the main and sub-accounts for team collaboration, and the sub-accounts will encounter open experience problems with insufficient permissions. This year, Qianniu built a permission application link on the mobile terminal to optimize the merchant’s sub-account usage experience and permission approval efficiency.
application link:
- Extend the client API to the two parties, and the two parties actively call to trigger the application guidance to meet the general needs; because the two-party verification method is flexible and lacks a unified closing.
- The aspect detects insufficient authority on the tripartite link and automatically triggers the application prompt. Due to historical reasons, tripartite applets are divided into two categories (permission granularity dimension), with different detection methods. a. Mini Programs upgraded from QAP: with refined authorization signs, the authorization granularity is the authorization package (matching the error code of the TOP response) b. Directly enter as a mini program: there is no refined authorization sign, and the authorization granularity is the mini program application level (Match the getAuthUserInfo API error code). The application process is triggered by monitoring the API calls of the applet; the former needs to use the TOP API name to go to the server to exchange the permission package and permission point information, and then create a work order to notify the master account for approval.
approval link: master account receives the pending approval message, click to open the corresponding permission approval details page.
After optimization, the number of errors of insufficient sub-account permissions decreased by 57% , and related public opinion decreased by 52% by .
Data measurement system construction
The function and quality of open tools mainly depend on business logic and service availability and stability. Therefore, it is necessary to define core indicators, monitor online exceptions, and deal with them in a timely manner. The construction of the data measurement system mainly includes the construction of somatosensory indicators, the optimization of public opinion SOP, and the construction of open experience market.
Somatosensory index construction
Established the business success rate of TOP and cloud applications through the event mechanism of the applet, built the somatosensory white screen rate (H5, weex and applet), expanded the white screen rate detection scheme of the applet, and monitored the business side online many times Problems, push rollbacks and repairs.
plug-in core operation quality index
The core nodes during the operation of the Qianniu plug-in in the figure below can be used to comprehensively monitor the stability of the online plug-in by establishing corresponding indicators. In addition to common pure technical indicators: interface business success rate, bridge API success rate, JS error rate, etc., Qianniu has also built a somatosensory white screen rate to reflect the online running quality of the plug-in.
Somatosensory white screen rate
Although the technical indicators of the core nodes are available, these technical indicators cannot fully cover the scenarios where the functions are not available: one day cloud application expansion, new machine configuration problems caused the order data to be empty, but the interface was successful; some technical indicators were wrong , Does not mean that the function will be unavailable, such as some JS errors. Therefore, it is necessary to establish indicators to directly measure the available lines from the somatosensory. Among them, the most common problem with the unavailability of the Qianniu plug-in function is the white screen.
Definition of white screen rate: within a certain period of time, page elements cannot be displayed in time, causing the page layout to fail to appear, or the wrong bottom page appears, or the page where part of the picture does not appear is defined as a white screen.
Detection scheme: In the Qianniu mini program scenario, it is mainly divided into three stages: noise scene filtering, white screen detection, and result reporting. The result report mainly uses the Motu Buried Point platform to report and alert, so I won’t go into details here.
Noise scene filtering:
- After opening a certain applet page A, the ISV code automatically jumps/switches to another page B, and the real rendering is triggered when the user visits page A. Therefore, if it is detected when A is not visible, it will be considered as a white screen. The false white screen caused by the difference of this technology does not affect the user's body sense, and is not our detection target;
- Qianniu's large number of small programs are provided by three parties, and authorization is required before accessing user data. Therefore, if the small program page is blank when detecting, it is still stuck in the authorization process, and the authorized pop-up box can be detected to avoid misjudgment.
- Some applet pages use the same layer rendering capabilities, such as applet pages that use components such as videos and maps. These elements are not normal html elements and cannot be detected by JS, but they are not white screens and need to be filtered by configuring the page whitelist.
Detection strategy:
The main strategy for white screen detection is to count the number of effective elements, which refer to effective information carriers such as text and pictures. In the applet and H5 plug-in, by injecting JS into the webview, the statistical information of the interface elements is obtained. Qianniu is different from most C-side applications. Qianniu has many re-input pages. For example, the answer page of the Q&A plug-in has a large area of input box. In this case, it is not considered as a white screen. By checking whether there is an input box related Component judgment. On the Qianniu end, if the merchant does not have an order, there may be a large area of blank space on the page, so it is necessary to filter the whitelist copy of "not yet", "not yet", etc., to avoid misjudgment as a white screen.
classic case
Second party business blank screen
In April, a white screen appeared when a second party's business line was accessed using a 4g network. The white screen rate indicator will alert in time, and the white screen rate can see obvious changes in the minute level, and it will fall back after repairing, avoiding a surge in public opinion.
Three-party business revision triggers current limit
In March, I received an alarm that a third-party plug-in called a logistics interface. The main error was being restricted. The success rate was only 80%, which was significantly lower than the average. The reason is that the new version of ISV queries logistics information in the order list, which leads to too many calls to the logistics interface to trigger flow restriction. Notifying ISV that the success rate has increased significantly after adjustments have been made at the product level.
Public opinion SOP program
Many user feedbacks on Qianniu are from the global feedback portal, lacking plug-in context; moreover, users are not mentally minded about plug-ins, and feedback problems are weakly directed, making public opinion analysis and problem-driven difficult. Qianniu’s plan is to record the usage of plug-ins, display the most recently used tools when users feedback plug-in public opinion, guide the selection of feedback target plug-ins, and add target plug-in information and problem classification to the public opinion information, which is convenient for statistics and warnings. After the launch, the proportion of open public opinion with the plug-in appkey rose from 11.95% to 95%.
Open to experience the market
By integrating public opinion, technical indicators and somatosensory data, Qianniu has established an open experience market, making the plug-in experience trackable and measurable. Based on the number of plug-in public opinion, technical indicators, and somatosensory indicators, the quality of the plug-in is established. The quality of the Qianniu plug-in is clear at a glance, and it has become an important starting point for promoting the optimization of two and three parties.
Summarize
Qianniu optimized the plug-in startup fault tolerance, built the sub-account permission application product link, and improved the usability of its own link. Through the construction of somatosensory indicators, public opinion feedback and analysis capabilities have been improved, an open experience market has been established, and a God’s perspective has been established on the quality of online plug-in operation. It can not only find online problems in time, but also drive the optimization of two- and three-party business in a targeted manner through data. Enhances the merchant’s experience of using open tools on Qianniu's end.
Follow us, 3 mobile technology practices & dry goods for you to think about every week!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。