Author: Xu

Hello everyone, I'm Xu Ge, product manager of Alibaba Cloud's cloud-native ARMS. Today I bring you the third lesson of the observable series - "Business & Digital Experience Management Scenario Interpretation". This paper is mainly divided into three parts. The first part is the necessity of digital experience, and the necessity of digital experience management is introduced from two aspects: the impact of digital experience management on business and the value of digital experience management to enterprises; the second part, ARMS in digital experience management The product capability introduction on the above; the third part, combined with customer cases to share the best practices.

​The need for digital experience management​

Why do we need digital experience management? Foreign research reports show that 70% of users report that the speed of opening web pages will directly affect users' willingness to shop online. Amazon also found that for every 100 milliseconds increase in website loading speed, overall sales decreased by 1%. Overall, user experience will directly affect business performance. So what value will digital experience bring to enterprises? We believe that the value of digital experience is reflected in three aspects:

The first one is quantification. I believe everyone may have heard a sentence - "If you can't quantify it, you can't optimize it." Therefore, quantify the subjective user experience into specific indicators, while providing visual analysis capabilities , to help enterprises understand the overall user terminal experience level and existing problems. At the same time, in addition to being able to quantify the user experience indicators of our own products, we can also obtain benchmark indicators of our industry, and even experience indicators of competing products. With quantitative data, we can realize insights and utilization of data through digital experience tools. For example, the problem location of usability and page performance, the delimitation of the impact of the problem, to analyze whether the problem is caused by a regional problem, an operator problem, or a device problem. Finally, with these insights, the ARMS user experience interaction tool will also provide optimization suggestions for experience problems to help us fix problems in a targeted manner. Find problems faster, reduce business impact, and reduce overall time-to-fix.

Because digital experience is so important to an enterprise, ARMS provides a comprehensive range of tools for digital experience scenarios. There are generally two methods for digital experience. One is called synthetic observation, and the more familiar concept is called dial test. The other is called real traffic observation. For synthetic ARMS, cloud dialing test products are provided. For real traffic, ARMS provides two products: front-end performance analysis and APP performance analysis.

To put it simply, the cloud dial test is to conduct active simulated access to the target website through pre-built detection points in different regions, different operators, different devices and different types, and obtain the availability and performance-related indicators. At the same time, thanks to the black box mode of cloud dialing test, it is also possible to collect and analyze the experience indicators of competing products. In terms of real traffic observation, ARMS is divided into front-end performance analysis for web and performance analysis for APP. For web front-end observation, ARMS supports the management of websites, H5, and applets. The first is to provide operation-related analysis capabilities, including PV/UV and other data and page performance-related analysis. In addition, it can also combine ARMS applications for API requests. Performance analysis provides end-to-end link correlation analysis capabilities. On the mobile side, APP performance analysis can implement crash analysis, performance analysis, remote log pulling for iOS applications and Android applications, as well as multi-dimensional analysis capabilities for different devices, different operators, and different networks.

So what is the difference between the two, and which scenarios are they suitable for? Here is a brief summary:

First of all, from the perspective of traffic, cloud dial test is not real traffic, it is simulated access traffic. Front-end performance analysis and APP performance analysis are based on real traffic for performance analysis. Therefore, it can be seen from this that cloud dialing test does not require traffic and can also perform performance management on websites or API interfaces. Front-end performance analysis and APP performance analysis require real traffic to realize digital experience management.

Secondly, from a formal point of view, the cloud dialing test is an active method. It will actively access the website or APP provider, find experience or other related problems faster and earlier, and can solve and repair these problems before users. problem. Front-end performance analysis and APP performance analysis are more passive means. Only after user access traffic can get relevant indicators, so as to realize corresponding analysis.

Finally, from the perspective of the amount of data, the frequency and number of visits to the cloud dial test can be set and controlled in advance, and the amount of data is relatively small. For front-end performance analysis and APP performance analysis, because the real traffic data is collected, the interaction events on the website and APP will generate corresponding indicators and logs, which will generate a large amount of data.

In summary, the cloud dial test is more suitable for obtaining benchmark experience indicators. For example, if there is no user traffic in a certain area, the website can be dialed through the cloud dial test to learn the overall experience indicators of the region. At the same time, you can also test the industry competitors' websites to obtain industry benchmark experience indicators. However, the front-end performance analysis and APP performance analysis are based on real traffic, so all the real experience indicators of the website or APP are obtained. For example, after a new version is released, verify whether the overall experience has achieved the expected effect. In addition, the cloud dial test is suitable for diagnosing and short-term experience problems, and the front-end performance analysis and APP performance analysis are suitable for long-term tracking of APP or website performance and identification of potential problems. In other words, the cloud dial test can help us answer the answers to known questions, such as whether this website is available? But there is no way to answer the underlying problem, that is to say, when you don't know where the problem is, this scenario is more suitable for real traffic performance analysis.

Therefore, in the digital experience management scenario, the combination of the two can provide enterprises with a full range of digital experience management.

ARMS Digital Experience Management Product Capability Introduction

Next, we will explain its core capabilities for cloud dial testing, front-end performance analysis, and APP performance analysis. In a nutshell, the cloud dial test is to simulate real users as much as possible by deploying observation points around the world, and to access the target website or APP from various regions of the world to master its availability and performance.

Cloud dial test has the following advantages:

  • There are a large number of detection points distributed around the world, including IDC computer room detection points and netizen LasMile detection points.
  • Compared with application performance analysis, neither professional skills nor embedded codes are required. It is a non-intrusive method. When dialing a website, it does not require R&D cooperation, and the dialing test configuration can be completed in three minutes.
  • As a proactive means, 7×24 hours and minute-level testing, before users find problems.
  • Cloud dial test has a variety of detection models, including usability analysis, web page performance analysis, DNS hijacking analysis, and CDN quality performance analysis.

First, we will introduce the usability performance analysis. For digital experience management, usability performance analysis is the first experience management problem that needs to be solved. After availability, we can talk about the subsequent access performance and error and exception related analysis. For cloud dial testing, you can select observation points in different regions and different operators to actively visit the website, mark a successful visit as an effective visit, and divide the effective visit by the total number of observations to get the specific details of the website. availability. For usability, we also provide long-term trend analysis; in addition, we also provide drill-down capabilities. For a certain dial test, we can learn more about the access details of this dial test to help us locate the key points that cause feasibility problems.

The second scenario is performance observation. The cloud dialing test performance observation can be divided into three aspects. The first is for web page performance, including the first screen time, 100K time, and DNS time, TCP time, download time, SSL handshake time and blocking time at the network layer; second It is for network performance, which is mainly reflected in latency and DNS query time; finally, for file transfer, cloud dial test can grasp indicators such as the average file transfer speed and the first packet time, and perform performance observations for scenarios that require file transfer. .

The third scenario is the hijacking analysis scenario. The cloud dial test conducts hijacking analysis for common hijacking types, including DNS hijacking, traffic hijacking, and element hijacking. In addition, the cloud dial test can detect the quality of DNS and CDN, including real-time analysis of the DNS resolution strategy and the performance status of each host node, and adjust the DNS resolution strategy according to the analysis results.

Cloud dial test can also evaluate the service quality of CDN providers during CDN selection to assist in making selection decisions. After purchasing the CDN service, you can also continuously detect the CDN through the cloud dial test, and obtain the detection data parsed by the CDN to optimize the CDN scheduling strategy.

Finally, due to the active black box capability of cloud dial test, competitive product analysis can also be achieved. Initiate active dial-up tests for competitors' websites in the industry, learn relevant experiential indicators, guide our own website optimization, and put us in a relatively favorable position in the competition.

Next, let's talk about ARMS's product capabilities in real digital experience management, including front-end performance analysis and APP performance analysis. Front-end performance analysis and APP performance analysis are based on real traffic access data. Digital experience management tools for different terminals can analyze digital experience from multiple perspectives such as page performance, error and exception analysis, and network requests. Multi-dimensional analysis capabilities such as equipment and network operators.

ARMS's true digital experience management products have the following characteristics:

1. Compatible with multiple platforms, support web, H5, applet. Common platforms like WeChat, Alipay, DingTalk and Mini Programs are all supported. At the same time, it supports a variety of user terminals such as iOS and Android on the APP.

2. Combined with ARMS application performance analysis and link tracking, end-to-end analysis can be achieved, and the API request of a page can be associated with the back-end call chain to achieve end-to-end performance analysis and problem location.

Third, the access is simple, no need to bury the point, and it also supports a variety of access methods.

Fourth, in addition to analysis capabilities, it also provides online diagnostic capabilities to assist in locating the root cause of the problem.

Front-end performance analysis The first capability to talk about is ARMS end-to-end performance analysis. We can use multiple dimensions in ARMS front-end performance analysis, such as version, operating system, device, browser, region, and network. It can analyze API performance in various dimensions, and can also be linked with application performance analysis to achieve end-to-end call analysis, helping users locate specific applications and codes that cause API request errors and slowness.

The second capability is the capability of multi-dimensional analysis of front-end performance analysis. It supports the analysis of performance indicators from geographic dimensions and terminal dimensions, including browsers, devices, operating systems, resolutions, and networks. It can locate the dimension of the specific problem in some scenarios, whether it is an equipment problem, a regional problem, or a network problem, and provide data support for business decision-making.

Finally, it is the JS error analysis capability of front-end performance analysis. ARMS counts the number of JS errors, the error rate, and the impact of this error on the business from different dimensions to help us make business decisions.

The digital experience management products for APP are also briefly introduced here.

The first is APP stability related analysis and ARMS APP performance analysis, which are divided into three types for stability problems. The first is crash analysis, including crash and aboard; the second is exception analysis, we will take the initiative to find your exceptions, including memory leaks, such exceptions as the main thread IO; in addition, in terms of stability, we will also Supports multi-dimensional analysis capabilities, including which version, which device, which operator, which region, which network, and statistics on the proportion of different dimensions, which can help us determine the root cause and impact. At the same time, it supports detailed drill-down of stability problems to help us locate the specific cause.

The second is the performance analysis capability of the API, which can be combined with ARMS application performance analysis to achieve end-to-end network performance analysis. In addition to statistics on the network performance of the APP, it can also be linked to the back-end application call link with one click, so as to quickly locate which microservice or component, or even which line of code is causing the slow call.

Finally, let's talk about the remote log pulling capability of APP performance analysis. For this kind of log, ARMS's APP performance analysis is relatively lightweight, and there is no need to bury or collect, or access the full-text search system. As long as the APP's SDK is integrated, ARMS will pull the crash log on demand, restore the error scene, and quickly Locate complex problems. You can also specify the device, version, and system to create a new pull task and actively pull the logs of the user's APP device. At the same time, the environment such as machine memory and CPU will also be pulled out when the log is running to assist in locating the problem. In addition to active pulling, intelligent pulling can also be implemented for crash scenarios. After detecting such a crash event, it automatically creates tasks, intelligently selects devices, obtains logs of problematic devices in advance, and retains the scene to save time for troubleshooting.

Digital Experience Management Best Practices

The above is an introduction to the product capabilities of ARMS in digital experience management. Finally, we share some best practices based on several customer cases.

The first case is Jieka Robot. Jieka Robot is a domestic intelligent robot manufacturing service provider that works closely with more than 300 automated airlines around the world to serve global customers. In order to better serve global customers, Jieka Robot takes online marketing as one of the important marketing methods, and has carried out a large number of overseas advertisements on Google. In order to ensure the effect of online marketing, Jieka Robot must first ensure that the landing page of the official website can be accessed normally. If a page or official website has usability or performance issues, it will not only affect the conversion rate but may also cause Google to stop serving them. After communicating with the observable team, Jieka Robot decided to use the ARMS cloud dial-up test to continuously test the overseas official website. The dial test task, continuous performance test on the official website, and finally found two problems:

First, the CDN scheduling in some regions is not very accurate, mainly in the eastern United States and Southeast Asia. CDN scheduling does not implement the optimal scheduling scheme. Second, there are some large image files on the official website, which affect the loading speed of the website. Based on these two judgments and positioning, after communicating with the CDN supplier, the card-saving robot will fully optimize the CDN mobilization logic in the east of the United States and Southeast Asia, and also push the R&D team to compress the page images. It has been detected that the website opening speed has increased by 50%, which fully guarantees the online marketing effect.

The second case is the front-end performance analysis case of ARMS. As a leader in the domestic children's programming education industry, Walnut Programming has developed very rapidly in its overall business volume. With the development of the business, the system architecture has become more and more complex, and the back-end adopts the microservice distributed architecture. How to improve the observability of the distributed system is a big problem at that time.

For the online education industry, user experience is very important. Because the user experience will directly determine the brand image and conversion rate. However, due to the adoption of the micro-service architecture, in a teaching scenario, a user's simple teaching may involve calls between different applications and even some third-party service interfaces. Therefore, any link failure or line bottleneck may affect the user experience. After examining open source methods and enterprise-level solutions, Walnut Programming finally decided to use ARMS's front-end performance analysis, combined with application performance analysis to realize digital experience management of teaching terminals. The first thing that impressed them at that time was the fast access capability of front-end performance analysis. There was no need to bury the point, and only need to introduce a script in the customer's front-end code to report the detection data. The second is to quickly locate the root cause of the problem by combining the end-to-end performance insight of application performance analysis. The third is the ability of multi-dimensional analysis. ARMS's front-end performance analysis can aggregate and analyze performance from multiple dimensions such as geographic location, operating system, resolution, and network operator, and specifically locate the causes of performance bottlenecks. Finally, ARMS's alerting capability enables the operation and maintenance team to perceive the experience problems as soon as possible. Truly find a problem in 5 minutes, isolate it in 10 minutes, and fix it in 30 minutes. For Walnut programming, the observable system of ARMS helps them reduce the workload of operation and maintenance by more than 30%, and also shorten the average time of fault location by 60%, which greatly improves the user experience and lays a solid foundation for the sustainable development of the business. solid foundation.

The above is the sharing of user cases for different products of digital experience management.

Click here , go ARMS official website for more details!


阿里云云原生
1k 声望302 粉丝