Author|Bai Yu
In the era of e-commerce, traffic has become the core competitiveness of enterprises, and activities such as spikes and panic buying have become necessary marketing methods. Since Taobao started the Double Eleven event, promotional activities by major e-commerce platforms and brand owners have sprung up like bamboo shoots after a rain shower. When it is necessary to provide services to a large group of people, usability becomes the key to e-commerce operation & website operation and maintenance. In the face of the surge in traffic brought by the big promotion of e-commerce, how to deal with the huge number of users spreading in different regions and countries around the world and the surge in traffic At the same time, ensuring the stable operation of the business has become a problem that enterprises must solve. Take an e-commerce company with tens of millions of registered users as an example. During the promotion, the company will face the influx of users from nearly tens of millions of different regions at the same time, and the availability of the system will affect the success of the promotion.
For e-commerce websites, slow loading or unavailability of the website often means that the early marketing efforts have been burned. This is not only a loss of tens of millions of orders, but also an impact on brand reputation. In the e-commerce promotion scenario of Double 11, due to the increase in traffic, once the website has usability problems, the social impact caused will also be multiplied.
Therefore, for big promotion scenarios such as Double 11, both e-commerce platforms and self-built sites will conduct stress testing in the early stage to find system performance bottlenecks and make corresponding capacity planning through stress testing. But is it enough to just do stress testing and capacity expansion? It is far from enough. Stress testing is more about evaluating website performance and capacity from the perspective of merchants or platforms, and lacks performance evaluation methods and methods from the perspective of users.
Such website optimization is not just a simple expansion of IaaS layer resources, but requires optimization and adjustment of all links in the entire website browsing path. To simulate the use of users in different regions of the world, if there is no test tool that simulates a large number of users and simulates the behavior of real users, wanting to predict where the performance, bottlenecks or failure points of this complex shopping website system are located is more like an impossible task. .
Let’s take the product pre-sale activity of a well-known large e-commerce website as an example. We hope to test the performance of the website system before the product reservation and panic buying activities begin to find the system bottleneck, and then help the system optimization to ensure the reservation/snap-up activities. Go smoothly.
This test is a global dial test, involving the shop page, product detail page, and order page of the website system. To test the performance of each module and the entire system. It is necessary to simulate the simultaneous operation of a large number of real users in different regions of the world, check the page response time, to ensure that the system responds in time when users in different regions browse, and will not generate unknown errors or delays that affect the user experience of the website.
After we collect and integrate relevant performance and experience indicators with the help of tools, we must start relevant analysis. We take the performance and experience data of real users as the core. Then the analysis process and the real user access process should be roughly: terminal-network- operating system. In the process of analysis, we need to ensure that we have a sufficient sample size and our own weight evaluation of the impact of different indicators on the user experience. Among them, we focus on the terminal and network parts.
(1) Survey of global availability
Before the big promotion event, we will select real user monitoring points of different operators in important cities and different operators in different provinces across the country, and even overseas cities, to initiate multiple rounds of network dialing on the landing page address of the website according to the market we are facing. Evaluate the performance of domain names, IPs, and APIs from indicators such as latency, packet loss rate, and availability, and form an overall availability report, which will focus on governance for regions with poor availability or operators.
(2) User experience evaluation of the core path page
User experience determines the effect of promotional activities, especially the approximate speed of the page, and directly determines the user's stay. Research data shows that if the webpage is opened at 6-8 seconds, most visitors will leave, and if the webpage is opened at 12 seconds, 99% of users will leave. The evaluation of user experience before the big promotion is also a place we need to focus on.
For user experience, we will sort out the core browsing path of users in the early stage, and the pages on the core browsing path will be optimized and managed. Through the browsing tasks of the cloud dial test, we will obtain core experience indicators such as the first screen time and 100K time for users in different regions and operators to visit the page. Especially for the overall first screen time, the first screen time of the core browsing path must meet the corresponding requirements.
(3) Evaluation of DNS resolution effect
DNS resolution is the easiest place to be overlooked. The lessons of Facebook in the front-end time are still vivid, so we will also focus on governance for DNS. Through 1000+ monitoring points all over the world, including real user monitoring, it initiates network requests to the target domain name 24 hours a day, to help users monitor the availability and resolution performance of DNS services. At the same time, DNS dial test supports specified recursion, iteration of different query methods and resolution servers , Through the flexible dial test parameter configuration to simulate the visit of real users as much as possible.
After a regular dial test task, Alibaba Cloud dial test can generate reports on DNS resolution time in different regions, and at the same time, it clearly lists the details of DNS request pairs for each dial test, including A address, DNS time, DNS resolution process, etc. It can help users quickly analyze and locate DNS resolution problems. In addition, by configuring DNS alarms, for DNS availability issues and resolution performance issues, you can also buy time before users perceive and ask questions to fix, improve user satisfaction and reduce economic losses.
(4) CDN quality monitoring
As the website’s images and video content become more abundant, in order to solve the problem of slow access in different regions and different operators, many e-commerce websites are using CDN services to increase website loading speed, reduce bandwidth costs, and increase content availability and redundancy. Remain. Select target user groups, such as LastMile (real netizens) monitoring points in major countries such as North America, Europe, South America, Southeast Asia, etc., configure the browser dial test task, and perform dial test on the big promotion website.
By analyzing the dial test logs, we can understand in real time the display performance of the CDN after deployment, whether there is an improvement in the performance of the host node, and whether the availability is stable. Whether the target customer correctly hits the corresponding host node, or whether the matching degree is reasonable, whether the CDN node is synchronized with the source site, and whether the element publishing is provided in place and effective for a long time. And based on the above evaluation criteria, the CDN setting strategy was adjusted and optimized.
Every year on the eve of Double 11, full-link stress testing has become a necessary option for enterprises. Iterative optimization and comprehensive verification of business stability are continuously found through stress testing. The emergence of cloud dial testing is a full-link stress test. It is a perfect supplement to comprehensively analyze the user experience in the big promotion scenario from the user's perspective, so that users can have a better buying experience. And as the business continues to evolve, it continues to play an irreplaceable role.
About Cloud Dial Test
As a business-oriented non-intrusive cloud native monitoring product, Cloud Dial Test has become the best choice. Through Alibaba Cloud's global service network, it simulates real user behaviors and continuously monitors the availability and performance of websites and their networks, services, and API ports around the clock. Achieve fine-grained problem positioning at the page element level, network request level, and network link level. Abundant monitoring related items and analysis models help companies find and locate performance bottlenecks and experience dark spots in a timely manner, reduce operational risks, and improve service experience and efficiency.
(1) Global monitoring node coverage
More than 200,000 LMs worldwide, more than 500 IDC terminal monitoring nodes, 400+ operators at home and abroad, and hundreds of thousands of registered members ensure that the monitoring scale meets the ever-increasing business scale.
(2) No need to embed code, ready to use out of the box
Zero-intrusive monitoring, you only need to enter the URL and perform simple configuration without R&D support. A complete website performance data analysis report can be obtained in a few minutes. Multiple purchase models of resource packs & pay-as-you-go to meet the needs of operation and maintenance testing.
(3) Business-oriented, preset a variety of analysis models
The monitoring cycle is fine to the minute level, with more than 20 monitoring related parameter settings in 7 categories, supporting multiple mainstream protocols, and providing 7×24 hours of fine-grained real-time fault monitoring, warning and performance analysis services for sites and business ports. From the perspective of the end customer, through multi-dimensional combination analysis of regions and operators, drill down to analyze the details of a single sample, use a rich indicator system and chart types to visually locate the problem, the affected area and its root cause, and the pressure drop analysis time to improve Operation and maintenance efficiency. Realize refined monitoring.
(4) Intelligent alarm, precise positioning
Real-time alarms are realized for the first screen time, overall performance, and availability. The rich alarm policy settings are deeply integrated with the Alibaba Cloud Alarm Center to effectively shorten the MTTR. Support the discovery of page element-level errors, and accurately locate the cause of the problem to a single network request process, improving the efficiency of problem location.
Click on the link below to learn more!
https://www.aliyun.com/activity/middleware/website/performance/test?spm=5176.20960838.0.0.6d33305emfVIyC
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。