2

I. Introduction

With the continuous expansion of mall business channels and the increasing number of promotional gameplay, the original mall v2.0 architecture has been unable to meet the ever-increasing event gameplay. Independent construction of the promotional system is required, decoupling from the mall, and providing pure mall marketing activity gameplay support capabilities. .

We will introduce the problems and solutions encountered during the construction of the vivo mall promotion system in series, and share the architecture design experience.

Second, the system framework

2.1 Business combing

Before introducing the business structure, let's briefly understand the business capacity building process of the vivo mall promotion system, and review the current promotion capabilities. The following problems exist in the promotion function in mall v2.0:

1. The promotion model is not abstract enough, maintenance is chaotic, and there is no independent activity inventory;

2. Chaotic activity management of inclusive and mutually exclusive relations, lack of unified promotion pricing capabilities.

The three pricing logics of business detail page, shopping cart, and order in the core transaction link of the mall are maintained separately and are not unified, as shown in the figure below. Obviously, with the increase in promotional offers or changes in gameplay, the amount of repeated development of the mall side business will increase significantly.

(Figure 2-1. Before the promotion pricing is unified)

3. Promotional performance cannot meet the activity level, which often affects the performance of the main mall.

Due to the coupling with the mall system, it is unable to provide targeted performance optimization, resulting in the system being unable to support large-scale promotion activities in increasingly frequent high-traffic scenarios.

Based on these pain points, we completed the independence of the promotion system in the first phase, decoupled from the mall, and built the core capabilities of the promotion system:

Promotion Management

A unified discount model and configuration management interface are abstracted for all discount activities, and functions such as event editing, modification, query and data statistics are provided. In addition, a unified activity inventory management is independently developed to facilitate the unified control of activity resources.

Promotional pricing

Based on the highly flexible and abstract pricing engine capabilities, the promotion pricing model of hierarchical pricing is defined, and unified preferential superimposition rules and pricing procedures are formulated to achieve the construction of vivo mall's promotion pricing capabilities. Promote the completion of the promotion pricing of all core links in vivo mall, and realize the unified calculation of the preferential price of the whole link, as shown in the following figure:

(Figure 2-2. After the promotion pricing is unified)

With the completion of the core capabilities of the first-stage promotion system, business needs have been greatly met, and various preferential games have increased. But what comes with it are various operational pain points:

  • Maintenance promotion activities cannot be spot-checked in advance to check whether the effect of the activities meets expectations;
  • With the increase of preferential gameplay, there are more and more preferential benefits for a product, and the configuration becomes more and more complicated. It is very easy to configure errors and cause online accidents;

For this reason, we started the capacity building of the second phase of the promotion system, focusing on solving the above operational pain points:

  • Provide time travel function, realize that users can "travel" to a certain point in the future, so as to realize the advance inspection of promotional activities;
  • Provide price monitoring functions, combined with the planning capabilities of the "Mall Marketing Price Capability Matrix", through multi-dimensional monitoring measures before, during and after the event, to "reduce the probability of errors, and stop losses in time for errors."

2.2 Promotions and coupons

The main purpose of promotion is to deliver various preferential information about commodities to users, provide preferential benefits, and attract users to purchase, so as to promote new life and increase sales. From this perspective, coupons are also part of the promotion.

However, for some reasons, the promotion system of vivo mall was not integrated with the promotion system during the independent process:

  • First of all, the coupon system has been independent in the mall v2.0, has been docked with many upstream businesses, and is already a mature middle-office system;
  • Furthermore, the coupons also have business specificity compared with other promotional offers, such as the ability to issue coupons and receive coupons.

When considering the cost of design and transformation, coupons are not included in the promotion system's capabilities, but coupons are also part of the commodity price discount after all, so promotion pricing needs to rely on the ability of the coupon system to provide coupon discounts.

2.3 Business Structure & Process

So far we have sorted out the approximate capability matrix of the entire promotion system. The overall architecture design is as follows:

(Figure 2-3. Promotion system architecture)

With the independence of the promotion system, the relationship between the entire shopping process of the mall and the promotion system is as follows:

(Figure 2-4. The latest shopping mall shopping process)

Three, technical challenges

As a mid-office capability system, the technical challenges faced by the promotion system include the following aspects:

  • In the face of complex and changeable promotional gameplay and preferential superimposition rules, how to make the system scalable, meet the ever-changing preferential demand, and improve the efficiency of development and operation.
  • How to meet the high-performance requirements in high-concurrency scenarios in the face of high-traffic scenarios such as new product launches and Double Eleven big for customers.
  • In the face of complex system environments such as untrusted calls from upstream business parties and unreliable services from downstream relying parties, how to improve the overall stability of the system and ensure the high availability of the system.

We have sorted out some technical solutions based on our own business characteristics.

3.1 Scalability

The scalability improvement is mainly reflected in two areas:

  • The definition of the preferential model, which abstracts a unified preferential model and configuration management interface for all preferential activities;
  • The establishment of promotion pricing engine and the unification of pricing models.

Related detailed design content will be explained in a follow-up article.

3.2 High Concurrency/High Performance

cache

Cache is almost a "silver bullet" to solve performance problems. In the promotion system, cache is also used extensively to improve performance, including the use of redis cache and local cache. While using caching, you need to pay attention to the problem of data consistency. Redis caching is good to solve, but local caching is not easy to handle. Therefore, the use of local cache depends on the business scenario, and try to use scenarios where the data is not frequently changed and certain inconsistencies are acceptable in the business.

batch

The business scenario of the promotion system is a typical read-more-write-less scenario, and the biggest impact on performance during the reading process is IO operations, including db, redis, and third-party remote calls. The batch transformation of these IO operations to exchange space for time and reduce the number of IO interactions is also a major solution for performance optimization.

streamlined/asynchronized

Simplify the implementation of functions and transform non-core tasks asynchronously. Such as cache processing after event editing, message synchronization after resource pre-occupation, message notification of group join state circulation, and so on.

cold and hot separation

In addition to the IO operation, the data volume that has the greatest impact on the performance of the read-more-write-less scenario is the amount of data. There are also some user-mode data in the promotion system, such as preferential resource reservation records, user group information, etc. These data have time attributes and have a thermal tail effect. In most cases, the most recent data are needed. It is the best choice to separate the data from hot and cold for such scenarios.

3.3 System stability

current limit downgrade

Based on the company's current limiting components, the non-core service functions are limited and service downgraded, and the core services of the overall system are fully guaranteed under high concurrency scenarios.

Idempotency

All interfaces are idempotent to avoid system abnormalities caused by the business side’s network timeout retry

fusing

Use Hystrix components to add fuse protection to calls to external systems to prevent failures of external systems from causing service breakdowns in the entire promotion system

Monitoring and alarm

By configuring the error log alarm of the log platform, the service analysis alarm of the call chain, and the monitoring and alarm functions of the company's middleware and basic components, we can find system abnormalities in the first time

Fourth, the pits stepped on

4.1 Redis SCAN command usage

In the process of clearing Redis cache data, there are some cache keys that are searched and cleared through fuzzy matching, and the bottom layer depends on the Redis SCAN command.

The SCAN command is a cursor-based iterator. After each call, a new cursor is returned to the user. The user needs to use this new cursor as the cursor parameter of the SCAN command in the next iteration to continue the previous iteration process.

For using the KEYS command, the SCAN command does not return all matching results at one time, reducing the risk of blocking the Redis system due to command operations. But it does not mean that the SCAN command can be used casually. In fact, SCAN has the same risk problem as the KEYS command in a large data volume scenario, which can easily cause the Redis load to increase and the response to slow down, which in turn affects the stability of the entire system.

(Figure 4-1 Redis load increases)

(Figure 4-2 Redis response with spikes)

The solution is:

  • Optimize the Redis key design to reduce unnecessary cache keys;
  • Remove the use of the SCAN command, and perform the clear operation through the exact match search.

4.2 Hot key issues

Redis cache is commonly used in promotion systems to improve performance, and most of the cached data is the dimension of SKU products. In business scenarios such as new product releases and promotion of specific types of mobile phones, it is extremely easy to generate hot key issues.

Hotspot keys have a clustering effect, which will cause an imbalance in the node load in the Redis cluster, which will cause the entire system to be unstable. This problem cannot be solved by ordinary machine expansion. The following figure shows the redis load situation during an online pressure test:

There are two commonly used solutions:

  • Hashing scheme: Hash the Redis Key and evenly distribute it to RedisCluster Nodes to solve the clustering effect of hot keys.
  • Multi-level caching scheme: Increase the use of local caching for hot keys to maximize access performance and reduce Redis node load.

We adopt a multi-level caching solution, refer to the excellent open source hotspot caching framework, and customize and expand a set of hotspot solutions, supporting hotspot detection, local caching, cluster broadcasting, and hotspot preheating functions to achieve quasi real-time hotspot detection and hotspot Key informs the instance cluster to perform local caching, which greatly prevents a large number of repeated calls from impacting distributed caching, and improves system operating efficiency.

Five, summary

This article is an overview introduction to the vivo mall promotion system. It briefly reviews the business capacity building process and system architecture of the vivo mall promotion system, and shares the technical problems and solutions encountered. In the future, we will share the design practices of the core functional modules of the promotion system (promotion management, promotion pricing, price monitoring and time travel) one by one, so stay tuned.

Author: vivo Internet official mall development team

vivo互联网技术
3.3k 声望10.2k 粉丝