3
头图

I. Introduction

With the rapid growth of user levels, the monolithic architecture of vivo official mall v1.0 has gradually exposed its drawbacks: modules are becoming more bloated, development efficiency is low, performance bottlenecks appear, and system maintenance is difficult.

The v2.0 architecture upgrade started in 2017, and the vertical system physical split based on business modules, the split business lines each perform their duties, provide service-oriented capabilities, and jointly support the main site business.

The commodity module is the core of the entire link. The increase of modules seriously affects the performance of the system, and service-oriented transformation is imperative.

This article will introduce the problems and solutions encountered in the construction of the vivo mall commodity system, and share the experience of architecture design.

2. The evolution of the commodity system

Separate the commodity module from the mall, independently become a commodity system, and gradually develop to the bottom, providing basic standardized services for the mall, search, membership, and marketing.

The commodity system architecture diagram is as follows:

The early-stage commodity system is rather messy and contains many business modules, such as commodity activity business, spike business, and inventory management. With the continuous development of the business, the commodity system carries more business and is not conducive to system expansion and maintenance.

Therefore, thinking about gradually sinking the commodity business as the lowest and most basic business system, and providing high-performance services for many callers, the following describes the upgrade history of the commodity system.

2.1 Commodity activities, divestiture of gifts

With the continuous increase of commodity activities, diversified gameplay, and the corresponding increase in additional attributes related to activities, these are not strongly associated with commodity information, but are more biased towards user marketing, and should not be coupled with core commodity business. It was merged into the mall promotion system.

Gifts are not only mobile phones, accessories, but may also be points, members, etc. These are not appropriate for the product system, and are not part of the product module, so they will be merged into the mall promotion system at the same time.

2.2 Spike independence

As we all know, the characteristics of the spike activity are:

  • Time-limited: The time range is very short, and it will end when the set time is exceeded
  • Limited quantity: the quantity of goods is very small, much lower than the actual inventory
  • Large number of visits: low price, can attract a lot of users

Based on the above characteristics, a seckill activity is not done overnight. Due to the sharing of system resources, when sudden large traffic impacts will cause other services in the commodity system to deny service, it will cause the risk of blocking the core transaction link, so it is independent. A separate spike system provides services to the outside world separately.

2.3 Establishment of the agency sales system

The main sales categories of our mall are mobile phones and mobile phone accessories. The product categories are relatively small. In order to solve the problem of the lack of non-mobile product categories, the operation considers cooperation with well-known e-commerce companies and hopes to introduce more product categories.

In order to facilitate subsequent expansion and be non-intrusive to the original system, we have considered a special sub-system to undertake the agency sales business, and finally hope to make a complete platform, which will allow other e-commerce companies to provide open APIs in the future. Take the initiative to access our business.

2.4 Stock divestiture

Pain points of inventory management:

  • Since our inventory is in the product dimension, only one field identifies the quantity, and every time you edit a product, you need to adjust the inventory for the product, and it is impossible to dynamically achieve inventory management;
  • At the same time, the marketing system also has its own activity inventory management mechanism, with scattered entrances and weak relevance;
  • Saleable inventory and active inventory management are based on actual inventory, which causes easy configuration errors.

Based on the above pain points, and at the same time in order to make it easier to operate and manage inventory, and to lay a foundation for using actual inventory for sales in the future, we set up an inventory center and provide the following main functions:

  • Real-time synchronization with the actual inventory of ecms;
  • The estimated delivery warehouse and delivery time of the goods can be calculated according to the warehouse distribution of the actual inventory, so as to calculate the expected delivery time of the goods;
  • Completion of low inventory warning, can be calculated based on available inventory, average monthly sales, etc., and dynamically remind operation orders.

Three, the challenge

As the lowest level system, the main challenge is to have stability, high performance, and data consistency.

3.1 Stability

  • Avoid stand-alone bottlenecks: Choose the right number of nodes according to the pressure test, not waste, but also ensure communication, and can cope with sudden traffic.
  • Business current limiting and downgrading: The core interface is limited to flow, and the system is guaranteed to be available first. When the traffic is too stressful on the system, the non-core business is downgraded, and the core business is guaranteed first.
  • Set a reasonable timeout period: Set a reasonable timeout period for Redis and database access. It should not be too long to avoid the application thread being filled when the traffic is large.
  • Monitoring & Alerting: Log standardization, and access to the company's log monitoring and alerting platform at the same time, so as to proactively discover problems and promptly.
  • Fuse: The external interface is fuse to prevent the system from being affected due to the abnormal external interface.

3.2 High performance

Multi-level cache

In order to improve the query speed and reduce the pressure on the database, we adopt a multi-level cache method, the interface is connected to the hotspot cache component, and the hotspot data is dynamically detected. If it is a hotspot, it is directly obtained locally, if it is not a hotspot, it is directly obtained from redis.

read-write separation

The database adopts a read-write separation architecture, the main database performs update operations, and the slave database is responsible for query operations.

interface current limit

Access to the current-limiting component, the interface that directly operates the database will perform current-limiting to prevent the increase in database pressure due to sudden traffic or irregular calls, which affects other interfaces.

But I have stepped on some pits in the early days:

1. The product list query causes too many redis keys, resulting in the risk of insufficient redis memory

Since it is a list query, when caching is performed, the input parameters are hashed to obtain the unique key. Due to the large number of input products, the input parameters in some scenarios change at any time. According to the permutation and combination, it will cause the basic request to be changed every time. Returning to the source and re-caching may cause database denial of service or redis memory overflow.

scheme one : loop into the parameter list, get data from redis each time, and then return;

This solution solves the problem of memory overflow caused by too many keys, but it is obvious that it increases a lot of network interaction. If there are dozens of keys, it is conceivable that there will be a significant impact on performance, so what else Ways to reduce network interaction, let's look at the second option.

Plan two : We enhance the original Redis components. Since the Redis cluster mode does not support mget, we adopt the pipeline method to achieve it. Each product data only needs to be cached once, and the use of mget also greatly improves the query speed.

This not only solves the problem of excessive key values, but also solves the problem of multiple network interactions in the first solution. After pressure test and comparison, the performance of the second solution is more than 50% higher than that of the first solution. The more keys, the more obvious the effect.

2. Hot data leads to the bottleneck of redis stand-alone

Shopping malls often have new product launch conferences. After the press conference, they will jump directly to the new product vendor detailed page. At this time, the new product vendor detailed page will have extremely large traffic and sudden data, which leads to unbalanced Redis node load. Less than 10%, some reach more than 90%, and some conventional expansions are ineffective.

We have the following solutions for hot issues:

  • The hash of the key, the key is distributed to different nodes
  • Use local cache

At the beginning, we used the open source Caffeine to complete the local caching component, which automatically calculates the amount of requests locally. When a certain threshold is reached, the data is cached. The cache time is different according to different business scenarios, generally no more than 15 seconds, mainly to solve hot data The problem.

Later, it was replaced with a hotspot cache component developed by ourselves, which supports hotspot dynamic detection, hotspot reporting, cluster broadcasting and other functions.

3.3 Data consistency

1. It is better to solve the data consistency of Redis, using "Cache Aside Pattern":

For read requests, the cache is read first, and the hit is returned directly, and the database is cached after the miss is read. For write requests, the database is operated first, and then the cache is deleted.

2. Because the inventory is stripped out, the maintenance entrance is still in the commodity system, which leads to cross-database operations, and ordinary single-warehouse transactions cannot be resolved.

At the beginning, we adopted the method of exception capture and local transaction rollback. The operation is a bit more troublesome, but it can also solve this problem.

Later, we completed the distributed transaction component through the open source seata, and introduced the company's basic components by rewriting the code, which has now been connected and used.

Four, summary

This article mainly introduces how to split the mall commodity system and slowly sink it as the most basic system to make its responsibilities more single, able to provide high-performance commodity services, and to share the technical problems encountered in the process and solutions The plan will be followed by the evolution history of the inventory system and content related to distributed transactions, so stay tuned.

Author: vivo official website mall development team-Ju Changjiang

vivo互联网技术
3.3k 声望10.2k 粉丝