The annual double eleven is here again, and the best gift for technical people is to promote the technical guide! After these years of development, the big promotion has long been not only limited to the e-commerce industry. Now all walks of life actually adopt similar methods for operating activities. There are 818 in the automotive industry, 618 in e-commerce, 11.11, etc., etc. The big promotion scenario poses many new challenges to basic software including databases, and also accumulates many best practices.

Before Double Eleven, PingCAP will conduct a series of in-depth discussions with users such as Autohome, Bitauto, JD.com, and Zhongtong, hoping to reveal what kind of technical problems are hidden behind the soaring sales volume year by year? What technical architecture can be used to stably carry the traffic peak?

818 Global Automobile Festival

There are three major shopping festivals on the Internet in China, 11.11, 618 and 818.

Both 618 and 11.11 are very familiar to everyone, while the 818 is more special. It is a festive carnival specially created for car buyers. Autohome "818 Global Car Night" is the top festival in the auto industry jointly created by Autohome and Hunan Satellite TV. It has been successfully held three times this year.

Compared to the other two shopping festivals, 818 can be said to be the only one in the world, and there is no such event in any other country with the most developed cars. In this regard, Zhang Fan, a senior engineer of Autohome, explained: “I personally feel that no country on the planet can surpass China in the area of e-commerce and online transactions. And why Autohome is the first to do this. What? First of all, Autohome is the world's most visited car-type website. It is with such a huge cohesion and user base that Autohome can do this and bring such influence to the majority of users. In addition, The original intention of this event is to provide car users and car enthusiasts with an opportunity similar to 11.11 and 618 to really get a discount when buying a car, so it is widely welcomed by users."

Starting in 2019, the "818 Global Car Nights" co-operated by Autohome and Hunan Satellite TV has lasted for three years. Different from the traditional big promotion, 818 Global Car Night pushed the car shopping festival to its peak through live TV and synchronizing with APP activities, bringing a car buying feast to the auto industry in August.

818 Challenges Brought by Live Events

Zhang Fan said frankly that in the 818 event of Auto House, the live broadcast was the most difficult. It is completely different from recording and broadcasting. There will be a lot of variables in the process of live broadcasting. Maybe there will be a lengthening of the program time, maybe there will be improvisation by the host, and maybe there will be some sudden data processing in the background. As the highlight of the whole party, activities such as the one-dollar spike car, the red envelope lottery, and the super koi carp are the links with the highest user participation and peak traffic. The timing of the beginning and end of these activities must be coordinated with the foreground and background with a precision of seconds.

On the day of the live broadcast, Auto House usually sends a team to the Hunan Satellite TV live broadcast site to communicate in real time with the "war room" in Beijing through multiple communications such as mobile phones, telephones, 5G walkie-talkies, and online video connections. Since the live signal is usually one minute later than the live signal, after the present host said that the 31st second kill started, there was actually only one minute of preparation time in the background. One minute later, millions of users in front of the TV will be able to see three, two, and one on their phones. The flash button will light up and you can press it to participate in the event. There must be no mistakes in this process, and one-to-one synchronization must be achieved.

The whole process was very intuitive to Zhang Fan and the others in the "war room" at the rear. This "war room" has large data screens, large monitoring screens, live broadcast signals and TV signals seen live. Every time the spike starts or the red envelope starts, the lines in the large monitoring screen will show cliff-like fluctuations as the number of participants and the number of interactions increase. These lines that represent business indicators are called "electrocardiograms", and when some popular stars appear in the live broadcast, this fluctuation will even be more than 2-4 times higher than other periods.

At the same time, the on-site data screen is also displaying about 20 data indicators in real time at a speed of 1-2 seconds, including the number of event participants, the number of user interactions, the distribution of prizes, and even this round of one-dollar spikes. Who are the users participating in the event, where, and what car has been hit.

These real-time data will not only be seen by the back-end staff, but the data will also be displayed on the live broadcast site in real time. This played a very important role in supporting the atmosphere of the live event. For example, when a user sees this party in front of the screen that the party is very popular, and there are really many people participating in the one-yuan car grab interaction, this is equivalent to a reverse incentive for him, and then participate in it.
In this process, the real-time data large screen not only solves the real-time transaction problem, but also feeds the real-time analysis data back to the on-site host. When the host announced the winning information almost in real time, the atmosphere of the party was also pushed to a high level, which played a key role in attracting more people to participate. And as the spike cars become more expensive, the more the rear system is, the higher the crest the system bears. Compared with the usual business of Carhome, the traffic experienced by the party has more than tenfolded, and the pressure on the entire system is self-evident.

Car home big promotion solution-distributed system family bucket

Big promotion scenarios usually require the system to be capable of rapid expansion and high availability, and distributed systems naturally have this ability. The car home uses a family bucket distributed system, including databases, queues, caches, and so on.

Among them, the distributed database mainly exhibits three capabilities, namely, high-level scalability, disaster tolerance, and cloud capabilities. TiDB based on the distributed architecture supports these features from the beginning, and has been well verified in the scene of the car home.

Tao Huixiang, the person in charge of the Autohome database, said that traditional relational databases, such as MySQL, SQL Server, etc., often encounter problems with the upper limit of the single-machine load capacity of some databases when the amount of data is particularly large. TiDB can be expanded horizontally from TiDB Server to TiKV and PD, and the performance can be linearly improved with the horizontal expansion, which satisfies the performance and scalability requirements of Autohome.

818 is the most important activity of the year for the car home, and the system must guarantee absolute reliability and stability. Therefore, in this 818 event, Autohome deployed TiDB clusters in three centers in the same city on the public cloud to avoid any problem in one computer room, which would affect the service quality of the overall event.

In the same city three data center solution, that is, there are three computer rooms in the same city to deploy a TiDB cluster, and the data synchronization between the three data centers in the same city is completed through the cluster itself (Raft protocol). The three data centers in the same city can perform external read and write services at the same time. When any data center fails, the cluster can automatically restore services, without manual intervention, and can ensure data consistency. The TiDB three-center architecture in the same city successfully supported the business during the 818 gala, and its operation performance was very stable.

Autohome 818 TiDB cluster overall architecture diagram

In this 818 project, the car home uses the latest version of TiDB 5.1.1, and the version of MySQL is Percona 5.7.25;
TiFlash is a key component of the TiDB HTAP form. It is a column memory extension of TiKV and is mainly used for OLAP business. TiFlash is deployed across regions to improve disaster tolerance. Autohome uses TiFlash to solve statistical analysis SQL, which is displayed on the big screen in real time;

TiCDC is a TiDB incremental data synchronization tool implemented by pulling TiKV change logs. It has the ability to restore data to a state consistent with any upstream TSO, and supports other systems to subscribe to data changes. TiCDC is deployed across regions and synchronizes TiDB cluster data to the downstream MySQL database in real time, as a backup for failure emergency, and realizes the improvement of business disaster tolerance;
The MySQL master-slave deployment is used for emergency and downgrade of the TiDB cluster to achieve business disaster tolerance improvement.

Database pressure test

Before the 818 event, the database team and the business side conducted a round of rigorous failure rehearsal and stress testing to ensure the high availability of the back-end.

Tao Huixiang revealed that there are many types of failure drills at Autohome. The database alone will drill the main library failures and the computer room failures. A total of three rounds were performed. In each round of testing, TiDB performed very well. KV failures basically took tens of seconds, and it only took 20 seconds to recover. Even if the machine room fails, it can be automatically switched within one minute.

In order to ensure the smooth support of the activity, PingCAP community technical experts have provided community technical support to Autohome for three consecutive years. In this year's stress test session, the community technical experts and the car home DBA completed the tuning, solved the hot writing problem well, and doubled the performance several times. In the end, during the peak of 818, TiDB successfully supported 90.48 million interactions of APP users during the party, and resisted the maximum write of 400,000 rows per second, and SQL 99 stabilized below 30ms. TiCDC performance is also very strong, the downstream MySQL synchronization speed is as high as 130,000 rows per second. The cross-center TiFlash MPP architecture provides strong support for the large-screen near real-time display of information such as the total number of assists, spikes and each round of participating users in the lottery.

Tao Huixiang gave a very high evaluation of the performance of TiDB in the big promotion: TiDB is very suitable for this kind of data level of more than one billion scenarios. One is that the analysis capability of TiDB is real-time, and the other is the data storage capability of TiDB. It is much better than traditional databases, such as SQL Server. TiDB combines the advantages of traditional data warehouses and traditional relational databases, and is very suitable for application in a business environment of this magnitude.

future plan

The database team of Autohome also summarized many best practices in this 818 promotion:

Like the city’s three-center five-replica architecture, the delay between computer rooms should be as small as possible, preferably within 2ms;

In OLTP business, the pressure test bottleneck usually lies in the disk IO of TiKV. For ordinary SSD, RAID0 can be used to increase IOPS;

Once a certain availability zone fails, manual intervention is normally not required, but in order to avoid severe performance degradation, it is recommended to manually adjust the five replicas to three replicas;

Reasonably design the table structure and index, try to avoid hot issues, do a full stress test together with the business, and find problems and optimize them as early as possible during the stress test.

Based on the good performance in this event, Tao Huixiang said that Autohome will continue to promote the launch of TiDB in more businesses in the future. For example, a lot of data from the car home in the past will run in Hive, and it will take the next day to know what happened yesterday. If you apply TiDB, you can analyze user data and business indicators required for operations, and do a second-level quasi-real-time push. It is expected that this time can be compressed to 5-10 seconds. The business side can immediately know what changes the user has made in the last moment and what data is updated.

618, 11.11 For enterprises, in addition to supporting business innovation, it is also a major training and full-link exercise of their own technical architecture. Through the ultimate test, the company's IT architecture, organizational processes, and talent skills have all been greatly improved. The experience and thinking in this process will also accelerate the company's daily business innovation rhythm, improve technology-driven innovation efficiency, and create a new growth engine.


PingCAP
1.9k 声望4.9k 粉丝

PingCAP 是国内开源的新型分布式数据库公司,秉承开源是基础软件的未来这一理念,PingCAP 持续扩大社区影响力,致力于前沿技术领域的创新实现。其研发的分布式关系型数据库 TiDB 项目,具备「分布式强一致性事务...