Construction and exploration of vivo database and storage platform

This article is based on the content of Mr. Xiao Bo's live speech 2021 vivo Developer Conference Reply on the public [2021VDC] obtain relevant information on the topics of the Internet technology sub-venue.

1. Background of database and storage platform construction

Taking history as a mirror, we can know the rise and fall, and the same is true of technology. Before introducing the platform, let's first review the development of vivo's Internet business in recent years.

Let's go back three years to see the development of vivo's Internet products in recent years. In November 2018, the cumulative total users of vivo's mobile Internet exceeded 220 million; in 2019, the application store, browser, video, wallet and other Internet The daily active users of applications exceeded 10 million; in 2020, the daily active browsers exceeded 100 million; in 2021, the total online users (excluding export) reached 270 million; dozens of applications with monthly active users exceeding 100 million, databases and storage products also reached 100 million. Scale of 4000+ servers and 50,000+ database instances.

was the database and storage platform like three years ago?

If the status quo of database services in 2018 can be described in one word, then I think "crisis like eggs" is the most suitable, which is mainly reflected in the following points;

The availability of the database online environment is often affected due to inefficient SQL, human misoperation, unreasonable infrastructure, and the robustness of open source products.
The changes are not standardized, the efficiency of changes and various operation and maintenance operations are low, there is no platform support, and the command line terminal is used to make changes.
The cost of database usage is extremely high, and in order to cope with increasingly complex business scenarios, a lot of additional costs are added. These are the status quo of vivo's database in 2018.
The security capabilities are not sound enough, and the data classification and classification, password account permissions, etc. are not standardized.

Let's take a look at the changes in some operational data of the vivo database over the years.

From the end of 2017 to the beginning of 2018, the scale of database instances has increased by nearly 5 times in the past three years, the scale of database servers maintained has increased by 6.8 times, and the single-machine deployment density of database instances has increased by more than 5 times. Dimension's database instance size increased 14.9 times.

Based on the above figures, we found that vivo's Internet business is actually in a state of rapid development in recent years. In the process of rapid development, whether it is from the service quality experienced by users or from the internal cost efficiency, the solution to data storage is The problem is imminent, so we launched the plan of self-developed database and storage platform in 2018. After several years of construction, we have initially acquired some capabilities, and now I will give you a brief introduction to these capabilities.

2. Capacity building of database and storage platform

First, let's introduce the database and storage platform products as a whole, which are mainly divided into two layers.

The first layer of our database and storage products includes relational data, non-relational databases, and storage services.
The second layer is mainly tool products, including R&D and operation and maintenance platforms that provide unified management and control of databases and storage, data transmission services, white-screen operation and maintenance tools, and some products such as SQL auditing, SQL optimization, and data backup.

Tool products are mainly based on self-developed products. For the database and storage products in the lower layer, we will give priority to mature open source products. At the same time, we will also use open source products or pure self-developed products to better meet business development. The following is a brief introduction to the capabilities of some products.

DaaS platform is the abbreviation of Database as a Service, which aims to provide a highly self-service, highly intelligent, highly available, low-cost data storage use and management platform, covering database and storage products from service application, deployment, maintenance The whole life cycle from offline to offline provides value to the company and users from four aspects.

The first is to improve the availability of database products, through inspection, monitoring, planning, fault tracking, etc., to prevent faults in advance, deal with them in a timely manner, and review and summarize the whole process to close the whole process.
The second is to improve R&D efficiency, develop self-service use databases, provide functions such as change detection and optimized diagnosis, reduce manual communication processes, and make project changes standardized and process clear to improve R&D efficiency.
The third is to improve data security, and comprehensively ensure data security through a series of means such as authority control, password control, data encryption, data desensitization, operation audit, and backup encryption.
The fourth is to reduce the operating costs of database and storage products. First, the repetitive work of DBAs is reduced through automated processes to improve human efficiency, and secondly, through service orchestration and resource scheduling, the resource utilization efficiency of database and storage services can be improved, and operating costs can be continuously reduced.

After several years of construction, some progress has been made in the above work. Among them, thousands of demand work orders per month, of which more than 90% of R&D students can complete it by themselves. The service availability has been maintained at more than 4 9s in recent years. The platform six kinds of platform support for database products and storage services reached more than 85%, and do database-scene coverage of the same capabilities, such as data changes, we support MySQL , ElastiSearch , MongoDB , TiDB Statement review before the change, backup of change data, one-key rollback of change operations, audit trail of change records, etc.

Vivo's DTS service is a fully self-developed data transmission service based on its own business needs. It mainly provides data interaction between data sources such as RDBMS, NoSQL, and OLAP. The integrated service integrating data migration, subscription, synchronization and backup has three main features in terms of functions;

The first is the stability of the synchronization link and the guarantee of data reliability. By combining the characteristics of each data source product, it is possible to ensure that the data is not duplicated or lost, providing 99.99% service availability guarantee for the business.
The second is to support a variety of heterogeneous database types at the functional level. In addition to common functions such as synchronization, migration, and subscription, we also support centralized storage and retrieval of changed data.
The third is the level of failure disaster recovery, which supports node-level failure disaster recovery, can achieve synchronous link recovery in seconds, and also supports breakpoint resume transmission, which can effectively solve the problem of transmission interruption caused by hardware, network and other abnormalities.

Let's take a look at some of the projects we have done in the underlying data storage layer, first of all, let's take a look at the MySQL database.

As the most popular database, MySQL also undertakes the important task of relational database services in vivo. The MySQL 2.0 in the above picture is our internal architecture version. In the past few years, our architecture has evolved into two versions.

In the first version, in order to quickly solve the usability problems faced at that time, version 1.0 was made based on MHA+ self-developed components.

At present, it has evolved to version 2.0, and components such as MHA are no longer dependent. From the perspective of architecture, the service access layer of version 2.0 supports the use of DNS or name service access for business, and a self-developed proxy layer Proxy is added in the middle. This layer is 100% compatible with MySQL syntax and protocols. On Proxy, we have implemented three-level read-write separation control, traffic control, data transparent encryption, SQL firewall, log auditing and other functions.

The Proxy layer combines the underlying high-availability components to realize automatic and manual failover of MySQL clusters, and ensures the availability of the high-availability control components themselves through the RAFT mechanism. Of course, MySQL uses a master-slave architecture, which can be deployed across IDCs in the same region. Cross-regional synchronization can be solved with the DTS products mentioned above, and cross-regional multi-active is not yet supported, which belongs to the planned 3.0 architecture.

As a very popular and excellent KV storage service, Redis has been widely used in vivo. In the development process of vivo Internet, Redis has been used in a stand-alone version, and a master-slave version of Redis has also been used. So far, all of them have been upgraded to cluster mode, and the features such as automatic failover and elastic scaling of cluster mode have helped us solve many problems.

However, when the scale of a single cluster is expanded to TB level and the number of nodes in a single cluster is expanded to 500+, there are still many problems. Based on the demands to solve these problems, we have done some transformation and development on Redis, which mainly includes three parts:

The first is the availability of multiple computer rooms in Redis Cluster. For this reason, we have developed a multi-active version of Redis based on Redis Cluster.
The second is to strengthen the data persistence of Redis, including the transformation of AOF logs, the introduction of AEP hardware, and the planned Forkless RDB.
The third is the enhancement of Redis synchronization and cluster mode, including asynchronous replication, file caching, water level control, etc., as well as optimization of the time complexity of Redis Cluster instructions. The time complexity of Redis Cluster instructions has brought our operation and maintenance. Many troubles, through the optimization of the algorithm, the time complexity has been reduced a lot, and the code of this block has been synchronized to the community.

After we made these optimizations on Redis, we found that only in-memory KV storage could not meet business needs. In the future, there will be larger storage scale requirements. Disk-based KV storage products must be used for data distribution and data processing. Tiered storage, for which we have developed disk KV storage products.

When we started the disk KV storage service R&D project, we clarified the basic demands of the business for storage.

The first is that it is compatible with the Redis protocol, which can be easily switched from the original project using the Redis service.
The second is that the storage cost should be low, the storage space should be large, and the performance should be high. Combined with some basic requirements of operation and maintenance, such as automatic failover, rapid expansion and contraction, etc.

In the end, we chose the disk KV storage service implemented with TIKV as the underlying storage engine, and we encapsulated the Redis instruction and the Redis protocol in the upper layer. Another reason for choosing TIKV is that TiDB products are used in our overall storage product system, which can reduce the learning cost of operation and maintenance personnel and enable them to get started quickly.

We have also developed a series of peripheral tools, such as Bulk load batch import tool, which supports importing data from the big data ecosystem to disk KV storage, data backup and restoration tools, Redis to disk KV synchronization tools, etc. These tools greatly reduce business costs. Migration cost. At present, our disk KV storage products have been widely used internally, supporting multiple TB-level storage scenarios.

We know that in addition to the access requirements for some structured or semi-structured data, there are also a large number of unstructured data access requirements in the process of business operation. It is against this background that vivo's object and file storage services are built.

Object and file storage services use a unified storage base, and the storage space can be extended to the EB level or above. The upper layer exposes standard object storage protocols and POSIX file protocols. Businesses can use the object storage protocol to access files, pictures, videos, and software packages. Etc., the standard POSIX file protocol can enable businesses to expand their access requirements like using a local file system, such as HPC and AI training scenarios, which can support GPU model training of tens of billions of small files.

For picture and video files, some common picture and video processing capabilities are also expanded, such as watermarking, thumbnailing, cropping, cropping, transcoding, etc. The previous briefly introduced some product capabilities of vivo database and storage platform, then let's talk about our exploration and thinking on some technical directions in the process of platform construction.

3. Exploration and thinking of database and storage technology

In terms of platform construction, the improvement of R&D efficiency in operation and maintenance is a common topic. There are many well-built platforms and products in the industry, but there is less talk about how to improve the efficiency of R&D in operation and maintenance of data storage.

Our understanding is:

First of all, the delivery of resources must be agile enough, and enough underlying technical details must be shielded. For this reason, we conduct unified management of IDC self-built databases, cloud databases, and cloud host self-built databases on the cloud and off the cloud to provide a unified operation view. Reduce the use cost of operation and maintenance and R&D.
Secondly, in order to improve efficiency, it is not only necessary to focus on the production environment, but to have effective means to uniformly manage and control multiple environments such as R&D, testing, pre-release, and production, so as to achieve unified experience and secure isolation of data and permissions.
Finally, we use the idea of DevOps solution to logically divide the entire platform into two domains, one is the R&D domain and the other is the operation and maintenance domain:

In the R&D field, we need to think about how to solve the efficiency problem of R&D students about database and storage products. It is not enough to deliver a database instance and support them to build databases and tables on the platform. Many operations occur in the coding process, such as constructing test data, writing logic code for adding, deleting, modifying, and checking.

We hope to interact with our platform during these processes to maximize R&D efficiency.

In the O&M domain, we believe that a good measure is the number of times you need to log in to the server in the daily O&M process. All O&M actions are standardized and automated, and some operations can be intelligent in the future. In the part of R&D and operation and maintenance interaction, our construction goal is to reduce interaction. The fewer people involved in the process, the higher the efficiency. Let the system make decisions, and the solution is to do self-service. Let's take a look at some exploration and thinking about security.

Security is no trivial matter. For this reason, we take out the parts of database security and data security separately for planning and design. The basic principle is that rights and responsibilities are clearly defined, and the database system involves account passwords and so on.

We have jointly developed a password encryption transmission scheme with the SDK team. The password of the database is cipher text for R&D and operation and maintenance. It is still cipher text in the project configuration file. .

For the data part, we work with the security team to automatically label and identify sensitive data, classify and grade it, and strictly control the query, export, and change of sensitive data, such as authority control, authority upgrade, and use through digital watermarking technology. Tracking, which can be traced back to who viewed what data at what time through an after-the-fact audit. We also perform transparent encryption and decryption operations for sensitive data, and the data placed on the storage medium is encrypted and stored.

In the same way, the backed up data and logs are also encrypted and stored. These are what we are currently doing, and there are still many capabilities that need to be built in the security field in the future. Let's look at the change again.

For the scenario of data change, we mainly focus on two points;

The first is whether the data change itself will affect the existing business. For this reason, we have built the ability to change the table structure and table data without locking, and set up three lines of defense for the three links before going online, during going live, and after going live. To prevent some bad SQL or Query from flowing into the production environment, if you want to roll back during the change process or after the change, we have also made a one-click rollback plan.
The second is the problem of change efficiency. For multiple environments and multiple clusters, we provide a one-click synchronization data change solution. At the same time, in order to better improve the user experience, we also provide a GUI library table design platform. With these foundations, we have opened up all the capabilities of the entire scenario to R&D students. Now R&D students can perform data change operations by themselves 24 hours a day, which greatly improves the efficiency of change.

4. Explore and think

Next, we will introduce some thoughts on the cost.

Regarding the cost, we mainly manage it from four aspects;

The first is budget management and control, resources are de-physicalized, business budgets are reported based on resources, and server consumption is predicted and continuously revised at the budget control level to ensure the health of the water level.
The second is the deployment of database services. We have gone through several stages. The earliest stage is a single machine and a single instance, which wastes a lot of resources. Later, it develops into a standardized package deployment. The same type of storage resources are mixed with different packages. Through algorithm optimization Continuously improve the efficiency of resource use.
The third is that we have made a series of mixed deployments of resources with different attributes, such as the mixed deployment of the agent layer of the database and the data nodes of the object storage. One of these two resources is CPU-based and the other is storage-based, which can complement each other. The next stage of future development should be the separation of cloud-native storage and computing, which is still being explored.
The fourth is that after the service is deployed, it is necessary to constantly pay attention to the running status, conduct inspections and early warnings on the capacity, and perform timely upgrade and distribution operations on the cluster to ensure the orderly operation of the entire operation. At the same time, it is necessary to pay attention to the operation status of the business, recover the offline data storage cluster in time, and reduce the existence of zombie clusters.

Another point of cost is the iteration of hardware resources, which is also very critical, so I won't introduce it too much here. Then let's take a look at the storage service system.

In object and file storage, we mainly focus on two points;

first of 161f742abad6a4 is the cost of . Regarding the cost, we used EC in the data redundancy strategy, and made EC across IDCs. All single IDC failures will not affect our data reliability. We have also introduced high-density and large-capacity storage servers to increase the storage density of a single rack as much as possible. It should be noted that the operating cost after server purchase cannot be ignored, and there is still a lot of room for optimization. We also provide data lossless and transparent compression capabilities and life cycle management capabilities, timely cleaning up expired data and archiving cold data. Continuously reduce storage costs through a variety of means.
second is performance . Regarding performance, we provide bucket & object granularity underlying storage engine IO isolation. By introducing some open source components such as alluxio, we provide server-side + client-side caching to improve hot data read performance. For the storage engine, we introduced opencas and IO_URING technologies to further improve the disk IO throughput of the whole machine.

The above is some exploration and thinking about the capabilities we are currently building. Finally, let's take a look at our future plans.

5. Future planning

In the entire storage service layer, we will continue to improve the storage service matrix, polish products, and provide more diverse storage products to better meet business development demands. At the same time, the storage service layer will provide some SAAS services based on existing storage products to meet more business demands. At the functional layer, we disassemble it into 4 parts:

Data basic services, this part provides basic functions of storage products, including deployment, expansion and contraction, migration, monitoring and alarming, backup and recovery, offline recycling, etc.
Data services and storage products are essentially the carriers of data storage. We also have some specifications for the data itself. The most basic ones are data query and change performance optimization, data governance and how to go deep into the business coding process.
Storage autonomy service is initially divided into four parts: performance autonomy, capacity autonomy, intelligent diagnosis, and scene service. Through autonomous service, on the one hand, the happiness of DBA work can be improved, and on the other hand, the robustness and stability of our system itself can be greatly improved. sex.
Data security services, although some capabilities have been built at present, are not systematic enough and need to increase investment in the future.

In the future, the entire storage service system will be integrated into the company's overall hybrid cloud architecture, providing users with a one-stop and standardized experience. That's all for sharing.

Author: vivo Internet Database Team - Xiao Bo

Construction and exploration of vivo database and storage platform

1. Background of database and storage platform construction

2. Capacity building of database and storage platform

3. Exploration and thinking of database and storage technology

4. Explore and think

5. Future planning

vivo互联网技术

引用和评论

vivo Pulsar万亿级消息处理实践（1）-数据发送原理解析和性能调优

MySQL慢查询日志：性能优化的终极指南

做到真正0丢失、0重复：Apache SeaTunnel 实现万亿级数据一致性全解密

Devin 发布 DeepWiki，2 星的项目直接装出万星的气场

好用的开源埋点方案-ClkLog埋点用户分析系统

DNS服务器地址大全

实战分享：DolphinScheduler 中 Shell 任务环境变量最佳配置方式