Is sub-database sub-table really suitable for your system? Talk about how to choose between sub-database sub-table and NewSQL

Once upon a time, "segment databases when concurrency is high, and tables when data is large" has become the bible for dealing with MySQL data growth problems.

Interviewers like to ask, bloggers like to write, and candidates like to memorize. It seems that a closed loop has been formed.

But have you ever thought about, is _ sub-database sub-table really suitable for your system? _

sub-table

When the business was just developed, all traffic was sent to a MySQL, and all user information fell into the user table.

Later, the amount of data in the user table became larger and larger.

So, you do a vertical split, splitting the original user table into a new user table and user\_details table.

After such a split, the user's information is scattered into two tables, the data volume of the user table becomes smaller, and the problem that the data volume of the user table is too large is temporarily solved.

However, with the development of the business, the online traffic has become larger and larger, and a single MySQL can no longer bear the pressure of the traffic.

When a single library cannot withstand the pressure, it needs to be divided into libraries.

Branch library

As the name implies, sub-library is to split a library into multiple libraries, so that multiple libraries can share the pressure of traffic.

Splitting into multiple libraries also means that tables are divided, that is to say, a sub-library must be divided into a table, and a sub-table may not necessarily be divided into a library.

We can divide the implementation of sub-database and sub-table into three types according to _partial application_ or _partial DB_:

JDBC proxy mode
DB proxy mode
Sharding On MySQL DB Mode

JDBC proxy mode

The JDBC proxy pattern is a decentralized architectural pattern. ShardingSphere-JDBC is a typical implementation of JDBC proxy mode.

Services are usually provided in the form of jar packages, allowing clients to directly connect to the database. This mode does not require additional deployment and dependencies, and can be understood as an enhanced version of the JDBC driver.

Although the JDBC proxy mode is simple, it violates the principle of DB transparency and is highly intrusive, requiring different drivers to be written for different languages.

Meituan's Zebra, MTDDL, and Ali's TDDL are all based on this model.

DB proxy mode

The DB proxy pattern is a centralized architecture pattern. ShardingSphere-Proxy is a classic implementation of the DB proxy mode.

This mode aims to achieve a transparent database proxy and is independent of application deployment. Because of independent deployment, there are no restrictions on heterogeneous languages and will not cause intrusion to applications.

The DB proxy mode consumes fewer connections than the JDBC proxy mode, and the performance is relatively better.

However, the centralized design also brings single-point problems. In order to maintain high availability and high performance, it is necessary to introduce VIPs such as LVS/F5 to achieve load balancing of traffic. Lengthens the application-to-database link, which in turn improves response time.

Ali's MyCat, Meituan Atlas and Baidu Heisenberg are implementations based on the DB proxy model.

Sharding On MySQL

Sharding On MySQL is equivalent to shielding the operation of sub-database and sub-table, which is the result of the combination of operation and maintenance and middleware. A typical example is Ali's DRDS.

This mode makes the sub-database and sub-table blurred, and to the application, it is more like a new type of database encapsulating MySQL.

Although it has become easier for users to use, but behind the simplicity is the precipitation of operation and maintenance, and the problem of sub-database and sub-tables still exists.

The cost of sub-library and sub-table

There are many ways to implement sub-database and sub-table, but the implementation of different modes seems to make up for the defect that MySQL does not support distributed.

The sub-database and sub-tables force MySQL to achieve a pseudo "distributed" state, which also brings some new problems, such as:

Functional limitations : Cross-dimensional joins, aggregations, and subqueries no longer exist after sub-databases and sub-tables, and global constraints such as unique keys and foreign keys can only be guaranteed by business, and DB is gradually weakened into storage.
Operation and maintenance complexity : The management of multiple database tables after sub-database and sub-tables is troublesome, the operation and maintenance cost is very high, and the data query is also very troublesome.
Sharding Key Problem : Queries that are not Sharding Keys require additional redundant processing, and other nodes such as Elasticsearch and ClickHouse need to be introduced, which further increases the complexity of the system.
Unique ID problem : The unique ID cannot be guaranteed after sub-database and sub-table, and the unique ID needs to be modified.
Distributed transaction problem : MySQL's own XA flexible transaction performance is too low, and a new distributed transaction solution needs to be introduced.

NewSQL

As can be seen from the above, sub-database sub-table needs to sacrifice some functions of MySQL, and also brings many new problems.

Is there a solution that can not only have the functions of MySQL, but also support the scalability of data?

Have. That is NewSQL.

NewSQL is a class of relational database management systems designed to provide the scalability of NoSQL systems for online transaction processing (OLTP) workloads, while maintaining the ACID guarantees of traditional database systems.

The well-known NewSQL in China include Ali's OceanBase, Tencent's TDSQL, and PingCAP's TiDB. They have both the functionality of MySQL and the ability to be distributed and scalable.

The author can only say that I have a little understanding of Ali's OceanBase, so I won't describe it too much.

Let's focus on Tencent's TDSQL and PingCAP's TiDB.

As can be seen from the architecture diagrams of the two (with some modules omitted), the architectures of TDSQL and TiDB have only some naming differences, which can be said to be almost identical.

The two are divided into three parts as a whole:

Computation : Responsible for accepting client connections, performing SQL parsing and optimization, and finally generating a distributed execution plan and forwarding it to the underlying storage layer for execution. (TDSQL: SQL Engine, TiDB: TiDB-Server)
Storage : Distributed _KV storage_, similar to NoSQL database, supports elastic expansion and contraction. (TDSQL: TDStore, TiDB: TiKV)
Control : The meta-information management module of the entire cluster is the brain of the entire cluster. (TDSQL: TDMetaCluster, TiDB: Placement Driver)

The core storage modules (TDStore/TiKV) of the two are developed based on RocksDB, and both are _KV storage_ modes.

RocksDB is an LSM-tree architecture engine developed by Facebook based on LevelDB that provides key-value storage and read-write functions.
The bottom layer uses _WAL (Write Ahead Log) technology_ and _Sorted String Table_, which has higher write throughput than the B-tree storage engine.

NewSQL smooth access solution

Because the author has landed on TiDB, I will use TiDB as an example to describe how to access NewSQL, so as to achieve smooth migration without affecting online use.

Step 1: In the initial state, all online reads and writes fall to MySQL.

Step 2 : Connect TiDB to the system as a slave node of MySQL. All online reads and writes still fall to MySQL. At the end of the day, use scripts or tasks to verify whether the MySQL data is consistent with the TiDB data. This step mainly verifies MySQL data synchronization. There is no problem with TiDB.

Step 3 : Switch some reads to TiDB. This step mainly verifies that there is no problem with the data read synchronized by TiDB, and that the system SQL can be executed normally in TiDB.

Step 4 : Break the synchronization between MySQL and TiDB, double write MySQL and TiDB, and all online read traffic will fall to MySQL.

Step 5 : Cut part of the read traffic to TiDB, and verify that the data written by TiDB can be read normally. At this stage, some idempotent tasks can be executed on two data sources at the same time to verify whether the two data are consistent.

Step 6 : Switch all online read traffic to TiDB, while maintaining double writing, and switch to MySQL immediately if there is a problem.

Step 7 : Cut off the write traffic of MySQL, use MySQL as a slave library of TiDB, and use it as a downgrade.

The basis of the whole solution is: TiDB is compatible with the MySQL protocol and the MySQL ecosystem .

This solution is designed on the basis of completely distrusting TiDB , and it verifies the fit between TiDB and MySQL, so the overall solution will be cumbersome, and some steps can be omitted according to the actual situation.

Is NewSQL really that good?

NewSQL is not omnipotent, and there is no need to deify NewSQL. Several well-known NewSQLs in China have some functional defects more or less. Take TiDB as an example:

The auto-increment ID of TiDB can only guarantee the auto-increment on a single TiKV, and cannot guarantee the global auto-increment.
Since TiKV storage is arranged according to the binary order of keys, using auto-incrementing ID may cause hot block effect.
TiDB defaults to the RC (read committed) transaction isolation level, and does not support the RR (repeatable read) isolation level. Although it provides SI (Snapshot Isolation) that is basically equivalent to RR, it still has the problem of write skew
The select point performance of TiDB is much worse than that of MySQL, and it can barely tie with MySQL with a data volume of hundreds of millions.
Because the bottom layer is based on the Raft protocol for data synchronization, the latency of TiDB will be higher than that of MySQL.
...

Therefore, NewSQL is not a dragon-killing knife, and the impact of these defects needs to be evaluated according to practical applications.

Applications of NewSQL

NewSQL has actually been developed in China for many years. OceanBase was born in 2010, TDSQL can be traced back to 2004, and TiDB was born in 2015.

The three have accumulated a lot of customer cases at home and abroad.

OceanBase

OceanBase has covered 100% of Ant Group's core links, supporting all five business sectors. Currently running billions of different SQLs, hundreds of petabytes of data, and millions of server cores.
ICBC uses OceanBase for its entire business, including but not limited to deposits, loans, payment settlements, and innovative businesses.
With features such as hybrid cloud architecture, high availability, and Oracle compatibility, OceanBase supports the digital transformation of MYbank 's core systems through distributed middleware, financial suites, and integrated solutions for mobile development platforms.
China Merchants Bank 's "mass market quotation system" and "historical income system" use OceanBase as the underlying database.

TDSQL

WeBank has realized the privatization deployment of TDSQL, which is a typical two-site multi-center architecture.
The underlying storage of Futu Securities ' Hong Kong stock trading system and Soochow Securities' new generation core trading system is TDSQL.
Businesses such as digital Guangdong province affairs and Shenzhen subway code rides are all run on TDSQL.
Ping An Bank, Agricultural Bank of China, Hua Xia Bank, and Bank of China all have related businesses on TDSQL.

TiDB

For the online payment business of Bank of Beijing , all bank cards of Bank of Beijing are bound to payment operations such as Alipay and WeChat. The back-end database runs on TiDB, and it is a typical dual-site, three-center, and same-city dual-active architecture. This Business is very critical. If the business is interrupted for more than a certain period of time, it needs to be reported to the CBRC.
Japan's No. 1 payment company - Paypay , wallet and payment business are all on TiDB.
China Life 's life and property insurance business is replacing Oracle with TiDB.
All KFC member login systems, including KFC's APP and third-party logins, use TiDB for the background database. This business was launched in April 2020 and has experienced many KFC promotions and other activities. Currently, KFC's background payment The system has also been switched to TiDB.
McDonald 's account and order system are all based on TiDB. If there is a problem with TiDB, all McDonald's stores in China, including online and offline ordering systems, will not be able to operate normally.
The core and most profitable micro- loan business of WeBank, the full batch processing business in the background runs on TiDB.

How to choose sub-database sub-table and NewSQL?

Sub-database sub-table is a heavyweight solution, it will bring many new problems, and the requirements for infrastructure and operation and maintenance are also very high.

NewSQL is powerful but also flawed.

How to make a decision needs to be comprehensively judged according to the current situation of the system and the situation of the company.

Sub-library and sub-table is a heavyweight solution. If lightweight solutions such as _read-write separation_ and _cold-hot separation_ can solve the problem, there is no need to use _sub-library and sub-table_.

If cache shunting and read-write separation can't handle it, and you are in an Internet company, the infrastructure can still be maintained and the operation and maintenance can keep up, _ database and table _ is still the first choice;

But if you are in a traditional enterprise with poor or no infrastructure, then you can consider _NewSQL_.

There is no high or low technology. The technology that can solve the problem is a good technology. When choosing a technical solution, don’t show off your skills, and don’t over-design!

Welcome to the personal public account [CoderW]

Is sub-database sub-table really suitable for your system? Talk about how to choose between sub-database sub-table and NewSQL

sub-table

Branch library

JDBC proxy mode

DB proxy mode

Sharding On MySQL

The cost of sub-library and sub-table

NewSQL

NewSQL smooth access solution

Is NewSQL really that good?

Applications of NewSQL

OceanBase

TDSQL

TiDB

How to choose sub-database sub-table and NewSQL?

References

CoderW

引用和评论

从扫码登录的原理分析QQ大量被盗事件，到底是腾讯的问题还是用户的问题？

70k star，取代Postman！这款轻量级API工具，太香了！

C++ 中 VS 项目引入公共配置文件

MySQL慢查询日志：性能优化的终极指南

大数据从业者必知必会的Hive SQL调优技巧

MySQL 备份 Shell 脚本：支持远程同步与阿里云 OSS 备份

疯狂推荐！从零开始 Dify 部署全攻略！