PolarDB-X 2.0: What is the experience of using a transparent distributed database?

Introduction to transparent and distributed. It is a capability of PolarDB-X that will be released soon. It allows applications to use PolarDB-X as if they were using a stand-alone database. Compared with the traditional "distributed database" of the middleware type, PolarDB-X with transparent and distributed capabilities no longer needs to apply the concept of partition key. The application can completely integrate the table statement and application developed on the stand-alone MySQL. The code is directly migrated to PolarDB-X to run. This article will introduce you to the new transparent and distributed experience of PolarDB-X.

PolarDB-X 2.0 video interpretation : https://yqh.aliyun.com/live/polardbx2021

Transparent distribution is a capability of PolarDB-X that will be released soon. It allows applications to use PolarDB-X as if they are using a stand-alone database.

Compared with the traditional "distributed database" of the middleware type, PolarDB-X with transparent and distributed capabilities no longer needs to apply the concept of partition key. The application can completely integrate the table statement and application developed on the stand-alone MySQL. The code is directly migrated to PolarDB-X to run.

This article will introduce you to the new transparent and distributed experience of PolarDB-X.

Install a WordPress on PolarDB-X

WordPress is an open source blogging software that uses MySQL as its database. The operation is to install a WordPress on PolarDB-X to experience PolarDB-X's transparent distributed capabilities.

We will follow three simple steps:

Create a table directly without modifying DDL
Run without modifying the application
Do down pressure test, do down tuning

Summarized as follows:

Using the official WordPress image, without any modification, the installation program can automatically complete table building and data initialization on PolarDB-X, all using standard MySQL syntax.
For this WordPress, the various monitoring data of PolarDB-X show that the load and data volume of each node are in a balanced state.
Through the SQL analysis, DAS and other tools provided by PolarDB-X, you can easily find the hot SQL in the system.
The DBA can directly optimize system performance by directly creating indexes, modifying data distribution and other DDL statements, without modifying the application.

PolarDB-X realizes a transparent and distributed weapon

Let me share with you how PolarDB-X is transparently distributed.

Transparent data partition

PolarDB-X is a typical Share Nothing distributed database, and its simplified architecture is as follows:

Its core components are the stateless computing node CN and the stateful storage node DN.

To understand PolarDB-X's transparent distribution capabilities, we must first understand how data is distributed on PolarDB-X.

In PolarDB-X, a table is composed of multiple indexes, including primary keys and secondary indexes. PolarDB-X will partition each index independently, and its partition key is the key of the index.

For example, in a typical e-commerce scenario, the order table has a primary key (id) and two indexes (seller\_id and buyer\_id):

create table orders (
   id bigint, 
   buyer_id varchar comment '买家', 
   seller_id varchar comment '卖家',
   primary key(id),
   index sdx(seller_id),
   index bdx(buyer_id)
)

For the primary key index, it will be partitioned according to the id
For index sdx, it will be partitioned according to seller\_id
For index bdx, it will be partitioned according to buyer\_id

As shown below:

After sharding the index, PolarDB-X will break up these shards to different storage nodes, and will perform load balancing according to the amount of data and other information, as shown in the following figure:

In PolarDB-X, the partition key can be ignored in the table creation statement, and PolarDB-X can also automatically fragment and load balance the table.

Therefore, when the application migrates PolarDB-X, you can export the table creation statement in the stand-alone MySQL, and you can directly execute it in PolarDB-X without modification.

Transparent distributed transaction

Distributed transaction is the most important basic capability in PolarDB-X. It is widely used in business to avoid business transformation of transaction code; at the same time, PolarDB-X also uses transactions to implement indexes.

The distributed transactions of PolarDB-X have the following characteristics:

Like Spanner, it satisfies the strongest consistency level of external consistency
Syntax is fully compatible with MySQL, no need to modify the application
Behaviorally supports MySQL-compatible RC and RR levels

The principle of PolarDB-X distributed transactions has many articles in our column, so I won’t repeat them here. Students who are interested in its principles can refer to these articles:

https://zhuanlan.zhihu.com/p/329978215

https://zhuanlan.zhihu.com/p/338535541

https://zhuanlan.zhihu.com/p/355413022

Online DDL

PolarDB-X supports various types of Online DDL. Here are some representative DDL types.

Index maintenance

Different from the indexes of stand-alone MySQL, the indexes of PolarDB-X are global indexes, including the following types:

Normal index
Unique index
Clustered index

Among them, the clustered index is a new type of PolarDB-X index relative to MySQL. It will include all the columns in the table, thus avoiding the cost of returning to the table.

The creation of indexes in PolarDB-X is done through DDL, and they are all Online, which will not block the business.

E.g:

Create an ordinary index: CREATE INDEX idx1 ON t1(name)
Create a clustered index: CREATE CLUSTERED INDEX idx1 ON t1(name)

INSTANT ADD COLUMN

The column addition operation is the most common type of DDL in the business. In MySQL, the time-consuming operation of adding columns is related to the amount of data (in MySQL8.0, adding columns at the end of the table is INSTANT).

In PolarDB-X, adding columns at any position is INSTANT. This represents a constant second-level time-consuming operation of adding columns, regardless of the amount of data, and will not have any impact on the business.

Partition adjustment

PolarDB-X supports 4 kinds of table distribution strategies, Hash, Range, List, Broadcast. Since Hash can avoid hotspots of continuous writing, PolarDB-X uses the Hash strategy by default. In most cases, this strategy can well meet the performance needs of the system.

But if the business is running and you want to choose a suitable partitioning strategy to improve system performance, you can easily adjust it through DDL statements in PolarDB-X, and PolarDB-X will reorganize the table data according to the new partitioning strategy.

E.g:

Modify the partitioning strategy of the table as Hash: ALTER TABLE t1 PARTITION BY HASH(name)
The number of fragments of the modified table is 32: ALTER TABLE t1 PARTITION BY HASH(name) PARTITIONS 32
Turn the table into a broadcast table: ALTER TABLE t1 BROADCAST
Modify the partitioning strategy of the table as RANGE: ALTER TABLE t1 PARTITION BY RANGE(id)

DDL statements can be used to convert between any two partitioning strategies:

Backfill speed adaptive

Presumably many students have had such experience: a very large table performs DDL operations. Due to the relatively large amount of data, this DDL operation cannot be completed in one day. In order to avoid business impact, human flesh adjusts the parameters when the business peak period comes during the day. , Reduce the backfill speed of DDL, and increase the backfill speed of DDL after the business peak period is over at night.

The backfill in PolarDB-X will automatically adjust the speed according to the current system load.

E.g:

In this example, there are four stages:

There is no business load at the beginning, and the DDL backfill speed rises to 25W lines/s
The business load began to rise, and the DDL backfill speed quickly dropped to 13W lines/s
The business TPS is stable at 1W5, and the DDL backfill speed is stable at 13W lines/s
After DDL ends, the business TPS is stable at 1W6

From this example, we can see that the backfill speed of PolarDB-X DDL is automatically adjusted according to the business load, and during the DDL period, the impact on the business TPS is small.

Make Online more Online

In order to further reduce the impact on the business during DDL, PolarDB-X also uses a number of technologies, such as:

Multiple versions of metadata, see: https://zhuanlan.zhihu.com/p/347885003
Can be suspended and canceled
MDL deadlock detection

We will introduce the details of these technologies in future articles, please pay attention to our knowledge column: https://www.zhihu.com/org/polardb-x

to sum up

PolarDB-X's transparent and distributed capabilities will greatly reduce the cost of migrating applications from a stand-alone database to a distributed database. At the same time, we will make it more transparent in the future. Some of the things we are doing include:

More granular scheduling strategy
Visual display of hot data, intelligent diagnosis linked with SQL audit analysis
In the case of a global index, support partition-level truncate
Scrolling and cleaning of data by time
and many more

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

PolarDB-X 2.0: What is the experience of using a transparent distributed database?

Install a WordPress on PolarDB-X