Introduction: recent years, and a number of excellent domestic database products have emerged. This article selects three typical domestic distributed databases for comprehensive comparison and pressure testing, showing the domestic distribution The status quo of the development of the database.
For all application systems, data is the core asset that carries business logic, and the database system that stores data is one of the core systems. With the continuous advancement of the localization process, the construction of application systems based on localized databases has become more and more important and has become the mainstream in database selection.
In recent years, the domestic database market has flourished, and many excellent domestic database products have emerged, and major manufacturers are also investing heavily in database research and development. This article selects three typical domestic distributed databases for comprehensive comparison and stress testing, and analyzes the development status of domestic distributed databases for readers' reference.
Test environment and database architecture
PolarDB-X
environment | parameters |
PolarDB-X-version | the PolaRx-kernel_5.4.11-16270254_xcluster-20,210,719 |
node specification | 16 nuclear 64GB |
number of nodes | . 5 th ( CN 16 core 64GB + DN 16 core 64G) |
database architecture:
### Oceanbase
database schema:
### TiDB
database architecture:
## Analysis of pressure measurement indicators
### Sysbench pressure test situation:
#### 1. Pressure measurement parameter configuration
parameter | value |
--rand-type | Uniform |
--table-size | 1000,0000 |
--table-NUM | 16 |
PS-MODE---db | disable |
--auto-inc is | to false |
--range-size | . 5 |
--skip-TRX | OFF |
`
Test table structure:
CREATE TABLE sbtest1
(
id
int(11) NOT NULL,
k
int(11) NOT NULL DEFAULT '0',
c
char(120) NOT NULL DEFAULT '',
pad
char(60) NOT NULL DEFAULT'',
PRIMARY KEY (id
),
KEY k_1
(k
)
) ENGINE = InnoDB DEFAULT CHARSET = utf8mb4
`
#### 2. Scene description
* There are 16 tables in total, each with 10 million rows of data, and the data is distributed uniformly.
* tidb scenario: distributed based on the range horizontal split mode (by default, tidb will automatically balance the data of all tables according to the range, and the data of a certain test table will be evenly distributed to multiple machines).
* OB mode: Single table is the default recommended mode of the official website. The table is automatically created when the sysbench script is not modified, here referred to as non-partitioned table; distributed based on the hash horizontal split mode, referred to as partition table.
* PolarDB-X scenario: single table, a table created automatically when the sysbench script is not modified, here referred to as non-partitioned table; distributed based on the hash horizontal split mode, referred to as partition table, the index adopts local index; based on the hash horizontal split mode Distributed, referred to as the partition table, the index uses the GSI global index.
#### 3. Test result data
### TPCC (5000 warehouse)
TPCC is a test specification specifically for online transaction processing systems (OLTP systems). Under normal circumstances, we also call such systems as online business processing systems. Released in July 1992, almost all mainstream foreign manufacturers that provide software and hardware platforms in the OLTP market have released corresponding TPC-C test results. With the continuous development of computer technology, these test results are constantly refreshed.
TPCC is usually used to simulate and test complex online transaction processing systems, and to test the transaction processing capabilities of databases under high stress conditions. The following stress tests summarize the maximum tpmC indicators of the three distributed databases:
`
// Data import 5000 warehouse
tiup bench tpcc --warehouses 5000 -D tpcc -H xxx -P xxx -T threads_num prepare
// run
tiup bench tpcc run -U root --db tpcc2 --host xxx --port xxx --time xxx --warehouses 5000 --threads
`
###
TPCH (100G)
TPCH (Business Intelligence Computing Test) is a test set developed by the Transaction Processing Performance Council (TPC) to simulate decision support applications. At present, it is widely used in academia and industry to evaluate the performance of decision support technology applications.
This kind of business test can comprehensively evaluate the overall comprehensive business computing capabilities of the system, and has higher requirements for manufacturers. At the same time, it also has general commercial practical significance. It is currently used in bank credit analysis and credit card analysis, telecommunications operation analysis, tax analysis, and tobacco industry. There are extensive applications in decision analysis. The following uses TPCH-100G to compare and analyze the analysis capabilities of the three distributed databases:
`
// Import data 100G
tiup bench tpch prepare --host xxx --port xxx --db tpch_100 --sf 100 --analyze --threads xxx
// run
tiup bench tpch run --host xxx --port xxx --db tpch_100 --sf 100 --check=true
`
### DDL capabilities
#### 1. Scene description
The test data is the lineitem table generated by tpch100g, with a single table of 600 million rows of data
#### 2. Parallel DDL test
Parallel DDL is used to test the DDL overshoot that meets the standard. When the previous DDL is not completed, add columns under the same lineitem table, create a table under the same database, and create indexes for small tables (such as the nation table). Observe Whether the second step can return immediately, if it can return immediately, it indicates that parallel DDL is supported.
### Hot line update
For limited database resources, if there are a large number of requests to consume, a large amount of lock competition will definitely occur (the update of a piece of data in the database will cause this record to be locked on the index), server resources are consumed, and the success rate of the request It is not high (in other words, you are wasting server resources and the cost performance is not high); hot row updates are used to test database lock control capabilities and transaction conditions under high concurrency and high pressure.
### Read and write separation
Scenario introduction: Consistent read is used to control whether the read data is consistent when reading data on a read-only node, including strong consistent read and weak consistent read; and read-only node delay control is used to control the business in the read process , How long is the maximum delay supported by the standby database?
### Partition change characteristics
Scenario introduction: Partition rule changes are used to verify the distributed adjustment capabilities of the database. The adjustment of the partition strategy can flexibly adapt to the business scenarios of online tables, especially from single table to partition table (distributed table), or from single table to broadcast table Scene.
### Special scene
#### 1. Big business
`
The test data is the lineitem table generated by tpch100g, with a single table of 600 million rows of data
select * from lineitem;
update lineitem set L_PARTKEY=L_PARTKEY+1;
`
Test Results:
####
2. Json type
#### 3. Number of stand-alone meters
The number of single-machine tables is used to test the maximum number of tables (partitions) that can be stored on a single machine in a complex business scenario, to verify the metadata management capabilities of the database, and to investigate the reduction in distributed distribution when a single machine supports more tables. Storage costs.
#### 4. Drop big table impact
TiDB, OceanBase, and PolarDB-X can all be deleted smoothly without affecting online business.
#### 5. Emergency current limit
Scenario introduction: Emergency current limiting is used in online emergency situations to implement emergency current limiting on part of rotten SQL or problem SQL to ensure that under most business normal conditions, limit the operation of some rotten SQL, which can be used for emergency online recovery.
#### 6. Resource Isolation
Scenario introduction: Users verify whether the automatic resource isolation of oltp and olap scenarios is supported. Olap usually requires a large amount of data query and analysis resources. If resource isolation is not possible, it may affect the use and stability of online services.
#### 7. Dynamic Index Binding
Scenario introduction: used to test the execution plan binding ability
## Test result analysis
###
TiDB:
1. The laboratory feature (plan cache) is enabled, and it is not recommended to use it directly in production. If the production environment is not turned on by default, the point\_select performance will be reduced by about 60%, and the resource inventory scenario with about 100 cores will only have 360,000 QPS.
2. In the sysbench test scenario, there will be a large number of where id between xx and xx, but in actual business, it is not meaningful to query the range based on user id or transaction id alone, and it is more of a time range query. TiDB is based on the Range partitioning strategy. The partition cutting between partitions can access only 1 data shard, while PolarDB-X and OceanBase's Hash-based strategy will access 5 data shards. Therefore, the data structure of TiDB will be purely in sysbench. Occupies a certain advantage in index capability. ps. In view of the performance difference between Range and Hash partitions, PolarDB-X ran a comparative test of Range partition based on the read only scenario. Compared with Hash partition, Range has almost 45% performance improvement (280,000 vs 190,000) )
3. In the TPC-C scenario, the overall disadvantages are obvious
4. In the TPC-H scenario, the performance is good in tilfash mode, but in the normal tikv mode, some SQL can't get results
5. In special scenarios, the performance of indexed DDL needs to be improved, json is supported but not recommended for production use, and there are obvious bottlenecks in hot row updates
###
OceanBase:
1. Non-partitioned tables (commonly understood single tables). Within OceanBase, table-level balancing is performed on multiple nodes in a distributed manner. The data of a table is only on one node, and different tables can be on different nodes. The ability of a pure stand-alone machine is tested under the partition table. The multiple tables in the sysbench scenario are completely independent during the test, so that you can make full use of "multiple single machines" to run a better total throughput value. In this mode, compared to TiDB, there will be 30% to 70% advantages. Multiple independent single table modes generally need to cooperate with the sub-database and sub-table of the business side in real business scenarios.
2. Partitioned table. In OceanBase, it supports the distribution of data of a table to multiple machines to achieve row-level horizontal scalability. There will be additional costs such as distributed transactions and fragmented aggregation queries under partitioned tables, which is the most Places to test distributed capabilities. The performance difference between partitioned table and non-partitioned table in sysbench performance test results is huge. Especially in the write and mixed read and write scenarios, the partition table is only about 1/5 of the single table test, and the performance of distributed transactions needs to be further improved.
3. Excellent performance in TPC-C scenarios. In the TPC-H scenario, the overall performance through parallel computing + row storage is good.
4. In special scenarios, json is not supported, and there are obvious bottlenecks in hot line updates.
###
PolarDB-X:
1. Non-partitioned tables (single tables commonly understood), PolarDB-X supports the distribution of tables to different nodes through the locality mode. The data of a table is only on one node, which tests the capabilities of a pure stand-alone machine. For pure read and mixed read and write scenarios, it will have a performance advantage of 2 to 2.5 times compared to TiDB.
2. Partition table. In PolarDB-X, the data of a table can be distributed to multiple machines. The implementation method is basically the same as that of TiDB and OceanBase distributed tables. The overall performance of write only will be better than TiDB; in the most Under common business scenarios read and write, the performance of partitioned tables and stand-alone tables are much better than that of OB. Non-partitioned tables have obvious performance advantages over TIDB, and partitioned tables are basically the same as tidb.
3. Excellent performance in TPC-C scenarios. In the TPC-H scenario, the overall performance through parallel computing + row storage is good
4. In special scenarios, fast adding DDL needs to be optimized, support for json, and optimization for hotspot updates are obvious.
Polardb-x has the best support for partition rule changes, and basically supports all common partition change strategies
## Summary
1. PolarDB-X/OceanBase/TiDB is similar in the performance of distributed horizontal expansion, and the distinction is not big.
2. TiDB has some good experimental features (such as plan cache, json), which are of great help to performance and ease of use, but it is not recommended for production at the moment.
3. The model of OceanBase is more complicated, and the test scenario requires a full understanding of partitioned tables and non-partitioned tables (single table). In the non-partitioned table (single table) mode, the performance is good, and the focus is on the pure stand-alone capability. The performance is acceptable, slightly lower than MySQL. However, in the partition table mode, the performance decreases more and needs to be distinguished by business.
4. PolarDB-X has good functionality and ease of use, and the support for json, big transactions, and hotspot updates is relatively complete. In the non-partitioned table (single table) mode, the pure MySQL stand-alone capability is outstanding. In the partitioned table mode, the performance can be further expanded through distributed capabilities, and the change strategy of the partition table is the most complete. (over)
> Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。