Author: Que Xiuping
A test member of the dble team of Aikesen, mainly responsible for dble requirements testing, automation writing and community question answering. Passionate about rua Yongzheng.
Source of this article: original contribution
*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.
1. Functional Background
We know that dble is a highly scalable distributed middleware based on MySQL, and MySQL is good at online transaction processing (OLTP), so in the face of more and more online analysis (OLAP) requirements, MySQL seems a little stretched . In order to provide good online analysis (OLAP) capabilities, dble provides a solution in version 3.22.01, which can support ClickHouse in the back-end nodes, and provide powerful online analysis (OLAP) services with the help of ClickHouse's capabilities.
So what is ClickHouse? Why choose ClickHouse? ClickHouse is a columnar database management system (DBMS) for online analytics (OLAP). For most queries, columnar databases process at least 100 times faster than row-based database MySQL.
How is dble implemented? Here's a simple operation.
2. Environmental preparation
Deploy four docker containers on the server:
1. Deploy two MySQL databases of version 5.7.25.
2. Deploy the dble environment of version 3.22.01. For installation, please refer to: [0.3 Quick Start · dble manual] ( https://actiontech.github.io/dble-docs-cn/0.overview/0.3_dble_quick_start.html )
3. Deploy the ClickHouse database of version 22.6.1. For installation, please refer to: [Install | ClickHouse Docs] ( https://clickhouse.com/docs/zh/getting-started/install/ ) to verify that ClickHosue is available.
# clickhouse-client -u test --password password -h 127.0.0.1
ClickHouse client version 22.6.1.1985 (official build).
Connecting to 127.0.0.1:9000 as user test.
Connected to ClickHouse server version 22.6.1 revision 54455.
clickhouse-1 :) show databases;
SHOW DATABASES
Query id: b2a417e7-7a76-4461-896d-961540eb60a1
┌─name─────────┐
│ INFORMATION_SCHEMA │
│ default │
│ information_schema │
│ system │
└────────────┘
4 rows in set. Elapsed: 0.003 sec.
3. dble+ClickHouse
1. Enter {install_dir}/dble/conf/user.xml and add user analysisUser.
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE dble:user SYSTEM "user.dtd">
<dble:user xmlns:dble="http://dble.cloud/">
<managerUser name="root" password="password"/>
<analysisUser name="ana1" password="password" dbGroup="ha_group3" />
</dble:user>
user.xml configuration note:
(1). An analysisUser corresponds to only one dbGroup.
(2). Multiple analysisUsers can refer to the same dbGroup.
2. Enter {install_dir}/dble/conf/db.xml and add ha_group3
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE dble:db SYSTEM "db.dtd">
<dble:db xmlns:dble="http://dble.cloud/">
<dbGroup rwSplitMode="0" name="ha_group3" delayThreshold="100" >
<heartbeat>select user()</heartbeat>
<dbInstance name="hostM3" password="password" url="172.100.9.13:9004" user="test" maxCon="1000" minCon="10" primary="true" databaseType="clickhouse"/>
</dbGroup>
</dble:db>
db.xml configuration note:
(1). The heartbeat statement in heartbeat needs to be the syntax supported by ClickHouse.
(2). The database configured in the dbgroup referenced by the analysisUser user must be the same as the databaseType.
(3). The port of ClickHosue needs to be configured with 9004, and 9004 provides the MySQL protocol.
3. The dble is started successfully, log in with the analysisUser user.
# mysql -h127.0.0.1 -uana1 -ppassword -P8066
#下发show databases语句返回的结果可知,dble未使用分库分表功能,仅展示后端 clickhouse 的数据库信息
mysql> show databases;
+--------------------+
| name |
+--------------------+
| INFORMATION_SCHEMA |
| default |
| information_schema |
| system |
+--------------------+
4 rows in set (0.00 sec)
#进入default库
mysql> use default;
Database changed, 1 warning
#建表
mysql> CREATE TABLE t1 (x String) ENGINE = TinyLog;;
Query OK, 1 row affected (0.03 sec)
#检查表是否正确
mysql> desc t1;
+------+--------+--------------+--------------------+---------+------------------+----------------+
| name | type | default_type | default_expression | comment | codec_expression | ttl_expression |
+------+--------+--------------+--------------------+---------+------------------+----------------+
| x | String | | | | | |
+------+--------+--------------+--------------------+---------+------------------+----------------+
1 row in set (0.00 sec)
#插入数据
mysql> INSERT INTO t1 VALUES(1);
Query OK, 1 row affected (0.00 sec)
#检查数据是否正确插入
mysql> SELECT x, toTypeName(x) FROM t1;
+------+---------------+
| x | toTypeName(x) |
+------+---------------+
| 1 | String |
+------+---------------+
1 row in set (0.00 sec)
Enter ClickHouse to verify.
# clickhouse-client -u test --password password -h 127.0.0.1
#进入默认库
clickhouse-1 :) use default;
USE default
Query id: eac19b96-6da6-4d77-8258-e5a827c31685
Ok.
0 rows in set. Elapsed: 0.002 sec.
# 查看表是否建立成功
clickhouse-1 :) show tables;
SHOW TABLES
Query id: 02f5018d-4b7b-4348-be5a-89fdcdbc3aa5
┌─name─┐
│ t1 │
└──────┘
1 row in set. Elapsed: 0.003 sec.
#查看表是否正确
clickhouse-1 :) desc t1;
DESCRIBE TABLE t1
Query id: 6721fa63-c52c-4236-8c4a-27a1ffdcd059
┌─name─┬─type───┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ x │ String │ │ │ │ │ │
└──────┴────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘
1 row in set. Elapsed: 0.002 sec.
# 查看数据是否和dble查询的一致
clickhouse-1 :) SELECT x, toTypeName(x) FROM t1;
SELECT
x,
toTypeName(x)
FROM t1
Query id: 548ea88e-b1a1-4362-912d-ffa770c7c1be
┌─x─┬─toTypeName(x)─┐
│ 1 │ String │
└───┴───────────────┘
2 rows in set. Elapsed: 0.002 sec.
4. Import 4.4G data in ClickHouse and compare the query results of ClickHouse and dble.
#clickhouse容器安装路径下 下载数据,网速好的话五分钟左右可下载完成
wget http://prod.publicdata.landregistry.gov.uk.s3-website-eu-west-1.amazonaws.com/pp-complete.csv
#clickhouse-client -u test --password password -h 127.0.0.1,登录默认库建表
CREATE TABLE uk_price_paid
(
price UInt32,
date Date,
postcode1 LowCardinality(String),
postcode2 LowCardinality(String),
type Enum8('terraced' = 1, 'semi-detached' = 2, 'detached' = 3, 'flat' = 4, 'other' = 0),
is_new UInt8,
duration Enum8('freehold' = 1, 'leasehold' = 2, 'unknown' = 0),
addr1 String,
addr2 String,
street LowCardinality(String),
locality LowCardinality(String),
town LowCardinality(String),
district LowCardinality(String),
county LowCardinality(String),
category UInt8
) ENGINE = MergeTree ORDER BY (postcode1, postcode2, addr1, addr2);
#clickhouse容器安装路径下 导入数据,可能耗时40s左右
clickhouse-local --input-format CSV --structure '
uuid String,
price UInt32,
time DateTime,
postcode String,
a String,
b String,
c String,
addr1 String,
addr2 String,
street String,
locality String,
town String,
district String,
county String,
d String,
e String
' --query "
WITH splitByChar(' ', postcode) AS p
SELECT
price,
toDate(time) AS date,
p[1] AS postcode1,
p[2] AS postcode2,
transform(a, ['T', 'S', 'D', 'F', 'O'], ['terraced', 'semi-detached', 'detached', 'flat', 'other']) AS type,
b = 'Y' AS is_new,
transform(c, ['F', 'L', 'U'], ['freehold', 'leasehold', 'unknown']) AS duration,
addr1,
addr2,
street,
locality,
town,
district,
county,
d = 'B' AS category
FROM table" --date_time_input_format best_effort < pp-complete.csv | clickhouse-client --query "INSERT INTO uk_price_paid FORMAT TSV"
Log in to dble and ClickHouse to see comparative data.
#ClickHouse侧结果
clickhouse-1 :) use default;
0 rows in set. Elapsed: 0.001 sec.
clickhouse-1 :) SELECT count() FROM uk_price_paid;
┌──count()─┐
│ 27176256 │
└──────────┘
1 row in set. Elapsed: 0.003 sec.
clickhouse-1 :) SELECT formatReadableSize(total_bytes) FROM system.tables WHERE name = 'uk_price_paid';
┌─formatReadableSize(total_bytes)─┐
│ 235.39 MiB │
└─────────────────────────────────┘
1 row in set. Elapsed: 0.003 sec.
clickhouse-1 :) SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 1000000, 80) FROM uk_price_paid GROUP BY year ORDER BY year;
┌─year─┬──price─┬─bar(round(avg(price)), 0, 1000000, 80)─┐
│ 1995 │ 67933 │ █████▍ │
│ 1996 │ 71507 │ █████▋ │
│ 1997 │ 78536 │ ██████▎ │
│ 1998 │ 85439 │ ██████▋ │
│ 1999 │ 96038 │ ███████▋ │
│ 2000 │ 107486 │ ████████▌ │
│ 2001 │ 118888 │ █████████▌ │
│ 2002 │ 137945 │ ███████████ │
│ 2003 │ 155893 │ ████████████▍ │
│ 2004 │ 178887 │ ██████████████▎ │
│ 2005 │ 189356 │ ███████████████▏ │
│ 2006 │ 203530 │ ████████████████▎ │
│ 2007 │ 219379 │ █████████████████▌ │
│ 2008 │ 217054 │ █████████████████▎ │
│ 2009 │ 213418 │ █████████████████ │
│ 2010 │ 236107 │ ██████████████████▊ │
│ 2011 │ 232803 │ ██████████████████▌ │
│ 2012 │ 238381 │ ███████████████████ │
│ 2013 │ 256923 │ ████████████████████▌ │
│ 2014 │ 279984 │ ██████████████████████▍ │
│ 2015 │ 297263 │ ███████████████████████▋ │
│ 2016 │ 313470 │ █████████████████████████ │
│ 2017 │ 346297 │ ███████████████████████████▋ │
│ 2018 │ 350486 │ ████████████████████████████ │
│ 2019 │ 351985 │ ████████████████████████████▏ │
│ 2020 │ 375697 │ ██████████████████████████████ │
│ 2021 │ 379729 │ ██████████████████████████████▍ │
│ 2022 │ 370402 │ █████████████████████████████▋ │
└──────┴────────┴────────────────────────────────────────┘
28 rows in set. Elapsed: 0.064 sec. Processed 27.18 million rows, 163.06 MB (423.68 million rows/s., 2.54 GB/s.)
#dble侧结果
mysql> SELECT count() FROM uk_price_paid;
+----------+
| count() |
+----------+
| 27176256 |
+----------+
1 row in set (0.00 sec)
mysql>
mysql> SELECT count() FROM uk_price_paid;
+----------+
| count() |
+----------+
| 27176256 |
+----------+
1 row in set (0.00 sec)
mysql> SELECT toYear(date) AS year, round(avg(price)) AS price, bar(price, 0, 1000000, 80) FROM uk_price_paid GROUP BY year ORDER BY year;
+------+--------+-----------------------------------------------------------------------------------------------+
| year | price | bar(round(avg(price)), 0, 1000000, 80) |
+------+--------+-----------------------------------------------------------------------------------------------+
| 1995 | 67933 | █████▍ |
| 1996 | 71507 | █████▋ |
| 1997 | 78536 | ██████▎ |
| 1998 | 85439 | ██████▋ |
| 1999 | 96038 | ███████▋ |
| 2000 | 107486 | ████████▌ |
| 2001 | 118888 | █████████▌ |
| 2002 | 137945 | ███████████ |
| 2003 | 155893 | ████████████▍ |
| 2004 | 178887 | ██████████████▎ |
| 2005 | 189356 | ███████████████▏ |
| 2006 | 203530 | ████████████████▎ |
| 2007 | 219379 | █████████████████▌ |
| 2008 | 217054 | █████████████████▎ |
| 2009 | 213418 | █████████████████ |
| 2010 | 236107 | ██████████████████▊ |
| 2011 | 232803 | ██████████████████▌ |
| 2012 | 238381 | ███████████████████ |
| 2013 | 256923 | ████████████████████▌ |
| 2014 | 279984 | ██████████████████████▍ |
| 2015 | 297263 | ███████████████████████▋ |
| 2016 | 313470 | █████████████████████████ |
| 2017 | 346297 | ███████████████████████████▋ |
| 2018 | 350486 | ████████████████████████████ |
| 2019 | 351985 | ████████████████████████████▏ |
| 2020 | 375697 | ██████████████████████████████ |
| 2021 | 379729 | ██████████████████████████████▍ |
| 2022 | 370402 | █████████████████████████████▋ |
+------+--------+-----------------------------------------------------------------------------------------------+
28 rows in set (0.07 sec)
Fourth, dble+MySQL+ClickHouse
1. Modify the configuration of user.xml
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE dble:user SYSTEM "user.dtd">
<dble:user xmlns:dble="http://dble.cloud/">
<managerUser name="root" password="password"/>
<shardingUser name="test" password="password" schemas="schema1"/>
<analysisUser name="ana1" password="password" dbGroup="ha_group3" />
</dble:user
2. Modify the configuration of db.xml
<!DOCTYPE dble:db SYSTEM "db.dtd">
<dble:db xmlns:dble="http://dble.cloud/">
<dbGroup rwSplitMode="0" name="ha_group1" delayThreshold="100" >
<heartbeat>select user()</heartbeat>
<dbInstance name="hostM1" password="password" url="172.100.9.5:3307" user="test" maxCon="1000" minCon="10" primary="true">
</dbInstance>
</dbGroup>
<dbGroup rwSplitMode="0" name="ha_group2" delayThreshold="100" >
<heartbeat>select user()</heartbeat>
<dbInstance name="hostM2" password="password" url="172.100.9.6:3307" user="test" maxCon="1000" minCon="10" primary="true">
</dbInstance>
</dbGroup>
<dbGroup rwSplitMode="0" name="ha_group3" delayThreshold="100" >
<heartbeat>select user()</heartbeat>
<dbInstance name="hostM3" password="password" url="172.100.9.13:9004" user="test" maxCon="1000" minCon="10" primary="true" databaseType="clickhouse"/>
</dbGroup>
</dble:db>
db.xml configuration note
(1). The parameter value of databaseType in the dbgroup referenced by the shardingUser user can only be MySQL. When databaseType is not set, the default is MySQL.
(2). The dbGroup referenced by shardingUser needs to be referenced by the shardingNode in the sharding.xml corresponding to the configured schemas
3. Modify the sharding.xml configuration
<?xml version='1.0' encoding='utf-8'?>
<!DOCTYPE dble:sharding SYSTEM "sharding.dtd">
<dble:sharding xmlns:dble="http://dble.cloud/">
<schema shardingNode="dn1" name="schema1" sqlMaxLimit="100">
<shardingTable name="sharding_4_t1" shardingNode="dn1,dn2,dn3,dn4" function="hash-four" shardingColumn="id"/>
</schema>
<shardingNode dbGroup="ha_group1" database="db1" name="dn1" />
<shardingNode dbGroup="ha_group2" database="db1" name="dn2" />
<shardingNode dbGroup="ha_group1" database="db2" name="dn3" />
<shardingNode dbGroup="ha_group2" database="db2" name="dn4" />
<function class="Hash" name="hash-four">
<property name="partitionCount">4</property>
<property name="partitionLength">1</property>
</function>
</dble:sharding>
4. The dble is successfully started, and the shardingUser user and the analysisUser user are used to log in respectively.
#mysql -h127.0.0.1 -utest -ppassword -P8066 分库分表用户
mysql> show databases;
+----------+
| DATABASE |
+----------+
| schema1 |
+----------+
1 row in set (0.01 sec)
#mysql -h127.0.0.1 -uana1 -ppassword -P8066 analysisUser用户
mysql> show databases;
+--------------------+
| name |
+--------------------+
| INFORMATION_SCHEMA |
| default |
| information_schema |
| system |
+--------------------+
4 rows in set (0.00 sec)
One possible business architecture:
1. OLTP: The client will send a request to dble, and dble (shardingUser, rwSplitUser) will send the statement to MySQL, and then return the request.
2. Replication: The MySQL data in the dble backend will be synchronized to the MySQL summary database. In order to use ClickHouse to analyze the data, use the tool to synchronize the data of the MySQL summary database to ClickHouse.
3. OLAP: The client will send a request to dble, and dble (analysisUser) will send the statement to ClickHouse, and then return the request. Realize data analysis of dble+MySQL+ClickHouse.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。