foreword
- The test environment of this article
ClickHouse
version is22.3 LTS
-
SummingMergeTree
Official documentation: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/summingmergetree/ -
AggregatingMergeTree
Official Documentation: https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/aggregatingmergetree - The principle is pre-aggregation, which can reduce the number of table scans by several orders of magnitude during statistics. A bit
flink
flow computing flavor.
initialization
Test data generation
Create library
CREATE DATABASE qbit;
create table
CREATE TABLE download ( userid UInt32, when DateTime, size Float32 ) ENGINE = MergeTree PARTITION BY toYYYYMM(when) ORDER BY(userid, when);
insert data
INSERT INTO qbit.download SELECT rand() % 5, now() + number * 60 as when, rand() % 100000000 FROM system.numbers LIMIT 10000;
View data
SELECT * FROM qbit.download LIMIT 10;
View daily downloads ( scan 10,000 data )
SELECT userid, toYYYYMMDD(when) AS day, count() AS cnt, sum(size) AS size FROM qbit.download GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
SummingMergeTree Quasi-real-time Kanban
Method 1 (view1)
Create View
view1
CREATE MATERIALIZED VIEW qbit.view1 ENGINE = SummingMergeTree((cnt, size)) ORDER BY (userid, day) POPULATE AS SELECT userid, toYYYYMMDD(when) AS day, toInt64(1) AS cnt, toFloat64(size) AS size FROM qbit.download
Query view
view1
(you can see that the results are consistent with the previous query)SELECT * FROM qbit.view1 ORDER BY cnt DESC LIMIT 10
Method 2 (view2)
Create View
view2
CREATE MATERIALIZED VIEW qbit.view2 ENGINE = SummingMergeTree PARTITION BY day ORDER BY (userid, day) POPULATE AS SELECT userid, toYYYYMMDD(when) AS day, count() AS cnt, sum(size) AS size FROM qbit.download GROUP BY userid, day
Query view
view2
(You can see that the results are consistent with the previous query)SELECT * FROM qbit.view2 ORDER BY cnt DESC LIMIT 10
AggregatingMergeTree Quasi-real-time Kanban
Create a view (view3)
CREATE MATERIALIZED VIEW qbit.view3 ENGINE = AggregatingMergeTree PARTITION BY day ORDER BY (userid, day) POPULATE AS SELECT userid, toYYYYMMDD(when) AS day, countState() AS cnt, sumState(size) AS size FROM qbit.download GROUP BY userid, day
Query view
view3
(you can see that the results are consistent with the previous query)SELECT userid, day, countMerge(cnt) AS cnt, sumMerge(size) AS size FROM qbit.view3 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
insert update data
Insert another 20000 pieces of data
INSERT INTO qbit.download SELECT rand() % 5, now() + (number * 60) AS when, rand() % 100000000 FROM system.numbers LIMIT 20000
- View the data volume of the bottom table
Directly query the bottom table statistics
SELECT userid, toYYYYMMDD(when) AS day, count() AS cnt, sum(size) AS size FROM qbit.download GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
query
view1
statisticsSELECT * FROM qbit.view1 ORDER BY cnt DESC LIMIT 10
The query result is inconsistent with the direct query bottom table,
OPTIMIZE
is consistent with the bottom table query resultOPTIMIZE TABLE qbit.view1 FINAL
If you don't want to force the merge summary, you can also change the query statement
SELECT userid, day, sum(cnt) AS cnt, sum(size) AS size FROM qbit.view1 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
You can directly
select count
to verify that even if there is no merge summary, the number of data in the view will be much less than the number of data in the bottom tablequery
view2
statisticsSELECT * FROM qbit.view2 ORDER BY cnt DESC LIMIT 10
The query result is inconsistent with the direct query bottom table,
OPTIMIZE
is consistent with the bottom table query resultOPTIMIZE TABLE qbit.view2 FINAL
If you don't want to force the merge summary, you can also change the query statement
SELECT userid, day, sum(cnt) AS cnt, sum(size) AS size FROM qbit.view2 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
Query
view3
Statistics.AggregatingMergeTree
itself usesgroup by
, so don't needOPTIMIZE
to get the final result.SELECT userid, day, countMerge(cnt) AS cnt, sumMerge(size) AS size FROM qbit.view3 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
View view
View database
mylog
what viewsSELECT database, name FROM system.tables WHERE (engine = 'View') AND (database = 'mylog')
View the creation statement of the view
mylog.log_view
SHOW CREATE TABLE mylog.log_view
references
- ClickHouse Materialized Views Illuminated, Part 1
- ClickHouse optimizes queries with SummingMergeTree
- AggregatingMergeTree is equivalent to replacing SummingMergeTree
This article is from qbit snap
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。