前言
- 本文测试环境
ClickHouse
版本为22.3 LTS
SummingMergeTree
官方文档:https://clickhouse.com/docs/e...AggregatingMergeTree
官方文档:https://clickhouse.com/docs/e...- 原理就是预聚合,可以把统计时扫表条数减少几个数量级。有点
flink
流式计算的味道。
初始化
测试数据生成
创建库
CREATE DATABASE qbit;
创建表
CREATE TABLE download ( userid UInt32, when DateTime, size Float32 ) ENGINE = MergeTree PARTITION BY toYYYYMM(when) ORDER BY(userid, when);
插入数据
INSERT INTO qbit.download SELECT rand() % 5, now() + number * 60 as when, rand() % 100000000 FROM system.numbers LIMIT 10000;
查看数据
SELECT * FROM qbit.download LIMIT 10;
查看每天的下载量(扫描 10000 条数据)
SELECT userid, toYYYYMMDD(when) AS day, count() AS cnt, sum(size) AS size FROM qbit.download GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
SummingMergeTree 准实时看板
方式一(view1)
创建视图
view1
CREATE MATERIALIZED VIEW qbit.view1 ENGINE = SummingMergeTree((cnt, size)) ORDER BY (userid, day) POPULATE AS SELECT userid, toYYYYMMDD(when) AS day, toInt64(1) AS cnt, toFloat64(size) AS size FROM qbit.download
查询视图
view1
(可以看到跟前面的查询结果是一致的)SELECT * FROM qbit.view1 ORDER BY cnt DESC LIMIT 10
方式二(view2)
创建视图
view2
CREATE MATERIALIZED VIEW qbit.view2 ENGINE = SummingMergeTree PARTITION BY day ORDER BY (userid, day) POPULATE AS SELECT userid, toYYYYMMDD(when) AS day, count() AS cnt, sum(size) AS size FROM qbit.download GROUP BY userid, day
查询视图
view2
(可以看到跟前面的查询结果是一致的)SELECT * FROM qbit.view2 ORDER BY cnt DESC LIMIT 10
AggregatingMergeTree 准实时看板
创建视图(view3)
CREATE MATERIALIZED VIEW qbit.view3 ENGINE = AggregatingMergeTree PARTITION BY day ORDER BY (userid, day) POPULATE AS SELECT userid, toYYYYMMDD(when) AS day, countState() AS cnt, sumState(size) AS size FROM qbit.download GROUP BY userid, day
查询视图
view3
(可以看到跟前面的查询结果是一致的)SELECT userid, day, countMerge(cnt) AS cnt, sumMerge(size) AS size FROM qbit.view3 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
插入更新数据
再插入 20000 条数据
INSERT INTO qbit.download SELECT rand() % 5, now() + (number * 60) AS when, rand() % 100000000 FROM system.numbers LIMIT 20000
- 查看底表数据量
直接查询底表统计
SELECT userid, toYYYYMMDD(when) AS day, count() AS cnt, sum(size) AS size FROM qbit.download GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
查询
view1
统计SELECT * FROM qbit.view1 ORDER BY cnt DESC LIMIT 10
查询结果与直接查询底表不一致,
OPTIMIZE
后与底表查询结果一致OPTIMIZE TABLE qbit.view1 FINAL
如果不想强制合并汇总,也可以改变查询语句
SELECT userid, day, sum(cnt) AS cnt, sum(size) AS size FROM qbit.view1 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
可以直接
select count
验证,即使没有合并汇总,视图的数据条数也会比底表的数据条数少很多查询
view2
统计SELECT * FROM qbit.view2 ORDER BY cnt DESC LIMIT 10
查询结果与直接查询底表不一致,
OPTIMIZE
后与底表查询结果一致OPTIMIZE TABLE qbit.view2 FINAL
如果不想强制合并汇总,也可以改变查询语句
SELECT userid, day, sum(cnt) AS cnt, sum(size) AS size FROM qbit.view2 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
查询
view3
统计。AggregatingMergeTree
本身用了group by
,所以不用OPTIMIZE
可以得到最终结果。SELECT userid, day, countMerge(cnt) AS cnt, sumMerge(size) AS size FROM qbit.view3 GROUP BY userid, day ORDER BY cnt DESC LIMIT 10
查看视图
查看数据库
mylog
有哪些视图SELECT database, name FROM system.tables WHERE (engine = 'View') AND (database = 'mylog')
查看视图
mylog.log_view
的创建语句SHOW CREATE TABLE mylog.log_view
参考文献
- ClickHouse Materialized Views Illuminated, Part 1
- ClickHouse 通过 SummingMergeTree 优化查询
- AggregatingMergeTree 等价替换 SummingMergeTree
本文出自 qbit snap
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。