Author: Da Rui

Darui, DBA, likes to lose weight, lost more than 30 pounds, and is responsible for the management and maintenance of the company's database cluster.

Source of this article: original contribution

*The original content is produced by the open source community of Aikesheng, and the original content shall not be used without authorization. For reprinting, please contact the editor and indicate the source.


The materialized view is written to the storage table using the method of to , which is as follows:

 CREATE MATERIALIZED VIEW
[IF NOT EXISTS]
[db.]table_name
[ON CLUSTER] TO
[db.]nameAS
SELECT ...

The storage table is specified, so the creation of the materialized view does not need to specify the engine. In the query, the materialized view and the actual storage table get the same data, because they all come from the same storage data.

The materialized view calculates the data written to the original table each time, and writes it to the target table after aggregation. For example, if there is a detailed table recorded at a time of 1s, and at the same time, the aggregated statistics of data need to be aggregated at the minute level (similar needs), the aggregated data can be written to the 1min table by creating a materialized view (this kind of Feels a bit like a trigger)

example

Schedule of 1s records

 CREATE TABLE dba_test.t_1s
(
    `ctime` DateTime64(0),
    `pv` Int64
)
ENGINE = MergeTree
PARTITION BY toDate(ctime)
ORDER BY ctime
SETTINGS index_granularity = 8192

Aggregate data recorded in 1min

 CREATE TABLE dba_test.t_1m
(
    `ctime` DateTime64(0),
    `pv` Int64
)
ENGINE = SummingMergeTree
PARTITION BY toDate(ctime)
ORDER BY ctime
SETTINGS index_granularity = 8192

Materialized view t_1m_mv , the query condition is to aggregate the query results from the 1s table (t_1s) according to the minute level (toStartOfMinute), and rewrite them to the 1min table (t_1m)

materialized view

 CREATE MATERIALIZED VIEW dba_test.t_1m_mv TO dba_test.t_1m
(
    `toStartOfMinute(ctime)` DateTime,
    `pv` Int64
) AS
SELECT
    toStartOfMinute(ctime),
    sum(pv) AS pv
FROM dba_test.t_1s
GROUP BY ctime

write test

 dba-clickhouse-001 :) insert into t_1s values('2022-01-01 00:10:01',1),('2022-01-01 00:10:01',1),('2022-01-01 00:20:01',2),('2022-01-01 00:20:01',2),('2022-01-01 00:30:01',3);

INSERT INTO t_1s VALUES

Query id: 0bf16844-0123-4e25-a3d4-f9b5a5c8db37

Ok.

5 rows in set. Elapsed: 0.003 sec.

dba-clickhouse-001 :) select * from t_1s;

SELECT *
FROM t_1s

Query id: cb442100-37a6-4de7-b6f3-f80f084710dc

┌───────────────ctime─┬─pv─┐
│ 2022-01-01 00:10:01 │  1 │
│ 2022-01-01 00:10:01 │  1 │
│ 2022-01-01 00:20:01 │  2 │
│ 2022-01-01 00:20:01 │  2 │
│ 2022-01-01 00:30:01 │  3 │
└─────────────────────┴────┘

5 rows in set. Elapsed: 0.002 sec.

dba-clickhouse-001 :) select * from t_1m;

SELECT *
FROM t_1m

Query id: f9d2d05d-8ad7-44a4-b66a-ea8c3c758f1f

┌───────────────ctime─┬─pv─┐
│ 1970-01-01 08:00:00 │  9 │
└─────────────────────┴────┘

1 rows in set. Elapsed: 0.002 sec.

The insertion time turned out to be 1970-01-01 08:00:00

Start to verify whether the query statement is wrong

Check that the query results in the materialized view are as expected

 dba-clickhouse-001 :) SELECT
:-]     toStartOfMinute(ctime),
:-]     sum(pv) AS pv
:-] FROM dba_test.t_1s
:-] GROUP BY ctime;

SELECT
    toStartOfMinute(ctime),
    sum(pv) AS pv
FROM dba_test.t_1s
GROUP BY ctime

Query id: 1ecaf07e-c766-40b7-bfa2-0f87ee54abad

┌─toStartOfMinute(ctime)─┬─pv─┐
│    2022-01-01 00:20:00 │  4 │
│    2022-01-01 00:30:00 │  3 │
│    2022-01-01 00:10:00 │  2 │
└────────────────────────┴────┘

3 rows in set. Elapsed: 0.002 sec.

The query results are as expected

Directly through the insert ...select... method to confirm whether the inserted data is as expected

 dba-clickhouse-001 :) insert into t_1m SELECT
:-]     toStartOfMinute(ctime),
:-]     sum(pv) AS pv
:-] FROM dba_test.t_1s
:-] GROUP BY ctime;

INSERT INTO t_1m SELECT
    toStartOfMinute(ctime),
    sum(pv) AS pv
FROM dba_test.t_1s
GROUP BY ctime

Query id: 5db8279a-ffb1-4174-843c-80cee48b448c

Ok.

0 rows in set. Elapsed: 0.002 sec.

dba-clickhouse-001 :) select * from t_1m;

SELECT *
FROM t_1m

Query id: acd79ea7-dc82-49f1-bb71-430a05895f19

┌───────────────ctime─┬─pv─┐
│ 1970-01-01 08:00:00 │  9 │
└─────────────────────┴────┘
┌───────────────ctime─┬─pv─┐
│ 2022-01-01 00:10:00 │  2 │
│ 2022-01-01 00:20:00 │  4 │
│ 2022-01-01 00:30:00 │  3 │
└─────────────────────┴────┘

4 rows in set. Elapsed: 0.002 sec.

Directly inserted, the data is correct, the time is not converted.

It can be confirmed that there is no problem with the query part of the materialized view, it can only be a problem when writing The time is 1970-01-01 08:00:00 more than 8 hours, which means that the time is delayed due to the time zone. Could it be because the inserted data is not standardized, or the "empty" has been converted.

verify

 dba-clickhouse-001 :) insert into t_1m values('',100);

INSERT INTO t_1m VALUES

Query id: af1785ef-dca1-467b-84c6-27f9da6547f6

Ok.

1 rows in set. Elapsed: 0.002 sec.

dba-clickhouse-001 :) select * from t_1m;

SELECT *
FROM t_1m

Query id: 34db2057-7274-4859-898e-6132f8df4465

┌───────────────ctime─┬─pv─┐
│ 1970-01-01 08:00:00 │  9 │
└─────────────────────┴────┘
┌───────────────ctime─┬─pv─┐
│ 2022-01-01 00:10:00 │  2 │
│ 2022-01-01 00:20:00 │  4 │
│ 2022-01-01 00:30:00 │  3 │
└─────────────────────┴────┘
┌───────────────ctime─┬──pv─┐
│ 1970-01-01 08:00:00 │ 100 │
└─────────────────────┴─────┘

5 rows in set. Elapsed: 0.002 sec.

Sure enough, when the inserted data is empty, the time is reset.

Compare the structure of the materialized view and the aggregate table of the target

 dba-clickhouse-001 :) desc t_1m;

DESCRIBE TABLE  t_1m

Query id: 96c6a5ca-e42a-47e1-8212-cbcfefa6ffa4

┌─name──┬─type──────────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ ctime │ DateTime64(0) │              │                    │         │                  │                │
│ pv    │ Int64         │              │                    │         │                  │                │
└───────┴───────────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

2 rows in set. Elapsed: 0.001 sec.

dba-clickhouse-001 :) desc t_1m_mv;

DESCRIBE TABLE  t_1m_mv

Query id: a258f6b5-f195-4386-a9bb-4ec86e7e9bd1

┌─name───────────────────┬─type─────┬─default_type─┬─default_expression─┬─comment─┬─codec_expression─┬─ttl_expression─┐
│ toStartOfMinute(ctime) │ DateTime │              │                    │         │                  │                │
│ pv                     │ Int64    │              │                    │         │                  │                │
└────────────────────────┴──────────┴──────────────┴────────────────────┴─────────┴──────────────────┴────────────────┘

2 rows in set. Elapsed: 0.001 sec.

The time field of the aggregate table is called ctime , and the materialized view is toStartOfMinute(ctime)

Readjust the writing of the materialized view and clean up the data in the t_1m table

 dba-clickhouse-001 :) show create table t_1m_mv\G
statement: CREATE MATERIALIZED VIEW dba_test.t_1m_mv TO dba_test.t_1m
(
    `ctime` DateTime,
    `pv` Int64
) AS
SELECT
    toStartOfTenMinutes(ctime) AS ctime,
    sum(pv) AS pv
FROM dba_test.t_1s
GROUP BY ctime

dba-clickhouse-001 :) insert into t_1s values('2022-01-01 00:10:01',1),('2022-01-01 00:10:01',1),('2022-01-01 00:20:01',2),('2022-01-01 00:20:01',2),('2022-01-01 00:30:01',3);

INSERT INTO t_1s VALUES

Query id: 812d1bbd-55f3-4a8f-b9f7-bbbe93e694af

Ok.

5 rows in set. Elapsed: 0.003 sec.

dba-clickhouse-001 :) select * from t_1m;

SELECT *
FROM t_1m

Query id: 2d1a045a-4e53-4f94-bb6a-fe5e5d58f5c7

┌───────────────ctime─┬─pv─┐
│ 2022-01-01 00:10:00 │  2 │
│ 2022-01-01 00:20:00 │  4 │
│ 2022-01-01 00:30:00 │  3 │
└─────────────────────┴────┘

3 rows in set. Elapsed: 0.002 sec.

in conclusion

The field (t_1m_mv) of the materialized view should be aligned with the field name of the target table (t_1m)

(The expression is not very strict, probably the meaning above)

爱可生开源社区
426 声望211 粉丝

成立于 2017 年,以开源高质量的运维工具、日常分享技术干货内容、持续的全国性的社区活动为社区己任;目前开源的产品有:SQL审核工具 SQLE,分布式中间件 DBLE、数据传输组件DTLE。