foreword

initialization

Test data generation

  • Create library

     CREATE DATABASE qbit;
  • create table

     CREATE TABLE download (
      userid UInt32,
      when DateTime,
      size Float32
    ) ENGINE = MergeTree 
    PARTITION BY toYYYYMM(when)
    ORDER BY(userid, when);
  • insert data

     INSERT INTO qbit.download
    SELECT
      rand() % 5,
      now() + number * 60 as when,
      rand() % 100000000
    FROM system.numbers
    LIMIT 10000;
  • View data

     SELECT * FROM qbit.download LIMIT 10;

    image.png

  • View daily downloads ( scan 10,000 data )

     SELECT
      userid,
      toYYYYMMDD(when) AS day,
      count() AS cnt,
      sum(size) AS size
    FROM qbit.download
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10

    image.png

SummingMergeTree Quasi-real-time Kanban

Method 1 (view1)

  • Create View view1

     CREATE MATERIALIZED VIEW qbit.view1
    ENGINE = SummingMergeTree((cnt, size))
    ORDER BY (userid, day) POPULATE AS
    SELECT
      userid,
      toYYYYMMDD(when) AS day,
      toInt64(1) AS cnt,
      toFloat64(size) AS size
    FROM qbit.download

    image.png

  • Query view view1 (you can see that the results are consistent with the previous query)

     SELECT *
    FROM qbit.view1
    ORDER BY cnt DESC
    LIMIT 10

    image.png

Method 2 (view2)

  • Create View view2

     CREATE MATERIALIZED VIEW qbit.view2
    ENGINE = SummingMergeTree
    PARTITION BY day
    ORDER BY (userid, day) POPULATE AS
    SELECT
      userid,
      toYYYYMMDD(when) AS day,
      count() AS cnt,
      sum(size) AS size
    FROM qbit.download
    GROUP BY
      userid,
      day

    image.png

  • Query view view2 (You can see that the results are consistent with the previous query)

     SELECT *
    FROM qbit.view2
    ORDER BY cnt DESC
    LIMIT 10

    image.png

AggregatingMergeTree Quasi-real-time Kanban

  • Create a view (view3)

     CREATE MATERIALIZED VIEW qbit.view3
    ENGINE = AggregatingMergeTree
    PARTITION BY day
    ORDER BY (userid, day) POPULATE AS
    SELECT
      userid,
      toYYYYMMDD(when) AS day,
      countState() AS cnt,
      sumState(size) AS size
    FROM qbit.download
    GROUP BY
      userid,
      day

    image.png

  • Query view view3 (you can see that the results are consistent with the previous query)

     SELECT
      userid,
      day,
      countMerge(cnt) AS cnt,
      sumMerge(size) AS size
    FROM qbit.view3
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10

    image.png

insert update data

  • Insert another 20000 pieces of data

     INSERT INTO qbit.download SELECT
      rand() % 5,
      now() + (number * 60) AS when,
      rand() % 100000000
    FROM system.numbers
    LIMIT 20000
  • View the data volume of the bottom table
    image.png
  • Directly query the bottom table statistics

     SELECT
      userid,
      toYYYYMMDD(when) AS day,
      count() AS cnt,
      sum(size) AS size
    FROM qbit.download
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10

    image.png

  • query view1 statistics

     SELECT *
    FROM qbit.view1
    ORDER BY cnt DESC
    LIMIT 10

    The query result is inconsistent with the direct query bottom table, OPTIMIZE is consistent with the bottom table query result

     OPTIMIZE TABLE qbit.view1 FINAL

    If you don't want to force the merge summary, you can also change the query statement

     SELECT
      userid,
      day,
      sum(cnt) AS cnt,
      sum(size) AS size
    FROM qbit.view1
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10

    You can directly select count to verify that even if there is no merge summary, the number of data in the view will be much less than the number of data in the bottom table
    image.png

  • query view2 statistics

     SELECT *
    FROM qbit.view2
    ORDER BY cnt DESC
    LIMIT 10

    The query result is inconsistent with the direct query bottom table, OPTIMIZE is consistent with the bottom table query result

     OPTIMIZE TABLE qbit.view2 FINAL

    If you don't want to force the merge summary, you can also change the query statement

     SELECT
      userid,
      day,
      sum(cnt) AS cnt,
      sum(size) AS size
    FROM qbit.view2
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10
  • Query view3 Statistics. AggregatingMergeTree itself uses group by , so don't need OPTIMIZE to get the final result.

     SELECT
      userid,
      day,
      countMerge(cnt) AS cnt,
      sumMerge(size) AS size
    FROM qbit.view3
    GROUP BY
      userid,
      day
    ORDER BY cnt DESC
    LIMIT 10

    image.png

View view

  • View database mylog what views

     SELECT
      database,
      name
    FROM system.tables
    WHERE (engine = 'View') AND (database = 'mylog')
  • View the creation statement of the view mylog.log_view

     SHOW CREATE TABLE mylog.log_view

references

This article is from qbit snap

qbit
268 声望279 粉丝