人工智能 - Cloud Lesson｜Introduction to ClickHouse component of MRS basic principles - 华为云开发者之家

Reading Fengyun is a master of Huawei cloud information. He is good at diversifying the presentation of complex information. There is always a picture (Yuntushuo), a simple blog post (cloud small lesson) or short video (cloud video hall) that it produces. You quickly get started with HUAWEI CLOUD. For more exciting content, please click here

Abstract: open source high-performance, open-source online analysis columnar database management system in 2016. Its data compression ratio is high, based on multi-core parallel computing, vectorized execution and SIMD, it has excellent performance. At present, ClickHouse is widely used in many fields such as Internet advertising, App and Web traffic, telecommunications, finance, and the Internet of Things. It is very suitable for business intelligence application scenarios and has a large number of applications and practices at home and abroad.

This article is shared from the HUAWEI cloud community " [Cloud small lesson] EI Lesson 22 MRS Basic Principles of ClickHouse Component Introduction ", author: Reading Fengyun

ClickHouse is an open source column database for online analysis and processing. It is independent of the Hadoop big data system. Its core features are extreme compression rate and extremely fast query performance. At the same time, ClickHouse supports SQL queries and has good query performance, especially the performance of aggregate analysis and query based on large wide tables, which is an order of magnitude faster than other analytical databases. At present, ClickHouse is widely used in many fields such as Internet advertising, App and Web traffic, telecommunications, finance, and the Internet of Things. It is very suitable for business intelligence application scenarios and has a large number of applications and practices at home and abroad.

Introduction to Key Features of ClickHouse

Complete DBMS function

ClickHouse has a complete database management function, with the basic functions of a DBMS (Database Management System, database management system), as shown below:

DDL (Data Definition Language): You can dynamically create, modify, or delete databases, tables, and views without restarting the service.

DML (Data Manipulation Language): You can dynamically query, insert, modify or delete data.

Permission control: You can set the operating permissions of the database or table according to the user granularity to ensure data security.

Data backup and recovery: Provides a data backup export and import recovery mechanism to meet the requirements of the production environment.

Distributed management: Provides a cluster mode that can automatically manage multiple database nodes.

Columnar storage and data compression

ClickHouse is a database that uses columnar storage. Data is organized in columns. Data belonging to the same column will be stored together, and columns will also be stored in different files.

When performing data query, columnar storage can reduce the data scanning range and the size of data transmission, and improve the efficiency of data query.

Vectorized execution engine

ClickHouse uses the SIMD instructions of the CPU to implement vectorized execution. The full name of SIMD is Single Instruction Multiple Data, which is an implementation of operating multiple pieces of data with a single instruction to improve performance through data parallelism (others include instruction-level parallelism and thread-level parallelism), and its principle is in the CPU register The parallel operation of data is realized at the level, which is more efficient than similar OLAP products.

Copy mechanism

ClickHouse uses ZooKeeper to implement the replication mechanism through the ReplicatedMergeTree engine (Replicated series engine). The replica mechanism is a multi-master architecture, and INSERT statements can be sent to any replica, and the remaining replicas will perform asynchronous replication of data.

The functional advantages of the copy mechanism are as follows:

The design of the ClickHouse copy mechanism can minimize network data transmission to synchronize in different data centers, and can be used to build a cluster architecture with multiple data centers and multiple activities in different places.

The replication mechanism is the basis for realizing high availability, load balancing, and migration/upgrade functions.

The system will monitor the synchronization of the replica data, identify the faulty node, and perform fault recovery when the node returns to normal to ensure the overall high availability of the service.

Data fragmentation and distributed query

ClickHouse provides linear expansion capabilities through fragmentation and distributed table mechanisms.

Fragmentation mechanism: Used to solve the performance bottleneck of a single node. By horizontally splitting the data, the data in a table is split to multiple nodes. There is no duplication of data between different nodes, so that you can increase the sharding pair ClickHouse performs linear expansion.

Distributed table: When querying sharded data, the query is performed through a distributed table. The distributed table engine itself does not store any data, but is only a layer of proxy, which can automatically route to each shard node in the cluster to obtain data, that is Distributed tables need to work together with other data tables.

As shown in the figure below, when querying, we need to query the distributed table table_distributed. The distributed table will automatically route the query request to each shard node and aggregate the results.

Having said so much, how do I use ClickHouse specifically? Don't worry, the following will take you through the ClickHouse client and basic database operations to get you started quickly.

Preparation before operation

The ClickHouse cluster has been created.
ClickHouse client has been installed.

ClickHouse client use

Execute the following command to switch to the client installation directory. The following directory is an example. For the specific client path, please refer to the actual installation path.

cd /opt/Bigdata/client

Execute the following commands to configure environment variables.

source bigdata_env

4. Execute the client command of the ClickHouse component.

clickhouse client --host ClickHouse instance IP --user login name- -password password- -port ClickHouse port number

Clickhouse client command line parameter description is shown in the following table:

Basic operation of ClickHouse database

Create a database:

Basic grammar

CREATE DATABASE [IF NOT EXISTS] database_name

Example of use

Create a table:

Basic grammar

Method 1: creates a table named "table_name" in the specified "database_name" database.

If "database_name" is not included in the table creation statement, the database selected when the client logs in is used as database_name" by default.

CREATE TABLE [IF NOT EXISTS] [database_name.]table_name [ON CLUSTER Cluster name]

(

name1 [type1] [DEFAULT|materialized|ALIAS expr1],

name2 [type2] [DEFAULT|materialized|ALIAS expr2],

...

) ENGINE = engine

Method 2: creates a table with the same structure as table_name2, and can specify different table engine declarations.

If there is no table engine declaration, the created table will use the same table engine as database_name2.table_name2.

CREATE TABLE [IF NOT EXISTS] [database_name.]table_name AS [database_name2.]table_name2 [ENGINE = engine]

Method 3: Use the specified engine to create a table with the same structure as the result of the SELECT clause, and fill it with the result of the SELECT clause.

CREATE TABLE [IF NOT EXISTS] [database_name.]table_name ENGINE = engine AS SELECT ...

Example of use

Insert table data:

Basic grammar

Method One : Insert data in standard format.

INSERT INTO [database_name.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ...

Method two : Use the result of SELECT to write.

INSERT INTO [database_name.]table [(c1, c2, c3)] SELECT ...

Example of use

Query table data:

Basic grammar

SELECT [DISTINCT] expr_list

[FROM [database_name.]table | (subquery) | table_function] [FINAL]

[SAMPLE sample_coeff]

[ARRAY JOIN ...]

[PREWHERE expr]

[WHERE expr]

[GROUP BY expr_list] [WITH TOTALS]

[HAVING expr]

[ORDER BY expr_list] [WITH FILL] [FROM expr] [TO expr] [STEP expr]

[LIMIT [offset_value, ]n BY columns]

[LIMIT [n, ]m] [WITH TIES]

[UNION ALL ...]

[INTO OUTFILE filename]

[FORMAT format]

Example of use

Modify the table structure:

Basic grammar

ALTER TABLE [database_name].name [ON CLUSTER cluster] ADD|DROP|CLEAR|COMMENT|MODIFY COLUMN ...

ALTER only supports *MergeTree, Merge and Distributed engine tables.

Example of use

Display database and table information

Basic grammar

show databases

show tables

Example of use

Query table structure

Basic grammar

DESC|DESCRIBE TABLE [database_name.]table [INTO OUTFILE filename] [FORMAT format]

Example of use

Delete table:

Basic grammar

DROP [TEMPORARY] TABLE [IF EXISTS] [database_name.]name [ON CLUSTER cluster]

Example of use

Okay, this is the end of this cloud lesson, let's experience more functions of MapReduce (MRS)! jab here

Click to follow and learn about Huawei Cloud's fresh technology for the first time~

Cloud Lesson｜Introduction to ClickHouse component of MRS basic principles

Introduction to Key Features of ClickHouse

Complete DBMS function

Columnar storage and data compression

Vectorized execution engine

Copy mechanism

Data fragmentation and distributed query

Preparation before operation

ClickHouse client use

Basic operation of ClickHouse database

Create a database:

Create a table:

Insert table data:

Query table data:

Modify the table structure:

Display database and table information

Query table structure

Delete table:

华为云开发者联盟

引用和评论

华为云开发者联盟入选 2023 中国技术品牌影响力企业榜，深耕开发者生态

🔥全程不用写代码，我用 AI 程序员写了一个飞机大战

从 DeepSeek 看25年前端的一个小趋势

Open WebUI：开源AI交互平台的全面解析

大模型中的Token究竟是什么？从原理到作用深度解析

OSPO Summit 2025 正式定档！议题征集同步开启

一文掌握 MCP 上下文协议：从理论到实践