image.png
Reading Fengyun is a master of Huawei cloud information. He is good at diversifying the presentation of complex information. There is always a picture (Yuntushuo), a simple blog post (cloud small lesson) or short video (cloud video hall) that it produces. You quickly get started with HUAWEI CLOUD. For more exciting content, please click here

Abstract: open source high-performance, open-source online analysis columnar database management system in 2016. Its data compression ratio is high, based on multi-core parallel computing, vectorized execution and SIMD, it has excellent performance. At present, ClickHouse is widely used in many fields such as Internet advertising, App and Web traffic, telecommunications, finance, and the Internet of Things. It is very suitable for business intelligence application scenarios and has a large number of applications and practices at home and abroad.

This article is shared from the HUAWEI cloud community " [Cloud small lesson] EI Lesson 22 MRS Basic Principles of ClickHouse Component Introduction ", author: Reading Fengyun

ClickHouse is an open source column database for online analysis and processing. It is independent of the Hadoop big data system. Its core features are extreme compression rate and extremely fast query performance. At the same time, ClickHouse supports SQL queries and has good query performance, especially the performance of aggregate analysis and query based on large wide tables, which is an order of magnitude faster than other analytical databases. At present, ClickHouse is widely used in many fields such as Internet advertising, App and Web traffic, telecommunications, finance, and the Internet of Things. It is very suitable for business intelligence application scenarios and has a large number of applications and practices at home and abroad.
image.png

Introduction to Key Features of ClickHouse

Complete DBMS function

ClickHouse has a complete database management function, with the basic functions of a DBMS (Database Management System, database management system), as shown below:

DDL (Data Definition Language): You can dynamically create, modify, or delete databases, tables, and views without restarting the service.

DML (Data Manipulation Language): You can dynamically query, insert, modify or delete data.

Permission control: You can set the operating permissions of the database or table according to the user granularity to ensure data security.

Data backup and recovery: Provides a data backup export and import recovery mechanism to meet the requirements of the production environment.

Distributed management: Provides a cluster mode that can automatically manage multiple database nodes.

Columnar storage and data compression

ClickHouse is a database that uses columnar storage. Data is organized in columns. Data belonging to the same column will be stored together, and columns will also be stored in different files.

When performing data query, columnar storage can reduce the data scanning range and the size of data transmission, and improve the efficiency of data query.

Vectorized execution engine

ClickHouse uses the SIMD instructions of the CPU to implement vectorized execution. The full name of SIMD is Single Instruction Multiple Data, which is an implementation of operating multiple pieces of data with a single instruction to improve performance through data parallelism (others include instruction-level parallelism and thread-level parallelism), and its principle is in the CPU register The parallel operation of data is realized at the level, which is more efficient than similar OLAP products.

Copy mechanism

ClickHouse uses ZooKeeper to implement the replication mechanism through the ReplicatedMergeTree engine (Replicated series engine). The replica mechanism is a multi-master architecture, and INSERT statements can be sent to any replica, and the remaining replicas will perform asynchronous replication of data.
image.png

The functional advantages of the copy mechanism are as follows:

The design of the ClickHouse copy mechanism can minimize network data transmission to synchronize in different data centers, and can be used to build a cluster architecture with multiple data centers and multiple activities in different places.

The replication mechanism is the basis for realizing high availability, load balancing, and migration/upgrade functions.

The system will monitor the synchronization of the replica data, identify the faulty node, and perform fault recovery when the node returns to normal to ensure the overall high availability of the service.

Data fragmentation and distributed query

ClickHouse provides linear expansion capabilities through fragmentation and distributed table mechanisms.

Fragmentation mechanism: Used to solve the performance bottleneck of a single node. By horizontally splitting the data, the data in a table is split to multiple nodes. There is no duplication of data between different nodes, so that you can increase the sharding pair ClickHouse performs linear expansion.

Distributed table: When querying sharded data, the query is performed through a distributed table. The distributed table engine itself does not store any data, but is only a layer of proxy, which can automatically route to each shard node in the cluster to obtain data, that is Distributed tables need to work together with other data tables.

As shown in the figure below, when querying, we need to query the distributed table table_distributed. The distributed table will automatically route the query request to each shard node and aggregate the results.
image.png

Having said so much, how do I use ClickHouse specifically? Don't worry, the following will take you through the ClickHouse client and basic database operations to get you started quickly.

Preparation before operation

  • The ClickHouse cluster has been created.
  • ClickHouse client has been installed.

ClickHouse client use

Log in to the node where the client is installed as the client installation user.

  1. Execute the following command to switch to the client installation directory. The following directory is an example. For the specific client path, please refer to the actual installation path.

cd /opt/Bigdata/client

  1. Execute the following commands to configure environment variables.

source bigdata_env

4. Execute the client command of the ClickHouse component.

clickhouse client --host ClickHouse instance IP --user login name- -password password- -port ClickHouse port number

Clickhouse client command line parameter description is shown in the following table:
image.png

Basic operation of ClickHouse database

Create a database:

Basic grammar

CREATE DATABASE [IF NOT EXISTS] database_name

Example of use
image.png

Create a table:

Basic grammar

Method 1: creates a table named "table_name" in the specified "database_name" database.

If "database_name" is not included in the table creation statement, the database selected when the client logs in is used as database_name" by default.

CREATE TABLE [IF NOT EXISTS] [database_name.]table_name [ON CLUSTER Cluster name]

(

name1 [type1] [DEFAULT|materialized|ALIAS expr1],

name2 [type2] [DEFAULT|materialized|ALIAS expr2],

...

) ENGINE = engine

Method 2: creates a table with the same structure as table_name2, and can specify different table engine declarations.

If there is no table engine declaration, the created table will use the same table engine as database_name2.table_name2.

CREATE TABLE [IF NOT EXISTS] [database_name.]table_name AS [database_name2.]table_name2 [ENGINE = engine]

Method 3: Use the specified engine to create a table with the same structure as the result of the SELECT clause, and fill it with the result of the SELECT clause.

CREATE TABLE [IF NOT EXISTS] [database_name.]table_name ENGINE = engine AS SELECT ...

Example of use
image.png

Insert table data:

Basic grammar

Method One : Insert data in standard format.

INSERT INTO [database_name.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ...

Method two : Use the result of SELECT to write.

INSERT INTO [database_name.]table [(c1, c2, c3)] SELECT ...

Example of use
image.png

Query table data:

Basic grammar

SELECT [DISTINCT] expr_list

[FROM [database_name.]table | (subquery) | table_function] [FINAL]

[SAMPLE sample_coeff]

[ARRAY JOIN ...]

[GLOBAL] [ANY|ALL|ASOF] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI] JOIN (subquery)|table (ON <expr_list>)|(USING <column_list>)

[PREWHERE expr]

[WHERE expr]

[GROUP BY expr_list] [WITH TOTALS]

[HAVING expr]

[ORDER BY expr_list] [WITH FILL] [FROM expr] [TO expr] [STEP expr]

[LIMIT [offset_value, ]n BY columns]

[LIMIT [n, ]m] [WITH TIES]

[UNION ALL ...]

[INTO OUTFILE filename]

[FORMAT format]

Example of use
image.png

Modify the table structure:

Basic grammar

ALTER TABLE [database_name].name [ON CLUSTER cluster] ADD|DROP|CLEAR|COMMENT|MODIFY COLUMN ...

ALTER only supports *MergeTree, Merge and Distributed engine tables.

Example of use
image.png

Display database and table information

Basic grammar

show databases

show tables

Example of use
image.png

Query table structure

Basic grammar

DESC|DESCRIBE TABLE [database_name.]table [INTO OUTFILE filename] [FORMAT format]

Example of use
image.png

Delete table:

Basic grammar

DROP [TEMPORARY] TABLE [IF EXISTS] [database_name.]name [ON CLUSTER cluster]

Example of use
image.png
image.png

Okay, this is the end of this cloud lesson, let's experience more functions of MapReduce (MRS)! jab here

Click to follow and learn about Huawei Cloud's fresh technology for the first time~


华为云开发者联盟
1.4k 声望1.8k 粉丝

生于云,长于云,让开发者成为决定性力量