Reading Fengyun is a master of Huawei cloud information. He is good at diversifying the presentation of complex information. There is always a picture (Yuntushuo), a simple blog post (cloud small lesson) or short video (cloud video hall) that it produces. You quickly get started with HUAWEI CLOUD. For more exciting content, please click here
Abstract: open source high-performance, open-source online analysis columnar database management system in 2016. Its data compression ratio is high, based on multi-core parallel computing, vectorized execution and SIMD, it has excellent performance. At present, ClickHouse is widely used in many fields such as Internet advertising, App and Web traffic, telecommunications, finance, and the Internet of Things. It is very suitable for business intelligence application scenarios and has a large number of applications and practices at home and abroad.
This article is shared from the HUAWEI cloud community " [Cloud small lesson] EI Lesson 22 MRS Basic Principles of ClickHouse Component Introduction ", author: Reading Fengyun
ClickHouse is an open source column database for online analysis and processing. It is independent of the Hadoop big data system. Its core features are extreme compression rate and extremely fast query performance. At the same time, ClickHouse supports SQL queries and has good query performance, especially the performance of aggregate analysis and query based on large wide tables, which is an order of magnitude faster than other analytical databases. At present, ClickHouse is widely used in many fields such as Internet advertising, App and Web traffic, telecommunications, finance, and the Internet of Things. It is very suitable for business intelligence application scenarios and has a large number of applications and practices at home and abroad.
Introduction to Key Features of ClickHouse
Complete DBMS function
ClickHouse has a complete database management function, with the basic functions of a DBMS (Database Management System, database management system), as shown below:
DDL (Data Definition Language): You can dynamically create, modify, or delete databases, tables, and views without restarting the service.
DML (Data Manipulation Language): You can dynamically query, insert, modify or delete data.
Permission control: You can set the operating permissions of the database or table according to the user granularity to ensure data security.
Data backup and recovery: Provides a data backup export and import recovery mechanism to meet the requirements of the production environment.
Distributed management: Provides a cluster mode that can automatically manage multiple database nodes.
Columnar storage and data compression
ClickHouse is a database that uses columnar storage. Data is organized in columns. Data belonging to the same column will be stored together, and columns will also be stored in different files.
When performing data query, columnar storage can reduce the data scanning range and the size of data transmission, and improve the efficiency of data query.
Vectorized execution engine
ClickHouse uses the SIMD instructions of the CPU to implement vectorized execution. The full name of SIMD is Single Instruction Multiple Data, which is an implementation of operating multiple pieces of data with a single instruction to improve performance through data parallelism (others include instruction-level parallelism and thread-level parallelism), and its principle is in the CPU register The parallel operation of data is realized at the level, which is more efficient than similar OLAP products.
Copy mechanism
ClickHouse uses ZooKeeper to implement the replication mechanism through the ReplicatedMergeTree engine (Replicated series engine). The replica mechanism is a multi-master architecture, and INSERT statements can be sent to any replica, and the remaining replicas will perform asynchronous replication of data.
The functional advantages of the copy mechanism are as follows:
The design of the ClickHouse copy mechanism can minimize network data transmission to synchronize in different data centers, and can be used to build a cluster architecture with multiple data centers and multiple activities in different places.
The replication mechanism is the basis for realizing high availability, load balancing, and migration/upgrade functions.
The system will monitor the synchronization of the replica data, identify the faulty node, and perform fault recovery when the node returns to normal to ensure the overall high availability of the service.
Data fragmentation and distributed query
ClickHouse provides linear expansion capabilities through fragmentation and distributed table mechanisms.
Fragmentation mechanism: Used to solve the performance bottleneck of a single node. By horizontally splitting the data, the data in a table is split to multiple nodes. There is no duplication of data between different nodes, so that you can increase the sharding pair ClickHouse performs linear expansion.
Distributed table: When querying sharded data, the query is performed through a distributed table. The distributed table engine itself does not store any data, but is only a layer of proxy, which can automatically route to each shard node in the cluster to obtain data, that is Distributed tables need to work together with other data tables.
As shown in the figure below, when querying, we need to query the distributed table table_distributed. The distributed table will automatically route the query request to each shard node and aggregate the results.
Having said so much, how do I use ClickHouse specifically? Don't worry, the following will take you through the ClickHouse client and basic database operations to get you started quickly.
Preparation before operation
- The ClickHouse cluster has been created.
- ClickHouse client has been installed.
ClickHouse client use
Log in to the node where the client is installed as the client installation user.
- Execute the following command to switch to the client installation directory. The following directory is an example. For the specific client path, please refer to the actual installation path.
cd /opt/Bigdata/client
- Execute the following commands to configure environment variables.
source bigdata_env
4. Execute the client command of the ClickHouse component.
clickhouse client --host ClickHouse instance IP --user login name- -password password- -port ClickHouse port number
Clickhouse client command line parameter description is shown in the following table:
Basic operation of ClickHouse database
Create a database:
Basic grammar
CREATE DATABASE [IF NOT EXISTS] database_name
Example of use
Create a table:
Basic grammar
Method 1: creates a table named "table_name" in the specified "database_name" database.
If "database_name" is not included in the table creation statement, the database selected when the client logs in is used as database_name" by default.
CREATE TABLE [IF NOT EXISTS] [database_name.]table_name [ON CLUSTER Cluster name]
(
name1 [type1] [DEFAULT|materialized|ALIAS expr1],
name2 [type2] [DEFAULT|materialized|ALIAS expr2],
...
) ENGINE = engine
Method 2: creates a table with the same structure as table_name2, and can specify different table engine declarations.
If there is no table engine declaration, the created table will use the same table engine as database_name2.table_name2.
CREATE TABLE [IF NOT EXISTS] [database_name.]table_name AS [database_name2.]table_name2 [ENGINE = engine]
Method 3: Use the specified engine to create a table with the same structure as the result of the SELECT clause, and fill it with the result of the SELECT clause.
CREATE TABLE [IF NOT EXISTS] [database_name.]table_name ENGINE = engine AS SELECT ...
Example of use
Insert table data:
Basic grammar
Method One : Insert data in standard format.
INSERT INTO [database_name.]table [(c1, c2, c3)] VALUES (v11, v12, v13), (v21, v22, v23), ...
Method two : Use the result of SELECT to write.
INSERT INTO [database_name.]table [(c1, c2, c3)] SELECT ...
Example of use
Query table data:
Basic grammar
SELECT [DISTINCT] expr_list
[FROM [database_name.]table | (subquery) | table_function] [FINAL]
[SAMPLE sample_coeff]
[ARRAY JOIN ...]
[GLOBAL] [ANY|ALL|ASOF] [INNER|LEFT|RIGHT|FULL|CROSS] [OUTER|SEMI|ANTI] JOIN (subquery)|table (ON <expr_list>)|(USING <column_list>)
[PREWHERE expr]
[WHERE expr]
[GROUP BY expr_list] [WITH TOTALS]
[HAVING expr]
[ORDER BY expr_list] [WITH FILL] [FROM expr] [TO expr] [STEP expr]
[LIMIT [offset_value, ]n BY columns]
[LIMIT [n, ]m] [WITH TIES]
[UNION ALL ...]
[INTO OUTFILE filename]
[FORMAT format]
Example of use
Modify the table structure:
Basic grammar
ALTER TABLE [database_name].name [ON CLUSTER cluster] ADD|DROP|CLEAR|COMMENT|MODIFY COLUMN ...
ALTER only supports *MergeTree, Merge and Distributed engine tables.
Example of use
Display database and table information
Basic grammar
show databases
show tables
Example of use
Query table structure
Basic grammar
DESC|DESCRIBE TABLE [database_name.]table [INTO OUTFILE filename] [FORMAT format]
Example of use
Delete table:
Basic grammar
DROP [TEMPORARY] TABLE [IF EXISTS] [database_name.]name [ON CLUSTER cluster]
Example of use
Okay, this is the end of this cloud lesson, let's experience more functions of MapReduce (MRS)! jab here
Click to follow and learn about Huawei Cloud's fresh technology for the first time~
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。