foreword
This article is mainly excerpted from " ClickHouse Principle Analysis and Application Practice "
ClickHouse原理解析与应用实践 朱凯著 北京:机械工业出版社,2020.5(2021.10 重印) 书中使用 ClickHouse 版本为 19.17.4.11
extract
2.1.7 Multi-Master Architecture
Distributed systems such as HDFS, Spark, HBase, and Elasticsearch all adopt the master-slave master-slave architecture, and there is a control node as the leader to coordinate the overall situation. On the other hand, ClickHouse adopts the Multi-Master multi-master architecture. Each node in the cluster has the same role, and the client can get the same effect when accessing any node.
This multi-master architecture has many advantages. For example, the peer-to-peer role makes the system architecture simpler. There is no need to distinguish between master nodes, data nodes, and computing nodes. All nodes in the cluster have the same function. Therefore, it naturally avoids the problem of a single point of failure, and is very suitable for scenarios with multiple data centers and multiple activities in different places.
2.1.9 Data Fragmentation and Distributed Query
Data sharding is the horizontal division of data, which is an effective means to solve storage and query bottlenecks in the face of massive data scenarios, and is a manifestation of the idea of divide and conquer.
ClickHouse supports sharding, and sharding relies on clustering. Each cluster consists of one or more shards, and each shard corresponds to a service node of ClickHouse. The maximum number of shards depends on the number of nodes ( a shard can only correspond to one service node ).
10.1 Overview of Replicas and Sharding
Regardless of the difference in table engines, from a purely data perspective, sometimes there is only a thin line between replicas and shards.
10.2.3 Definition form of copy
The replica of ClickHouse adopts a multi-master architecture, and each replica instance can be used as an entry for data reading and writing.
10.4 Data Fragmentation
Each service node in ClickHouse can be called a shard (shard). ClickHouse's data sharding needs to be used in conjunction with the Distributed table engine.
10.4.1 How to configure the cluster
This section describes several typical distributed configurations.
Shards are more like logical groupings, and whether they are replicas or shards, their carriers are replicas, so from a certain point of view, replicas are also shards.
- Shards without replicas: node, no shard/replica
- Shards without replicas: no nodes, shard/replica
- N shards and N replicas: no node, shard/replica
View distributed cluster node information
SELECT * FROM system.clusters;
10.5 Analysis of Distributed Principle
Distributed table engine is synonymous with distributed. It does not store any data itself, but acts as a transparent proxy for data sharding and can automatically route data to each node in the cluster. Therefore, the distributed table engine needs to work together with other data table engines. .
This article is from qbit snap
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。