foreword

extract

2.1.7 Multi-Master Architecture

Distributed systems such as HDFS, Spark, HBase, and Elasticsearch all adopt the master-slave master-slave architecture, and there is a control node as the leader to coordinate the overall situation. On the other hand, ClickHouse adopts the Multi-Master multi-master architecture. Each node in the cluster has the same role, and the client can get the same effect when accessing any node.
This multi-master architecture has many advantages. For example, the peer-to-peer role makes the system architecture simpler. There is no need to distinguish between master nodes, data nodes, and computing nodes. All nodes in the cluster have the same function. Therefore, it naturally avoids the problem of a single point of failure, and is very suitable for scenarios with multiple data centers and multiple activities in different places.

2.1.9 Data Fragmentation and Distributed Query

Data sharding is the horizontal division of data, which is an effective means to solve storage and query bottlenecks in the face of massive data scenarios, and is a manifestation of the idea of divide and conquer.
ClickHouse supports sharding, and sharding relies on clustering. Each cluster consists of one or more shards, and each shard corresponds to a service node of ClickHouse. The maximum number of shards depends on the number of nodes ( a shard can only correspond to one service node ).

10.1 Overview of Replicas and Sharding

Regardless of the difference in table engines, from a purely data perspective, sometimes there is only a thin line between replicas and shards.

10.2.3 Definition form of copy

The replica of ClickHouse adopts a multi-master architecture, and each replica instance can be used as an entry for data reading and writing.

10.4 Data Fragmentation

Each service node in ClickHouse can be called a shard (shard). ClickHouse's data sharding needs to be used in conjunction with the Distributed table engine.

10.4.1 How to configure the cluster

This section describes several typical distributed configurations.
Shards are more like logical groupings, and whether they are replicas or shards, their carriers are replicas, so from a certain point of view, replicas are also shards.

  • Shards without replicas: node, no shard/replica
  • Shards without replicas: no nodes, shard/replica
  • N shards and N replicas: no node, shard/replica

View distributed cluster node information

 SELECT * FROM system.clusters;

10.5 Analysis of Distributed Principle

Distributed table engine is synonymous with distributed. It does not store any data itself, but acts as a transparent proxy for data sharding and can automatically route data to each node in the cluster. Therefore, the distributed table engine needs to work together with other data table engines. .

This article is from qbit snap

qbit
268 声望279 粉丝