ClickHouse Distributed Architecture (qbit)

foreword

This article is mainly excerpted from " ClickHouse Principle Analysis and Application Practice "

 ClickHouse原理解析与应用实践
朱凯著
北京：机械工业出版社，2020.5（2021.10 重印）
书中使用 ClickHouse 版本为 19.17.4.11

extract

2.1.7 Multi-Master Architecture

Distributed systems such as HDFS, Spark, HBase, and Elasticsearch all adopt the master-slave master-slave architecture, and there is a control node as the leader to coordinate the overall situation. On the other hand, ClickHouse adopts the Multi-Master multi-master architecture. Each node in the cluster has the same role, and the client can get the same effect when accessing any node.
This multi-master architecture has many advantages. For example, the peer-to-peer role makes the system architecture simpler. There is no need to distinguish between master nodes, data nodes, and computing nodes. All nodes in the cluster have the same function. Therefore, it naturally avoids the problem of a single point of failure, and is very suitable for scenarios with multiple data centers and multiple activities in different places.

2.1.9 Data Fragmentation and Distributed Query

Data sharding is the horizontal division of data, which is an effective means to solve storage and query bottlenecks in the face of massive data scenarios, and is a manifestation of the idea of divide and conquer.
ClickHouse supports sharding, and sharding relies on clustering. Each cluster consists of one or more shards, and each shard corresponds to a service node of ClickHouse. The maximum number of shards depends on the number of nodes ( a shard can only correspond to one service node ).

10.1 Overview of Replicas and Sharding

Regardless of the difference in table engines, from a purely data perspective, sometimes there is only a thin line between replicas and shards.

10.2.3 Definition form of copy

The replica of ClickHouse adopts a multi-master architecture, and each replica instance can be used as an entry for data reading and writing.

10.4 Data Fragmentation

Each service node in ClickHouse can be called a shard (shard). ClickHouse's data sharding needs to be used in conjunction with the Distributed table engine.

10.4.1 How to configure the cluster

This section describes several typical distributed configurations.
Shards are more like logical groupings, and whether they are replicas or shards, their carriers are replicas, so from a certain point of view, replicas are also shards.

Shards without replicas: node, no shard/replica
Shards without replicas: no nodes, shard/replica
N shards and N replicas: no node, shard/replica

View distributed cluster node information

 SELECT * FROM system.clusters;

10.5 Analysis of Distributed Principle

Distributed table engine is synonymous with distributed. It does not store any data itself, but acts as a transparent proxy for data sharding and can automatically route data to each node in the cluster. Therefore, the distributed table engine needs to work together with other data table engines. .

This article is from qbit snap

ClickHouse Distributed Architecture (qbit)

foreword

extract

2.1.7 Multi-Master Architecture

2.1.9 Data Fragmentation and Distributed Query

10.1 Overview of Replicas and Sharding

10.2.3 Definition form of copy

10.4 Data Fragmentation

10.4.1 How to configure the cluster

10.5 Analysis of Distributed Principle

qbit

引用和评论

uvicorn 配置日志格式（qbit）

深度解析：通过 AIBrix 多节点部署 DeepSeek-R1 671B 模型

百万架构师第三十课：协调服务-zookeeper：了解zookeeper的核心原理｜JavaGuide

海量数据融合互通丨TiDB 在安徽省住房公积金监管服务平台的应用实践

百万架构师第二十九课：协调服务-zookeeper：初步认识zookeeper｜JavaGuide

演讲实录|分布式 Python 计算服务 MaxFrame 介绍及场景应用方案

架构师必看！现代应用架构发展趋势与数据库选型建议丨TiDB vs MySQL 专题（一）