6
头图

Etcd is a key/value storage service using consistent hashing algorithm (Raft) in a distributed environment. takes advantage of Etcd's characteristics, applications can share information, configuration or service discovery in the cluster, Etcd will replicate these data in each node of the cluster and ensure that the data is always correct.

System Requirements >= 8v CPU + 16GB RAM + 50GB SSD

图片

Install and use

Static is to know the address of the node and the size of the cluster before configuring the service

Source code compilation and installation

############################
# Build the latest version
############################

# 1.下载项目并编译
$ git clone https://github.com/etcd-io/etcd.git && cd etcd
$ ./build
To build a vendored etcd from the master branch via go get:

# 2.设置GOPATH环境变量
$ export GOPATH='/Users/example/go'
$ go get -v go.etcd.io/etcd
$ go get -v go.etcd.io/etcd/etcdctl

# 3.启动服务
$ ./bin/etcd
$ $GOPATH/bin/etcd

# 4.简单使用
$ ./bin/etcdctl put foo bar
OK

图片

Deploy single machine and single service (static)

##################################
# Running etcd in standalone mode
##################################

# 1.设置启动的Node地址
$ export NODE1='172.16.176.52'

# 2.创建一个逻辑存储
$ docker volume create --name etcd-data

# 3.启动etcd服务
# 正式的ectd端口是2379用于客户端连接,而2380用于伙伴通讯
# --data-dir: 到数据目录的路径
# --initial-advertise-peer-urls: 集群中节点间通讯的URL地址
# --listen-peer-urls: 集群中节点间通讯的URL地址
# --advertise-client-urls: 客户端监听的URL地址
# --listen-client-urls: 客户端监听的URL地址
# --initial-cluster: 启动初始化集群配置
$ docker run -p 2379:2379 -p 2380:2380 --name etcd \
    --volume=etcd-data:/etcd-data \
    quay.io/coreos/etcd:latest \
    /usr/local/bin/etcd \
    --data-dir=/etcd-data --name node1 \
    --initial-advertise-peer-urls http://${NODE1}:2380 \
    --listen-peer-urls http://0.0.0.0:2380 \
    --advertise-client-urls http://${NODE1}:2379 \
    --listen-client-urls http://0.0.0.0:2379 \
    --initial-cluster node1=http://${NODE1}:2380

# 4.列出现在集群中的服务状态
$ etcdctl --endpoints=http://${NODE1}:2379 member list

图片

Deploy distributed cluster service (static)

################################
# Running a 3 node etcd cluster
################################

# node1
docker run -p 2379:2379 -p 2380:2380 --name etcd-node-1 \
  --volume=/var/lib/etcd:/etcd-data \
  quay.io/coreos/etcd:latest \
  /usr/local/bin/etcd \
  --data-dir=/etcd-data \
  --initial-advertise-peer-urls "http://10.20.30.1:2380" \
  --listen-peer-urls "http://0.0.0.0:2380" \
  --advertise-client-urls "http://10.20.30.1:2379" \
  --listen-client-urls "http://0.0.0.0:2379" \
  --initial-cluster "etcd-node-1=http://10.20.30.1:2380, etcd-node-2=http://10.20.30.2:2380, etcd-node-3=http://10.20.30.3:2380" \
  --initial-cluster-state "new" \
  --initial-cluster-token "my-etcd-token"

# node2
docker run -p 2379:2379 -p 2380:2380 --name etcd-node-2 \
  --volume=/var/lib/etcd:/etcd-data \
  quay.io/coreos/etcd:latest \
  /usr/local/bin/etcd \
  --data-dir=/etcd-data \
  --initial-advertise-peer-urls "http://10.20.30.2:2380" \
  --listen-peer-urls "http://0.0.0.0:2380" \
  --advertise-client-urls "http://10.20.30.2:2379" \
  --listen-client-urls "http://0.0.0.0:2379" \
  --initial-cluster "etcd-node-1=http://10.20.30.1:2380, etcd-node-2=http://10.20.30.2:2380, etcd-node-3=http://10.20.30.3:2380" \
  --initial-cluster-state "new" \
  --initial-cluster-token "my-etcd-token"

# node3
docker run -p 2379:2379 -p 2380:2380 --name etcd-node-3 \
  --volume=/var/lib/etcd:/etcd-data \
  quay.io/coreos/etcd:latest \
  /usr/local/bin/etcd \
  --data-dir=/etcd-data \
  --initial-advertise-peer-urls "http://10.20.30.3:2380" \
  --listen-peer-urls "http://0.0.0.0:2380" \
  --advertise-client-urls "http://10.20.30.3:2379" \
  --listen-client-urls "http://0.0.0.0:2379" \
  --initial-cluster "etcd-node-1=http://10.20.30.1:2380, etcd-node-2=http://10.20.30.2:2380, etcd-node-3=http://10.20.30.3:2380" \
  --initial-cluster-state "new" \
  --initial-cluster-token "my-etcd-token"

# run etcdctl using API version 3
docker exec etcd /bin/sh -c "export ETCDCTL_API=3 && /usr/local/bin/etcdctl put foo bar"

Deploy distributed cluster services

# 编辑docker-compose.yml文件
version: "3.6"

services:
  node1:
    image: quay.io/coreos/etcd
    volumes:
      - node1-data:/etcd-data
    expose:
      - 2379
      - 2380
    networks:
      cluster_net:
        ipv4_address: 172.16.238.100
    environment:
      - ETCDCTL_API=3
    command:
      - /usr/local/bin/etcd
      - --data-dir=/etcd-data
      - --name
      - node1
      - --initial-advertise-peer-urls
      - http://172.16.238.100:2380
      - --listen-peer-urls
      - http://0.0.0.0:2380
      - --advertise-client-urls
      - http://172.16.238.100:2379
      - --listen-client-urls
      - http://0.0.0.0:2379
      - --initial-cluster
      - node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380
      - --initial-cluster-state
      - new
      - --initial-cluster-token
      - docker-etcd

  node2:
    image: quay.io/coreos/etcd
    volumes:
      - node2-data:/etcd-data
    networks:
      cluster_net:
        ipv4_address: 172.16.238.101
    environment:
      - ETCDCTL_API=3
    expose:
      - 2379
      - 2380
    command:
      - /usr/local/bin/etcd
      - --data-dir=/etcd-data
      - --name
      - node2
      - --initial-advertise-peer-urls
      - http://172.16.238.101:2380
      - --listen-peer-urls
      - http://0.0.0.0:2380
      - --advertise-client-urls
      - http://172.16.238.101:2379
      - --listen-client-urls
      - http://0.0.0.0:2379
      - --initial-cluster
      - node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380
      - --initial-cluster-state
      - new
      - --initial-cluster-token
      - docker-etcd

  node3:
    image: quay.io/coreos/etcd
    volumes:
      - node3-data:/etcd-data
    networks:
      cluster_net:
        ipv4_address: 172.16.238.102
    environment:
      - ETCDCTL_API=3
    expose:
      - 2379
      - 2380
    command:
      - /usr/local/bin/etcd
      - --data-dir=/etcd-data
      - --name
      - node3
      - --initial-advertise-peer-urls
      - http://172.16.238.102:2380
      - --listen-peer-urls
      - http://0.0.0.0:2380
      - --advertise-client-urls
      - http://172.16.238.102:2379
      - --listen-client-urls
      - http://0.0.0.0:2379
      - --initial-cluster
      - node1=http://172.16.238.100:2380,node2=http://172.16.238.101:2380,node3=http://172.16.238.102:2380
      - --initial-cluster-state
      - new
      - --initial-cluster-token
      - docker-etcd

volumes:
  node1-data:
  node2-data:
  node3-data:

networks:
  cluster_net:
    driver: bridge
    ipam:
      driver: default
      config:
      -
        subnet: 172.16.238.0/24
# 使用启动集群
docker-compose up -d


# 之后使用如下命令登录到任一节点测试etcd集群
docker exec -it node1 bash

# etcdctl member list
422a74f03b622fef, started, node1, http://172.16.238.100:2380, http://172.16.238.100:2379
ed635d2a2dbef43d, started, node2, http://172.16.238.101:2380, http://172.16.238.101:2379
daf3fd52e3583ffe, started, node3, http://172.16.238.102:2380, http://172.16.238.102:2379

etcd common configuration parameters

--name       #指定节点名称
--data-dir   #指定节点的数据存储目录,用于保存日志和快照
--addr       #公布的 IP 地址和端口;默认为 127.0.0.1:2379
--bind-addr   #用于客户端连接的监听地址;默认为–addr 配置
--peers       #集群成员逗号分隔的列表;例如 127.0.0.1:2380,127.0.0.1:2381
--peer-addr   #集群服务通讯的公布的 IP 地址;默认为 127.0.0.1:2380
-peer-bind-addr  #集群服务通讯的监听地址;默认为-peer-addr 配置
--wal-dir         #指定节点的 wal 文件的存储目录,若指定了该参数 wal 文件会和其他数据文件分开存储
--listen-client-urls #监听 URL;用于与客户端通讯
--listen-peer-urls   #监听 URL;用于与其他节点通讯
--initial-advertise-peer-urls  #告知集群其他节点 URL
--advertise-client-urls  #告知客户端 URL
--initial-cluster-token  #集群的 ID
--initial-cluster        #集群中所有节点
--initial-cluster-state new  #表示从无到有搭建 etcd 集群
--discovery-srv  #用于 DNS 动态服务发现,指定 DNS SRV 域名
--discovery      #用于 etcd 动态发现,指定 etcd 发现服务的 URL

data storage

The data storage of etcd is a bit like the storage method of PG database

etcd currently supports two major versions, V2 and V3. These two versions are quite different in implementation. On the one hand, it is the way to provide external interfaces, and on the other hand, it is the underlying storage engine. The instance of the V2 version is a pure memory. In the realization of, all the data is not stored on the disk, and the V3 version of the instance supports the persistence of the data.

图片

We all know that etcd provides us with key/value service catalog storage.

# 设置键值对
$ etcdctl set name escape

# 获取方式
$ etcdctl get name
escape

After using etcd, we will wonder where the data is stored? The data will be stored in the /var/lib/etcd/default/ directory by default. We will find that the directory where the data is located will be divided into two folders, snap and wal directories.

  • snap
  • Store snapshot data, store etcd data status
  • etcd to prevent too many WAL files and set snapshots
  • wal
  • Store write-ahead logs
  • The biggest role is to record the entire history of the entire data change
  • In etcd, all data changes must be written to WAL before submission
# 目录结构
$ tree /var/lib/etcd/default/
default
└── member
    ├── snap
    │   ├── 0000000000000006-0000000000046ced.snap
    │   ├── 0000000000000006-00000000000493fe.snap
    │   ├── 0000000000000006-000000000004bb0f.snap
    │   ├── 0000000000000006-000000000004e220.snap
    │   └── 0000000000000006-0000000000050931.snap
    └── wal
        └── 0000000000000000-0000000000000000.wal

The use of WAL for data storage enables etcd to have two important functions, namely fast recovery from failures and data rollback/redo.

  • Fast failure recovery means that when your data is damaged, you can quickly recover from the original data to the state before the data damage by performing all the modification operations recorded in the WAL.
  • Data rollback and redo is because all modification operations are recorded in the WAL and need to be rolled back or redone, and only the operations in the log are executed in the direction or forward direction.

Now that WAL stores all changes in real time, why do we need snapshots? As usage increases, the data stored by WAL will explode. In order to prevent the disk from filling up quickly, etcd defaults to a snapshot operation every 10,000 records, and the WAL files after the snapshot can be deleted. The default number of historical etcd operations that can be queried through the API is 1000.

At the first startup, etcd will store the startup configuration information in the data directory specified by the data-dir parameter. The configuration information includes the ID of the local node, the cluster ID, and the initial cluster information. Users need to avoid restarting etcd from an expired data directory, because nodes started with an expired data directory will be inconsistent with other nodes in the cluster. Therefore, in order to maximize the security of the cluster, once there is any possibility of data corruption or loss, you should remove this node from the cluster, and then add a new node without a data directory.

Raft algorithm

Consensus algorithm to ensure consistency

In every distributed system, etcd often plays a very important position. Since many service configuration discovery and configuration information are stored in etcd, the upper limit of the availability of the entire cluster is often the availability of etcd, and use 3 ~ 5 A highly available cluster composed of an etcd node is often a routine operation.

It is precisely because etcd will start multiple nodes in the process of using, how to deal with the distributed consistency between several nodes is a more challenging problem. The solution to the data consistency of multiple nodes is actually the consensus algorithm, and etcd uses the Raft consensus algorithm.

图片

From the beginning, Raft was designed as a consensus algorithm that is easy to understand and implement. It is similar to the Paxos protocol in fault tolerance and performance, except that it decomposes the problem of distributed consistency into several sub-problems, and then proceed one by one. solve.

Each Raft cluster contains multiple servers. At any time, each server can only be in the three states of Leader, Follower and Candidate; in the normal state, there will only be one Leader state in the cluster, and the rest of the servers All follower status.

图片

All follower nodes are passive. They will not actively make any requests, but will only respond to requests made by Leader and Candidate. For each user's variable operation, it will be routed to the Leader node for processing. Except for the Leader and Follower nodes, the Candidate node is actually just a temporary state during the operation of the cluster.

The time in the Raft cluster is also divided into several terms (Terms). Each term will start with the election of the leader. After the election, it will enter the normal operation stage. A new term will not start until the Leader node has a problem Wheel of choice.

图片

Each server will store the latest tenure of the current cluster. It is like a monotonically increasing logical clock that can synchronize the state between each node. The tenure held by the current node will be passed to other nodes with each request. . The Raft protocol selects a node from a cluster as the leader node of the cluster at the beginning of each term of office. This node is responsible for the replication and management of logs in the cluster.

图片

We divide the Raft protocol into three sub-problems: node election, log replication, and security .

Service discovery

Service discovery is one of the main uses of etcd services

Service discovery is also one of the most common problems in distributed systems, that is, how can processes or services in the same distributed cluster find each other and establish a connection? Essentially, service discovery is to know whether there are processes in the cluster listening on UDP or TCP ports, and to find and connect by name. To solve the problem of service discovery, the following three pillars are needed, none of which are indispensable.

  • A strongly consistent and highly available service storage catalog
  • Etcd based on Raft algorithm is born such a highly consistent and highly available service storage directory
  • A mechanism for registering services and monitoring the health status of services
  • Users can register services in etcd, and set the key TTL value for the registered services, and keep the heartbeat of the service regularly to achieve the effect of monitoring the health of the service.
  • A mechanism for finding and connecting services
  • In order to ensure the connection, we can deploy a proxy mode etcd on each service machine, so that we can ensure that the services that can access the etcd cluster can be connected to each other.

图片

图片

In the daily development of cluster management functions, if you want to design, you can dynamically adjust the size of the cluster. So first of all, we must support service discovery, that is, when a new node is started, it can register its own information with the master, and then let the master add it to the cluster, and then delete itself from the cluster after it is shut down. etcd provides a good basic function of service registration and discovery. When we use etcd for service discovery, we can devote our energy to the business processing of the service itself.

Instructions

图片

etcd adopts a hierarchical spatial structure in the organization of keys, similar to the concept of directories in a file system. Database operations revolve around the management of the CRUD complete life cycle of keys and directories. etcdctl is a command line client. It provides a concise command, which can be understood as a command tool set, which can facilitate us to test the service or manually modify the database content. etcdctl is similar to other xxxctl command principles and operations, such as systemctl and so on.

  • Object as key value

图片

  • Object is directory

图片

  • Non-database operation commands

图片

# 列出etcd集群中的实例
$ etcdctl member list

# 添加etcd集群中的实例
$ etcdctl member add <实例>

# 删除etcd集群中的实例
$ etcdctl member remove <实例>

# 更新etcd集群中的实例
$ etcdctl member update <实例>
  • etcdctl common configuration parameters
--name #指定节点名称

Disaster recovery

Etcd cluster backup and data recovery and optimization of operation and maintenance

etcd is designed to withstand the cluster automatically recovering from temporary failures (such as machine restarts), and tolerate (N-1)/2 continuous failures for a cluster of N members. When a member continues to fail, whether because of hardware failure or disk damage, it loses access to the cluster. If the cluster continues to lose more than (N-1)/2 members, it can only fail miserably and lose its quorum hopelessly. Once the quorum is lost, the cluster cannot reach consistency and cannot continue to receive updates. In order to recover data from disaster failures, etcd v3 provides snapshot and repair tools to rebuild the cluster without losing v3 key data.

etcd certificate creation

Since the v3 version of etcd certificate is based on IP, it is necessary to recreate the certificate every time an etcd node is added. For our more convenient use, you can check this link to make etcd certificate. Details: https://github.com/cloudflare/cfssl .

Snapshot key space

To restore the cluster, you first need a snapshot of the key space from the etcd member. It can be quickly obtained from active members with the etcdctl snapshot save command. Or copy the member/snap/db file from the etcd data directory. For example, the following command snapshots the keyspace served on $ENDPOINT to the file snapshot.db.

# 集群备份etcd数据快照
# $ENDPOINT => http://10.20.30.1:2379
$ ETCDCTL_API=3 etcdctl --endpoints $ENDPOINT snapshot save snapshot.db

# 在单节点etcd上执行下面的命令就可以对etcd进行数据备份
# 每两个小时备份一次数据并上传到S3上,并保留最近两天的数据
$ ETCDCTL_API=3 etcdctl snapshot  save /var/lib/etcd_backup/etcd_$(date "+%Y%m%d%H%M%S").db

Restore the cluster

To restore the cluster, use the snapshot "db" file backed up on any node before. The hand of restoration can use the etcdctl snapshot restore command to restore the etcd data directory. At this time, all members should use the same snapshot to restore. Because some snapshot metadata (especially member ID and cluster ID) information will be overwritten after the data is restored, the members in the cluster may lose their previous identification. Therefore, in order to start the cluster from the snapshot, a new logical cluster must be started for recovery.

When restoring, verification of the integrity of the snapshot is optional. If the snapshot is obtained through etcdctl snapshot save, when using the etcdctl snapshot restore command to restore, the integrity of the hash value will be checked. If the snapshot is copied from the data directory, there is no integrity check, so it can only be restored by using --skip-hash-check.

Create a new etcd data directory for a 3-member cluster as follows:

$ etcdctl snapshot restore snapshot.db \
  --name m1 \
  --initial-cluster m1=http:/host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls http://host1:2380

$ etcdctl snapshot restore snapshot.db \
  --name m2 \
  --initial-cluster m1=http:/host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls http://host2:2380

$ etcdctl snapshot restore snapshot.db \
  --name m3 \
  --initial-cluster m1=http:/host1:2380,m2=http://host2:2380,m3=http://host3:2380 \
  --initial-cluster-token etcd-cluster-1 \
  --initial-advertise-peer-urls http://host3:2380

Next, start the etcd service with the new data directory. Now the restored cluster can use and provide the keyspace service from the snapshot.

$ etcd \
  --name m1 \
  --listen-client-urls http://host1:2379 \
  --advertise-client-urls http://host1:2379 \
  --listen-peer-urls http://host1:2380 &

$ etcd \
  --name m2 \
  --listen-client-urls http://host2:2379 \
  --advertise-client-urls http://host2:2379 \
  --listen-peer-urls http://host2:2380 &

$ etcd \
  --name m3 \
  --listen-client-urls http://host3:2379 \
  --advertise-client-urls http://host3:2379 \
  --listen-peer-urls http://host3:2380 &

Author: Escape
https://escapelife.github.io/posts/90d30a1e.html


民工哥
26.4k 声望56.7k 粉丝

10多年IT职场老司机的经验分享,坚持自学一路从技术小白成长为互联网企业信息技术部门的负责人。2019/2020/2021年度 思否Top Writer