Zookeeper study notes (1) basic concepts and simple use

Just a few words, I usually read the document directly to learn a new technology, and put some core points in the document into the article, because I want to record my thinking process, and also exercise my ability to read English documents.

Overview

First, we open the bing search engine and search for Zookeeper. Some students may ask, why do you want to open the bing search engine instead of Baidu. That's because currently searching for Zookeeper on Baidu, the official website is not found on the first page:

百度的搜索结果

百度第二页

But you open bing and search for Zookeeper:

bing的搜索结果

Personally, I feel that Baidu's search quality is showing signs of deterioration, so I use bing more recently.

Zookeeper文档

bird

ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch.
Zookeeper provides high-performance coordination services for distributed applications, and provides many services with simple interfaces, including domain name services, configuration management, distributed synchronization, and group services.

Interpretation 1: Domain Name Service, Configuration Management, Distributed Synchronization, and Group Service These four seem to be vaguely understood.

Interpretation 2 Provides high-performance coordination services for distributed applications. We like high-performance. What did we coordinate?

Let's talk about distribution

Let's talk about the evolution of the server-side system architecture. Initially, our services only ran on one server. Slowly, as the number of users continued to increase, the single-machine deployment could no longer meet the traffic. So people naturally think of a cluster, that is, the same application is deployed again, and Nginx or other middleware distributes requests to the machines in the cluster according to the load balancing algorithm. However, in order to pursue high reliability, we cannot make a single machine cluster, and change the port in the configuration file to realize the cluster, but this is not reliable. If there is any problem with this computer, the entire service will become unavailable. Put the eggs in one basket to maintain the high reliability of the service. We will serve on multiple computers, so that even if there is a problem with the service on one computer, the service is still available (now our service is still a single application) , which is also distributed deployment, so distributed does not necessarily have to be linked to microservices.

单机部署到集群部署

But this introduces new problems:

When a node fails and cannot provide services, how can it be known by the cluster and other nodes can take over the task?

Example: When the data volume and access volume continue to increase, the single-machine MySQL can no longer support the system's access volume. We start to build a cluster to improve the access capability of the database. In order to increase the reliability, we deploy multiple machines or even multiple locations.

数据库集群

Generally speaking, the performance of adding, deleting, and modifying is far less than the performance of query, so we select several database nodes for writing. For users' new data requests, they will be allocated to the writing node. After the writing node is completed, the data should be spread to the writing node. Other nodes, but there is a problem here is that if the write node hangs, then a natural operation is to select another read library from the library to respond to the request, and at the same time remove the dead node from the cluster.

How to ensure that tasks are executed only once in a distributed scenario.

Example: For a distributed timed task, services are deployed on both computers A and B. How to ensure that the timed task is executed only once.

This is also the coordination of Zookeeper.

Basic Concepts and Design Goals

The core concepts can be seen in the design goals.

ZooKeeper is simple. (Zookeeper is simple)

ZooKeeper allows distributed processes to coordinate with each other through a shared hierarchical namespace which is organized similarly to a standard file system. The namespace consists of data registers - called znodes, in ZooKeeper parlance - and these are similar to files and directories. Unlike a typical file system, which is designed for storage, ZooKeeper data is kept in-memory, which means ZooKeeper can achieve high throughput and low latency numbers.
Zookeeper implements the coordination of distributed processes through a namespace similar to the file system. The namespace is composed of data registers one by one. In Zookeeper, they are called znodes. ZNodes are similar to file system folders, but Zookeeper Choosing to keep data in memory means Zookeeper can achieve high throughput and low latency.

Zookeeper的命名空间

This is the namespace of Zookeeper, like the Linux file system, a typical tree structure, in fact, you can also compare it to the Windows file system, / is the root directory, this is the hard disk, and the following is the folder. Just like a folder has multiple subfolders, a znode also has multiple nodes that store data in the form of key/value. There are two types of Znodes, which are divided into temporary nodes and permanent nodes. The type of the node is determined when it is created and cannot be changed. The lifetime of ephemeral nodes depends on the session that created them. Once the session ends, the temporary node will be deleted automatically, of course, it can also be deleted manually. The temporary node is not allowed to have child nodes. The lifetime of a permanent node is independent of the session and can only be deleted when the client explicitly performs the delete operation. Znode also has a serialization feature. If specified at the time of creation, an incremental serial number will be automatically appended to the name of the Znode. The sequence number is unique to the parent node of this node, which records the order in which each child node was created.

Features of Znode nodes:

Having the characteristics of both files and directories, it not only maintains data, information, timestamps and other data like a file, but also can be used as a part of the path identification like a directory, and can have child Znodes. Users can add, delete, modify, check and other operations on Znode
Znode has atomic operations read operations will get all data related to the node, write operations will also replace all data of the node
There is a limit to the size of Znode storage data. The data size of each Znode is at most 1M, but it should be much smaller than this value in normal use.
Znodes are referenced by path, which must be absolute.

ZooKeeper is replicated
Like the distributed processes it coordinates, ZooKeeper itself is intended to be replicated over a set of hosts called an ensemble.
The servers that make up the ZooKeeper service must all know about each other. They maintain an in-memory image of state, along with a transaction logs and snapshots in a persistent store. As long as a majority of the servers are available, the ZooKeeper service will be available.
Clients connect to a single ZooKeeper server. The client maintains a TCP connection through which it sends requests, gets responses, gets watch events, and sends heart beats. If the TCP connection to the server breaks, the client will connect to a different server.
Like the distributed applications coordinated by it, Zookeeper itself also maintains consistency. Zookeeper in the cluster synchronizes memory state, as well as persistent logs and snapshots. As long as most of the servers are available, then the corresponding Zookeeper is available. .
The client connects to a single Zookeeper, sends requests, gets responses, gets listening events, and sends heartbeats through the connection. If the client's connection is disconnected, the client will connect to other Zookeepers.

zkservice

Conditional updates and watches
ZooKeeper supports the concept of watches . Clients can set a watch on a znode. A watch will be triggered and removed when the znode changes. When a watch is triggered, the client receives a packet saying that the znode has changed.
Zookeeper supports the concept of monitoring. The client can monitor the Znode. When the node is removed or changed, it will notify the listening client and receive a message when the node changes.

A little summary

Zookeeper uses the above features to achieve the functional features we mentioned above:

The domain name service maps IP to service name. If we need to call each other in our service cluster, then we can choose to store the IP and domain name in the Zookeeper node, and use the domain name to exchange for the corresponding IP address when calling.
Configuration management dynamically refreshes the configuration. Based on the monitoring mechanism, we store the configuration file in the Znode, the application monitors the corresponding Znode, and the Znode changes will push the changes to the corresponding application. That is, dynamic refresh configuration
The publication and subscription of data are also based on the monitoring mechanism
Distributed locks Processes on different hosts compete for unified resources. Zookeeper can be used to do distributed locks. For example, there are scheduled tasks configured on service A. In order to ensure that scheduled tasks A only run on one cluster, we can This task is accomplished with the help of distributed locks.
In order to let our services achieve stronger throughput and high availability, we choose distributed deployment, but in the computer world, a problem is usually solved by some technical means, and new problems will be introduced. During the deployment process, we encountered new problems, such as the coordination between nodes (leader is selected in the master-slave cluster), and the competition of resources. In order to solve these problems, Zookeeper came into being.
Why do you bring out the official documentation of Zookeeper, because I want to record my own learning process. I remember that when I first learned Java Web, I would go to station B to find videos, but when I watch videos, I sometimes think, How did the video authors come to this conclusion, and how did they come to the conclusion that Zookeeper can be used in this way, because I want to get first-hand information directly and have my own thinking process.

put it on first

Having said so much, let's use zookeeper first.

Zookeeper下载

This time, we installed and deployed under Linux. It is relatively slow to download from the official website of Zookeeper in China. We download it through the mirror:

 # 首先在cd 到usr下建zookeeper目录，然后在这个目录下建zk1、zk2、zk3.我们本次做集群部署
# zk1 下面执行下面命令 
wget https://mirrors.tuna.tsinghua.edu.cn/apache/zookeeper/zookeeper-3.7.1/apache-zookeeper-3.7.1-bin.tar.gz --no-check-certificate 
# 解压
tar -xzvf apache-zookeeper-3.7.1-bin.tar.gz
# 然后创建data logs  目录
mkdir data logs
# 将zk1 下面的所有文件复制到 zk2 zk3 下面一份
cp -r /usr/zookeeper/zk1/*  /usr/zookeeper/zk2/
cp -r /usr/zookeeper/zk1/*  /usr/zookeeper/zk3/
# zk1/data 下面建立myud 文件，此文件记录节点id,每个zookeeper节点都需要一个myid文件来记录节点在集群中的id,此文件只能由一个数字。
echo "1" >> /usr/zookeeper/zk1/data/myid
echo "2" >> /usr/zookeeper/zk2/data/myid
echo "3" >> /usr/zookeeper/zk3/data/myid
# 然后进入 apache-zookeeper-3.7.1-bin的conf文件夹下面，将配置文件zoo_sample.cfg重名为zoo.cfg。对该文件进行如下配置
mv zoo_sample.cfg  zoo.cfg
# 加入以下配置 dataDir 存储数据  dataLogDir 存储日志  clientPort 监听端口
dataDir=/usr/zookeeper/zk1/data 
dataLogDir=/usr/ZooKeeper/zk1/logs
clientPort=2181
server.1=127.0.0.1:8881:7771
server.2=127.0.0.1:8882:7772
server.3=127.0.0.1:8883:7773
#集群配置中模版为 server.id=host:port:port，id 是上面 myid 文件中配置的 id；ip 是节点的 ip，第一个 port 是节点之间通信的端口，第二个 port 用于选举 leader 节点
# 第一个编辑完,我们用复制指令将这个配置文件复制到zk2和zk3中。注意要改clientPort dataDir dataLogDir
 /usr/zookeeper/zk1/apache-zookeeper-3.7.1-bin/bin/zkServer.sh start
 /usr/zookeeper/zk2/apache-zookeeper-3.7.1-bin/bin/zkServer.sh start
 /usr/zookeeper/zk3apache-zookeeper-3.7.1-bin/bin/zkServer.sh start
 # 正常启动会输出 Starting zookeeper ... STARTED 如果不放心可以用jps指令进行监测

Add, delete, modify and check nodes

Just like Redis has Redis Cli, Zookeeper also has a corresponding client. We use this client to create node operations.

permanent node

 #连接zk1
/usr/zookeeper/zk1/apache-zookeeper-3.7.1-bin/bin/zkCli.sh -server 127.0.0.1:2181
# 创建一个节点 dog 是key 123 是value
create /dog 123 
# 获取目录中存储的值
get /dog
# 现在连接zk2 获取dog节点
/usr/zookeeper/zk2/apache-zookeeper-3.7.1-bin/bin/zkCli.sh -server 127.0.0.1:2181
# 获取dog目录中存储的值 会发现能够获取的到
get /dog

temporary node

 # 连接zk1 创建临时节点 -e 代表临时节点
create -e /dog/cat  123
# 连接zk2 获取/dog/cat
get /dog/cat
# 在zk1中输入quit指令,断掉当前会话
quit
# 在zk2就获取不到了

创建临时节点

zk2能获取到

zk2就获取不到了

Classic case base: Fair distributed lock based on Znode temporary sequential node + Watcher mechanism

The principle is as follows:

公平分布式锁

Request A first comes to Zookeeper to request the creation of a temporary sequence node. Zookeeper generates a node for request A, and requests A to check whether its own sequence number in the lock directory is the smallest. If it means that the lock is successful, B listens for changes in the node sequence value less than its own node. If A executes, B acquires the lock. If there are more clients such as C, D, etc. listening, the reason is the same.

 create -s -e /dog/pig  s #在dog下创建临时顺序节点
# 返回值Created /dog/pig0000000001

write at the end

In fact, Zookeeper has other functions, as follows:

Publish and subscribe to data
Service registration and discovery
Distributed Configuration Center
naming service
Distributed lock
Master election
load balancing
Distributed queue

Only the basic concepts and applications are introduced here. I hope it will be helpful for everyone to learn Zookeeper. Putting English comments is also to improve the level of reading technical documents in English.

References

Explain ZooKeeper's application scenarios and architecture from 0 to 1 WeChat public account Tencent Technology Engineering
Summary of zookeeper knowledge points https://www.cnblogs.com/reycg-blog/p/10208585.html#zookeeper-%E6%98%AF%E4%BB%80%E4%B9%88
Getting started with zookeeperhttps: //zookeeper.readthedocs.io/zh/latest/index.html#
Nginx load balancing When one of the servers hangs, what will happen to the Nginx load? https://blog.csdn.net/Tomwildboar/article/details/115382121
Simple implementation of MySQL master-master load balancing based on zookeeper https://www.cnblogs.com/TomSnail/p/4389297.html

Zookeeper study notes (1) basic concepts and simple use

Overview

Let's talk about distribution

Basic Concepts and Design Goals

A little summary

put it on first

Add, delete, modify and check nodes

write at the end

References

北冥有只鱼

引用和评论

从阻塞IO到io_uring: Linux IO模型的演进之路

Spring-@Configuration注解简析

还在用命令行监控服务器？试试这款监控工具吧，直观又易用！

单元测试-PowerMock

实现钉钉登录第三方网站功能

注册中心

springboot初始化数据库+druid解密