What is the Gossip protocol?

Hello, I am crooked.

On New Year's Day, I saw a very outrageous rumor. I won't say what it is. I'm afraid it will dirty everyone's eyes.

However, I saw that one of the people who passed on in a group was a vivid and vivid one, and everyone was discussing it in a lively manner, as if everyone was there.

I should have just laughed about this. But after a while, I suddenly slapped my thigh: This is material.

I can talk to you about a consensus algorithm.

When it comes to consensus algorithms, the first thing that comes to mind should be strong consensus algorithms such as Raft, Paxos, and Zab algorithms, which are difficult to understand.

But there is another consensus algorithm with weak consistency that is easier to understand, the Gossip protocol.

Gossip, look at this word first, circle it, it is to be tested, this is a sixth-level vocabulary, and it is also a postgraduate word, meaning "gossip".

Next, I will take you a brief look at what this "rumor" is all about.

Gossip protocol

The earliest proposal of the Gossip protocol can be traced back to a paper published in 1987: "Epidemic Algorithms for Replicated Database Maintenance"

http://bitsavers.trailing-edge.com/pdf/xerox/parc/techReports/CSL-89-1_Epidemic_Algorithms_for_Replicated_Database_Maintenance.pdf

When I first saw the title of this paper, I was stunned: there is no keyword for Gossip.

Mainly the two words Epidemic Algorithms, which I happen to know.

Algorithms, algorithms, nothing to say.

What is Epidemic?

Keep up with current events:

So Epidemic Algorithms translates to epidemic algorithm.

Therefore, the scientific name of Gossip should be called "epidemic algorithm", but everyone prefers to call it Gossip. After all, what if you don't like to hear a little "gossip"?

Before talking about the thesis, simply set the tone.

What do you think is the most basic, core, and important action of the consensus protocol?

Is the data updated?

In order to ensure the consistency of the data of each node, the update operation of the data must be involved.

Therefore, in the introduction part of the paper, three methods are described to update the data:

Direct mail
Anti-entropy
Rumor mongering

Direct mail

Let's not talk nonsense, let's go directly to the picture:

What does the picture above mean?

It is a total of eight small dots. Assuming that each represents a server, they are all in an equal relationship, and there is no central node, master-slave relationship.

The top red node indicates that the node has data changed, so the changed data is directly notified to the remaining nodes.

The same is true if data changes occur on other nodes.

It can be simply understood as a loop traversal. Every time a data change occurs, in order to maintain the consistency of the data, a loop traversal must be performed.

The advantages of this scheme are obvious: simple, crude, and direct.

But the disadvantages are as obvious as the advantages. Let's see what the paper says:

Mainly look at the part of but:

Not completely reliable at first, as this requires that every site must be aware of the existence of all sites. But the reality is that some sites don't always know about all other sites.

Then, information (mail) is sometimes lost, and once lost, even eventual consistency cannot be guaranteed, and the whole thing is cool.

In fact, Direct mail (direct mail) is not the main scheme discussed in the paper, and writing it in the first one serves as a guide.

We mainly talk about the two schemes Anti-entropy and Rumor-Mongering.

First set the overall tone:

Anti-entropy (anti-entropy), is to spread all the data on the node
Rumor-Mongering is the propagation of newly arrived or changed data on a node

To put it bluntly, one is full and the other is incremental.

Anti-entropy

Some students may feel inexplicable about the word "anti-entropy", in fact, they mainly do not understand what "entropy" is.

In fact, to put it bluntly, the popular understanding of "entropy" is "the degree of confusion".

For example, in your room, if you don't organize it, the arrangement of your items will become more and more chaotic, that is, more and more "entropy". And this operation of tidying up the room is "anti-entropy".

You can simply understand this thing like this first, and I will not explain it to you for a while. If you want to talk about this thing, you must rise to the height of the universe and philosophy.

I'm mainly afraid that you won't understand.

In the paper, the Anti-entropy mode is described as follows:

Each server randomly chooses another server on a regular basis, and the two exchange their content to smooth out any differences between them, which is very reliable.

(but started) But you need to check the full content of the respective servers. The implication is that the amount of data is slightly larger, so it cannot be used too frequently.

Experiments show that anti-entropy, while reliable, propagates updates much slower than direct mail.

If they are not synchronized, then the data difference between the two is getting bigger and bigger, that is, more and more entropy.

The purpose of synchronization is to reduce differences and achieve eventual consistency, which is anti-entropy.

A definition is just that.

Rumor mongering

Compared with anti-entropy, rumors are easy to understand literally.

For example, I'm a college student, and I don't know the whole school. But there are innumerable connections among the students in the school.

Suppose one day, I happened to meet the school flower alone walking on the road, I went up to discuss with her about consensus algorithms and other related issues in the computer field, we had in-depth discussions on these issues and exchanged understanding and views with each other.

What we mean here is that the whole process is the more intense the discussion, and I don't know how to walk and walk to Lover's Slope.

There should be a place called Lover's Slope in every university.

Then another girl saw it. She said to her best friend: Do you know crooked? Yes, it is the freshman, the handsome ratio. I would have seen him and the school flower strolling around Lover's Slope.

Then one pass ten, ten pass one hundred. All the teachers and students in the school knew the news.

The news that "Waiwai and Xiaohua are strolling on Lover's Slope" reached the final consistency through gossip's rumor mode.

The difference between "rumour spread" and "anti-entropy" is that only new or changed information is transmitted, without the need to transmit the full amount of information.

For example, in the above example, it is only necessary to synchronize the latest news of "Waiwai and Xiaohua are strolling over the lover's slope".

There is no need to synchronize information such as "Who is Wai Wai, who is the school beauty, where is the lover Poe" and so on.

When referring to "rumor" and "anti-entropy", the paper also has such a definition:

simple epidemics: simple infectious diseases

In this mode, there are two states: infectious and susceptible.

A node in the infective state means that it has data updates and needs to share (infect) the data to other nodes.

A node in the susceptible state means that it has not received data updates from other nodes (not infected).

So, when I mention "infection" later, you should know that I saw it from here, not made up.

Regarding "rumour spread" and "anti-entropy", borrowing a more serious description from Mr. Zhou Zhiming's "Phoenix Architecture", it is as follows:

http://icyfenix.cn/distribution/consensus/gossip.html

There is a certain contradiction between the time spent to achieve consistency and the redundancy of messages in network propagation. If one is to be improved, the other will be deteriorated.

From this, Gossip designed two possible modes of message dissemination: Anti-Entropy and Rumor-Mongering, both of which are quite literary.

Entropy is a concept that is rare in life but very commonly used in science. It represents the degree of chaos in things.

Anti-entropy means anti-chaos, with the goal of improving the similarity between nodes in the network, so in anti-entropy mode, all data of nodes will be synchronized to eliminate the differences between nodes, and the goal is all nodes in the entire network complete agreement.

However, under the premise that the node itself will change, this goal will make the number of messages in the entire network very large, which will bring huge transmission overhead to the network.

In the rumor mode, the goal is to spread the message, and only the data of the newly arrived node is sent, that is, only the change information is sent to the outside world. In this way, the amount of message data will be significantly reduced, and the network overhead will be relatively small.

a website

Showdown, in fact, I saw this website before I decided to write this article.

Because this website directly has a very simulated animation to simulate the synchronization process of the gossip protocol, a moving picture is worth a thousand words.

The address is put here first, you can visit and play for yourself:

https://flopezluis.github.io/gossip-simulator/

Let me show you how it works:

Don't care if you don't understand it, this thing at least looks very powerful.

Here's how it works:

First we look at Nodes and Fanout here.

Nodes is actually very easy to understand. It is the number of nodes. The 40 here represents the number of small circles below. For example, if I am 18 years old this year, then I will change it to 18 and it will look like this:

Mainly, what is this Fanout?

In the carousel picture at the head of this page, the first picture is like this:

The answer lies in this Learn more.

https://managementfromscratch.wordpress.com/2016/04/01/introduction-to-gossip/

This passage explains what is Fanout. At the same time, it also briefly introduces the basic working principle of the gossip protocol.

It says that the gossip protocol is very simple in concept and very simple in coding. The basic idea behind them is this:

A node wants to share some information with other nodes in the network. It then periodically selects a random node from the set of nodes and exchanges information, and the node that receives the information does the same.
This information is sent periodically to N targets, N is called fanout.

Therefore, the preceding Fanout=4 means that a certain node will synchronize the information it wants to share to the other 4 nodes in the cluster each time.

It should look like this in the emulator:

In the image above you can see that there are many lines, but their starting point is a red node.

This red node means that you can click one or more of the small circles at will with the mouse. Once the mouse is clicked, it will turn red, and it is finished. The red code means "infected".

How did the lines above come about?

After you have a small red circle, click "Show Paths" above and the path will appear:

But isn't it good to say Fanout=4, why so many paths?

Because, although this node knows about so many other nodes, it will only choose 4 of them to infect.

The above picture is still a bit complicated, so I reduced the parameters a little bit, so it looks a lot cleaner:

There is a node in the cluster that has updated information. This node knows the existence of the other 5 nodes, but it will only push the information to two of them. After clicking the Send Message button, it will look like this:

You can find that there are already three red nodes in the above graph, and two paths have become thicker, meaning that the meaning is propagated from this path.

The entire cluster will eventually complete the "infection" and achieve eventual consistency:

At the same time, the gossip protocol is also fault-tolerant:

According to the prompts on the page, we can delete a part of the path through the "Delete" button, such as the following:

Deleting two paths means that these nodes are unreachable, but eventually the cluster will still be infected.

Let's show it again with a moving picture. You can see that after the path is deleted, this node will no longer communicate with the corresponding node, but the entire cluster still achieves convergence:

You can also open the website to play it yourself, there is a little trick like this:

Click the Play button, you can pause at any time, which makes it easier to observe the entire process of propagation.

Finally, about this picture, there is another key thing that has not been said, which is the formula inside:

This formula is also mentioned in Learn More. In fact, it is the complexity of the gossip protocol, O(logN):

For example, if Fanout=4 is set each time, the relationship between the number of nodes and the estimated propagation rounds is as follows:

40 nodes, 2.66 rounds
80 nodes, 3.16 rounds
160 nodes, 3.66 rounds
320 nodes, 4.16 rounds
640 nodes, 4.66 rounds
...

It can be seen that as the number of nodes doubles, the number of propagation rounds does not increase significantly.

This is the word mentioned in the previous Learn More screenshot: Scalable

This is a fourth-level vocabulary, you can take the test, remember it, it means "scalable".

Clusters using gossip protocol, Scalable is very nice.

Other notes

On this website, the most important thing is its animation simulation function, but don't ignore the description of other parts in it.

For example, this passage is very important to me.

There are two issues raised in this passage, and I will address them one by one.

First it says that during the simulation of the website, all nodes sending messages seem to be synchronously, as if there is a global loop.

This is done in the simulation because it looks more intuitive.

However, in a true gossip protocol, each node has its own cycle, and there is no and no need for synchronization between them at all.

What does the above mean?

Let me put it more bluntly. When each node synchronizes messages, it processes it according to its own cycle, such as once every 10 seconds. You don't care when other nodes trigger the synchronization message operation at all, you just need to take care of yourself.

The second question I think is very important:

How do the nodes know about each other?

How do nodes know the existence of other nodes?

One way is that when a node joins a cluster, it must know information about a node in the cluster. From the previous animation, we know that if a node is known by another node, it will eventually be infected.

Then the question arises: how does a new node know the information of a node in the cluster when it joins?

Very simple, one solution I know is manual designation.

Redis cluster uses the gossip protocol to exchange information. When a new node is to be added to the cluster, a meet command is required.

http://www.redis.cn/commands/cluster-meet.html

This thing is artificial designation.

Another thing to note is this:

Here is another mock site mentioned:

https://www.serf.io/docs/internals/simulator.html

It can measure the time for the cluster to reach consensus by controlling these parameters.

The above figure shows the situation when the information exchange frequency (GOSSIP INTERVAL) is 0.2s, the number of Fanout nodes is 3, the total number of nodes is 30, and the packet loss rate and node failure rate are 0.

In this case, the corresponding time graph to reach eventual consistency is as follows:

Basically it's done in a second.

You can also modify the parameters yourself to see the changes in the corresponding time chart.

For example, if I only modify the number of nodes, and change it from 30 to 3000, the time graph becomes like this:

Convergence was completed around 1.75s.

The node expands by 100 times, but the time increases by less than 1s, which is really excellent.

This thing is good or not, but I'll show you an exciting one to feel the scale of the spread of this horror:

It can be seen from the animation that the first or two transmissions were fine, at least the eyes could see a rough idea, but in the next few rounds, most nodes were infected, but they continued to spread the news.

The amount of news is simply numb to the scalp.

Six Degrees of Separation Theory

Finally, let's talk about an interesting thing, called "Six Degrees of Separation Theory":

In 1967, Harvard psychology professor Stanley Milgram wanted to describe a web of human connections that connects people to communities. I did a chain letter experiment and found the phenomenon of "six degrees of separation". Simply put: "The distance between you and any stranger is no more than six people, that is, you can know any stranger through at most six people.

The Six Degrees of Separation Theory, also known as the Small World Theory. This is actually inextricably linked to the Gossip protocol.

I saw a related video on Xiaopo. I think the explanation is quite clear. If you are interested, you can check it out:

https://www.bilibili.com/video/BV1b7411B7D2?t=31

.png)

In the video, there is a picture like this:

Guys, isn't this a copy of our previous website, it looks so sweet.

When this theory was first put forward, it was still "you can know any stranger through at most six people".

But with the rapid development of social networks in recent years, the earth has been reduced to the concept of a "village".

So this number is gradually decreasing:

.png)

And if the scope is narrowed a little, such as limited to the small scope of programmers, it will be even smaller.

Sometimes I pull up a business docking group, go in and see the good guys and former colleagues, you say how big this circle can be.

This article has been included in the personal blog, welcome everyone to play.

https://www.whywhy.vip/

What is the Gossip protocol?

Gossip protocol

Direct mail

Anti-entropy

Rumor mongering

a website

Other notes

Six Degrees of Separation Theory

why技术

引用和评论

面试场景题：一次关于线程池使用场景的讨论。

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性