[Java Interview] What is ISR and why do I need to introduce ISR

Hi everyone, my name is Mic.
A fan who has been working for 5 years and writes that he is proficient in Kafka on his resume.
The result was a direct slap in the face during the interview.
The interviewer asked him: "What is an ISR and why do I need to design an ISR"
Then he looked at the interviewer with a bewildered expression.
Check out the answers from ordinary people and experts below.

Ordinary people:

ISR seems to be a mechanism in Kafka.

Why to introduce it should be related to data synchronization.

Expert:

OK, I need to answer this question from several perspectives.

First, the messages sent to the Kafka Broker are finally stored on the disk in the physical form of Partition.

In order to ensure the reliability of Parititon, Kafka provides a copy mechanism of Paritition, and then in these Partition replica sets.

There are Leader Partition and Flollower Partition.

The message sent by the producer will be stored in the Leader Partition first, and then the message will be copied to the Follower Partition.

The advantage of this design is that once the node where the Leader Partition is located hangs, a new Leader can be re-elected from the remaining Partition copies.

Consumers can then continue to obtain unconsumed data from the new Leader Partition.

In the Partition multi-copy design scheme, there are two key requirements.

Synchronization of replica data
Election of a new leader

Both of these requirements need to involve network communication. In order to avoid performance problems caused by network communication delay, Kafka,

And as far as possible to ensure that the data in the newly elected Leader Partition is the latest, so a scheme such as ISR is designed.

The full name of ISR is in-sync replica, which is a set list, which stores the Follower Partition closest to the Leader Partition node data

If the data in a Follower Partition lags too much behind the Leader, it will be removed from the ISR list.

To put it simply, the synchronized data of the nodes in the ISR list must be the latest, so the subsequent leader election only needs to be filtered from the ISR list.

So, I think there are two reasons for introducing the ISR scheme

Ensure the efficiency of data synchronization as much as possible, because nodes with low synchronization efficiency will be kicked out of the ISR list.
Avoid data loss, because the node data in the ISR is the closest to the Leader copy.

The above is my understanding of the problem.

Summarize

In my opinion, this question is very valuable for research.

Generally speaking, the synchronization of replica data is nothing more than synchronous blocking or asynchronous non-blocking.

However, these two solutions, either bring performance problems or data loss problems, are not particularly suitable.

And ISR solves this problem perfectly. In the actual process, we can also learn from similar design ideas.

Friends who like my works, remember to like, favorite and follow.

file

Copyright notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated. Please indicate the source for Mic带你学架构 !
If this article is helpful to you, please help to follow and like, your persistence is the driving force for my continuous creation. Welcome to follow the WeChat public account of the same name to get more technical dry goods!

[Java Interview] What is ISR and why do I need to introduce ISR

Ordinary people:

Expert:

Summarize

跟着Mic学架构

引用和评论

【Java面试】大厂裁员，小厂倒闭，如何搞定面试官Java SPI是什么？有什么用？

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性