Hi everyone, my name is Mic.
A fan who has been working for 5 years and writes that he is proficient in Kafka on his resume.
The result was a direct slap in the face during the interview.
The interviewer asked him: "What is an ISR and why do I need to design an ISR"
Then he looked at the interviewer with a bewildered expression.
Check out the answers from ordinary people and experts below.
Ordinary people:
ISR seems to be a mechanism in Kafka.
Why to introduce it should be related to data synchronization.
Expert:
OK, I need to answer this question from several perspectives.
First, the messages sent to the Kafka Broker are finally stored on the disk in the physical form of Partition.
In order to ensure the reliability of Parititon, Kafka provides a copy mechanism of Paritition, and then in these Partition replica sets.
There are Leader Partition and Flollower Partition.
The message sent by the producer will be stored in the Leader Partition first, and then the message will be copied to the Follower Partition.
The advantage of this design is that once the node where the Leader Partition is located hangs, a new Leader can be re-elected from the remaining Partition copies.
Consumers can then continue to obtain unconsumed data from the new Leader Partition.
In the Partition multi-copy design scheme, there are two key requirements.
- Synchronization of replica data
- Election of a new leader
Both of these requirements need to involve network communication. In order to avoid performance problems caused by network communication delay, Kafka,
And as far as possible to ensure that the data in the newly elected Leader Partition is the latest, so a scheme such as ISR is designed.
The full name of ISR is in-sync replica, which is a set list, which stores the Follower Partition closest to the Leader Partition node data
If the data in a Follower Partition lags too much behind the Leader, it will be removed from the ISR list.
To put it simply, the synchronized data of the nodes in the ISR list must be the latest, so the subsequent leader election only needs to be filtered from the ISR list.
So, I think there are two reasons for introducing the ISR scheme
- Ensure the efficiency of data synchronization as much as possible, because nodes with low synchronization efficiency will be kicked out of the ISR list.
- Avoid data loss, because the node data in the ISR is the closest to the Leader copy.
The above is my understanding of the problem.
Summarize
In my opinion, this question is very valuable for research.
Generally speaking, the synchronization of replica data is nothing more than synchronous blocking or asynchronous non-blocking.
However, these two solutions, either bring performance problems or data loss problems, are not particularly suitable.
And ISR solves this problem perfectly. In the actual process, we can also learn from similar design ideas.
Friends who like my works, remember to like, favorite and follow.
Copyright notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated. Please indicate the source forMic带你学架构
!
If this article is helpful to you, please help to follow and like, your persistence is the driving force for my continuous creation. Welcome to follow the WeChat public account of the same name to get more technical dry goods!
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。