[Java interview] How does Kafka avoid repeated consumption

Hi everyone, my name is Mic
A fan who worked for 5 years found me.
He said: "Mr. Mic, if you can answer this question, I will admire you"
I was stunned on the spot, is the bet so casual now?
I asked him what the problem was and he said "How does Kafka avoid the problem of double consumption!"
Check out the answers from ordinary people and experts below!

Ordinary people:

How Kafka avoids repeated consumption is that we can use a design similar to distributed locks at the end of that message consumption.

When I consume a message, I can directly use an instruction such as setNx in redis, and then save the message in redis, and then if it is sent repeatedly, I just need to judge whether the redis exists or not. All right.

Expert:

OK, I will answer this question from several aspects.

First of all, the messages stored on the Kafka Broker have an Offset mark.

Then the consumer of Kafka maintains the currently consumed data through the offSet mark,

Every time a batch of data is consumed, Kafka Broker will update the value of OffSet to avoid repeated consumption.

By default, after the message is consumed, the value of Offset will be automatically submitted to avoid repeated consumption.

The auto-commit logic of the Kafka consumer has a default interval of 5 seconds, that is to say, it is committed when the next message is pulled from the Broker after 5 seconds.

Therefore, in the process of consumer consumption, the application is forcibly killed or shut down, which may cause the Offset not to be submitted, resulting in the problem of repeated submission.

In addition, there is another situation where repeated consumption will occur.

There is a Partition Balance mechanism in Kafka, which is to distribute multiple Partitions to multiple consumers in a balanced manner.

The Consumer will consume messages from the allocated Partition. If the Consumer cannot process this batch of messages within the default 5 minutes.

It will trigger Kafka's Rebalance mechanism, which will cause Offset to fail to commit automatically.

After rebalancing, the Consumer will still start consumption from the Offset position that was not submitted before, which will also lead to the problem of repeated consumption of messages.

Based on this background, I think there are several ways to solve the problem of repeated consumption of messages.

Improve the processing performance of the consumer to avoid triggering the Balance. For example, messages can be processed in an asynchronous manner to shorten the market for single message consumption. Or you can adjust the timeout for message processing. It can also reduce the number of pieces of data pulled from the Broker at one time.
You can generate md5 for the message and save it in mysql or redis. Before processing the message, go to mysql or redis to determine whether it has been consumed. This solution actually uses the idea of idempotency.

The above is my understanding of the problem.

Summarize

The problem of repeated consumption is very important. If it is not considered, there will be online data problems.

Therefore, during the interview, these questions can also test the technical ability and practical ability of the applicant.

In addition, about the issue of idempotency, I talked about it in the previous video, you can find it yourself.

Friends who like my works remember to like, favorite and follow.

file

Copyright notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise stated. Please indicate the source for Mic带你学架构 !
If this article is helpful to you, please help to follow and like, your persistence is the driving force for my continuous creation. Welcome to follow the WeChat public account of the same name to get more technical dry goods!

[Java interview] How does Kafka avoid repeated consumption

Ordinary people:

Expert:

Summarize

跟着Mic学架构

引用和评论

【Java面试】大厂裁员，小厂倒闭，如何搞定面试官Java SPI是什么？有什么用？

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性