background
I haven't shared Java-related troubleshooting for a long time. Recently, I helped my colleagues to investigate a problem together:
When using Pulsar
consume, the same message is consumed repeatedly.
Check
When he told me about this phenomenon, I was skeptical. Based on previous experience, Pulsar explained it in the official documentation and API:
Only when the consumption ackTimeout
is set and the consumption is overtime will the message be repeatedly delivered. It is turned off by default, and the viewing code is indeed not turned on.
Could it be that the negativeAcknowledge()
method was called (calling this method will also trigger redelivery), because we used a third-party library https://github.com/majusko/pulsar-java-spring-boot-starter This method will only be called when an exception is thrown.
After reviewing the code, there is no place to throw an exception, and I don't even see an exception occur during the entire process; this is a bit weird.
recurrent
In order to understand the ins and outs of the whole thing, I learned about his use process in detail;
In fact, bug
appeared in the business. He debug
and then single-step debugging when the message was consumed. After one debugging, he received the same message again shortly after.
But the strange thing is that it is not possible to repeat consumption every time after debug
. We all say that if a bug
can be completely reproduced 100%, it will basically solve more than half of it.
So the first step in our investigation is to fully reproduce the problem.
In order to rule out the problem of IDEA (although the probability is unlikely), since it is a problem caused by sleep
, it is actually debug
when converted to the code, so we plan to directly sleep
in the consumption logic to see if it can be recovered. now.
After testing, sleep
could not be reproduced for a few seconds to tens of seconds, and finally sleep
one minute, a magical thing happened, and it was successfully reproduced every time!
Since it can be successfully reproduced, it is easy to say, because my own business code also uses Pulsar
, so I plan to reproduce it in my own project for the convenience of debugging.
As a result, the weird thing happened again, and I can't reproduce it here.
Although this is expected, it cannot be adjusted.
Based on the premise of believing in modern science, the only difference between the two of us is that the projects are different, so I compared the codes on both sides.
@PulsarConsumer(
topic = xx,
clazz = Xx.class,
subscriptionType = SubscriptionType.Shared
)
public void consume(Data msg) {
log.info("consume msg:{}", msg.getOrderId());
Lock lock = redisLockRegistry.obtain(msg.getOrderId());
if (lock.tryLock()) {
try {
orderService.do(msg.getOrderId());
} catch (Exception e) {
log.error("consumer msg:{} err:", msg.toString(), e);
} finally {
lock.unlock();
}
}
}
As expected, the code on the colleague's side is locked; it is a distributed lock based on Redis. At this time, when I slap my thigh, it will not be unlocked and the timeout will cause an exception to be thrown.
In order to verify this problem, I made a breakpoint at the consumption of Pulsar
of the framework on the basis of reproducibility:
Sure enough, the case was solved, and the abnormal prompt was very clear: the timeout period for locking has passed.
After entering the exception, the message was directly negative
, and the exception was also eaten, so it was not found before.
After checking the source code of RedisLockRegistry
, the default timeout is exactly one minute, so we could not reproduce this problem for tens of seconds before sleep
.
Summarize
Afterwards, I asked my colleague why the lock was added here, because I saw that there was no need for a lock at all; it turned out that he added it because of the code copied from others, and he didn't think much of it at all.
So there are some lessons to be learned from this:
- Although ctrl C/V is convenient, you have to fully consider your own business scenarios.
- When using some third-party APIs, you need to fully understand their functions and parameters.
Your likes and shares are the greatest support for me
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。