How to achieve idempotence and de-duplication in the backend?

Interviewer : How about you tell me what you have been watching recently? You can pull it out and discuss together

Candidate : Recently I am looking at "de-duplication" and "idempotency" related content

Interviewer : Then you will first talk about your understanding of "de-

candidate : I think "idempotence" and "de-duplication" are very similar, and I can't tell the strict difference between them

candidate : Let me talk about my personal understanding, I don’t know if it’s right

candidate : "Deduplication" means deduplication of a request or message within a "certain time" "N times"

Candidate : "idempotency" is to ensure that the request or message is processed in "any time", and it is necessary to ensure that its results are consistent

Candidate : Whether it is "de-duplication" or "idempotency", it is necessary to have a "unique Key", and there is a place to "store" the unique Key

candidate : Take the project as an example, the "message management platform" I maintain has the function of "de-duplication": "5 minutes for the same content message deduplication", "template deduplication within 1 hour", "channels reach N in one day" Sub-threshold de-duplication"...

candidate : once again emphasize the essence of "idempotence" and "de-duplication": "Unique Key" + "Storage"

Interviewer : Then how did you do it

candidate : different business scenarios, the only key is different, it is determined by the business

candidate : There are many storage options, such as "local cache"/"Redis"/"MySQL"/"HBase", etc. The specific selection is also related to the business

candidate : For example, in the scenario of "message management platform", I store the selected "Redis" (with superior read and write performance), and Redis also has an "expiration time" to facilitate the problem of "a certain period of time"

candidate : The unique Key is naturally constructed differently according to different businesses.

candidate : For example, "5 minutes to remove duplicate messages with the same content", I directly MD5 request parameters as the only Key. "One-hour template de-duplication" uses "template ID+userId" as the unique key, and "channel de-duplication within one day" uses "channel ID+userId" as the only Key...

Interviewer : Now that "reduction" is mentioned, have you heard of Bloom filters?

Candidate : I know it naturally

Interviewer : talk about Bloom filters, why don’t you use them?

candidate : The underlying data structure of the Bloom filter can be understood as a bitmap, which can also be simply understood as an array. The elements only store 0 and 1, so it occupies a relatively small space

candidate : When an element is to be stored in the bitmap, it is actually to see where it is stored in the bitmap. At this time, the hash algorithm is generally used, and the stored position is marked as 1.

Candidate : The position marked with 1 indicates that it exists, and the position marked with 0 indicates that it does not exist

candidate : Bloom filter can judge the existence of elements with a lower space occupation and then be used for deduplication, but it also has corresponding shortcomings

candidate : As long as the hash algorithm is used, "hash conflict" is indispensable, leading to "misjudgment"

Candidate : In the Bloom filter, if an element is judged to exist, then the element "may not" actually exist. If the element is judged to be non-existent, it must be non-existent

Candidate : I shouldn't need to explain this, right? (Combining the "hash algorithm" and "the position marked with 1 indicates that it exists, and the position marked with 0 indicates that it does not exist", the above conclusion can be drawn)

candidate : Bloom filters can’t "delete" elements either (this is also a limitation of the hash algorithm, in which Bloom filters cannot accurately locate an element)

candidate : If you want to use it, the implementation of the Bloom filter can be directly implemented by Guava, but this is a stand-alone

Candidate : The distributed Bloom filter will generally use Redis now, but not every company will deploy the Redis version of Bloom filter (there are still limitations, like my previous company did not have)

Candidate : Therefore, the projects I am currently in charge of are not using Bloom filters (:

candidate : If the "de-duplication" overhead is relatively large, consider establishing a "multi-layer filtering" logic

candidate : For example, let’s first see if the "local cache" can filter a part, and the remaining "strong verification" is handed over to the "remote storage" (common Redis or DB) for secondary filtering

Interviewer : Well, then I remember the last time you answered Kafka

Interviewer : At that time you said that at least one + idempotence was achieved when processing orders

Interviewer : In idempotent processing: Redis is used for pre-filtering, and DB unique index is used for strong consistency check, which is also to improve performance, right?

Interviewer : The only Key seems to be "order number + order status"

Candidate : Interviewer Your memory is really good!

Candidate : Generally, we need to check the data consistency and go directly to MySQL (DB). After all, there is transaction support

candidate : "local cache" if the business is suitable, it can be used as a "front" judgment

candidate : Redis high-performance reading and writing, both pre-judgment and post-position (:

candidate : HBase is generally used in scenarios with large amounts of data (Redis memory is too expensive, DB is not flexible enough and it is not suitable for storing large amounts of data in a single table)

candidate : As for idempotence, the general storage is "Redis" and "database"

candidate : The most common one is the "unique index" of the database to achieve idempotence (several of the projects I am responsible for use this)

Candidate : Building a "unique key" is a business-related thing (: generally use your own business ID for splicing to generate a "meaningful" unique key

candidate : Of course, "Redis" and "MySQL" can also be used to implement distributed locks to achieve idempotence (:

candidate 16195a59e5ad95: However, Redis distributed locks cannot fully guarantee security, and MySQL implements distributed locks (optimistic locks and pessimistic locks still

Candidate : There are many solutions to achieve "idempotence" on the Internet, which essentially revolve around "storage" and "unique Key" made some variants, and then took a name...

candidate 16195a59e5adcd: In general, change the

Interviewer : Um...understood

Welcome to follow my WeChat public [16195a59e5ae39 Java3y ] to talk about Java interviews. The online interviewer series is being updated continuously!

[Online Interviewer-Mobile] The series updated twice a week!
[Online Interviewer-Computer] The series updated twice a week!

Originality is not easy! ! Seek three links! !

How to achieve idempotence and de-duplication in the backend?

Java3y

引用和评论

呵，老板不过如此，SQL还是得看我

Java8的新特性

Java11的新特性

Java5的新特性

Java9的新特性

Java13的新特性

Java7的新特性