Interviewer : How about you tell me what you have been watching recently? You can pull it out and discuss together
Candidate : Recently I am looking at "de-duplication" and "idempotency" related content
Interviewer : Then you will first talk about your understanding of "de-
candidate : I think "idempotence" and "de-duplication" are very similar, and I can't tell the strict difference between them
candidate : Let me talk about my personal understanding, I don’t know if it’s right
candidate : "Deduplication" means deduplication of a request or message within a "certain time" "N times"
Candidate : "idempotency" is to ensure that the request or message is processed in "any time", and it is necessary to ensure that its results are consistent
Candidate : Whether it is "de-duplication" or "idempotency", it is necessary to have a "unique Key", and there is a place to "store" the unique Key
candidate : Take the project as an example, the "message management platform" I maintain has the function of "de-duplication": "5 minutes for the same content message deduplication", "template deduplication within 1 hour", "channels reach N in one day" Sub-threshold de-duplication"...
candidate : once again emphasize the essence of "idempotence" and "de-duplication": "Unique Key" + "Storage"
Interviewer : Then how did you do it
candidate : different business scenarios, the only key is different, it is determined by the business
candidate : There are many storage options, such as "local cache"/"Redis"/"MySQL"/"HBase", etc. The specific selection is also related to the business
candidate : For example, in the scenario of "message management platform", I store the selected "Redis" (with superior read and write performance), and Redis also has an "expiration time" to facilitate the problem of "a certain period of time"
candidate : The unique Key is naturally constructed differently according to different businesses.
candidate : For example, "5 minutes to remove duplicate messages with the same content", I directly MD5 request parameters as the only Key. "One-hour template de-duplication" uses "template ID+userId" as the unique key, and "channel de-duplication within one day" uses "channel ID+userId" as the only Key...
Interviewer : Now that "reduction" is mentioned, have you heard of Bloom filters?
Candidate : I know it naturally
Interviewer : talk about Bloom filters, why don’t you use them?
candidate : The underlying data structure of the Bloom filter can be understood as a bitmap, which can also be simply understood as an array. The elements only store 0 and 1, so it occupies a relatively small space
candidate : When an element is to be stored in the bitmap, it is actually to see where it is stored in the bitmap. At this time, the hash algorithm is generally used, and the stored position is marked as 1.
Candidate : The position marked with 1 indicates that it exists, and the position marked with 0 indicates that it does not exist
candidate : Bloom filter can judge the existence of elements with a lower space occupation and then be used for deduplication, but it also has corresponding shortcomings
candidate : As long as the hash algorithm is used, "hash conflict" is indispensable, leading to "misjudgment"
Candidate : In the Bloom filter, if an element is judged to exist, then the element "may not" actually exist. If the element is judged to be non-existent, it must be non-existent
Candidate : I shouldn't need to explain this, right? (Combining the "hash algorithm" and "the position marked with 1 indicates that it exists, and the position marked with 0 indicates that it does not exist", the above conclusion can be drawn)
candidate : Bloom filters can’t "delete" elements either (this is also a limitation of the hash algorithm, in which Bloom filters cannot accurately locate an element)
candidate : If you want to use it, the implementation of the Bloom filter can be directly implemented by Guava, but this is a stand-alone
Candidate : The distributed Bloom filter will generally use Redis now, but not every company will deploy the Redis version of Bloom filter (there are still limitations, like my previous company did not have)
Candidate : Therefore, the projects I am currently in charge of are not using Bloom filters (:
candidate : If the "de-duplication" overhead is relatively large, consider establishing a "multi-layer filtering" logic
candidate : For example, let’s first see if the "local cache" can filter a part, and the remaining "strong verification" is handed over to the "remote storage" (common Redis or DB) for secondary filtering
Interviewer : Well, then I remember the last time you answered Kafka
Interviewer : At that time you said that at least one + idempotence was achieved when processing orders
Interviewer : In idempotent processing: Redis is used for pre-filtering, and DB unique index is used for strong consistency check, which is also to improve performance, right?
Interviewer : The only Key seems to be "order number + order status"
Candidate : Interviewer Your memory is really good!
Candidate : Generally, we need to check the data consistency and go directly to MySQL (DB). After all, there is transaction support
candidate : "local cache" if the business is suitable, it can be used as a "front" judgment
candidate : Redis high-performance reading and writing, both pre-judgment and post-position (:
candidate : HBase is generally used in scenarios with large amounts of data (Redis memory is too expensive, DB is not flexible enough and it is not suitable for storing large amounts of data in a single table)
candidate : As for idempotence, the general storage is "Redis" and "database"
candidate : The most common one is the "unique index" of the database to achieve idempotence (several of the projects I am responsible for use this)
Candidate : Building a "unique key" is a business-related thing (: generally use your own business ID for splicing to generate a "meaningful" unique key
candidate : Of course, "Redis" and "MySQL" can also be used to implement distributed locks to achieve idempotence (:
candidate 16195a59e5ad95: However, Redis distributed locks cannot fully guarantee security, and MySQL implements distributed locks (optimistic locks and pessimistic locks still
Candidate : There are many solutions to achieve "idempotence" on the Internet, which essentially revolve around "storage" and "unique Key" made some variants, and then took a name...
candidate 16195a59e5adcd: In general, change the
Interviewer : Um...understood
Welcome to follow my WeChat public [16195a59e5ae39 Java3y ] to talk about Java interviews. The online interviewer series is being updated continuously!
[Online Interviewer-Mobile] The series updated twice a week!
[Online Interviewer-Computer] The series updated twice a week!
Originality is not easy! ! Seek three links! !
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。