12
头图

Interviewer : you like to talk about the message queue today? I see Kafka is written in many places in your project

Candidate : Hmm

Interviewer : Then you can briefly explain the scenario where you use Kafka

Candidate : The purpose of using message queues in general can have three situations: decoupling, asynchronous and peak clipping

Candidate : Take the example of my project. I now maintain a message management platform system and provide external interfaces for various business parties to call

candidate : After they called the interface, they actually sent a message "not synchronously".

Candidate : In the interface processing layer, the message is just put on the message queue, and then the result is directly returned to the interface caller.

candidate : The advantage of this is:

candidate : 1. The throughput of the interface will be greatly improved (because no actual call is made, the interface RT will be very low) [asynchronous]

Candidate : 2. Even if there is a large number of message call interfaces, the system will not be affected (the traffic is carried by the message queue) [peak clipping]

Interviewer : Well...

Candidate : For another example, I have another project here, which is advertising order attribution project. The main thing I do is to get order data and calculate the corresponding commission for each business advertisement.

candidate : the order data is taken from the message queue

candidate : The advantages of this design are:

candidate : 1. The students of the trading team only need to write the order message to the message queue, and the topic of the order data is consumed and used by each business party [Decoupling] [Asynchronous]

candidate : 2. Even if the order QPS increases sharply, there is not much perception of the downstream business (because the downstream business only consumes the data of the message queue, it will not directly affect the performance of the machine) [peak cut]

Interviewer : Well, then I want to ask, Why do you think the message queue can be peaked?

Interviewer : Or another way of asking, Why can Kafka carry such a large QPS?

candidate : The "core" function of the message queue is to store the produced data, and then read the data for each business.

Candidate : It is not the same as when we process requests: During business processing, other people's interfaces may be adjusted, and it may be necessary to check the database... etc. and a series of operations.

candidate : Kafka has made a lot of optimizations in the process of "storing" and "reading"

Candidate few examples, for example:

Candidate : When we send or read a message to a topic, there are actually multiple Partitions processing [parallel]

Candidate : When storing messages, Kafka internally writes to the disk sequentially, and uses the operating system buffer to improve performance [append+cache]

candidate : also reduce the number of CPU copies of files in reading and writing data [zero copy]

Interviewer : Well, since you mentioned reducing the number of CPU copies of files, can you tell me about this technology?

candidate : Well, yes, it is actually zero-copy technology.

candidate : For example, when we call the read function normally, the following steps will occur:

candidate : 1. DMA copies the disk to the read kernel cache

candidate : 2. The CPU copies the data read from the kernel buffer to user space

candidate : When the write function is called normally, the following steps will occur:

candidate : 1. The CPU copies the user space data to the Socket kernel buffer

candidate : 2. DMA copies the data in the Socket kernel buffer to the network card

candidate : It can be found that 2 DMA copies and 2 CPU copies are required to complete "one read and write". The DMA copy cannot be saved. The so-called zero copy technology is to save the copy of the CPU.

candidate : And in order to avoid the user process directly operating the kernel, to ensure the security of the kernel, when the application calls the system function, the context switch will occur (the above process will happen 4 times in total)

Interviewer :...

candidate : The current zero-copy technologies mainly include mmap and sendfile, these two technologies will reduce context switching and CPU copy to a certain extent

candidate : For example: mmap is to map the address of the read buffer to the address of the user space to realize the sharing of the read kernel buffer and the application buffer

candidate : thereby reducing a CPU copy from the read buffer to the user buffer

candidate : The last read and write using mmap can be simplified to:

candidate : 1. DMA copies the hard disk data to the read kernel buffer.

candidate : 2. The CPU copies the read kernel buffer to the Socket kernel buffer.

candidate : 3. DMA copies the Socket kernel buffer to the network card

Candidate : Because the read kernel buffer is mapped with user space, it saves a CPU copy

Interviewer : Well...

candidate : And sendfile+DMA Scatter/Gather is to read the file descriptor/length information of the kernel buffer area to the Socket kernel buffer area to achieve zero copy of the CPU

Candidate : Use sendfile+DMA Scatter/Gather to read and write once, which can be simplified to:

candidate : 1. DMA copies the hard disk data to the read kernel buffer.

Candidate : Second, the CPU sends the file descriptor and length information of the read buffer to the Socket buffer.

candidate : 3. DMA copies data from the read kernel buffer to the network card according to the file descriptor and data length

candidate : back to Kafka

candidate : From Producer->Broker, Kafka is to persist the data of the network card to the hard disk, using mmap (reduced from 2 CPU copies to 1)

candidate : From Broker->Consumer, Kafka sends data from the hard disk to the network card, using sendFile (to achieve CPU zero copy)

Interviewer : I'll interrupt a little bit, I'm still busy, let me summarize what you said

Interviewer : The reason you use Kafka is for asynchrony, peak clipping, and decoupling

Interviewer : The reason Kafka can be so fast is to achieve parallelism, make full use of operating system cache, sequential write and zero copy

Interviewer : Right?

candidate : Well

Interviewer continue next time, I’m a bit busy here

img

Welcome to follow my WeChat public [16125c05659001 Java3y ] to talk about Java interview

[Online Interviewer-Mobile] The series updated twice a week!
[Online Interviewer-Computer] The series Two articles a week are being continuously updated!

Original is not easy! ! Seek three links! !


Java3y
12.9k 声望9.2k 粉丝