What is Kafka and what scenarios is it mainly used in?

1. What is kafka?

Kafka is a distributed publish/subscribe messaging system developed by LinkedIn, written in Scala, and it is widely used with horizontal scalability and high throughput.

2. Generate background

Kafka is a messaging system used as the basis for LinkedIn's Activity Stream and operational data processing pipeline (Pipeline). Activity flow data is the most common part of the data that almost all sites use when reporting on their website usage.

Activity data includes information about page views (Page View), content being viewed, and search conditions. The usual way of processing this kind of data is to first write various activities into a certain file in the form of a log, and then perform statistical analysis on these files periodically.

Operational data refers to the performance data of the server (CPU, IO usage rate, request time, service log, etc.). There are many kinds of statistical methods for operating data.

3. Basic structure diagram

4. Explanation of basic concepts

Broker

A Kafka cluster contains one or more servers, which are called brokers. The broker does not maintain the consumption status of data, which improves performance. Direct use of disks for storage, linear read and write, fast speed: avoiding data copying between JVM memory and system memory, reducing performance-consuming object creation and garbage collection.

Producer

Responsible for publishing messages to Kafka broke

Consumer

The message consumer, the client that reads the message from the Kafka broker, and the consumer pulls data from the broker and processes it.

Topic

Each message published to the Kafka cluster has a category, and this category is called Topic. (Physically different Topic messages are stored separately. Logically, although a Topic message is stored on one or more brokers, users only need to specify the Topic of the message to produce or consume data without worrying about where the data is stored.)

Partition

Parition is a physical concept, and each topic contains one or more Partitions.

Consumer Group

Each Consumer belongs to a specific Consumer Group (you can specify a group name for each Consumer, if you do not specify a group name, it belongs to the default group)

Topic & Partition

Topic can be considered as a queue logically, and each consumer must specify its Topic, which can be simply understood as the need to specify which queue to put this message into. In order to increase the throughput rate of Kafka linearly, Topic is physically divided into one or more Partitions, and each Partition physically corresponds to a folder, under which all messages and index files of this Partition are stored.

If you create two topics, topic1 and topic2, with 13 and 19 partitions, respectively, a total of 32 folders will be generated on the entire cluster (the cluster used in this article has a total of 8 nodes, where topic1 and topic2 replication-factor are both As 1).

5. Applicable scenarios

Messaging

For some conventional messaging systems, Kafka is a good choice; partitons/replication and fault tolerance can make Kafka have good scalability and performance advantages. However, so far, we should clearly realize that Kafka does not provide JMS "Transactional", "message transmission guarantee (message confirmation mechanism), "message grouping" and other enterprise-level features; Kafka can only be used as a "regular" messaging system. To a certain extent, it has not yet ensured that the sending and receiving of messages is absolutely reliable ( For example, message retransmission, message transmission loss, etc.)

Website activity tracking

Kafka can be used as the best tool for "website activity tracking"; it can send webpages/user operations and other information to kafka. Real-time monitoring, or offline statistical analysis, etc.

Metrics

Kafka is usually used for actionable monitoring data. This includes aggregate statistics from distributed applications used to produce centralized operational data feeds.

Log Aggregation

The characteristics of kafka determine that it is very suitable as a "log collection center"; applications can send operational logs to the Kafka cluster in "batch" and "asynchronous", instead of storing them locally or in the DB; Kafka can submit messages in batches/compress messages, etc. For the producer side, there is almost no performance expense. At this time, the consumer side can make hadoop and other systematic storage and analysis systems.

Finally, pay attention to the technical road of the official account of migrant workers, and you can get the message queue, middleware-related technical articles, and interview questions that I have compiled, which is very complete.

Link: https://blog.csdn.net/code52/article/details/50475511

What is Kafka and what scenarios is it mainly used in?

1. What is kafka?

2. Generate background

3. Basic structure diagram

4. Explanation of basic concepts

Broker

Producer

Consumer

Topic

Partition

Consumer Group

Topic & Partition

5. Applicable scenarios

Messaging

Website activity tracking

Metrics

Log Aggregation

民工哥

引用和评论

早知道有这么个吊炸天的 CI&CD 工具，我就不用 Jenkins 了！

C++ 中 VS 项目引入公共配置文件

大数据从业者必知必会的Hive SQL调优技巧

疯狂推荐！从零开始 Dify 部署全攻略！

Cherry Studio 入门 MCP：为你的大模型插上翅膀

狂揽17k star！Docker可视化神器，一键部署项目真香！

【成功解决】JetBrains PyCharm 激活提示 “Key is invalid” (秘钥无效) 的终极解决方案