7 reasons to choose Pulsar over Kafka

The following article is from the AI frontline, the author is translated by the AI frontline team

This article was originally created by the WeChat public account "AI Frontline" (ID: ai-front) and may not be reproduced without authorization.

Author | Chris Bartholomew
Translator | Ignorance
Edit | Natalie

AI frontline guide: For developers of cloud-native distributed applications, they need a solution in order to focus more on application and microservice development, rather than waste time dealing with complex messaging infrastructure Help manage these infrastructures.
The first step in building a messaging infrastructure is to select the appropriate messaging middleware technology. There are many alternatives, from various open source frameworks (such as RabbitMQ, ActiveMQ, NATS) to some commercial products (such as IBM MQ or RedHat AMQ). In addition, we also have Kafka. However, we did not use Kafka in the end, but chose Pulsar.

Why did you choose Pulsar in the end? Here are 7 reasons to choose Pulsar over Kafka.

1. Combination of stream processing and queue

Pulsar is like a two-in-one product. It can not only handle high-rate real-time scenarios like Kafka, but also supports standard message queue modes, such as multi-consumer, failover subscription, and message fan-out. Pulsar will automatically track the reading position of the client and save this information in a high-performance distributed ledger (BookKeeper).

Unlike Kafka, Pulsar has the functions of traditional message queues (such as RabbitMQ). Therefore, you only need to run a Pulsar system to process real-time streams and message queues at the same time.

2. Partition is supported, but not required

If you have used Kafka, you must know what partitioning is all about. All topics in Kafka are partitioned, which can increase throughput. By partitioning and then dividing into different brokers, the processing rate of a single topic can be greatly improved. But what if certain topics do not require too high a processing rate? For this kind of situation, wouldn't it be better if you can ignore the partitioning and avoid the API and management work that comes with it?

Pulsar can do it. If you only need one topic, you can use one topic without using partitions. If you need to maintain the processing rate of multiple consumer instances, and do not need to use partitions, Pulsar's shared subscription can achieve this goal.

If partitioning is really needed to further improve performance, Pulsar can also support the use of partitions.

3. The log is good, but ledger is even better

The Kafka development team foresaw the importance of logs for a real-time data exchange system. The log is written to the system by appending, and the writing speed is very fast. The data in the log is serial, and the data can be read quickly in the order of writing. Compared to random reading and writing, serial reading and writing are faster. For any system that provides data assurance, the interaction of persistent storage is a bottleneck, and log abstraction maximizes the efficiency in this area.

Logs are good, but when the amount of data is too large, it will also bring us some trouble. Keeping all logs on a single server has become a challenge. What should I do after the logs fill up the server storage? How to expand? Or the server that saves the log goes down and needs to recreate a new server from the copy, what should I do? Copying logs from one server to another server takes a long time, especially when you want to keep the real-time data of the system at the same time, it is even more difficult to complete this operation.

Pulsar segments the log to avoid copying large chunks of the log. Through BookKeeper, Pulsar divides the logs into multiple different servers. In other words, the log will not be stored on a single server, and any server will not become the bottleneck of the entire system. This makes fault handling and capacity expansion easier, and only requires adding new servers without rebalancing.

4. Stateless

For cloud native application developers, the favorite thing is stateless. Stateless components are fast to start, replaceable, and can achieve seamless expansion. Wouldn't it be better if the message middleware was also stateless?

Kafka is not stateless. Each broker contains all the logs of the partition. If a broker goes down, not all brokers can take over its work. If the workload is too high, you cannot add new brokers to share at will, but must synchronize the state with the broker holding its partition copy.

In the Pulsar architecture, the broker is stateless. But a completely stateless system cannot persist messages, so Pulsar does not rely on brokers to achieve message persistence. In the Pulsar architecture, data distribution and storage are independent of each other. The broker receives data from the producer, and then sends the data to the consumer, but the data is stored in BookKeeper.

Pulsar's broker is stateless, so if the workload is high, you can directly add a new broker to quickly take over the workload.

5. Simple cross-domain replication

Cross-domain replication is Pulsar's specialty. Pulsar considered this feature at the beginning of its design, and it is easy to configure. Whether it is a globally distributed application or a disaster recovery solution, it can be done through Pulsar.

6. Stable performance

The benchmark test ( http://openmessaging.cloud/docs/benchmarks/pulsar/) shows that Pulsar can provide high throughput while maintaining low latency.

7. Fully open source

Pulsar provides many features similar to Kafka, such as cross-domain replication, streaming message processing (Pulsar Functions), connectors (Pulsar IO), SQL-based topic queries (Pulsar SQL), schema registry, and some Kafka does not Features such as tiered storage and multi-tenancy. Even better, these features are open source.

in conclusion

Above, we have many reasons to choose Pulsar to build messaging infrastructure services. In addition to the above reasons, other features of Pulsar also bring a lot of convenience, such as multi-tenancy, namespace, authentication and authorization, documentation, friendly support for Kubernetes, and so on.

Original English:
https://kafkaesque.io/7-reasons-we-choose-apache-pulsar-over-apache-kafka/

more updates and dry goods sharing about Apache Pulsar, please follow the Apache Pulsar official account!

7 reasons to choose Pulsar over Kafka

1. Combination of stream processing and queue

2. Partition is supported, but not required

3. The log is good, but ledger is even better

4. Stateless

5. Simple cross-domain replication

6. Stable performance

7. Fully open source

in conclusion

ApachePulsar

引用和评论

深入解析 Apache BookKeeper 系列：第二篇 — 写操作原理

祝贺陈梓立(Tison)当选 2025 年度 Apache 软件基金会董事会

K8s 小白入门｜从电影配乐谈起，聊聊容器编排和 K8s

架构设计不合理，如何优化系统结构

深入浅出微服务基础设施：服务架构的演进历史

MCP协议重大升级，Spring AI Alibaba联合Higress发布业界首个Streamable HTTP实现方案

一键实现 Oracle 数据整库同步至 Apache Doris