The following article is from the AI frontline, the author is translated by the AI frontline team
This article was originally created by the WeChat public account "AI Frontline" (ID: ai-front) and may not be reproduced without authorization.
Author | Chris Bartholomew
Translator | Ignorance
Edit | Natalie
AI frontline guide: For developers of cloud-native distributed applications, they need a solution in order to focus more on application and microservice development, rather than waste time dealing with complex messaging infrastructure Help manage these infrastructures.
The first step in building a messaging infrastructure is to select the appropriate messaging middleware technology. There are many alternatives, from various open source frameworks (such as RabbitMQ, ActiveMQ, NATS) to some commercial products (such as IBM MQ or RedHat AMQ). In addition, we also have Kafka. However, we did not use Kafka in the end, but chose Pulsar.
Why did you choose Pulsar in the end? Here are 7 reasons to choose Pulsar over Kafka.
1. Combination of stream processing and queue
Pulsar is like a two-in-one product. It can not only handle high-rate real-time scenarios like Kafka, but also supports standard message queue modes, such as multi-consumer, failover subscription, and message fan-out. Pulsar will automatically track the reading position of the client and save this information in a high-performance distributed ledger (BookKeeper).
Unlike Kafka, Pulsar has the functions of traditional message queues (such as RabbitMQ). Therefore, you only need to run a Pulsar system to process real-time streams and message queues at the same time.
2. Partition is supported, but not required
If you have used Kafka, you must know what partitioning is all about. All topics in Kafka are partitioned, which can increase throughput. By partitioning and then dividing into different brokers, the processing rate of a single topic can be greatly improved. But what if certain topics do not require too high a processing rate? For this kind of situation, wouldn't it be better if you can ignore the partitioning and avoid the API and management work that comes with it?
Pulsar can do it. If you only need one topic, you can use one topic without using partitions. If you need to maintain the processing rate of multiple consumer instances, and do not need to use partitions, Pulsar's shared subscription can achieve this goal.
If partitioning is really needed to further improve performance, Pulsar can also support the use of partitions.
3. The log is good, but ledger is even better
The Kafka development team foresaw the importance of logs for a real-time data exchange system. The log is written to the system by appending, and the writing speed is very fast. The data in the log is serial, and the data can be read quickly in the order of writing. Compared to random reading and writing, serial reading and writing are faster. For any system that provides data assurance, the interaction of persistent storage is a bottleneck, and log abstraction maximizes the efficiency in this area.
Logs are good, but when the amount of data is too large, it will also bring us some trouble. Keeping all logs on a single server has become a challenge. What should I do after the logs fill up the server storage? How to expand? Or the server that saves the log goes down and needs to recreate a new server from the copy, what should I do? Copying logs from one server to another server takes a long time, especially when you want to keep the real-time data of the system at the same time, it is even more difficult to complete this operation.
Pulsar segments the log to avoid copying large chunks of the log. Through BookKeeper, Pulsar divides the logs into multiple different servers. In other words, the log will not be stored on a single server, and any server will not become the bottleneck of the entire system. This makes fault handling and capacity expansion easier, and only requires adding new servers without rebalancing.
For cloud native application developers, the favorite thing is stateless. Stateless components are fast to start, replaceable, and can achieve seamless expansion. Wouldn't it be better if the message middleware was also stateless?
Kafka is not stateless. Each broker contains all the logs of the partition. If a broker goes down, not all brokers can take over its work. If the workload is too high, you cannot add new brokers to share at will, but must synchronize the state with the broker holding its partition copy.
In the Pulsar architecture, the broker is stateless. But a completely stateless system cannot persist messages, so Pulsar does not rely on brokers to achieve message persistence. In the Pulsar architecture, data distribution and storage are independent of each other. The broker receives data from the producer, and then sends the data to the consumer, but the data is stored in BookKeeper.
Pulsar's broker is stateless, so if the workload is high, you can directly add a new broker to quickly take over the workload.
5. Simple cross-domain replication
Cross-domain replication is Pulsar's specialty. Pulsar considered this feature at the beginning of its design, and it is easy to configure. Whether it is a globally distributed application or a disaster recovery solution, it can be done through Pulsar.
6. Stable performance
The benchmark test ( http://openmessaging.cloud/docs/benchmarks/pulsar/) shows that Pulsar can provide high throughput while maintaining low latency.
7. Fully open source
Pulsar provides many features similar to Kafka, such as cross-domain replication, streaming message processing (Pulsar Functions), connectors (Pulsar IO), SQL-based topic queries (Pulsar SQL), schema registry, and some Kafka does not Features such as tiered storage and multi-tenancy. Even better, these features are open source.
Above, we have many reasons to choose Pulsar to build messaging infrastructure services. In addition to the above reasons, other features of Pulsar also bring a lot of convenience, such as multi-tenancy, namespace, authentication and authorization, documentation, friendly support for Kubernetes, and so on.
more updates and dry goods sharing about Apache Pulsar, please follow the Apache Pulsar official account!