Apache Pulsar and Kafka performance comparison: latency (test method)

The high performance of Apache Kafka has always been well-loved. It can process messages at high speed while maintaining low latency. Apache Pulsar has developed rapidly among similar products and has outstanding competitiveness.

There are always articles saying that Pulsar performs better than Kafka, but it is not easy to find the raw data for the test. In addition, many of the reported test data do not come from the latest version of Pulsar and Kafka. These two projects have developed too fast, so the test results are not representative of the new version, so we decided to use Kafka (2.3.0) and Pulsar (2.4.0), conduct a series of performance tests, and release the test results one after another.

This series of articles will focus on latency, and subsequent articles will discuss throughput.

Messaging system performance test

Both Kafka and Pulsar packages include performance testing tools. We can modify any performance testing tool to make them compatible with each other. We will use the third-party benchmark framework of the OpenMessaging Project.

Refer to: http://openmessaging.cloud/

OpenMessaging Project is a Linux Foundation Collaborative Project. The goal of the OpenMessaging Project is to provide vendor-neutral standards for messaging and streaming technologies that are applicable to all programming languages. This project uses a performance testing framework that supports multiple messaging technologies.

The code used in the test is in the OpenMessaging GitHub repository.

For reference: https://github.com/openmessaging/openmessaging-benchmark

The test was originally designed to run on public cloud products, but we will use standard EC2 instances in Amazon Web Services (AWS) for testing.

We will publish the complete output of each test on GitHub. You are welcome to analyze the data and put forward insights. Of course, you can also test yourself and get new data. We found that using different series of EC2 instance sets for testing, the results are similar, so your test results will not differ much from ours.

Although the test run with the OpenMessaging benchmark tool already contains a lot of workload, in order to make the comparison test more interesting, we decided to add some more load. This idea comes from an article on the LinkedIn Engineering website called "Benchmarking Apache Kafka: 2 Million Writes Per Second (On Three Cheap Machines)" article.

For reference: https://engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-writes-second-three-cheap-machines

This is a blog a long time ago. Now, the hardware is generally better, but we have not used the top-of-the-line test equipment. Spoiler alert in advance: Both Kafka and Pulsar can handle 2 million writes per second effortlessly, which will be detailed in a follow-up article.

OPENMESSAGING benchmark

The OpenMessaging benchmark is an open source and extensible framework. Just add Terraform configuration, Ansible playbook, or Java library that can control producer and consumer in the test tool, then messaging technology can be added to the test. Currently, the Terraform configuration in Kafka and Pulsar are the same, and the same set of EC2 instances are launched in AWS.

For comparison, the hardware configuration should be the same. Therefore, the existing benchmark code simplifies the comparison between Kafka and Pulsar.

Before the actual test started, I pre-tested all the tests. Latency tests were performed at a constant rate, and latency and throughput were recorded regularly.

If you want to test on your own, you need to pay attention to the following points: the cost of running tests on AWS is very high; a large number of well-configured EC2 instances (very important for software benchmarking) are required; if you need to ensure that the hardware does not become a bottleneck, you need to configure Better hardware, and its cost is relatively high (no less than 5 US dollars per hour, 3800 US dollars a month); when not testing (for example, at night), you can stop the running environment; make sure to delete all resources after the test .

Performance considerations

In order to better understand the test results, first introduce a few important concepts. First, you need to check the purpose of the test: delay; then, check the durability of the message, especially when storing the message to disk; finally, you need to understand the different message replication models in Kafka and Pulsar.

There are many similarities between Kafka and Pulsar, but there are also some significant differences that affect performance. In order to evaluate these two systems fairly, the following differences need to be understood:

About delay testing

All delays must include network delays between the application and the messaging system. Assuming that the network configuration of all tests is the same, and the delay caused by the network is also the same, then the network delay has the same effect on all tests. When comparing delays, the same network delay is important.

We point this out because the test results are not the same as the results in the LinkedIn Engineering blog. Assume that the test of the blog post at the time was run on a dedicated 1 GB Ethernet, and our test was run on a public cloud instance with a network bandwidth of 10 GB. Therefore, our test results are indeed incorrect when compared directly with the results in that article. Since in the test, the network configuration is constant, we can compare the delay results between Kafka and Pulsar messaging systems in this test.

We recorded two types of delay: end-to-end delay and release delay.

End-to-end delay

The end-to-end delay is relatively simple and refers to the time from when the producer sends a message to when the consumer receives the message. In the benchmark tests that implement Pulsar and Kafka, the timestamps sent are automatically generated by the respective APIs that send the messages. When the consumer receives the message, it generates a second timestamp. The difference between these two times is the end-to-end delay.

The complexity of the end-to-end delay lies in the clock used to measure the time stamp. When measuring end-to-end delay, the time stamp clock must be synchronized. If there is no synchronization, the difference between the clocks will affect the measurement. The final measurement result is actually the difference between the clocks, which is the delay of the messaging system. Since the clock drifts over time, the problem will be more serious in long-running tests.

Ideally, the producer and consumer are on the same server and use the same clock to obtain the timestamp, so there is no delay. However, the purpose of the benchmark is to separate the producer and the consumer on different servers to distribute the load.

Another best solution is to synchronize the clocks between the servers as accurately as possible. AWS provides a free time sync service (time sync service), which combined with chrony, makes the clocks between EC2 instances appear to be very close to synchronization (within a few microseconds). In the test, we specially installed chrony on all client servers and configured to use AWS time source.

Post delay

Publish delay refers to the period of time when a message is sent from the producer to the broker. After the message is acked by the broker, it indicates that the message was correctly sent to the broker and has been persisted. Procuder, broker, and consumer are independent of each other. When the producer finishes sending the message (after receiving the ack command), the success of the message has nothing to do with the producer. The consistent low release delay is beneficial to the producer. When the producer is ready to send a message, the broker quickly receives the message, allowing the producer to continue to deal with producer-level issues, such as business logic. The sending of such messages is a key feature of the broker.

In the benchmark test, the message is sent asynchronously, that is, the producer will not block the operation of the message ack. The call to send the message returns immediately, and the callback confirms when the message arrives. In asynchronous sending, publishing delay may seem less important, but it is not.

Kafka: max.in.flight.requests.per.connection

Pulsar: maxPendingMessages

Both of the above have buffers to store unacknowledged messages. When the buffer reaches the set threshold, the producer starts to block (or stops, depending on the configuration). Therefore, if the message system does not quickly acknowledge receipt of the message, the production application will wait for the operation of the message system.

In the benchmark test, the release delay refers to the time from when the send method is called to when the confirmation callback is triggered. Both timestamps are generated in the producer, so there is no need to consider clock synchronization.

Persistence and flushing messages to disk

The persistence of the message system means that the message will not be lost when the system fails. To ensure this, the messages need to be stored in a safe location so that even if the server running the messaging system software fails, the messages will not be lost. Both Kafka and Pulsar eventually write messages to disk to provide durability.

However, simply requiring the operating system to write messages to the file system is not enough. Writing to the file system means storing the data in the write cache, but not necessarily safely on the disk. Since the cache resides in memory, if the server crashes (for example, power failure, kernel downtime), data that has been written to the cache but not yet written or flushed to the disk will be lost. The operation of forcing the cached data to be written to disk is called fsync. To ensure that the message has been stored on the disk, the fsync operation needs to be triggered to flush the file cache to the disk after the message is written.

By default, Kafka does not explicitly flush every message to disk, it will refresh the disk when refreshing the operating system. When the server crashes, data that has not been flushed to disk may be lost. This default setting is for performance reasons. Writing to disk is slower than writing to the memory cache, so disk flushing degrades message processing performance. Kafka can be configured to refresh periodically (or even to refresh after each message is written), but for performance reasons, this is not recommended.

By default, Pulsar flushes every message to disk. The message is confirmed to the producer only after it is stored on the disk, which provides a stronger guarantee of durability in the event of a server crash. Pulsar can maintain high performance while ensuring durability because it uses Apache BookKeeper to store messages, and BookKeeper is a distributed log storage system optimized for this purpose. In addition, fsync operation can be prohibited in Pulsar.

Since flushing the disk will affect performance, we will test both Kafka and Pulsar twice, once to enable disk flushing for each message, and once to disable. This helps to better compare Kafka and Pulsar.

Message copy

Both Kafka and Pulsar use multiple copy mechanisms to ensure high availability of messages. Even if one copy of the message is lost, the message can still be recovered from the other copy. In Kafka and Pulsar, message replication affects performance in different ways. How to confirm that the replication behavior between Kafka and Pulsar is similar in the test?

Kafka uses leader-follower replication model. A broker of Kafka is selected as the leader of the partition. First, all messages are written to the leader, and the follower reads and copies the messages from the leader. Kafka monitors whether each follower is "synchronized" with the leader. Using Kafka, you can control the total number of copies (replication factor) of a single message and the minimum number of copies that need to be synchronized ( min.insync.replicas ) before the message is successfully saved and confirmed by the producer.

The typical configuration situation is that Kafka saves 3 copies of the message and only confirms it immediately after at least 2 copies (mostly) are successfully written. This is also the configuration we used in all Kafka tests ( replication-factor = 3, in.sync.replicas = 2, acks = all).

Pulsar uses quorum-vote replicate the model. Write multiple copies of the message in parallel. Once it is confirmed that a partial copy of the message has been stored, the message ( ack quorum ) will be confirmed. The Leader-follower model usually writes copies to the same set of leaders and followers in a specific topic partition, and Pulsar can evenly distribute the message copies to different storage nodes to improve read and write performance.

In the test, in order to ensure the consistency of the test, Pulsar also uses the mechanism of three copies. Pulsar (configuration: ensemble =3, write `quorum =3， ack quorum`=2) is similar to Kafka's replication result: 3 message copies, after being confirmed by any two of them, immediately return the message persistence success to the broker.

Click the link to view the English manuscript.

Apache Pulsar and Kafka performance comparison: latency (test method)

Messaging system performance test

OPENMESSAGING benchmark

Performance considerations

About delay testing

End-to-end delay

Post delay

Persistence and flushing messages to disk

Message copy

ApachePulsar

引用和评论

深入解析 Apache BookKeeper 系列：第二篇 — 写操作原理

Dolphinscheduler IDEA本地调试

祝贺陈梓立(Tison)当选 2025 年度 Apache 软件基金会董事会

【Hadoop】HDFS架构解析

【Hadoop】HBase系统解析及适用场景

Koupleless 助力「人力家」实现分布式研发集中式部署，又快又省！

基于 pyflink 的算法工作流设计和改造