The reason is revealed! Why choose Pulsar instead of Kafka

ApachePulsar
中文

5 minutes to read this article

Author: Chris Bartholomew

Some time ago, I mentioned in a blog " choose instead Pulsar 7 reasons to Kafka's " . After that, I have been preparing a detailed report comparing Kafka and Pulsar, and have been talking with users of the Pulsar open source project, and also with the users of our hosted Pulsar service-Kafkaesque ( https://kafkaesque.io/) chat.

I found that I missed some reasons in the previous article. Therefore, I specially wrote a follow-up to this article to supplement it.

Before adding, let's quickly review the 7 reasons mentioned in the previous article:

  • The combination of stream processing and queue. Kafka or RabbitMQ can only handle one of them on a single platform. Pulsar is like a two-in-one product that can process real-time streams and message queues at the same time.
  • Partitioning is supported, but not required. If you only need one topic, you can use one topic without using partitions. If you need to maintain the processing rate of multiple consumer instances, you do not need to use partitions.
  • Distributed log. Pulsar logs are distributed and can be scaled horizontally. Therefore, it will not be stored on a single server, and any server will not become the bottleneck of the entire system.
  • Stateless broker. An ideal scenario for cloud-native application development, where data distribution and storage are independent of each other.
  • Native support for cross-regional replication, and simple configuration. Whether it is a globally distributed application or a disaster recovery solution, anyone can do it through Pulsar.
  • Stable performance. Benchmark tests show that Pulsar can provide high throughput while maintaining low latency.
  • Completely open source. Both features similar to Kafka or features that Kafka does not have are open source.

These are the 7 reasons mentioned earlier. If you want to know more about the above points, you can check the article mentioned at the beginning. These reasons seem to have been many, but I found more.

Multi-tenant handling

In the previous article, I should talk more about multi-tenancy. Pulsar's support for multi-tenancy is a very important feature. In an enterprise, the messaging infrastructure will be used by multiple teams and multiple projects. Maintaining an independent messaging system cluster for each team (project) is too expensive and difficult to maintain.

Pulsar can have multiple tenants, and these tenants can have multiple namespaces to maintain the order of content. Coupled with the access control, quota, rate limit, etc. of each namespace, it is conceivable that in the future we can only use one cluster to handle multi-tenant problems. In fact, not only we consider this issue, but Kafka will also consider it. You can see related content in Kafka Improvement Proposal (KIP) KIP-37. This issue has been discussed for some time.

Quorum copy

Next, I will talk about many details. To ensure that messages are not lost, the messaging system will configure each message to generate 2 or 3 copies to prevent errors. Kafka uses the follow-the-leader model to achieve this. The leader stores the message, and the follower replicates the message on the leader.

Once enough followers have confirmed that the replication has been completed, Kafka is "happy". Pulsar uses the Quorum model. It sends the message to a bunch of nodes, and once enough nodes confirm that they have received the message, Pulsar is "happy".

Quorum replication is more democratic, without this leader-follower hierarchy. When all votes are equal, the majority wins. But this has nothing to do with technology. Importantly, over time, Quorum replication tends to provide more consistent behavior.

This may explain why Pulsar has more consistent latency performance. If you want to know more about Kafka and Pulsar delays, please check my blog (the article is very long, don’t say I didn’t remind you)

In fact, Kafka has been considering using Quorum replication to improve latency consistency. For this, you can check the discussion in KIP-250.

Tiered storage

A major advantage of a stream processing system like Kafka is that it can replay messages that have already been consumed. If you like these messages the first time you see them, you can replay them to correct certain content, or build new applications around them, which is also very interesting.

If you like these messages very much and want to keep them forever, what should you do? For example, if you are doing event sourcing. This sounds great, but permanent is a long time, and permanent storage of messages can also be expensive, especially on high-performance solid-state drives. These hard drives need to maintain the message system in good operating condition.

Is it feasible to transfer those old messages (those that may be used again in the future) to a relatively inexpensive storage solution? Wouldn't it be great if you could use cheap cloud storage like Amazon S3 buckets? You may have guessed what I am going to say.

With Pulsar's hierarchical storage, you can automatically push those old messages to almost unlimited, cheap cloud storage, and then perform related operations like retrieving new messages. I bet that Kafka wants to have this feature. Yes, they will. You can see related discussions in KIP-405.

End-to-end encryption

Obviously, security is very important, and no one wants information security to be peeped. Of course, you will use TLS (encrypted during transmission) between the client and the messaging system. In doing so, the messaging system must decrypt the connection in order to understand what the client wants to express.

Then, it saves the unencrypted message on the disk. Of course, you will insist on encrypting the disk so that even if the disk is stolen, the message is still safe (encrypted at rest). But in both cases, the messaging system requires a key to the data. If this is not the case, it is dealing with a lot of baffling books.

In many cases, this level of encryption is good enough, but if you want to ensure that no one can pry into your message, you need end-to-end encryption. The producer encrypts the message using the key shared with the consumer receiving the message before sending it. When the message is saved on the disk of the message system, it will be encrypted, and the message system does not have a key. The messaging system can do its job, but your message is like a heavenly book to the messaging system, so it is very safe.

Pulsar can perform end-to-end encryption in its Java client. Kafka has been discussing this operation in KIP-317.

Broker balance

Pulsar broker is stateless. Stateless components are a great thing. When one component is overloaded, you can add another component to handle the load. When new clients connect, they can be directed to the new instance. But this does not help the instance that is overloaded for the first time. You need to transfer some work from the overloaded instance to the new instance. In other words, the load needs to be rebalanced.

Pulsar will automatically perform broker load balancing. It monitors the usage of the broker's CPU, memory, and network (not the disk, the broker I mentioned is stateless) and adjusts the load to maintain balance. This means that you do not need to expand the broker cluster at a single broker hotspot, unless the service capacity of the broker cluster reaches the upper limit.

You can also use Kafka for proxy load balancing. However, you must install one more program, such as LinkedIn's Cruise Control. Or you can use Confluent's load balancer tool (this tool requires a fee).

Community and ecosystem

Some people criticized me for not mentioning the scale and richness of the Kafka community and ecosystem. This criticism is pertinent. In the community and ecosystem category, Kafka does a better job than Pulsar. As an open source project, Kafka has been in existence for 5 years, so it has a larger community, more related projects, and more related questions and answers on Stack Overflow.

As everyone continues to contribute new components and peripheral integrations, the Pulsar community is growing and growing. The friends in the Slack community are very friendly and helpful. I would also like to add: Pulsar was inspired by Kafka in many ways and stood on the shoulders of the giant Kafka. The Kafka project and community deserve praise and respect. Sometimes it sounds like I don’t respect Kafka. In fact, I respect Kafka very much. I just have a better opinion of Pulsar.

A reasonable alternative to KAFKA

In this article and the previous article, I listed 12 reasons to choose Pulsar. What's even cooler is that I find that as I deepen my understanding of Pulsar, I can always find new reasons. Therefore, I may also write a third blog related to this topic. stay tuned.

I think Pulsar is a reasonable alternative to Kafka. Pulsar supports most of Kafka's functions, and Pulsar has more advantages (I have listed 12 at present). The more people who know Pulsar, the greater its development momentum. If you are evaluating streaming and/or queuing systems, consider Apache Pulsar.

Want to keep abreast of Pulsar's R&D progress, user cases and hot topics? Come and pay attention to Apache Pulsar and StreamNative WeChat official accounts, we will share everything about Pulsar with you as soon as possible.

Click the link to view the English version.

阅读 820

ApachePulsar
Apache Pulsar 是 Apache 软件基金会顶级项目,是下一代云原生分布式消息流平台,集消息、存储、轻量化...

Apache软件基金会顶级项目,下一代云原生分布式消息系统

186 声望
918 粉丝
0 条评论

Apache软件基金会顶级项目,下一代云原生分布式消息系统

186 声望
918 粉丝
文章目录
宣传栏