Pulsar: Is the next-generation messaging engine really that strong?

background

We are currently doing technical selection of new business, which involves the choice of message middleware; combined with our actual situation, we hope that it can meet the following requirements:

Friendly cloud native support: Because the current main language is Go , and it can be simple enough in operation and maintenance.
Officially supports SDK multiple languages: There are also some Java Python and 06082afc234273 that need to be maintained.
It is best to have some convenient and easy-to-use features, such as: delayed messages, dead letter queues, multi-tenancy, etc.

Of course, there are some horizontal expansion, throughput, low latency, these characteristics needless to say, almost all mature messaging middleware can meet these requirements.

Based on the above screening criteria, Pulsar entered our field of vision.

As Apache , the above features are well supported.

Let's take a look at what it excels.

Architecture

It can be seen from the official architecture diagram that Pulsar mainly consists of the following components:

Broker stateless components, which can be scaled horizontally, mainly used for producer and consumer connections; similar to Kafka's broker, but without data storage function, so expansion is easier.
BookKeeper cluster: mainly used for persistent storage of data.
Zookeeper used to store the broker and BookKeeper .

At a glance, it seems that there are more components than Kafka relies on, which does provide system complexity; but the same benefits are also obvious.

Pulsar is separated from the calculation. When expansion is needed, it will be very simple. broker add 06082afc24adf2 directly without any other mental burden.

When storage becomes a bottleneck, only BookKeeper needs to be expanded, without artificial rebalancing, BookKeeper will automatically load.

The same operation, Kafka is much more complicated.

characteristic

Multi-tenant

Multi-tenancy is also a just-needed function, which can isolate the data of different businesses and teams in the same cluster.

persistent://core/order/create-order

Take this topic name as an example, under core there is a order of namespace , which is create-order the name of topic

In actual use, tenants are generally divided according to business teams, and namespace is different businesses under the current team; this way, topics can be managed clearly.

There is usually a comparison that hurts. How do you deal with this type of problem in a messaging middleware without multi-tenancy:

It’s simply not so fine, all business lines are mixed, and when the team is small, it may not be a big problem; once the business increases, it will be very troublesome to manage.
I make a layer of abstraction before topic, but in fact it is also realizing multi-tenancy in essence.
Each business team maintains its own cluster, which can of course also solve the problem, but the complexity of operation and maintenance will naturally increase.

The above is very intuitive to see the importance of multi-tenancy.

Function function calculation

Pulsar also supports lightweight function calculations, for example, some messages need to be cleaned and converted, and then published to another topic.

This kind of requirement can write a simple function. Pulsar provides SDK to conveniently process the data, and finally use the official tool to publish it to broker .

Prior to this type of simple requirements may also need to deal with the stream processing engine yourself.

application

In addition, the upper-level applications, such as producers and consumers, are similar to those used by everyone.

For example, Pulsar supports four consumption modes:

Exclusive : Exclusive mode, at the same time only one consumer can start and consume data; through SubscriptionName indicate the same consumer), the scope of application is small.
Failover failover mode: Based on the exclusive mode, multiple consumer can be started at the same time. Once one consumer hangs up, the rest can be quickly topped up, but only one consumer can be consumed; some scenarios are available.
Shared sharing mode: the consumer can run simultaneously with N, a message according round-robin delivered to each polling consumer ; and when a consumer down without ack time, the message will be delivered to other consumers. This consumption model can increase consumption power, but the news cannot be orderly.
KeyShared Sharing Mode: Based on the sharing mode; it is equivalent to topic , and the messages in the same group can only be consumed by the same consumer in an orderly manner.

The third shared consumption mode should be the most used. When the order of messages is required, the KeyShared mode can be used.

SDK

The officially supported SDK is very rich; I also SDK on the basis of the SDK .

Because we use a lightweight dependency injection library such as dig

    SetUpPulsar(lookupURL)
    container := dig.New()
    container.Provide(func() ConsumerConfigInstance {
        return NewConsumer(&pulsar.ConsumerOptions{
            Topic:            "persistent://core/order/create-order",
            SubscriptionName: "order-sub",
            Type:             pulsar.Shared,
            Name:             "consumer01",
        }, ConsumerOrder)

    })

    container.Provide(func() ConsumerConfigInstance {
        return NewConsumer(&pulsar.ConsumerOptions{
            Topic:            "persistent://core/order/update-order",
            SubscriptionName: "order-sub",
            Type:             pulsar.Shared,
            Name:             "consumer02",
        }, ConsumerInvoice)

    })

    container.Invoke(StartConsumer)

Two of the container.Provide() functions are used to inject the consumer object.

container.Invoke(StartConsumer) will remove all consumer objects from the container and start consumption at the same time.

At this time, with my limited Go I am also thinking about a question. Go dependency injection required in 06082afc24b544?

Let's take a look at the benefits of Dig

Objects are managed by the container, which makes it easy to implement singletons.
When the previous dependencies of each object are complex, a lot of code for creating and obtaining objects can be reduced, and the dependencies are clearer.

The same disadvantages are:

It's not so intuitive when you follow the code, and you can't see how a dependent object was created at a glance.
It's not in line with the concise approach advocated by Go.

For Java Spring , it must be really fragrant, after all, it is still a familiar taste; but for Gopher who has never been in contact with similar needs, it does not seem to be just needed.

At present, all kinds of Go dependency injection libraries on the market are emerging in endlessly, and there are also many large manufacturers. It can be seen that there is still a market.

I believe that many Gopher very disgusted with the Java introduce some complex concepts to Go , but I think the dependency injection itself is a language, a variety of languages also has its own implementation, but in Java Spring is not just a dependent Injecting the framework, there are many complex functions, so many developers are daunting.

If only dependency injection is a subdivision requirement, it is not complicated to implement, and it will not bring too much complexity. If you take the time to look at the source code, you can quickly grasp it based on understanding the concepts.

SDK back to 06082afc24b7e2 itself, Go of SDK has fewer functions than the Java version at this stage (to be precise, only the Java version has the most abundant functions), but the core is all, and it does not affect daily use.

to sum up

This article describes Pulsar some basic concepts and benefits, as well as the way to talk about Go dependency injection; if you like us to do in the technology selection, may wish to consider Pulsar .

Follow-up will continue to share Pulsar related content, friends with relevant experience can also leave their own opinions in the comment area.

Pulsar: Is the next-generation messaging engine really that strong?

background

Architecture

characteristic

Multi-tenant

Function function calculation

application

SDK

to sum up

crossoverJie

引用和评论

StarRocks 升级注意事项

Apache Pulsar 技术系列 - 大规模延迟消息解析

【赵渝强老师】Kafka生产者的执行过程

【赵渝强老师】Kafka生产者的消息发送方式

【赵渝强老师】Kafka的消费者与消费者组

搭建Zookeeper、Kafka集群

【赵渝强老师】Kafka消息的消费模式