4

Recently, this Apache Pulsar message middleware is very popular. It is known as the next generation message middleware. Today, let’s take a look at how awesome it is?

Overview

Apache Pulsar is a pub/sub messaging platform that uses Apache Bookkeeper to provide persistence. It is a server-to-server messaging middleware. It was originally developed by Yahoo and open sourced in 2016. It is currently being incubated under the Apache Foundation. It can provide the following features:

  • Cross-regional replication
  • Multi-tenant
  • Zero data loss
  • Zero rebalancing time
  • Unified queue and flow model
  • High scalability
  • High throughput
  • Pulsar Proxy
  • function

Architecture

图片

Pulsar uses a hierarchical structure to isolate the storage mechanism from the broker. This architecture provides Pulsar with the following benefits:

  • Independent extension broker
  • Independent expansion storage (Bookies)
  • Easier to containerize Zookeeper, Broker and Bookies
  • ZooKeeper provides cluster configuration and state storage

In the Pulsar cluster, one or more agents process and load balance incoming messages from producers, dispatch the messages to consumers, communicate with Pulsar configuration storage to handle various coordination tasks, and store the messages in the BookKeeper instance (also known as bookies), rely on cluster-specific ZooKeeper cluster tasks, etc.

  • The BookKeeper cluster composed of one or more bookies handles the persistent storage of messages.
  • The ZooKeeper cluster specific to this cluster handles coordination tasks between the Pulsar clusters.

图片

For more information about Pulsar's architecture, please refer to: https://pulsar.apache.org/docs/en/concepts-architecture-overview/

Four subscription models

There are four subscription modes in Pulsar: exclusive, shared, failover and key\_shared. These modes are shown in the figure below.

图片

图片

图片

图片

For detailed introduction, refer to: 160f3fbaad7d77 https://pulsar.apache.org/docs/en/concepts-messaging/

Performance is better than Kafka

The best performance of Pulsar is performance. Pulsar is much faster than Kafka. Compared with Kafka, Pulsar's speed is increased by 2.5 times and latency is reduced by 40%.

Data source: https://streaml.io/pdf/Gigaom-Benchmarking-Streaming-Platforms.pdf

图片

图片

Note: The comparison is for 1 topic in 1 partition, which contains 100 bytes of messages. Pulsar can send 220,000+ messages per second.

installation

Install Pulsar in the binary version
#下载官方二进制包
[root@centos7 ~]# wget https://archive.apache.org/dist/pulsar/pulsar-2.8.0/apache-pulsar-2.8.0-bin.tar.gz
#解压
[root@centos7 ~]# tar zxf apache-pulsar-2.8.0-bin.tar.gz
[root@centos7 ~]# cd apache-pulsar-2.8.0
[root@centos7 apache-pulsar-2.8.0]# ll
total 72
drwxr-xr-x 3 root root   225 Jan 22  2020 bin
drwxr-xr-x 5 root root  4096 Jan 22  2020 conf
drwxr-xr-x 3 root root   132 Jul  6 11:47 examples
drwxr-xr-x 4 root root    66 Jul  6 11:47 instances
drwxr-xr-x 3 root root 16384 Jul  6 11:47 lib
-rw-r--r-- 1 root root 31639 Jan 22  2020 LICENSE
drwxr-xr-x 2 root root  4096 Jan 22  2020 licenses
-rw-r--r-- 1 root root  6612 Jan 22  2020 NOTICE
-rw-r--r-- 1 root root  1269 Jan 22  2020 README
#bin目录下就有直接启动的命令
Docker installation (emphasis on introduction)
[root@centos7 ~]# docker run -it \
 -p 6650:6650 \
 -p 8080:8080 \
 --mount source=pulsardata,target=/pulsar/data \
 --mount source=pulsarconf,target=/pulsar/conf \
 apachepulsar/pulsar:2.8.0 \
 bin/pulsar standalone

HTTP protocol access uses port 8080, and pulsar protocol (Java, Python, etc. client) access uses port 6650.

The official visualization tool Pulsar Manager can visually manage multiple Pulsars. https://pulsar.apache.org/docs/en/administration-pulsar-manager/

[root@centos7 ~]# docker pull apachepulsar/pulsar-manager:v0.2.0

[root@centos7 ~]# docker run -it \
   -p 9527:9527 -p 7750:7750 \
   -e SPRING_CONFIGURATION_FILE=/pulsar-manager/pulsar-manager/application.properties \
   apachepulsar/pulsar-manager:v0.2.0

Set admin user and password

[root@centos7 ~]# CSRF_TOKEN=$(curl http://localhost:7750/pulsar-manager/csrf-token)
curl \
  -H 'X-XSRF-TOKEN: $CSRF_TOKEN' \
  -H 'Cookie: XSRF-TOKEN=$CSRF_TOKEN;' \
  -H "Content-Type: application/json" \
  -X PUT http://localhost:7750/pulsar-manager/users/superuser \
  -d '{"name": "admin", "password": "admin123", "description": "test", "email": "mingongge@test.org"}'

{"message":"Add super user success, please login"}

The browser directly enters http://server_ip:9527 log in as follows

图片

Enter the user and password you just created to configure and manage the server

图片

List

图片

Toptic list

图片

Toptic details

图片

Client configuration

Java client

The following is an example of a Java consumer configuration that uses shared subscriptions:

import org.apache.pulsar.client.api.Consumer;
import org.apache.pulsar.client.api.PulsarClient;
import org.apache.pulsar.client.api.SubscriptionType;

String SERVICE_URL = "pulsar://localhost:6650";
String TOPIC = "persistent://public/default/mq-topic-1";
String subscription = "sub-1";

PulsarClient client = PulsarClient.builder()
        .serviceUrl(SERVICE_URL)
        .build();

Consumer consumer = client.newConsumer()
        .topic(TOPIC)
        .subscriptionName(subscription)
        .subscriptionType(SubscriptionType.Shared)
        // If you'd like to restrict the receiver queue size
        .receiverQueueSize(10)
        .subscribe();
Python client

Here is an example of a Python consumer configuration that uses shared subscriptions:

from pulsar import Client, ConsumerType

SERVICE_URL = "pulsar://localhost:6650"
TOPIC = "persistent://public/default/mq-topic-1"
SUBSCRIPTION = "sub-1"

client = Client(SERVICE_URL)
consumer = client.subscribe(
    TOPIC,
    SUBSCRIPTION,
    # If you'd like to restrict the receiver queue size
    receiver_queue_size=10,
    consumer_type=ConsumerType.Shared)
C++ client

Here is an example of a C++ consumer configuration using shared subscription:

#include <pulsar/Client.h>

std::string serviceUrl = "pulsar://localhost:6650";
std::string topic = "persistent://public/defaultmq-topic-1";
std::string subscription = "sub-1";

Client client(serviceUrl);

ConsumerConfiguration consumerConfig;
consumerConfig.setConsumerType(ConsumerType.ConsumerShared);
// If you'd like to restrict the receiver queue size
consumerConfig.setReceiverQueueSize(10);

Consumer consumer;

Result result = client.subscribe(topic, subscription, consumerConfig, consumer);

More configuration and operation guides, the official documents are very clear, the official document: https://pulsar.apache.org/docs/

to sum up

As a next-generation distributed message queue, Plusar has many attractive features, and it also makes up for some of the shortcomings of other competing products, such as geographic replication, multi-tenancy, scalability, read-write isolation, and so on.


民工哥
26.4k 声望56.7k 粉丝

10多年IT职场老司机的经验分享,坚持自学一路从技术小白成长为互联网企业信息技术部门的负责人。2019/2020/2021年度 思否Top Writer