This article describes some best practices related to Pulsar client coding, and provides commercially available sample codes for your reference when developing and improving the efficiency of your access to Pulsar. In the production environment, Pulsar's address information is often obtained through the configuration center or K8s domain name discovery. This is not the focus of this article, and it is replaced PulsarConstant.SERVICE_HTTP_URL
The examples in this article have been uploaded to Github .
Early client initialization and configuration
Initialize Client--demo level
import lombok.extern.slf4j.Slf4j;
import org.apache.pulsar.client.api.PulsarClient;
/**
* @author hezhangjian
*/
@Slf4j
public class DemoPulsarClientInit {
private static final DemoPulsarClientInit INSTANCE = new DemoPulsarClientInit();
private PulsarClient pulsarClient;
public static DemoPulsarClientInit getInstance() {
return INSTANCE;
}
public void init() throws Exception {
pulsarClient = PulsarClient.builder()
.serviceUrl(PulsarConstant.SERVICE_HTTP_URL)
.build();
}
public PulsarClient getPulsarClient() {
return pulsarClient;
}
}
The Demo-level Pulsar client did not configure any custom parameters when initialized, and did not consider exceptions when initialized. When init
, an exception will be thrown directly.
Initialize Client--available online level
import io.netty.util.concurrent.DefaultThreadFactory;
import lombok.extern.slf4j.Slf4j;
import org.apache.pulsar.client.api.PulsarClient;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
/**
* @author hezhangjian
*/
@Slf4j
public class DemoPulsarClientInitRetry {
private static final DemoPulsarClientInitRetry INSTANCE = new DemoPulsarClientInitRetry();
private volatile PulsarClient pulsarClient;
private final ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1, new DefaultThreadFactory("pulsar-cli-init"));
public static DemoPulsarClientInitRetry getInstance() {
return INSTANCE;
}
public void init() {
executorService.scheduleWithFixedDelay(this::initWithRetry, 0, 10, TimeUnit.SECONDS);
}
private void initWithRetry() {
try {
pulsarClient = PulsarClient.builder()
.serviceUrl(PulsarConstant.SERVICE_HTTP_URL)
.build();
log.info("pulsar client init success");
this.executorService.shutdown();
} catch (Exception e) {
log.error("init pulsar error, exception is ", e);
}
}
public PulsarClient getPulsarClient() {
return pulsarClient;
}
}
In the actual environment, we often do pulsar client
pulsar client
started, we will keep retrying to create 0619a2dc3ef0a4.
The above code example volatile
plus continuous loop reconstruction, and after the client is successfully created, the timer thread is destroyed.
Initialize Client--Commercial level
import io.netty.util.concurrent.DefaultThreadFactory;
import lombok.extern.slf4j.Slf4j;
import org.apache.pulsar.client.api.PulsarClient;
import org.apache.pulsar.client.api.SizeUnit;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
/**
* @author hezhangjian
*/
@Slf4j
public class DemoPulsarClientInitUltimate {
private static final DemoPulsarClientInitUltimate INSTANCE = new DemoPulsarClientInitUltimate();
private volatile PulsarClient pulsarClient;
private final ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1, new DefaultThreadFactory("pulsar-cli-init"));
public static DemoPulsarClientInitUltimate getInstance() {
return INSTANCE;
}
public void init() {
executorService.scheduleWithFixedDelay(this::initWithRetry, 0, 10, TimeUnit.SECONDS);
}
private void initWithRetry() {
try {
pulsarClient = PulsarClient.builder()
.serviceUrl(PulsarConstant.SERVICE_HTTP_URL)
.ioThreads(4)
.listenerThreads(10)
.memoryLimit(64, SizeUnit.MEGA_BYTES)
.operationTimeout(5, TimeUnit.SECONDS)
.connectionTimeout(15, TimeUnit.SECONDS)
.build();
log.info("pulsar client init success");
this.executorService.shutdown();
} catch (Exception e) {
log.error("init pulsar error, exception is ", e);
}
}
public PulsarClient getPulsarClient() {
return pulsarClient;
}
}
The commercial-grade Pulsar Client
has 5 new configuration parameters:
- ioThreads netty's ioThreads is responsible for network IO operations. If the business traffic is large, you can increase the number of
ioThreads
- listenersThreads is responsible for invoking
listener
mode. It is recommended to configure more than the numberpartition
- memoryLimit currently used to limit
pulsar
producer, which can well prevent network interruption, Pulsar failure and other scenarios, the message backlog is on theproducer
, resulting in Java program OOM; - operationTimeout The timeout time of some metadata operations, Pulsar defaults to 30s, which is somewhat conservative, and can be appropriately lowered according to your own network conditions and processing performance;
- connectionTimeout connection timeout time of Pulsar, the configuration principle is the same as above.
Client advanced parameters (memory allocation related)
We can also control the parameters of Pulsar client memory allocation by passing Java properties. Here are a few important parameters:
- pulsar.allocator.pooled is true to use off-heap memory pool, false to use heap memory allocation instead of memory pool. By default, an efficient off-heap memory pool is used;
- pulsar.allocator.exit_on_oom If the memory overflows, whether to close jvm , the default is false;
- pulsar.allocator.out_of_memory_policy was https://github.com/apache/pulsar/pull/12200 . There is currently no official release version to configure the behavior when the off-heap memory is insufficient.
FallbackToHeap
andThrowException
, the default isFallbackToHeap
, if you do not want the memory of message serialization to affect the heap memory allocation, you can configure it toThrowException
.
Producer
Initialize important parameters of producer
maxPendingMessages
The producer message sending queue is reasonably configured according to the magnitude of the actual topic to avoid OOM in the scenario of network interruption and Pulsar failure. It is recommended to choose one between memoryLimit
messageRoutingMode
Message routing mode. The default is RoundRobinPartition
. Choose according to business needs. If you need to preserve the order, generally choose SinglePartition
, and send messages with the same key to the same partition
.
autoUpdatePartition
Automatically update partition information. For topic
, partition
information of 0619a2dc3ef4c4 in 0619a2dc3ef4c2 remains unchanged, no configuration is required, which reduces the consumption of the cluster.
batch related parameters
Because the bottom layer of batch sending mode is realized by timing tasks, if the number of messages on this topic is small, it is not recommended to enable batch
. In particular, a large number of timed tasks with low time intervals will cause the netty thread CPU to soar.
- enable Batching whether to enable batch sending;
- batchingMaxMessages maximum number of messages sent in batches
- batchingMaxPublishDelay Batch sending interval of scheduled tasks.
Static producer initialization
A static producer means that the producer will not be started or shut down as the business changes. Then just after the microservice is started and the client is initialized, the producer is initialized. The example is as follows:
One thread per producer, suitable for scenarios with a small number of producers
import io.netty.util.concurrent.DefaultThreadFactory;
import lombok.extern.slf4j.Slf4j;
import org.apache.pulsar.client.api.Producer;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
/**
* @author hezhangjian
*/
@Slf4j
public class DemoPulsarStaticProducerInit {
private final ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1, new DefaultThreadFactory("pulsar-producer-init"));
private final String topic;
private volatile Producer<byte[]> producer;
public DemoPulsarStaticProducerInit(String topic) {
this.topic = topic;
}
public void init() {
executorService.scheduleWithFixedDelay(this::initWithRetry, 0, 10, TimeUnit.SECONDS);
}
private void initWithRetry() {
try {
final DemoPulsarClientInit instance = DemoPulsarClientInit.getInstance();
producer = instance.getPulsarClient().newProducer().topic(topic).create();
} catch (Exception e) {
log.error("init pulsar producer error, exception is ", e);
}
}
public Producer<byte[]> getProducer() {
return producer;
}
}
Multiple producers and one thread, suitable for scenarios with a large number of producers
import io.netty.util.concurrent.DefaultThreadFactory;
import lombok.extern.slf4j.Slf4j;
import org.apache.pulsar.client.api.Producer;
import java.util.List;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
/**
* @author hezhangjian
*/
@Slf4j
public class DemoPulsarStaticProducersInit {
private final ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1, new DefaultThreadFactory("pulsar-consumer-init"));
private CopyOnWriteArrayList<Producer<byte[]>> producers;
private int initIndex;
private List<String> topics;
public DemoPulsarStaticProducersInit(List<String> topics) {
this.topics = topics;
}
public void init() {
executorService.scheduleWithFixedDelay(this::initWithRetry, 0, 10, TimeUnit.SECONDS);
}
private void initWithRetry() {
if (initIndex == topics.size()) {
return;
}
for (; initIndex < topics.size(); initIndex++) {
try {
final DemoPulsarClientInit instance = DemoPulsarClientInit.getInstance();
final Producer<byte[]> producer = instance.getPulsarClient().newProducer().topic(topics.get(initIndex)).create();;
producers.add(producer);
} catch (Exception e) {
log.error("init pulsar producer error, exception is ", e);
break;
}
}
}
public CopyOnWriteArrayList<Producer<byte[]>> getProducers() {
return producers;
}
}
Example of dynamically generating and destroying producer
There are also some businesses. Our producer may dynamically start or destroy according to the business, such as receiving data from vehicles on the road and sending it to a specified topic. We will not let all the producers reside in the memory, which will lead to a large amount of memory. We can manage the life cycle of the producer in a way similar to LRU Cache.
/**
* @author hezhangjian
*/
@Slf4j
public class DemoPulsarDynamicProducerInit {
/**
* topic -- producer
*/
private AsyncLoadingCache<String, Producer<byte[]>> producerCache;
public DemoPulsarDynamicProducerInit() {
this.producerCache = Caffeine.newBuilder()
.expireAfterAccess(600, TimeUnit.SECONDS)
.maximumSize(3000)
.removalListener((RemovalListener<String, Producer<byte[]>>) (topic, value, cause) -> {
log.info("topic {} cache removed, because of {}", topic, cause);
try {
value.close();
} catch (Exception e) {
log.error("close failed, ", e);
}
})
.buildAsync(new AsyncCacheLoader<>() {
@Override
public CompletableFuture<Producer<byte[]>> asyncLoad(String topic, Executor executor) {
return acquireFuture(topic);
}
@Override
public CompletableFuture<Producer<byte[]>> asyncReload(String topic, Producer<byte[]> oldValue,
Executor executor) {
return acquireFuture(topic);
}
});
}
private CompletableFuture<Producer<byte[]>> acquireFuture(String topic) {
CompletableFuture<Producer<byte[]>> future = new CompletableFuture<>();
try {
ProducerBuilder<byte[]> builder = DemoPulsarClientInit.getInstance().getPulsarClient().newProducer().enableBatching(true);
final Producer<byte[]> producer = builder.topic(topic).create();
future.complete(producer);
} catch (Exception e) {
log.error("create producer exception ", e);
future.completeExceptionally(e);
}
return future;
}
}
In this mode, you can perform streaming gracefully CompletableFuture<Producer<byte[]>>
Can accept lost messages sent
final CompletableFuture<Producer<byte[]>> cacheFuture = producerCache.get(topic);
cacheFuture.whenComplete((producer, e) -> {
if (e != null) {
log.error("create pulsar client exception ", e);
return;
}
try {
producer.sendAsync(msg).whenComplete(((messageId, throwable) -> {
if (throwable != null) {
log.error("send producer msg error ", throwable);
return;
}
log.info("topic {} send success, msg id is {}", topic, messageId);
}));
} catch (Exception ex) {
log.error("send async failed ", ex);
}
});
The above is Client
creation failure and sending failure. However, since Pulsar is not always available in a production environment, it will fail to send due to virtual machine failures, Pulsar service upgrades, etc. At this time, if you want to ensure that the message is sent successfully, you need to retry the message sending.
Can tolerate transmission loss in extreme scenarios
final Timer timer = new HashedWheelTimer();
private void sendMsgWithRetry(String topic, byte[] msg, int retryTimes) {
final CompletableFuture<Producer<byte[]>> cacheFuture = producerCache.get(topic);
cacheFuture.whenComplete((producer, e) -> {
if (e != null) {
log.error("create pulsar client exception ", e);
return;
}
try {
producer.sendAsync(msg).whenComplete(((messageId, throwable) -> {
if (throwable == null) {
log.info("topic {} send success, msg id is {}", topic, messageId);
return;
}
if (retryTimes == 0) {
timer.newTimeout(timeout -> DemoPulsarDynamicProducerInit.this.sendMsgWithRetry(topic, msg, retryTimes - 1), 1 << retryTimes, TimeUnit.SECONDS);
}
log.error("send producer msg error ", throwable);
}));
} catch (Exception ex) {
log.error("send async failed ", ex);
}
});
}
Here, after the sending fails, a backoff retry is performed, and the pulsar
server can be tolerated for a period of time. For example, back off 7 times and the initial interval is 1s, then the fault of 1+2+4+8+16+32+64=127s
This is enough to meet the requirements of most production environments.
theoretically has a fault that exceeds 127s, it is still necessary to return to the upstream to fail in extreme scenarios.
Producer Partition level strictly guarantees the order
The main point of the producer's strict order protection: only send one message at a time, and send the next message after confirming the success of the transmission. Two modes of synchronous and asynchronous can be used in implementation:
- The main point of the synchronous mode is to send cyclically, until the previous message is successfully sent, and then start the next message to send;
- The main point of the asynchronous mode is to observe the future sent by the previous message, and keep retrying if it fails, and start sending the next message if it succeeds.
It is worth mentioning that in this mode, partitions can be parallelized, and you can use OrderedExecutor
or per partition per thread
.
Examples of synchronization modes:
/**
* @author hezhangjian
*/
@Slf4j
public class DemoPulsarProducerSyncStrictlyOrdered {
Producer<byte[]> producer;
public void sendMsg(byte[] msg) {
while (true) {
try {
final MessageId messageId = producer.send(msg);
log.info("topic {} send success, msg id is {}", producer.getTopic(), messageId);
break;
} catch (Exception e) {
log.error("exception is ", e);
}
}
}
}
consumer
Initialize important parameters of the consumer
receiverQueueSize
Note: When cannot be processed, the consumption buffer queue will be backlogged in the memory. Reasonable configuration to prevent OOM.
autoUpdatePartition
Automatically update partition information. For topic
, partition
information of 0619a2dc3efaca in 0619a2dc3efac7 remains unchanged, no configuration is required, which reduces the consumption of the cluster.
subscribeType
The subscription type is determined according to business needs.
subscriptionInitialPosition
The starting position of the subscription is determined as the first or the last according to the business needs.
messageListener
Use listener mode to consume, only need to provide callback function, do not need to actively perform receive()
pull. Generally, there is no special requirement, and the listener mode is recommended.
ackTimeout
When the server pushes the message, but the consumer does not reply to the ack in time, it will be pushed to the consumer again after ackTimeout, which is the redeliver
mechanism.
Note that when using the redeliver
mechanism, be sure to use only the retry mechanism to retry recoverable errors. For example, if the message is decoded in the code, it is not suitable to use the redeliver
mechanism if the decoding fails. This will cause the client to keep retrying.
If you are not sure, you can also deadLetterPolicy
below to prevent messages from being retried all the time.
negativeAckRedeliveryDelay
When the client calls negativeAcknowledge
, the time to trigger the redeliver
mechanism. redeliver
mechanism are the same as that of ackTimeout
.
It should be noted that ackTimeout
and negativeAckRedeliveryDelay
not recommended to be used at the same time. Generally, it is recommended to use negativeAck
. Users can have more flexible control rights. Once the ackTimeout
is unreasonable, it may cause unnecessary retry of the message when the consumption time is uncertain.
deadLetterPolicy
Configure the maximum number of times and dead letter topic for redeliver
Initial consumer principle
Consumers can only work if they are successfully created, unlike producers that can return failures upstream, so consumers have to keep retrying to create. The sample code is as follows: Note: Consumers and topics can have a one-to-many relationship, and consumers can subscribe to multiple topics.
One thread per consumer, suitable for scenarios with a small number of consumers
import io.netty.util.concurrent.DefaultThreadFactory;
import lombok.extern.slf4j.Slf4j;
import org.apache.pulsar.client.api.Consumer;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
/**
* @author hezhangjian
*/
@Slf4j
public class DemoPulsarConsumerInit {
private final ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1, new DefaultThreadFactory("pulsar-consumer-init"));
private final String topic;
private volatile Consumer<byte[]> consumer;
public DemoPulsarConsumerInit(String topic) {
this.topic = topic;
}
public void init() {
executorService.scheduleWithFixedDelay(this::initWithRetry, 0, 10, TimeUnit.SECONDS);
}
private void initWithRetry() {
try {
final DemoPulsarClientInit instance = DemoPulsarClientInit.getInstance();
consumer = instance.getPulsarClient().newConsumer().topic(topic).messageListener(new DemoMessageListener<>()).subscribe();
} catch (Exception e) {
log.error("init pulsar producer error, exception is ", e);
}
}
public Consumer<byte[]> getConsumer() {
return consumer;
}
}
Multiple consumers and one thread, suitable for scenarios with a large number of consumers
import io.netty.util.concurrent.DefaultThreadFactory;
import lombok.extern.slf4j.Slf4j;
import org.apache.pulsar.client.api.Consumer;
import java.util.List;
import java.util.concurrent.CopyOnWriteArrayList;
import java.util.concurrent.Executors;
import java.util.concurrent.ScheduledExecutorService;
import java.util.concurrent.TimeUnit;
/**
* @author hezhangjian
*/
@Slf4j
public class DemoPulsarConsumersInit {
private final ScheduledExecutorService executorService = Executors.newScheduledThreadPool(1, new DefaultThreadFactory("pulsar-consumer-init"));
private CopyOnWriteArrayList<Consumer<byte[]>> consumers;
private int initIndex;
private List<String> topics;
public DemoPulsarConsumersInit(List<String> topics) {
this.topics = topics;
}
public void init() {
executorService.scheduleWithFixedDelay(this::initWithRetry, 0, 10, TimeUnit.SECONDS);
}
private void initWithRetry() {
if (initIndex == topics.size()) {
return;
}
for (; initIndex < topics.size(); initIndex++) {
try {
final DemoPulsarClientInit instance = DemoPulsarClientInit.getInstance();
final Consumer<byte[]> consumer = instance.getPulsarClient().newConsumer().topic(topics.get(initIndex)).messageListener(new DemoMessageListener<>()).subscribe();
consumers.add(consumer);
} catch (Exception e) {
log.error("init pulsar producer error, exception is ", e);
break;
}
}
}
public CopyOnWriteArrayList<Consumer<byte[]>> getConsumers() {
return consumers;
}
}
Consumer reaches semantics at least once
Use manual reply ack mode to ensure that the processing is successful before ack. If the processing fails, you can try again by yourself or through the negativeAck
mechanism
Synchronization mode example
It should be noted here that if there is a large gap in the processing time of messages, the synchronous processing method may make it impossible to process messages that can be processed quickly.
/**
* @author hezhangjian
*/
@Slf4j
public class DemoMessageListenerSyncAtLeastOnce<T> implements MessageListener<T> {
@Override
public void received(Consumer<T> consumer, Message<T> msg) {
try {
final boolean result = syncPayload(msg.getData());
if (result) {
consumer.acknowledgeAsync(msg);
} else {
consumer.negativeAcknowledge(msg);
}
} catch (Exception e) {
// 业务方法可能会抛出异常
log.error("exception is ", e);
consumer.negativeAcknowledge(msg);
}
}
/**
* 模拟同步执行的业务方法
* @param msg 消息体内容
* @return
*/
private boolean syncPayload(byte[] msg) {
return System.currentTimeMillis() % 2 == 0;
}
}
Examples of asynchronous mode
If you are asynchronous, you need to consider the memory limit, because the asynchronous way can quickly broker
, and will not be blocked by business operations, so inflight may have a lot of messages. If it is Shared
or KeyShared
mode, it can be restricted maxUnAckedMessage
If it is the Failover
mode, the following consumer can block the pull message when the consumer is busy, and no business processing is performed. The processing can be blocked by judging the inflight
/**
* @author hezhangjian
*/
@Slf4j
public class DemoMessageListenerAsyncAtLeastOnce<T> implements MessageListener<T> {
@Override
public void received(Consumer<T> consumer, Message<T> msg) {
try {
asyncPayload(msg.getData(), new DemoSendCallback() {
@Override
public void callback(Exception e) {
if (e == null) {
consumer.acknowledgeAsync(msg);
} else {
log.error("exception is ", e);
consumer.negativeAcknowledge(msg);
}
}
});
} catch (Exception e) {
// 业务方法可能会抛出异常
consumer.negativeAcknowledge(msg);
}
}
/**
* 模拟异步执行的业务方法
* @param msg 消息体
* @param demoSendCallback 异步函数的callback
*/
private void asyncPayload(byte[] msg, DemoSendCallback demoSendCallback) {
if (System.currentTimeMillis() % 2 == 0) {
demoSendCallback.callback(null);
} else {
demoSendCallback.callback(new Exception("exception"));
}
}
}
When consumers are busy, they block pulling messages and no longer do business processing
When the consumer cannot handle it, the listener
method is blocked, and no business processing is performed. To avoid accumulating too many messages in microservices to cause OOM, you can control the processing through RateLimiter or Semaphore.
/**
* @author hezhangjian
*/
@Slf4j
public class DemoMessageListenerAsyncAtLeastOnce<T> implements MessageListener<T> {
@Override
public void received(Consumer<T> consumer, Message<T> msg) {
try {
asyncPayload(msg.getData(), new DemoSendCallback() {
@Override
public void callback(Exception e) {
if (e == null) {
consumer.acknowledgeAsync(msg);
} else {
log.error("exception is ", e);
consumer.negativeAcknowledge(msg);
}
}
});
} catch (Exception e) {
// 业务方法可能会抛出异常
consumer.negativeAcknowledge(msg);
}
}
/**
* 模拟异步执行的业务方法
* @param msg 消息体
* @param demoSendCallback 异步函数的callback
*/
private void asyncPayload(byte[] msg, DemoSendCallback demoSendCallback) {
if (System.currentTimeMillis() % 2 == 0) {
demoSendCallback.callback(null);
} else {
demoSendCallback.callback(new Exception("exception"));
}
}
}
Consumers strictly follow the partition order
In order to achieve partition
level consumer, it is necessary to partition
with the message of the order 0619a2dc3eff9d. Once the processing fails, the other messages of partition
cannot be processed until this message is retried successfully. Examples are as follows:
/**
* @author hezhangjian
*/
@Slf4j
public class DemoMessageListenerSyncAtLeastOnceStrictlyOrdered<T> implements MessageListener<T> {
@Override
public void received(Consumer<T> consumer, Message<T> msg) {
retryUntilSuccess(msg.getData());
consumer.acknowledgeAsync(msg);
}
private void retryUntilSuccess(byte[] msg) {
while (true) {
try {
final boolean result = syncPayload(msg);
if (result) {
break;
}
} catch (Exception e) {
log.error("exception is ", e);
}
}
}
/**
* 模拟同步执行的业务方法
*
* @param msg 消息体内容
* @return
*/
private boolean syncPayload(byte[] msg) {
return System.currentTimeMillis() % 2 == 0;
}
}
Thanks
Thank you Penghui and Luo Tian for reviewing the manuscript.
About the Author
Zhangjian , Apache Pulsar Contributor, graduated from Xidian University, senior engineer of Huawei Cloud Internet of Things, Pulsar has been commercialized in Huawei Cloud Internet of Things on a large scale, for more information, please visit his Briefing Book blog address .
Related Links
join the Apache Pulsar Chinese exchange group 👇🏻
Click on the link to view the Apache Pulsar dry goods collection
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。