云原生 - Blog post dry goods | Using Apache Pulsar in Kotlin - ApachePulsar

About Apache Pulsar
Apache Pulsar is a top-level project of the Apache Software Foundation. It is a next-generation cloud-native distributed message flow platform that integrates messaging, storage, and lightweight functional computing. Multi-machine room and cross-region data replication, with streaming data storage features such as strong consistency, high throughput, low latency, and high scalability.
GitHub address: http://github.com/apache/pulsar/

This article is translated from: "Using Apache Pulsar With Kotlin" by Gilles Barbier.
Original link: https://gillesbarbier.medium.com/using-apache-pulsar-with-kotlin-3b0ab398cf52

Apache Pulsar often described as the next generation of Kafka, is a rising star in the developer toolset. Pulsar is a multi-tenant, high-performance solution for server-to-server messaging, often used as the core of scalable applications.

Pulsar can be used with Kotlin because it is written in Java. However, its API doesn't take into account the power that Kotlin brings, such as data classes , coroutines or reflectionless serialization .

In this article, I will discuss how to use Pulsar with Kotlin.

Use native serialization for message bodies

A default way of defining messages in Kotlin is to use the data class , the main purpose of which is to hold data. For such data classes, Kotlin automatically provides methods such as equals(), toString(), copy(), etc., reducing code size and the risk of errors.

Create a Pulsar producer using Java:

Producer<MyAvro> avroProducer = client
  .newProducer(Schema.AVRO(MyAvro.class))
  .topic(“some-avro-topic”)
  .create();

The Schema.AVRO(MyAvro.class) directive will introspect the MyAvro Java class and infer a Schema from it. This requires verifying that new producers will actually produce messages that are actually compatible with existing consumers. However, the Java implementation of Kotlin data classes does not play well with the default serializer used by Pulsar. But luckily, since version 2.7.0, Pulsar allows you to use custom serializers for producers and consumers.

First, you need to install the official Kotlin serialization plugin . Use it to create a message class like this:

@Serializable
        data class RunTask(
             val taskName: TaskName,
             val taskId: TaskId,
        val taskInput: TaskInput,
        val taskOptions: TaskOptions,
        val taskMeta: TaskMeta
         )

Note the @Serializable annotation. With it, you can use RunTask.serialiser() to make the serializer work without introspection, which will make things a lot more efficient!

Currently, the serialization plugin only supports JSON (and some other formats in beta such as protobuf). So we also need avro4k library to extend it and support Avro format.

Using these tools, we can create a Producer task like the following:

import com.github.avrokotlin.avro4k.Avro
import com.github.avrokotlin.avro4k.io.AvroEncodeFormat
import io.infinitic.common.tasks.executors.messages.RunTask
import kotlinx.serialization.KSerializer
import org.apache.avro.file.SeekableByteArrayInput
import org.apache.avro.generic.GenericDatumReader
import org.apache.avro.generic.GenericRecord
import org.apache.avro.io.DecoderFactory
import org.apache.pulsar.client.api.Consumer
import org.apache.pulsar.client.api.Producer
import org.apache.pulsar.client.api.PulsarClient
import org.apache.pulsar.client.api.Schema
import org.apache.pulsar.client.api.schema.SchemaDefinition
import org.apache.pulsar.client.api.schema.SchemaReader
import org.apache.pulsar.client.api.schema.SchemaWriter
import java.io.ByteArrayOutputStream
import java.io.InputStream

// Convert T instance to Avro schemaless binary format
fun <T : Any> writeBinary(t: T, serializer: KSerializer<T>): ByteArray {
    val out = ByteArrayOutputStream()
    Avro.default.openOutputStream(serializer) {
        encodeFormat = AvroEncodeFormat.Binary
        schema = Avro.default.schema(serializer)
    }.to(out).write(t).close()

    return out.toByteArray()
}

// Convert Avro schemaless byte array to T instance
fun <T> readBinary(bytes: ByteArray, serializer: KSerializer<T>): T {
    val datumReader = GenericDatumReader<GenericRecord>(Avro.default.schema(serializer))
    val decoder = DecoderFactory.get().binaryDecoder(SeekableByteArrayInput(bytes), null)

    return Avro.default.fromRecord(serializer, datumReader.read(null, decoder))
}

// custom Pulsar SchemaReader
class RunTaskSchemaReader: SchemaReader<RunTask> {
    override fun read(bytes: ByteArray, offset: Int, length: Int) =
        read(bytes.inputStream(offset, length))

    override fun read(inputStream: InputStream) =
        readBinary(inputStream.readBytes(), RunTask.serializer())
}

// custom Pulsar SchemaWriter
class RunTaskSchemaWriter : SchemaWriter<RunTask> {
    override fun write(message: RunTask) = writeBinary(message, RunTask.serializer())
}

// custom Pulsar SchemaDefinition<RunTask>
fun runTaskSchemaDefinition(): SchemaDefinition<RunTask> =
    SchemaDefinition.builder<RunTask>()
        .withJsonDef(Avro.default.schema(RunTask.serializer()).toString())
        .withSchemaReader(RunTaskSchemaReader())
        .withSchemaWriter(RunTaskSchemaWriter())
        .withSupportSchemaVersioning(true)
        .build()

// Create an instance of Producer<RunTask>
fun runTaskProducer(client: PulsarClient): Producer<RunTask> = client
    .newProducer(Schema.AVRO(runTaskSchemaDefinition()))
    .topic("some-avro-topic")
    .create();

// Create an instance of Consumer<RunTask>
fun runTaskConsumer(client: PulsarClient): Consumer<RunTask> = client
    .newConsumer(Schema.AVRO(runTaskSchemaDefinition()))
    .topic("some-avro-topic")
    .subscribe();
密封类消息和每个 Topic 一个封装
Pulsar 每个 Topic 只允许一种类型的消息。在某些特殊情况下，这并不能满足全部需求。但这个问题可以通过使用封装模式来变通。
首先，使用密封类从一个 Topic 创建所有类型消息：
@Serializable
sealed class TaskEngineMessage() {
    abstract val taskId: TaskId
}

@Serializable
data class DispatchTask(
    override val taskId: TaskId,
    val taskName: TaskName,
    val methodName: MethodName,
    val methodParameterTypes: MethodParameterTypes?,
    val methodInput: MethodInput,
    val workflowId: WorkflowId?,
    val methodRunId: MethodRunId?,
    val taskMeta: TaskMeta,
    val taskOptions: TaskOptions = TaskOptions()
) : TaskEngineMessage()

@Serializable
data class CancelTask(
    override val taskId: TaskId,
    val taskOutput: MethodOutput
) : TaskEngineMessage()

@Serializable
data class TaskCanceled(
    override val taskId: TaskId,
    val taskOutput: MethodOutput,
    val taskMeta: TaskMeta
) : TaskEngineMessage()

@Serializable
data class TaskCompleted(
    override val taskId: TaskId,
    val taskName: TaskName,
    val taskOutput: MethodOutput,
    val taskMeta: TaskMeta
) : TaskEngineMessage()

Then, create a wrapper for these messages:

Note @Serializable
data class TaskEngineEnvelope(
    val taskId: TaskId,
    val type: TaskEngineMessageType,
    val dispatchTask: DispatchTask? = null,
    val cancelTask: CancelTask? = null,
    val taskCanceled: TaskCanceled? = null,
    val taskCompleted: TaskCompleted? = null,
) {
    init {
        val noNull = listOfNotNull(
            dispatchTask,
            cancelTask,
            taskCanceled,
            taskCompleted
        )

        require(noNull.size == 1)
        require(noNull.first() == message())
        require(noNull.first().taskId == taskId)
    }

    companion object {
        fun from(msg: TaskEngineMessage) = when (msg) {
            is DispatchTask -> TaskEngineEnvelope(
                msg.taskId,
                TaskEngineMessageType.DISPATCH_TASK,
                dispatchTask = msg
            )
            is CancelTask -> TaskEngineEnvelope(
                msg.taskId,
                TaskEngineMessageType.CANCEL_TASK,
                cancelTask = msg
            )
            is TaskCanceled -> TaskEngineEnvelope(
                msg.taskId,
                TaskEngineMessageType.TASK_CANCELED,
                taskCanceled = msg
            )
            is TaskCompleted -> TaskEngineEnvelope(
                msg.taskId,
                TaskEngineMessageType.TASK_COMPLETED,
                taskCompleted = msg
            )
        }
    }

    fun message(): TaskEngineMessage = when (type) {
        TaskEngineMessageType.DISPATCH_TASK -> dispatchTask!!
        TaskEngineMessageType.CANCEL_TASK -> cancelTask!!
        TaskEngineMessageType.TASK_CANCELED -> taskCanceled!!
        TaskEngineMessageType.TASK_COMPLETED -> taskCompleted!!
    }
}

enum class TaskEngineMessageType {
    CANCEL_TASK,
    DISPATCH_TASK,
    TASK_CANCELED,
    TASK_COMPLETED
}

Note how Kotlin checks init gracefully! A wrapper can be easily created with TaskEngineEnvelope.from(msg) and return the original message with envelope.message() .

Why add an explicit taskId value here instead of using a global field message:TaskEngineMessage and one field for each message type? It is because in this way, I can use PulsarSQL to obtain the information of this topic by means of taskId or type, or a combination of the two.

Build Worker through coroutines

Using Thread in plain Java is complicated and error-prone. Fortunately, Koltin provides coroutines - a simpler abstraction for asynchronous processing - and channels - a convenient way to transfer data between coroutines.

I can create a Worker by:

A single ("task-engine-message-puller") coroutine dedicated to pulling messages from Pulsar
N coroutines ( "task-engine-$i") to process messages in parallel
A single ("task-engine-message-acknoldeger") coroutine that acknowledges Pulsar messages after processing

After there are many processes like this I have added a logChannel to collect logs. Note that in order to be able to acknowledge the Pulsar message in a different coroutine than the one that received it, I need to wrap the TaskEngineMessage into MessageToProcess<TaskEngineMessage> which contains Pulsar messageId :

typealias TaskEngineMessageToProcess = MessageToProcess<TaskEngineMessage>

fun CoroutineScope.startPulsarTaskEngineWorker(
    taskEngineConsumer: Consumer<TaskEngineEnvelope>,
    taskEngine: TaskEngine,
    logChannel: SendChannel<TaskEngineMessageToProcess>?,
    enginesNumber: Int
) = launch(Dispatchers.IO) {

    val taskInputChannel = Channel<TaskEngineMessageToProcess>()
    val taskResultsChannel = Channel<TaskEngineMessageToProcess>()

    // coroutine dedicated to pulsar message pulling
    launch(CoroutineName("task-engine-message-puller")) {
        while (isActive) {
            val message: Message<TaskEngineEnvelope> = taskEngineConsumer.receiveAsync().await()

            try {
                val envelope = readBinary(message.data, TaskEngineEnvelope.serializer())
                taskInputChannel.send(MessageToProcess(envelope.message(), message.messageId))
            } catch (e: Exception) {
                taskEngineConsumer.negativeAcknowledge(message.messageId)
                throw e
            }
        }
    }

    // coroutines dedicated to Task Engine
    repeat(enginesNumber) {
        launch(CoroutineName("task-engine-$it")) {
            for (messageToProcess in taskInputChannel) {
                try {
                    messageToProcess.output = taskEngine.handle(messageToProcess.message)
                } catch (e: Exception) {
                    messageToProcess.exception = e
                }
                taskResultsChannel.send(messageToProcess)
            }
        }
    }

    // coroutine dedicated to pulsar message acknowledging
    launch(CoroutineName("task-engine-message-acknowledger")) {
        for (messageToProcess in taskResultsChannel) {
            if (messageToProcess.exception == null) {
                taskEngineConsumer.acknowledgeAsync(messageToProcess.messageId).await()
            } else {
                taskEngineConsumer.negativeAcknowledge(messageToProcess.messageId)
            }
            logChannel?.send(messageToProcess)
        }
    }
}

data class MessageToProcess<T> (
    val message: T,
    val messageId: MessageId,
    var exception: Exception? = null,
    var output: Any? = null
)

Summarize

In this article, we introduced how to use Pulsar implemented in Kotlin:

Code messages (including the encapsulation of Pulsar topics that receive multiple types of messages);
Create a producer/consumer of Pulsar;
Build a simple Worker capable of processing many messages in parallel.

Follow the public account "Apache Pulsar" to get more technical dry goods

Join Apache Pulsar Chinese exchange group 👇🏻

Blog post dry goods | Using Apache Pulsar in Kotlin

Use native serialization for message bodies

Build Worker through coroutines

Summarize

ApachePulsar

引用和评论

深入解析 Apache BookKeeper 系列：第二篇 — 写操作原理

SegmentFault 思否开源项目支持计划启动，为你的开源项目助力！

思否发布｜2024 中国开源先锋 33 人之心尖上的开源人物

保证Redis和数据库数据一致性的方法

NocoBase 本周更新汇总：提升表格区块渲染性能等

啊？原来社区大佬们是这样的人！

2024 中国技术先锋年度评选正式启动！6 大奖项即将揭晓