KoP 2.8.0 New Features Preview (Video included)

Introduction: At the TGIP-CN live event on April 11th, we invited StreamNative engineer Xu Yunze, who shared the prospects of the new features of KoP 2.8.0. The following is a concise text version of the video shared by Xu Yunze for your reference.

In the TGIP-CN live broadcast on April 11, Xu Yunze, a software engineer from StreamNative, brought you the sharing of "KoP 2.8.0 Feature Preview". The following is a concise text version of the shared video, please refer to it.

微信扫一扫，使用小程序 bilibili BV1T741147B6，选择 TGIP-031

What I shared today is "KoP (Kafka on Pulsar) 2.8.0 New Features Preview". First of all, I will briefly introduce myself: I work for StreamNative and I am the Contributor of Apache Pulsar and the main maintainer of KoP.

About KoP version number specification

First, let's talk about the KoP version number.

Apache Pulsar has a Major release. Because the early version management of KoP is confusing, starting from 2.6.2.0, the version number of KoP is basically the same as that of Pulsar. The KoP Master branch will update the dependent Pulsar version from time to time. In this case, if you want to have some new features, you can mention a PR in Pulsar, and then KoP can rely on this method. KoP 2.8.0 is a version that can be applied to production.

Today I will start this live broadcast for you from the following four main points:

Why do you need KoP
The basic implementation of KoP
Recent progress of KoP 2.8.0-SNAPSHOT version
Near-term plans and future prospects

Kafka Vs Pulsar

First, let me talk about my views on the two systems Kafka and Pulsar. Apart from some miscellaneous features, these two systems are still very similar, but the biggest difference is their storage model.

Kafka's Broker is both computing and storage. The so-called calculation refers to abstracting the data sent by the Client into different topics and partitions, and may also do some schema and other processing, and the messages will be written to the storage after processing. Kafka's Broker will write directly to the file system on this machine after processing. But Pulsar is different, it will write to a Bookie cluster, and each Bookie node is equal.

Tiered storage brings many benefits. For example, if you want to increase throughput, you can add a Broker; if you want to increase disk capacity, you can increase Bookie, and since each node is equal, there is no need to rebalance. There is no follower of Kafka's Leader. Of course, this is not the focus of this talk. What I want to express is that the biggest difference in their architecture is that Kafka writes local files and Pulsar writes Bookie.

On the other hand, although I think there is no absolute pros and cons between the two, everyone has the freedom to choose. I believe there are many scenarios where Pulsar can be used instead of Kafka.

Migrate from Kafka to Pulsar

If I fancy some of the advantages of Pulsar and want to migrate from Kafka to Pulsar, what problems will I encounter?

Promote the business to replace the client?

The business is too troublesome and I don't want to change it.
Pulsar adaptors? (Pulsar launched an adaptor, the code of Kafka doesn't need to be changed, you can change the dependency of maven)
It looks good, but it's a pity that I am not using a Java client.
I am not too troublesome, but I only know PHP.

What if users directly use Kafka connectors (nearly one hundred) to connect to external systems?
What if the user connects to Kafka using the connector of the external system?

KoP（Kafka on Pulsar）

Facing the above-mentioned multiple problems of migrating from Kafka to Pulsar, the KoP (Kafka on Pulsar) project came into being. KoP introduces the Kafka protocol processing plug-in into Pulsar Broker, so as to realize Apache Pulsar's support for the native Apache Kafka protocol. With KoP, users can migrate existing Kafka applications and services to Pulsar without modifying the code, thereby using the powerful functions of Pulsar. Regarding the background of the KoP project, you can learn about KoP related information , which will not be repeated here.

As shown in the figure above, the Protocol Handler has been introduced since Pulsar 2.5.0, which runs on the Broker service. The default is Pulsar Protocol Handler is actually just a concept, it is to communicate with Pulsar client. Kafka Protocol Handler is dynamically loaded, configuration is equivalent to loading a layer of plug-ins, through this plug-in to communicate with Kafka client.

The use of KoP is very simple. Just put the NAR package of the Protocol Handler into the protocols subdirectory of the Pulsar directory, add the corresponding configuration to broker.conf or standalone.conf, and the service of port 9092 will be started by default at startup, similar to Kafka .

Currently, the clients supported by KoP:

Java >= 1.0
C/C++: librdkafka
Golang: sarama
NodeJS:
Other rdkafka-based clients

Protocol Handler

Protocol Handler is actually an interface, we can implement our own Protocol Handler. Broker's startup process:

Load the Protocol Handler from the directory, then load the Class, use the accept method and the protocolName method to verify, and then there are three steps:

initialize()
start()
newChannelInitializer()

The first step is to load the configuration of the Protocol Handler. Protocol Handler and Broker share the same configuration, so Service Configuration is also used here. The Start step is the most important because it passes in the BrokerService parameter.

BrokerService controls all resources of each Broker:

Connected producers, subscriptions
Topics held and their corresponding managed ledgers
Built-in admin and client
…

Implementation of KoP

Topic & Partition

Kafka and Pulsar are similar in many places. TopicPartition in Kafka is a string and an int; Pulsar is a little more complicated, divided into the following parts:

Whether to persist
Tenant
Namespaces
theme
Partition number

There are three configurations in KoP:

Default tenant: kafkaTenant=Public
Default namespace: kafkaNamespace=default
Prohibit automatic creation of non-partitioned topics: allowAutoTopicCreationType=partitioned

Why is there a configuration that prohibits the automatic creation of non-partitioned topics? Because Kafka only has the concept of partitioned topic but not the concept of non-partitioned topic. If you use the Pulsar client to automatically create a topic, the Kafka client may not be able to access this topic. Do some simple processing in KoP to independently map the default tenant to the namespace.

Produce & Fetch request

PRODUCE request:

Find the PersistentTopic object (including ManagedLedger) by topic name.
Convert the message format.
Write messages to Bookie asynchronously.

FETCH request:

Find the PersistentTopic object by topic name.
Find the corresponding ManagedCursor through Offset.
Read Entry from the corresponding position of ManagedCursor.
After converting the Entry format, the message is returned to the client.

Group Coordinator

Group Coordinator is used to rebalance and determine the mapping relationship between partition and group. Because the Group will have multiple consumers, which partitions the consumers will access, this is determined by the Group Coordinator.

When a consumer joins (subscribes to) a group:

A JoinGroup request will be sent to notify Broker that new consumers have joined.
Will send a SyncGroup request for partition allocation.

It will also send the information to the Client, and the consumer will send a new request to get some distribution information from the Broker. Group Coordinator will write these group-related information into a special topic.

KoP has also made some configurations here. This special topic will exist in a default namespace, and its partition number is 8 by default. Kafka group is basically equivalent to Pulsar Failover subscription. If you want Kafka's offset to be recognized by the Pulsar client, you need to ACK the MessageId corresponding to the offset. Therefore, there is a component in KoP called OffsetAcker, which maintains a group of Consumers. Every time the Group Coordinator wants to ACK, a consumer corresponding to the partition is created to ACK the group.

A concept of "namespace bundle" will be mentioned here. Group Coordinator determines the mapping relationship between consumer and partition.

In Apache Pulsar, each broker has (own) some Bundle ranges (as shown in the example above); the topic will be hashed to one of the Bundle ranges by name. The owner broker of this range is the owner broker of the topic, then the topic you subscribe to Just connect to the broker. Here you should pay attention to two issues. One is that the bundle may be split (you can also configure it to prohibit the split), and the other is that the Broker may hang, so the mapping relationship between the bundle and the Broker may change. Therefore, in order to prevent the occurrence of these two problems, KoP registered a listener, which can be used to sense changes in bundle ownership, and once the bundle ownership changes, it will notify the Group Coordinator to call the processing function for processing.

Kafka Offset

First introduce the two concepts of Kafka Offset and Pulsar MessageId. Kafka Offset is a 64-bit integer used to identify the location of the message storage. Kafka's message is stored in the local machine, so an integer can be used to represent the sequence number of the message. Pulsar stores messages on Bookie, which may be distributed across multiple machines, so Bookie uses Ledger ID and Entry ID to indicate the location of the message. Ledger ID can be understood to correspond to the Segment in Kafka, and Entry ID is approximately equivalent to Kafka Offset. Entry in Pulsar corresponds to not a single message, but a packaged message, so the Batch Index is generated. Therefore, the three fields of Ledger ID, Entry ID and Batch Index are required to jointly mark a Pulsar message.

Then, you can't simply map Kafka Offset to Pulsar's MessageID. Such simple processing may cause Pulsar messages to be lost. Before KoP 2.8.0, by assigning 20 bits, 32 bits, and 12 bits to Pulsar LedgerID, Entry ID and Batch Index, respectively, a Kafka Offset (as shown in the figure above) was put together. This allocation strategy is feasible in most cases and can To ensure the orderliness of Kafka offsets, it is still difficult to propose an "appropriate" allocation plan in the face of MessageID splitting. There are problems in the following situations:

For example, if 20 bytes are allocated to LedgerID, the problem of LedgerID exhaustion will occur at 2^20, and it is easy to cause the Batch Index bytes to run out;
When reading entries from the cursor, they can only be read one by one, otherwise it may cause Maximum offset delta exceeded problems;
Some third-party components (such as Spark) rely on the function of continuous offset

In view of the above-mentioned problems with Kafka Offset, StreamNative and Tencent engineers jointly proposed an optimization solution based on broker entry metadata PIP 70: Introduce lightweight broker entry metadata . The new solution can be referred to the right side of the figure below.

On the left side of the above figure: The current Pulsar message consists of two parts: Metadata and Payload. Payload refers to the specific data written, and Metadata refers to metadata such as publishing timestamp. Pulsar Broker will write the message to the Client and store the message in Bookie at the same time.

The right side of the above picture: the right side shows the improvement plan proposed by PIP 70. In the new solution, the Broker will still write the message to the Client, but what is written to the Bookie is Raw Message ── What is Raw Message? BrokerEntryMetadata is added to the original Message. From the figure above, we can see that Client cannot get Raw Message, only Broker can get Raw Message. As mentioned earlier, the Protocol Handler can obtain all the permissions of the Broker, so the Protocol Handler also obtains the Raw Message. If you put the offset in Pulsar, then KoP can get the offset.

We made this realization: There are two fields in the protocol buffer file, the main one is the second field. Index corresponds to Kafka's Offset, which is equivalent to implementing Kafka in Pulsar. There are two intercepters, ManagedLedgerInterceptor

   private boolean beforeAddEntry(OpAddEntry addOperation) {
        // if no interceptor, just return true to make sure addOperation will be 
initiate()
        if (managedLedgerInterceptor == null) {
            return true;
        }
        try {
            managedLedgerInterceptor.beforeAddEntry(addOperation, 
addOperation.getNumberOfMessages());
            return true;

And BrokerEntryMetadataInterceptor.

    public OpAddEntry beforeAddEntry(OpAddEntry op, int numberOfMessages) {
        if (op == null || numberOfMessages <= 0) {
            return op;
        }
        op.setData(Commands.addBrokerEntryMetadata(op.getData(), 
brokerEntryMetadataInterceptors, numberOfMessages));
        return op;
    }

addOperation contains the bytes and number of messages sent from the producer, so that the interceptor can intercept all produced messages. And Commands.addBrokerEntryMetadata effect is added in front of a BrokerEntryMetadata OpAddEntry data. The reason added in front is for easy analysis. You can first read whether the previous field is a magic number. If it is, then you can read BrokerEntryMetadata. If it is not, you can parse the ordinary Metadata according to the normal protocol. BrokerEntryMetadataInterceptor is equivalent to the interceptor made on the Broker side.

Therefore, it is easy to implement continuous Offset based on BrokerEntryMetadata in KoP:

FETCH request: read Bookie directly and parse BrokerEntryMetadata;
PRODUCE request: Pass the ManagedLedger into the context of asynchronously writing Bookie, and get the Offset from the interceptor of the ManagedLedger
COMMIT_OFFSET request: For __consumer_offsets, write the topic as it is, and for Pulsar's cumulative acknowledgement, perform binary search on ManagedLedger.

In view of the above changes, the following configuration must be made in KoP 2.8.0 to ensure the normal use of Offset operation:

brokerEntryMetadataInterceptors=org.apache.pulsar.common.intercept.AppendIndexMetadataInterceptor

Encoding and decoding of messages

This is also an important part of KoP 2.8.0 improvement.

Before KoP 2.8.0, the production and consumption of messages by KoP needed to go through operations such as decompression and unbatch of messages, and the operation delay was serious. We also asked a question: Why should KoP be compatible with Pulsar's client? If you migrate from Kafka to Pulsar, in most cases there may only be Kafka client, and it is unlikely that there will be interaction between Kafka client and Pulsar client, and the encoding and decoding of messages is unnecessary. When producing messages, just write the ByteBuffer inside MemoryRecords directly into Bookie.

The message consumption is relatively different. We use ManagedCursor to read it. We also need to convert several Entry into a ByteBuf. However, in the actual application scenario, it is found that this method is still relatively expensive. After further investigation, it is found that appendWithOffset is generated when the checksum is recalculated for each message. If the number of batches is large, the number of calculations will be too much, resulting in unnecessary overhead. In response to this problem, BIGO team members gave the relevant PR plan and submitted a simplified version of appendWithOffset (as shown in the figure below), removing unnecessary actions. Of course, the proposal is also based on the continuous Offset improvement submitted previously.

Performance Test (WIP)

The performance test is still in the WIP (Work in Progress) stage, and some problems have been found. First, in the peak in the figure below, the end-to-end delay is 6ms to 4ms, which is within the acceptable range. However, in the follow-up investigation, I found that the full GC is often as high as 600 ms, and even the delay is higher. We are investigating this problem.

The following pictures are the monitoring of HandleProduceRequest, ProduceEncode, MessageQueuedLatency, and MessagePublish. From the monitoring point of view, the delay of HandleProduceRequest (from the beginning of the processing of the PRODUCE request until all messages of this request are successfully written to Bookie) is about 4 ms, which is similar to the Pulsar client but one less network round trip.

We mainly look at the time to encode ProduceEncode (the time to encode Kafka messages). My test is to use Kafka's EntryFormat, and we can see that it only consumes less than 0.1 ms; if you use Pulsar's EntryFormat, the monitoring result is in a few tenths of a millisecond ~ Between a few ms.

In fact, there is still a problem with the implementation here, because a queue is still being used, so there will be the indicator MessageQueuedLatency in the figure below. MessageQueuedLatency is the time from the start of the message queue of each partition to the preparation of asynchronous transmission. We suspect that the queue is causing the performance to deteriorate, but from the monitoring point of view, the 0.1 ms delay has little effect.

Finally, MessagePublish is the delay of Bookie, that is, the time from the asynchronous sending of the message of a single partition to the successful writing to Bookie. The monitoring results are satisfactory, so we will study the source of GC problems in the near future.

KoP Authentication

Before version 2.8.0

If you need to deploy to the cloud in the actual production environment, you need to support Authentication. Prior to 2.8.0, KoP's support for Authentication was relatively simple. The support was limited to the SASL/PLAIN mechanism. It was based on Pulsar's JSON Web Token authentication. In addition to the basic configuration of Broker, only the additional configuration saslAllowedMechanism=Plain is required. The user side needs to enter namespace and token as the user name and password configured by JAAS.

security.protocol=SASL_PLAINTEXT    # or security.protocol=SASL_SSL if SSL connection is used
sasl.mechanism=PLAIN
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule \
Required username=''public/default'' password=''token:xxx'';

Support OAuth 2.0

Recently KoP 2.8.0 supports OAuth 2.0 for authentication, which is the SASL/OAUTHBEARER mechanism. Simple science, it uses simple third-party services. First, the Client obtains the authorization grant from the Resource Owner. The Resource Owner can be an authorization code similar to the WeChat official account, or it can be a grant given in advance by a real person. Then send the Grant to the Authorization Server, which is the OAuth 2 Server, and obtain the Access Token through the authorization code, and then you can access the Resource Server, which is the Broker of Pulsar, and the Boker will verify the Token. The way to obtain Token is third-party verification, which is relatively safe.

The default Handler of KoP is the same as Kafka. Similar to Kafka, KoP also needs to configure the Server Callback Handler on the broker side for token verification:

kopOauth2AuthenticateCallbackHandler: handler class
kopOauth2ConfigFile: configuration file path

It uses the JAAS method and uses a separate configuration file. KoP provides an implementation class, which is based on the AuthenticationProvider configured by Pulsar Broker for verification. Because KoP has a Broker Service, it has all the permissions of Broker and can call the provider authentication method configured by Broker for verification. Therefore, only auth.validate.method= needs to be configured in the configuration file. This method corresponds to the provider's getAuthNa me method return value. If you use JWT authentication, this method is a token; if you use OAuth 2 authentication, this method may be different.

Client

For Kafka clients, KoP provides a Login Callback Handler implementation. Kafka Java client OAuth 2.0 authentication:

sasl.login.callback.handler.class=io.streamnative.pulsar.handlers.kop.security.oauth.OauthLoginCallbackHandler
security.protocol=SASL_PLAINTEXT # or security.protocol=SASL_SSL if SSL connection is used sasl.mechanism=OAUTHBEARER
sasl.jaas.config=org.apache.kafka.common.security.oauthbearer.OAuthBearerLoginModule \
    required oauth.issuer.url="https://accounts.google.com"\
    oauth.credentials.url="file:///path/to/credentials_file.json"\
    oauth.audience="https://broker.example.com";

Server Callback is used to verify the client token, and Login Callback Handler is to obtain the third-party token from the third-party OAuth 2 service. My implementation is based on the implementation of Pulsar, and the configuration is based on Kafka's JAAS. There are three main configurations: issueUrl, credentialsUrl, and audience. Its meaning is the same as Pulsar's Java client authentication, so you can refer to Pulsar's documentation. Pulsar Java client OAuth 2.0 authentication:

String issuerUrl = "https://dev-kt-aa9ne.us.auth0.comH;
String credentialsUrl = "file:///path/to/KeyFile.json";
String audience = "https://dev-kt-aa9ne.us.auth0.com/api/v2/";
PulsarClient client = PulsarClient.builder() 
    .serviceUrl("pulsar://broker.example.com:6650/") 
    .authentication(
        AuthenticationFactoryOAuth2.clientcredentials(issuerUrl, credentialsUrl, audience)) .build();

Therefore, KoP's support for OAuth 2 is that it provides a Client-side and default Server-side Callback Handler. When Kafka uses OAuth 2 authentication, you need to write your own Handler; but the implementation of KoP and Pulsar are similar, you don't need to write your own Handler, it can be used out of the box.

KoP 2.8.0 other developments

Ported Kafka's Transaction Coordinator. If you want to enable Transaction, you need to add the following configuration:

enableTransactionCoordinator=true
brokerid=<id>

Added KoP custom metrics based on PrometheusRawMetricsProvider. This feature was added by Chen Hang of BIGO, which is the monitoring just shown.
Expose advertised listeners to support Envoy Kafka Filter for proxy. The unfriendly point in the previous KoP is that the configured listener must be the same as the advertised listener of the broker. In the new version, we separate listener and advertised listener, which can support proxy, for example: Envoy proxy can be used when deployed on the cloud.
Improve the support for Kafka AdminClient. This is a point that was overlooked in the past. Everyone thinks that Pulsar's admin is enough. In fact, on the one hand, users are accustomed to using Kafka AdminClient. On the other hand, some user-configured components have built-in AdminClient. If the protocol is not supported, it will affect the use.

Near-term plan

Pulsar 2.8.0 strives to be released at the end of April. Some performance test issues need to be checked before the official release:

Add more detailed metrics.
Troubleshoot problems with continuous memory growth and full GC during stress testing.
Conduct a more systematic performance test.
Deal with recent feedback from the community.

KoP 2.8.0 New Features Preview (Video included)

About KoP version number specification

Kafka Vs Pulsar

Migrate from Kafka to Pulsar

KoP（Kafka on Pulsar）

Protocol Handler

Implementation of KoP

Topic & Partition

Produce & Fetch request

Group Coordinator

Kafka Offset

Encoding and decoding of messages

Performance Test (WIP)

KoP Authentication

Before version 2.8.0

Support OAuth 2.0

Client

KoP 2.8.0 other developments

Near-term plan

Related Reading

ApachePulsar

引用和评论

深入解析 Apache BookKeeper 系列：第二篇 — 写操作原理

得物新一代可观测性架构：海量数据下的存算分离设计与实践

Kafka Streams 在监控场景的应用与实践

Debian/Ubuntu清理硬盘空间

百度视频搜索架构演进

ProtonBase 游戏行业解决方案

[前端] node版本升级后续平坑2 -npm版本问题

KoP 2.8.0 New Features Preview (Video included)

About KoP version number specification

Kafka Vs Pulsar

Migrate from Kafka to Pulsar

KoP（Kafka on Pulsar）

Protocol Handler

Implementation of KoP

Topic & Partition

Produce & Fetch request

Group Coordinator

Kafka Offset

Encoding and decoding of messages

Performance Test (WIP)

KoP Authentication

Before version 2.8.0

Support OAuth 2.0

Client

KoP 2.8.0 other developments

Near-term plan

Related Reading

ApachePulsar

引用和评论

深入解析 Apache BookKeeper 系列：第二篇 — 写操作原理

得物新一代可观测性架构：海量数据下的存算分离设计与实践

Kafka Streams 在监控场景的应用与实践

Debian/Ubuntu清理硬盘空间

百度视频搜索架构演进

ProtonBase 游戏行业解决方案

[前端] node版本升级后续 平坑2 -npm版本问题

[前端] node版本升级后续平坑2 -npm版本问题