2

Yesterday, Apache Kafka 3.0 version was officially released, which is a large version involving many aspects. In this version, Apache Kafka 3.0 introduces various new features, breakthrough API changes, and improvements to KRaft: Apache Kafka's built-in consensus mechanism will replace Apache ZooKeeper™.

Although KRaft is not yet recommended for production (list of known gaps), we have made many improvements to KRaft metadata and API. Exactly-once and partition redistribution support is worth highlighting. We encourage developers to check out the new features of KRaft and try it out in the development environment.

And, starting from Apache Kafka 3.0, producers enable the strongest delivery guarantee by default (acks=all, enable.idempotence=true), which means that users now get sorting and persistence by default. In addition, don't miss Kafka Connect task restart enhancements, KStreams timestamp-based synchronization improvements, and MirrorMaker2's more flexible configuration options.

Routine changes

  • KIP-750: Deprecation of Java 8 support in Kafka

In 3.0, all components of the Apache Kafka project have deprecated support for Java 8. This will give users time to make adjustments before the next major version (4.0), when Java 8 support will be withdrawn.

  • KIP-751: Deprecation of Scala 2.12 support in Kafka

Support for Scala 2.12 is also deprecated in Apache Kafka 3.0. As with Java 8, we give users time to adapt because we plan to remove support for Scala 2.12 in the next major version (4.0).

Kafka broker, producer, consumer and management client

  • KIP-630: Kafka Raft snapshot

One of the main features we introduced in 3.0 is that the KRaft controller and KRaft agent can generate, copy, and load snapshots for the metadata subject partition named __cluster_metadata. The Kafka cluster uses this topic to store and replicate metadata information about the cluster, such as agent configuration, topic partition assignment, leadership, etc. As this state grows, Kafka Raft Snapshot provides an effective way to store, load and copy this information.

  • KIP-746: Modify KRaft metadata records

Experience and continuous development since the first version of the Kafka Raft controller has shown that some metadata record types need to be modified and used when Kafka is configured to run without ZooKeeper (ZK).

  • KIP-730: Producer ID generation in KRaft mode

In 3.0 and KIP-730, the Kafka controller now completely takes over the responsibility of generating the Kafka producer ID. The controller does this in both ZK and KRaft modes. This brings us closer to the bridge version, which will allow users to transition from a Kafka deployment using ZK to a new deployment using KRaft.

  • KIP-679: Producer will enable the strongest delivery guarantee by default

Starting from 3.0, Kafka producers turn on idempotence and delivery confirmation of all copies by default. This makes record delivery guarantees stronger by default.

  • KIP-735: Increase the default consumer session timeout

The default value of the configuration property session.timeout.ms of Kafka Consumer has been increased from 10 seconds to 45 seconds. This will allow consumers to better adapt to temporary network failures by default, and avoid continuous rebalancing when consumers seem to be only temporarily leaving the group.

  • KIP-709: Extend OffsetFetch request to accept multiple group IDs

It has been a while since requesting the current offset of the Kafka consumer group. However, obtaining the offsets of multiple consumer groups requires a separate request for each group. In 3.0 and KIP-709, the fetch and AdminClient APIs were extended to support simultaneous reading of offsets of multiple consumer groups in a single request/response.

  • KIP-699: Update FindCoordinator to resolve multiple Coordinators at once

Supporting operations that can be applied to multiple consumer groups simultaneously in an efficient manner depends largely on the client's ability to effectively discover the coordinators of these groups. This is made possible through KIP-699, which adds support for coordinators that discover multiple groups with one request. The Kafka client has been updated to use this optimization when talking to a new Kafka broker that supports this request.

  • KIP-724: Remove support for message format v0 and v1

Since the launch of Kafka 0.11.0 in June 2017 for four years, the message format v2 has been the default message format. Therefore, after enough water (or stream) flows under the bridge, the major version of 3.0 provides us with a good opportunity to deprecate the old message format (ie v0 and v1). These formats are rarely used today. In 3.0, if users configure the agent to use message format v0 or v1, they will receive a warning. This option will be removed in Kafka 4.0 (for details and the impact of deprecating v0 and v1 message formats, please refer to KIP-724).

  • KIP-707: The future of KafkaFuture
    When KafkaFuture introduced this type to promote the implementation of Kafka AdminClient, versions before Java 8 were still widely used, and Kafka officially supported Java 7. Fast forward a few years, and Kafka is now running on a Java version that supports CompletionStage and CompletableFuture class types. Using KIP-707, KafkaFuture added a method to return a CompletionStage object and enhanced usability in a way that KafkaFuture is backward compatible.
  • KIP-466: Add support for List serialization and deserialization

KIP-466 adds new classes and methods for the serialization and deserialization of generic lists-this feature is very useful for both Kafka clients and Kafka Streams.

  • KIP-734: Improve AdminClient.listOffsets to return the offset of the timestamp and the record with the largest timestamp

The user's ability to list Kafka topic/partition offsets has been extended. With KIP-734, users can now ask AdminClient to return the offset and timestamp of the record with the highest timestamp in the subject/partition. (This is not confused with what the AdminClient revenue has been for the latest offset, which is the offset of the next record, written in the topic/partition.) This extension of the existing ListOffsets API allows users to detect lively by asking which is The offset of the most recently written record and what its timestamp is to partition.

kafka Connect

  • KIP-745: Connect API to restart connectors and tasks

In Kafka Connect, a connector is represented as a set of Connector class instances and one or more Task class instances at runtime, and most operations on the connectors available through the Connect REST API can be applied to the entire group. From the beginning, a notable exception to restart was the endpoints of Connector and Task instances. To restart the entire connector, the user must call separately to restart the connector instance and the task instance. In 3.0, KIP-745 enables users to restart all or only failed Connector and Task instances with one call. This feature is an additional feature, and the previous behavior of restartREST API remains unchanged.

  • KIP-738: Delete the internal converter attribute of Connect

After they were deprecated in the previous major version (Apache Kafka 2.0), internal.key.converter and internal.value.converter were removed as configuration attributes and prefixes in the configuration of the Connect worker. Looking to the future, the internal Connect theme will exclusively use JsonConverter to store records without an embedded mode. Any existing Connect cluster that uses a different converter must migrate its internal themes to the new format (see KIP-738 for details on the upgrade path).

  • KIP-722: Connector client override is enabled by default

Starting with Apache Kafka 2.3.0, the connector worker can be configured to allow the connector configuration to override the Kafka client properties used by the connector. This is a widely used feature, and now there is an opportunity to release a major version that enables the function of overriding connector client properties by default (default connector.client.config.override.policy is set to All).

  • KIP-721: Enable the connector log context in the connection Log4j configuration

Another feature that was introduced in 2.3.0 but has not been enabled by default so far is the connector log context. This has changed in 3.0, and the connector context adds log4j to the log mode of the Connect worker by default. Upgrading from the previous version to 3.0 will log4j change the format of exported log lines by adding connector contexts where appropriate.

Kafka Streams

  • KIP-695: Further improve Kafka Streams timestamp synchronization

KIP-695 enhances the semantics of how the Streams task chooses to obtain records, and expands the meaning of configuration properties and the available value max.task.idle.ms. This change requires a new method in the Kafka consumer API, where currentLag can return the consumer lag for a specific partition if it is known locally and there is no need to contact the Kafka Broker.

  • KIP-715: Publicly submitted offsets in the stream

Starting from 3.0, three new methods have been added to the TaskMetadata interface: committedOffsets, endOffsets, and timeCurrentIdlingStarted. These methods can allow Streams applications to track the progress and health of their tasks.

  • KIP-740: Clean up public API TaskId

KIP-740 represents a major innovation in this category of TaskId. Several methods and all internal fields have been deprecated. The new subtopology() and partition() will replace the old topicGroupId and partition fields (see KIP-744 for related changes and amendments to KIP-740).

  • KIP-744: Migrate TaskMetadata, and interface between ThreadMetadata and internal implementation

KIP-744 takes the changes proposed by KIP-740 one step further and separates the implementation from many types of public APIs. To achieve this, new interfaces TaskMetadata, ThreadMetadata, and StreamsMetadata were introduced, and the existing classes with the same names were discarded.

  • KIP-666: Add Instant based method to ReadOnlySessionStore

The interactive query API extends a set of new methods in the ReadOnlySessionStore and SessionStore interfaces, which accept parameters of the Instant data type. This change will affect any custom read-only interactive query session storage implementations that need to implement the new method.

  • KIP-622: Add currentSystemTimeMs and currentStreamTimeMs to ProcessorContext

The ProcessorContext adds two new methods in 3.0, currentSystemTimeMs and currentStreamTimeMs. The new method enables users to query the cached system time and streaming time separately, and use them in a uniform way in production and test code.

  • KIP-743: Delete the configuration value of the 0.10.0-2.4Streams built-in indicator version configuration

The support for the old indicator structure of the built-in indicators in Streams was removed in 3.0. KIP-743 is deleting the value built.in.metrics.version from the configuration properties in 0.10.0-2.4. The latest is currently the only valid value for this attribute (it has been the default value since 2.5).

  • KIP-741: Change the default SerDe to null

The previous default value of the default SerDe attribute has been removed. The stream used to default to ByteArraySerde. Starting with 3.0, there is no default, and users need any set of its SerDes as needed in the API or by setting the default DEFAULT_KEY_SERDE_CLASS_CONFIG and DEFAULT_VALUE_SERDE_CLASS_CONFIG in their stream configuration. The previous default values are almost always unsuitable for real applications and cause more confusion than convenience.

  • KIP-733: Change the default replication factor configuration of Kafka Streams

With the opportunity of a major version, the default value replication.factor of the Streams configuration property will be changed from 1 to -1. This will allow new Streams applications to use the default replication factor defined in the Kafka broker, so there is no need to set this configuration value when they are transferred to production. Please note that the new default value requires Kafka Brokers 2.5 or higher.

  • KIP-732: Deprecate eos-alpha and replace eos-beta with eos-v2

Another Streams configuration value deprecated in 3.0 is exactly_once as the attribute value processing.guarantee. The value exactly_once corresponds to the original implementation of Exactly Once Semantics (EOS) and can be used to connect to any Streams application of Kafka cluster version 0.11.0 or higher. The first implementation of this EOS has passed the second implementation of EOS, which is represented by the value instead of exactly_once_beta in the processing.guarantee property. Looking ahead, the name exactly_once_beta has also been deprecated and replaced with the new name exactly_once_v2. In the next major version (4.0), both exactly_once and exactly_once_beta will be removed, and exactly_once_v2 is the only option for EOS delivery guarantee.

  • KIP-725: Optimize the configuration of WindowedSerializer and WindowedDeserializer

The configuration properties default.windowed.key.serde.inner and default.windowed.value.serde.inner are deprecated and replaced by a single new property windowed.inner.class.serde for consumer clients. It is recommended that Kafka Streams users configure their windowed SerDe by passing it to the SerDe constructor, and then provide the SerDe wherever it is used in the topology.

  • KIP-633: Deprecation of the 24-hour default value of grace period in Streams

In Kafka Streams, window operations are allowed to process records outside the window based on a configuration property called grace period. Previously, this configuration was optional and it was easy to miss, resulting in a default of 24 hours. This is why Suppression operator users are often confused because it buffers records until the end of the grace period, thus adding a 24-hour delay. In 3.0, Windows classes are enhanced with factory methods that require them to be constructed with a custom grace period or no grace period at all. The old factory method with a default grace period of 24 hours has been deprecated, and the corresponding API that is incompatible with the new factory method for which grace() has set this configuration.

  • KIP-623: internal-topics adds the option "" to the streaming application reset tool

By adding a new command line parameter through kafka-streams-application-reset, the application reset tool Streams use becomes more flexible: –internal-topics. The new parameter accepts a comma-separated list of topic names, which correspond to the names that can be used. Internal themes that the application tool arranges for deletion. Combining this new parameter with the existing parameters, --dry-run allows the user to confirm which topics will be deleted before actually performing the delete operation and specify a subset of them if necessary.

MirrorMaker

  • KIP-720: Deprecate MirrorMaker v1

In 3.0, the first version of MirrorMaker is not recommended. Looking to the future, the development of new features and major improvements will focus on MirrorMaker 2 (MM2).

  • KIP-716: Allows the use of MirrorMaker2 to configure the position of the offset synchronization theme

In 3.0, users can now configure MirrorMaker2 to create and store the location of internal themes used to convert consumer group offsets. This will allow users of MirrorMaker2 to maintain the source Kafka cluster as a strictly read-only cluster and use a different Kafka cluster to store offset records (ie the target Kafka cluster, or even a third cluster outside the source and target clusters).

Original link: https://blogs.apache.org/kafka/entry/what-s-new-in-apache6


xiangzhihong
5.9k 声望15.3k 粉丝

著有《React Native移动开发实战》1,2,3、《Kotlin入门与实战》《Weex跨平台开发实战》、《Flutter跨平台开发与实战》1,2和《Android应用开发实战》