流式计算中反压问题如何解决？深入探讨 Credit-Based 流量控制机制（含源码解析）

PowerData

编者荐语：

来自PowerData阿丞同学的精彩文章！

以下文章来源于阿丞的数据漫谈，作者阿阿丞

[

阿丞的数据漫谈 .

聚焦数据及人工智能领域，不定期分享能源行业知识、数据科学、学习笔记等。尽可能All in 原创。

](#)

HELLO 更多趣文请关注阿丞的数据漫谈

前言

Flink 为应对流式计算中常见的反压问题，引入了多种优化机制，包括早期的TCP反压、动态扩缩容等，以及自1.15版本之后的流量监控机制，以优化吞吐量、降低延迟并确保系统的稳定性。

什么是流式计算中反压？

在流式计算中，反压（Backpressure）是指当数据消费速度跟不上生产速度时，系统通过限速机制避免数据积压和系统过载的现象。其核心原理是下游节点处理能力不足时，通过反馈机制向上游传递压力，最终导致数据源（如Kafka）的摄入速率降低。

反压产生的主要原因：

资源瓶颈：CPU、内存不足或并行度配置不合理，导致算子处理能力受限;
数据倾斜：部分节点处理数据量远超其他节点，引发局部过载；
代码性能问题：复杂逻辑（如频繁的IO作）或低效算法导致处理延迟;
流量激增：突发流量（如大促活动）超出系统瞬时处理能力；
外部系统瓶颈：如数据库写入慢、网络延迟高等;
时间同步异常：集群节点时间不同步导致反压误判。

Flink 主要采用以下手段来解决反压问题：

TCP反压：利用TCP流控机制控制数据流量。当出现下游处理速度滞后于上游发送速度的情况时，上游会减缓数据发送速率，从而防止系统过载。
动态扩缩容：依据系统负载状况动态调整任务的并行度，将任务分配到更多的计算节点，以此提高系统处理能力。
流量控制机制：在上层实现流量控制，接收方根据可用缓冲区动态分配信用值（Credit），发送方依据信用值精确控制发送速率。
算子逻辑优化：将多个算子（例如Filter、Map）合并为一个Task执行，以减少线程切换和序列化开销。
资源调整：根据任务需求扩展CPU、内存等资源。
网络优化：进行零拷贝优化并采用高效序列化（如预设序列化器，Flink Native或Protobuf等，避免Java序列化带来的性能瓶颈）。
其他优化（如数据倾斜优化、外部系统优化等）

自1.15版本之后，主要基于流量控制机制并辅以上述手段来应对反压问题，即Credit - Based流量控制机制：

原理：

该机制主要在网络层（涉及流量控制、缓冲调节等方面）得以实现。接收方依据本地可用缓冲区大小来动态分配信用值（Credit），这一信用值表示本地可接收的数据量。发送方根据接收到的信用值精准控制发送速率，以此避免因下游节点处理速度滞后而导致的数据积压和网络阻塞。

每个接收端（Receiver）都会维护一个信用值（Credit），用于表示本地的可用缓冲区大小。在信用授权环节，Receiver通过消息告知发送端（Sender）当前能够接收的数据量（以buffer数量为单位）。Sender则根据信用值准确发送数据，从而防止数据积压。

主要优化点：

有效解决传统TCP反压中的队头阻塞问题，提升网络带宽的利用率。
支持零拷贝传输，并实现细粒度的流量控制，减少数据在JVM堆内外的复制操作。

源码解析部分

Flink中的数据传输都依赖于缓冲区Buffer，用Netty进行通信，每当需要发送数据时，都需要创建一个新的缓冲区实例。通过 ResultSubPartition 和 InputChannel 的交互、Netty 消息的封装与事件驱动，确保反压快速生效且避免资源竞争。与 TCP-Based 相比，显著降低了反压延迟并提升了系统稳定性。

此段代码都在 org.apache.flink.runtime.io.network 包中。

整体流程：

消费者分配初始信用给生产者。
生产者发送数据时消耗信用，信用不足时暂停发送。
消费者处理完数据后，释放本地缓存空间
生产者接收新信用并继续发送数据。

消费者分配初始信用给生产者

requestSubpartition（）方法在消费者启动时，向生产者申请分区，并发送初始信用（初始信用默认为0）。

package org.apache.flink.runtime.io.network.partition.consumer; publicclass RemoteInputChannel extends InputChannel { public void requestSubpartitions() throws IOException, InterruptedException { if (partitionRequestClient == null) { LOG.debug( "{}: Requesting REMOTE subpartitions {} of partition {}. {}", this, consumedSubpartitionIndexSet, partitionId, channelStatePersister); // Create a client and request the partition try { partitionRequestClient = connectionManager.createPartitionRequestClient(connectionId); } catch (IOException e) { // IOExceptions indicate that we could not open a connection to the remote // TaskExecutor thrownew PartitionConnectionException(partitionId, e); } // requestSubpartition partitionRequestClient.requestSubpartition( partitionId, consumedSubpartitionIndexSet, this, 0); } }

生产者发送数据时消耗信用，信用不足时暂停发送。

package org.apache.flink.runtime.io.network.netty; publicabstractclass NettyMessage { // 构造新的 AddCredit 消息实例，要求 credit > 0 staticclass AddCredit extends NettyMessage { AddCredit(int credit, InputChannelID receiverId) { checkArgument(credit > 0, "The announced credit should be greater than 0"); this.credit = credit; this.receiverId = receiverId; } } // 将数据写入 Netty @Override void write(ChannelOutboundInvoker out, ChannelPromise promise, ByteBufAllocator allocator) throws IOException { ByteBuf result = null; try { result = allocateBuffer( allocator, ID, Integer.BYTES + InputChannelID.getByteBufLength()); result.writeInt(credit); receiverId.writeTo(result); out.write(result, promise); } catch (Throwable t) { handleException(result, null, t); } } // 从缓冲区中读取 AddCredit 消息 static AddCredit readFrom(ByteBuf buffer) { // 从缓冲区读取信用额度 int credit = buffer.readInt(); // 从缓冲区读取接收者 ID InputChannelID receiverId = InputChannelID.fromByteBuf(buffer); // 返回一个新的 AddCredit 消息实例 returnnew AddCredit(credit, receiverId); } }

消费者处理完数据后，释放资源

package org.apache.flink.runtime.io.network.partition.consumer; publicclass RemoteInputChannel extends InputChannel { /** * Handles the input buffer. This method is taking over the ownership of the buffer and is fully * responsible for cleaning it up both on the happy path and in case of an error. */ public void onBuffer(Buffer buffer, int sequenceNumber, int backlog, int subpartitionId) throws IOException { boolean recycleBuffer = true; try { // 检查传入的 sequenceNumber 是否与预期的 expectedSequenceNumber 相匹配 if (expectedSequenceNumber != sequenceNumber) { onError(new BufferReorderingException(expectedSequenceNumber, sequenceNumber)); return; } // 如果缓冲区中的数据类型是阻塞上游操作，则调用 onBlockingUpstream 方法，并验证 backlog 是否为 0。如果不是，则抛出非法参数异常。 if (buffer.getDataType().isBlockingUpstream()) { onBlockingUpstream(); checkArgument(backlog == 0, "Illegal number of backlog: %s, should be 0.", backlog); } finalboolean wasEmpty; boolean firstPriorityEvent = false; // 在 receivedBuffers 上使用同步块来确保线程安全 synchronized (receivedBuffers) { NetworkActionsLogger.traceInput( "RemoteInputChannel#onBuffer", buffer, inputGate.getOwningTaskName(), channelInfo, channelStatePersister, sequenceNumber); // Similar to notifyBufferAvailable(), make sure that we never add a buffer // after releaseAllResources() released all buffers from receivedBuffers // (see above for details). if (isReleased.get()) { return; } wasEmpty = receivedBuffers.isEmpty(); SequenceBuffer sequenceBuffer = new SequenceBuffer(buffer, sequenceNumber, subpartitionId); DataType dataType = buffer.getDataType(); if (dataType.hasPriority()) { firstPriorityEvent = addPriorityBuffer(sequenceBuffer); recycleBuffer = false; } else { receivedBuffers.add(sequenceBuffer); recycleBuffer = false; if (dataType.requiresAnnouncement()) { firstPriorityEvent = addPriorityBuffer(announce(sequenceBuffer)); } } totalQueueSizeInBytes += buffer.getSize(); final OptionalLong barrierId = channelStatePersister.checkForBarrier(sequenceBuffer.buffer); if (barrierId.isPresent() && barrierId.getAsLong() > lastBarrierId) { // checkpoint was not yet started by task thread, // so remember the numbers of buffers to spill for the time when // it will be started lastBarrierId = barrierId.getAsLong(); lastBarrierSequenceNumber = sequenceBuffer.sequenceNumber; } channelStatePersister.maybePersist(buffer); ++expectedSequenceNumber; } // 如果有优先级事件发生，则通知优先级事件。 if (firstPriorityEvent) { notifyPriorityEvent(sequenceNumber); } // 如果缓冲区之前为空，则通知通道非空。 if (wasEmpty) { notifyChannelNonEmpty(); } // 如果 backlog 大于等于 0，则通知发送方积压情况。 if (backlog >= 0) { onSenderBacklog(backlog); } } finally { // 最终清理，recycleBuffer 默认为True if (recycleBuffer) { buffer.recycleBuffer(); } } } }

生产者接收新信用并继续发送数据。

package org.apache.flink.runtime.io.network.netty; class CreditBasedPartitionRequestClientHandler extends ChannelInboundHandlerAdapter implements NetworkClientHandler { /** Messages to be sent to the producers (credit announcement or resume consumption request). */ // 存储待发送给生产者的消息，如信用通知或恢复消费请求 privatefinal ArrayDeque<ClientOutboundMessage> clientOutboundMessages = new ArrayDeque<>(); /** * Tries to write&flush unannounced credits for the next input channel in queue. * * <p>This method may be called by the first input channel enqueuing, or the complete future's * callback in previous input channel, or the channel writability changed event. */ private void writeAndFlushNextMessageIfPossible(Channel channel) { if (channelError.get() != null || !channel.isWritable()) { return; } // 处理队列的消息 while (true) { ClientOutboundMessage outboundMessage = clientOutboundMessages.poll(); // The input channel may be null because of the write callbacks // that are executed after each write. if (outboundMessage == null) { return; } // It is no need to notify credit or resume data consumption for the released channel. if (!outboundMessage.inputChannel.isReleased()) { Object msg = outboundMessage.buildMessage(); if (msg == null) { continue; } // Write and flush and wait until this is done before // trying to continue with the next input channel. channel.writeAndFlush(msg).addListener(writeListener); return; } } } }