【Flink】OperatorID生成逻辑及Chain策略

在 StreamGraph 翻译为 JobGraph 的过程中 Flink 会为每一个算子生成对应的 OperatorID，并传递到 Jobvertex 中。JobVertex 是 JobGraph 中的节点，每个 JobVertex 包含一个或多个算子 chain 在一起的算子链。如果 JobVertex 只包含一个算子，则 JobVertex 的 id 就是这个算子的 OperatorID，如果 JobVertex 包含了多个算子 chain 在一起的算子链，则 JobVertex 的 id 是这个算子链的头部算子的 OperatorID。每个 OperatorID 唯一标识一个算子，Flink 状态恢复时也是通过 OperatorID 找到当前节点对应的状态。

入口函数

之前提到，OperatorID 是在 StreamGraph 翻译为 JobGraph 的过程中生成的，其入口函数为 StreamingJobGraphGenerator#createJobGraph：

// Generate deterministic hashes for the nodes in order to identify them across
// submission iff they didn't change.
Map<Integer, byte[]> hashes =
  defaultStreamGraphHasher.traverseStreamGraphAndGenerateHashes(streamGraph);

// Generate legacy version hashes for backwards compatibility
List<Map<Integer, byte[]>> legacyHashes = new ArrayList<>(legacyStreamGraphHashers.size());
for (StreamGraphHasher hasher : legacyStreamGraphHashers) {
  legacyHashes.add(hasher.traverseStreamGraphAndGenerateHashes(streamGraph));
}

defaultStreamGraphHasher：默认实现为 StreamGraphHasherV2，用于计算每个节点的 OperatorID，哈希的对象根据 StreamNode 是否设置了 transformationUID 会有变化。
legacyHashes：只包含一个 StreamGraphUserHashHasher，如果用户给算子设置了 userHash，则这里会抽取用户设置的 userHash 作为 OperatorID。

StreamGraphHasherV2

找出所有的 source 算子，添加到 remaining 队列；
对 remaining 队列采取广度遍历算法，计算每个节点的 OperatorID。

public Map<Integer, byte[]> traverseStreamGraphAndGenerateHashes(StreamGraph streamGraph) {
    // The hash function used to generate the hash
    final HashFunction hashFunction = Hashing.murmur3_128(0);
    final Map<Integer, byte[]> hashes = new HashMap<>();

    Set<Integer> visited = new HashSet<>();
    Queue<StreamNode> remaining = new ArrayDeque<>();

    // We need to make the source order deterministic. The source IDs are
    // not returned in the same order, which means that submitting the same
    // program twice might result in different traversal, which breaks the
    // deterministic hash assignment.
    List<Integer> sources = new ArrayList<>();
    for (Integer sourceNodeId : streamGraph.getSourceIDs()) {
        sources.add(sourceNodeId);
    }
    Collections.sort(sources);

    //
    // Traverse the graph in a breadth-first manner. Keep in mind that
    // the graph is not a tree and multiple paths to nodes can exist.
    //

    // 将 source 节点放入队列
    // Start with source nodes
    for (Integer sourceNodeId : sources) {
        remaining.add(streamGraph.getStreamNode(sourceNodeId));
        visited.add(sourceNodeId);
    }

    // 广度遍历
    StreamNode currentNode;
    while ((currentNode = remaining.poll()) != null) {
        // Generate the hash code. Because multiple path exist to each
        // node, we might not have all required inputs available to
        // generate the hash code.
        // 如果生成失败，说明该节点依赖的节点的哈希尚未计算完毕，则把该节点从 visited 拿出，等待下一次遍历
        // 如果生成成功，则把该节点的下游节点放入待遍历的队列和 visited 队列，放入 visited 队列的原因是
        if (generateNodeHash(
                currentNode,
                hashFunction,
                hashes,
                streamGraph.isChainingEnabled(),
                streamGraph)) {
            // Add the child nodes
            for (StreamEdge outEdge : currentNode.getOutEdges()) {
                StreamNode child = streamGraph.getTargetVertex(outEdge);

                if (!visited.contains(child.getId())) {
                    remaining.add(child);
                    visited.add(child.getId());
                }
            }
        } else {
            // We will revisit this later.
            visited.remove(currentNode.getId());
        }
    }

    return hashes;
}

generateNodeHash 方法执行逻辑如下图所示：
OperatorID计算逻辑.png
根据用户是否需给 StreamNode 设置了 transformationUID 会将不同的数据作为哈希对象:

generateUserSpecifiedHash，将用户设置的 transformationUID 作为源数据计算哈希:

private byte[] generateUserSpecifiedHash(StreamNode node, Hasher hasher) {
  hasher.putString(node.getTransformationUID(), Charset.forName("UTF-8"));

  return hasher.hash().asBytes();
}

generateDeterministicHash，根据作业的拓扑结构计算 OperatorID：

private byte[] generateDeterministicHash(
  StreamNode node,
  Hasher hasher,
  Map<Integer, byte[]> hashes,
  boolean isChainingEnabled,
  StreamGraph streamGraph) {

  // Include stream node to hash. We use the current size of the computed
  // hashes as the ID. We cannot use the node's ID, because it is
  // assigned from a static counter. This will result in two identical
  // programs having different hashes.
  generateNodeLocalHash(hasher, hashes.size());

  // Include chained nodes to hash
  for (StreamEdge outEdge : node.getOutEdges()) {
    if (isChainable(outEdge, isChainingEnabled, streamGraph)) {

      // Use the hash size again, because the nodes are chained to
      // this node. This does not add a hash for the chained nodes.
      generateNodeLocalHash(hasher, hashes.size());
    }
  }

  byte[] hash = hasher.hash().asBytes();

  // Make sure that all input nodes have their hash set before entering
  // this loop (calling this method).
  for (StreamEdge inEdge : node.getInEdges()) {
    byte[] otherHash = hashes.get(inEdge.getSourceId());

    // Sanity check
    if (otherHash == null) {
      throw new IllegalStateException(
        "Missing hash for input node "
        + streamGraph.getSourceVertex(inEdge)
        + ". Cannot generate hash for "
        + node
        + ".");
    }

    for (int j = 0; j < hash.length; j++) {
      hash[j] = (byte) (hash[j] * 37 ^ otherHash[j]);
    }
  }

  // ... debug log
  return hash;
}

在哈希源数据 buffer 中放入 hashes.size()；
检查该算子 chain 的下游算子数量，每有一个就往 buffer 中放一次 hashes.size()；
对 buffer 计算哈希得到该算子的哈希值；
找出该算子的上游算子的 hashes，与该算子的哈希值做位操作。

Chain 策略

判断两个算子能 chain 在一起的条件如下：

用户全局允许启用 chain：默认处于开启状态，可通过 StreamExecutionEnvironment#disableOperatorChaining 禁用；
下游算子只有当前算子一个上游；
两个算子同属一个 slotSharingGroup；
算子的 chain 策略不能是 NEVER：默认是 ALWAYS，一些 transformations 可通过 setChainingStrategy 修改；
算子之间的数据转发使用 ForwardPartitioner；
ShuffleMode 不能是 BATCH；
上下游算子并发度一致；
该 StreamGraph 的 chaining 配置为 true：默认为 true，用户可通过 StreamGraph::setChaining 修改。

总结

OperatorID 的生成逻辑可以简要概括如下：

如果用户在创建 DataStream 时设置了 userHash，则使用该 userHash 作为 OperatorID；
如果用户在创建 DataStream 时设置了 transformationUID，则将 transformationUID 进行一次哈希计算的结果作为 OperatorID；
默认情况下，根据当前算子的位置以及和下游算子 chain 的情况计算哈希值，并和上游算子的哈希值做位操作后获得 OperatorID。

【Flink】OperatorID生成逻辑及Chain策略

入口函数

StreamGraphHasherV2

Chain 策略

总结

Mulavar

引用和评论

【Flink】TaskManager 内存模型及计算逻辑详解

读Flink源码谈设计：图的抽象与分层

Apache Hudi源码解读—Filnk写Hudi链路

Fluss：面向实时分析设计的下一代流存储

Apache Flink 2.0：Streaming into the Future

5 分钟多种方式实现数据脱敏

大数据时代，数据脱敏助力企业信息安全