在 StreamGraph 翻译为 JobGraph 的过程中 Flink 会为每一个算子生成对应的 OperatorID,并传递到 Jobvertex 中。JobVertex 是 JobGraph 中的节点,每个 JobVertex 包含一个或多个算子 chain 在一起的算子链。如果 JobVertex 只包含一个算子,则 JobVertex 的 id 就是这个算子的 OperatorID,如果 JobVertex 包含了多个算子 chain 在一起的算子链,则 JobVertex 的 id 是这个算子链的头部算子的 OperatorID。每个 OperatorID 唯一标识一个算子,Flink 状态恢复时也是通过 OperatorID 找到当前节点对应的状态。
入口函数
之前提到,OperatorID 是在 StreamGraph 翻译为 JobGraph 的过程中生成的,其入口函数为 StreamingJobGraphGenerator#createJobGraph:
// Generate deterministic hashes for the nodes in order to identify them across
// submission iff they didn't change.
Map<Integer, byte[]> hashes =
defaultStreamGraphHasher.traverseStreamGraphAndGenerateHashes(streamGraph);
// Generate legacy version hashes for backwards compatibility
List<Map<Integer, byte[]>> legacyHashes = new ArrayList<>(legacyStreamGraphHashers.size());
for (StreamGraphHasher hasher : legacyStreamGraphHashers) {
legacyHashes.add(hasher.traverseStreamGraphAndGenerateHashes(streamGraph));
}
- defaultStreamGraphHasher:默认实现为 StreamGraphHasherV2,用于计算每个节点的 OperatorID,哈希的对象根据 StreamNode 是否设置了 transformationUID 会有变化。
- legacyHashes:只包含一个 StreamGraphUserHashHasher,如果用户给算子设置了 userHash,则这里会抽取用户设置的 userHash 作为 OperatorID。
StreamGraphHasherV2
- 找出所有的 source 算子,添加到 remaining 队列;
- 对 remaining 队列采取广度遍历算法,计算每个节点的 OperatorID。
public Map<Integer, byte[]> traverseStreamGraphAndGenerateHashes(StreamGraph streamGraph) {
// The hash function used to generate the hash
final HashFunction hashFunction = Hashing.murmur3_128(0);
final Map<Integer, byte[]> hashes = new HashMap<>();
Set<Integer> visited = new HashSet<>();
Queue<StreamNode> remaining = new ArrayDeque<>();
// We need to make the source order deterministic. The source IDs are
// not returned in the same order, which means that submitting the same
// program twice might result in different traversal, which breaks the
// deterministic hash assignment.
List<Integer> sources = new ArrayList<>();
for (Integer sourceNodeId : streamGraph.getSourceIDs()) {
sources.add(sourceNodeId);
}
Collections.sort(sources);
//
// Traverse the graph in a breadth-first manner. Keep in mind that
// the graph is not a tree and multiple paths to nodes can exist.
//
// 将 source 节点放入队列
// Start with source nodes
for (Integer sourceNodeId : sources) {
remaining.add(streamGraph.getStreamNode(sourceNodeId));
visited.add(sourceNodeId);
}
// 广度遍历
StreamNode currentNode;
while ((currentNode = remaining.poll()) != null) {
// Generate the hash code. Because multiple path exist to each
// node, we might not have all required inputs available to
// generate the hash code.
// 如果生成失败,说明该节点依赖的节点的哈希尚未计算完毕,则把该节点从 visited 拿出,等待下一次遍历
// 如果生成成功,则把该节点的下游节点放入待遍历的队列和 visited 队列,放入 visited 队列的原因是
if (generateNodeHash(
currentNode,
hashFunction,
hashes,
streamGraph.isChainingEnabled(),
streamGraph)) {
// Add the child nodes
for (StreamEdge outEdge : currentNode.getOutEdges()) {
StreamNode child = streamGraph.getTargetVertex(outEdge);
if (!visited.contains(child.getId())) {
remaining.add(child);
visited.add(child.getId());
}
}
} else {
// We will revisit this later.
visited.remove(currentNode.getId());
}
}
return hashes;
}
generateNodeHash 方法执行逻辑如下图所示:
根据用户是否需给 StreamNode 设置了 transformationUID 会将不同的数据作为哈希对象:
generateUserSpecifiedHash,将用户设置的 transformationUID 作为源数据计算哈希:
private byte[] generateUserSpecifiedHash(StreamNode node, Hasher hasher) { hasher.putString(node.getTransformationUID(), Charset.forName("UTF-8")); return hasher.hash().asBytes(); }
generateDeterministicHash,根据作业的拓扑结构计算 OperatorID:
private byte[] generateDeterministicHash( StreamNode node, Hasher hasher, Map<Integer, byte[]> hashes, boolean isChainingEnabled, StreamGraph streamGraph) { // Include stream node to hash. We use the current size of the computed // hashes as the ID. We cannot use the node's ID, because it is // assigned from a static counter. This will result in two identical // programs having different hashes. generateNodeLocalHash(hasher, hashes.size()); // Include chained nodes to hash for (StreamEdge outEdge : node.getOutEdges()) { if (isChainable(outEdge, isChainingEnabled, streamGraph)) { // Use the hash size again, because the nodes are chained to // this node. This does not add a hash for the chained nodes. generateNodeLocalHash(hasher, hashes.size()); } } byte[] hash = hasher.hash().asBytes(); // Make sure that all input nodes have their hash set before entering // this loop (calling this method). for (StreamEdge inEdge : node.getInEdges()) { byte[] otherHash = hashes.get(inEdge.getSourceId()); // Sanity check if (otherHash == null) { throw new IllegalStateException( "Missing hash for input node " + streamGraph.getSourceVertex(inEdge) + ". Cannot generate hash for " + node + "."); } for (int j = 0; j < hash.length; j++) { hash[j] = (byte) (hash[j] * 37 ^ otherHash[j]); } } // ... debug log return hash; }
- 在哈希源数据 buffer 中放入 hashes.size();
- 检查该算子 chain 的下游算子数量,每有一个就往 buffer 中放一次 hashes.size();
- 对 buffer 计算哈希得到该算子的哈希值;
- 找出该算子的上游算子的 hashes,与该算子的哈希值做位操作。
Chain 策略
判断两个算子能 chain 在一起的条件如下:
- 用户全局允许启用 chain:默认处于开启状态,可通过 StreamExecutionEnvironment#disableOperatorChaining 禁用;
- 下游算子只有当前算子一个上游;
- 两个算子同属一个 slotSharingGroup;
- 算子的 chain 策略不能是 NEVER:默认是 ALWAYS,一些 transformations 可通过 setChainingStrategy 修改;
- 算子之间的数据转发使用 ForwardPartitioner;
- ShuffleMode 不能是 BATCH;
- 上下游算子并发度一致;
- 该 StreamGraph 的 chaining 配置为 true:默认为 true,用户可通过 StreamGraph::setChaining 修改。
总结
OperatorID 的生成逻辑可以简要概括如下:
- 如果用户在创建 DataStream 时设置了 userHash,则使用该 userHash 作为 OperatorID;
- 如果用户在创建 DataStream 时设置了 transformationUID,则将 transformationUID 进行一次哈希计算的结果作为 OperatorID;
- 默认情况下,根据当前算子的位置以及和下游算子 chain 的情况计算哈希值,并和上游算子的哈希值做位操作后获得 OperatorID。
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。