Introduction to DataWorks function practice series, to help you analyze the pain points in the process of business realization and improve the efficiency of business functions!

1.png

Past review:

Through previous introductions, you have learned the most critical knowledge points for task operation on DataWorks. Next, I will introduce you step by step how to implement the most common data development scenarios through DataWorks task nodes. This issue introduces you how to implement parameter transparent transmission on DataWorks, that is, transparently transmit the parameters of upstream tasks to downstream tasks.

Function recommendation: assignment node and parameter node

In Alibaba Cloud DataWorks, a data development task is finally disassembled into multiple node tasks, and a complete data development business process is formed by setting the upstream and downstream relationships between nodes, as shown in the following figure.

2.png

The above is a simple example. In practical applications, we usually encounter such a situation. The disassembled upstream node task will generate some parameters, or the running result of the upstream node, which needs to be applied to the downstream node task. Data development tasks are needed to realize the transparent transmission of parameters/node operation results between each node. According to different transparent transmission requirements, DataWorks provides two special nodes that can be realized: assignment node and parameter node .

Part1: Assignment node-realize transparent transmission of task results

When you need to provide the results of tasks from upstream nodes to downstream nodes using , you can use assignment nodes to transfer task results between nodes. The assignment node supports three assignment languages, ODPS SQL, SHELL and Python, and automatically adds assignment parameters (outputs parameters) for you according to the assignment rules for easy reference by other nodes.

When using the assignment node to transparently transmit parameters, the following three points need to be paid attention to.

1.1 The dependency between the assignment node and the upstream and downstream nodes

3.png

As shown in the figure above, when using the assignment node to pass parameters:

  • The assignment node (fuzhi\_python, fuzhi\_sql, fuzhi\_shell) needs to be the upstream node of the reference assignment node parameter node (down\_compare), and the downstream node needs to set a direct dependency with the assignment node (the assignment node is one of the downstream nodes). Layer parent node) .
  • When the assignment node is used as an upstream node in conjunction with other nodes, must first submit the assignment node so that the downstream node can parse out the parameters during configuration.

1.2 The context parameter transparent transmission relationship between the assignment node and the downstream node

context parameter configuration in the scheduling configuration of the assignment node and the reference node, the parameter transparent transmission reference relationship is formed:

4.png

  • Assignment nodes (fuzhi\_python, fuzhi\_sql, fuzhi\_shell) need to add the parameters to be assigned downstream as node context , the output parameter this node.
  • Assigned parameter value assigned to parameter reference downstream nodes need to be added as reference node context in node input parameters .

illustrate:

  • Assignment node parameter transfer only supports transfer to one level of child nodes, and does not support cross-node transfer .

1.3 Assignment language and assignment results

When referencing the result of the assignment node, the parameter output format of the assignment node is related to the way the downstream node references the parameter. The assignment of the assignment parameter (outputs parameter) in different languages is explained as follows.

assignment language the Outputs parameter values the Outputs parameter format the Outputs parameter size limit
ODPs SQL last line of the SELECT statement output parameters of an assignment, add It is the output parameter of this node of the assignment node for reference by other nodes. passes the output result downstream as a two-dimensional array. transfer a maximum value of 2 MB. If the output result of the assignment statement exceeds this limit, the assignment node will fail.
SHELL The data of the last line of the ECHO statement is added as the output parameter of this node of the assignment node for reference by other nodes. divides the output result into a one-dimensional array based on commas (,).
Python added as the output parameter of this node of the assignment node for other nodes to reference. divides the output result into a one-dimensional array based on commas (,).
For more information about the assignment node, please go to the help center to view the document Configure the assignment node . ## Part2: Parameter node-realize parameter transparent transmission The parameter node is a special virtual node, which is used to manage the parameters in the business process and realize the parameters to be passed in the task node. It supports constant parameters, variable parameters and transparent transmission of the parameters of the upstream node. The nodes that need to reference the parameters directly depend on the parameter node That's it. A parameter node is essentially a virtual node that does not run data calculation tasks to generate data. It is mainly used in scenarios where parameters are transferred across nodes and parameter management. ### 2.1 Cross-node parameter transfer When the task of a certain downstream node needs to obtain the output parameters of multiple, multi-level upstream nodes in the business process of data development, you can use the parameter node to add all the parameters that the downstream node needs to obtain to the parameter node. Downstream nodes can be directly hung under the parameter node, and all required parameters can be obtained. 5.png Take the above figure as an example, the sql\_7 node needs to obtain the output parameters of the sql\_1, sql\_3, and sql\_4 nodes. At this time, you can add a parameter node as the sql\_1, sql\_3, sql\_4 Downstream node, and add all the required parameters of sql\_7 to the parameter node, hang sql\_7 downstream of this parameter node, then sql\_7 can directly obtain all the required parameters through the parameter node. ### 2.2 Parameter Management In the business process of data development, when the tasks of downstream nodes need to use certain constant parameters and variable parameters, you can use parameter nodes to add all the parameters needed by the downstream nodes to the parameter nodes, and the downstream nodes of the parameters need to be used directly Hanging under the parameter node, you can get the required parameters for use, which is convenient for the unified management of all used parameters in the entire business process. 6.png Take the above figure as an example, the sql\_3, sql\_4, sql\_5, and sql\_7 nodes all need to use parameters. At this time, you can add a parameter node and add the parameters used by each downstream node to the parameter node. Hang the nodes that need to use parameters downstream of this parameter node. For more parameter node introduction, please go to the help center to view the document create a parameter node . ## Part3: Comparison of assignment node and parameter node
comparison items assigned node parameter node
passthrough scene node operation results passthrough node parameters passthrough
pass through limit only transparently transmitted to child nodes , does not support cross-node transparent transmission can across nodes pass-through
node properties one kind of task node, the node run assigned tasks , Supports <span>ODPS SQL, SHELL and Python three assignment languages</span> is essentially a virtual node, and will not run tasks generate data
> Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。


引用和评论

0 条评论