Introduction to DataWorks function practice series, to help you analyze the pain points in the process of business realization and improve the efficiency of business functions!
Past review:
- Issue 01-Data Synchronization Solution : Introduces you to optional data synchronization solutions in different scenarios.
- Issue 02-Exclusive Data Integration Resource Group : Introduces you to the resource group and network connection schemes and precautions that can be used during data synchronization.
- Issue 03-Production and Development Environment Isolation : Introduce you to DataWorks through the standard mode to provide development environment and production environment isolation and different environment permission requirements.
Through previous introductions, you have learned the most critical knowledge points for task operation on DataWorks. Next, I will introduce you step by step how to implement the most common data development scenarios through DataWorks task nodes. This issue introduces you how to implement parameter transparent transmission on DataWorks, that is, transparently transmit the parameters of upstream tasks to downstream tasks.
Function recommendation: assignment node and parameter node
In Alibaba Cloud DataWorks, a data development task is finally disassembled into multiple node tasks, and a complete data development business process is formed by setting the upstream and downstream relationships between nodes, as shown in the following figure.
The above is a simple example. In practical applications, we usually encounter such a situation. The disassembled upstream node task will generate some parameters, or the running result of the upstream node, which needs to be applied to the downstream node task. Data development tasks are needed to realize the transparent transmission of parameters/node operation results between each node. According to different transparent transmission requirements, DataWorks provides two special nodes that can be realized: assignment node and parameter node .
Part1: Assignment node-realize transparent transmission of task results
When you need to provide the results of tasks from upstream nodes to downstream nodes using , you can use assignment nodes to transfer task results between nodes. The assignment node supports three assignment languages, ODPS SQL, SHELL and Python, and automatically adds assignment parameters (outputs parameters) for you according to the assignment rules for easy reference by other nodes.
When using the assignment node to transparently transmit parameters, the following three points need to be paid attention to.
1.1 The dependency between the assignment node and the upstream and downstream nodes
As shown in the figure above, when using the assignment node to pass parameters:
- The assignment node (fuzhi\_python, fuzhi\_sql, fuzhi\_shell) needs to be the upstream node of the reference assignment node parameter node (down\_compare), and the downstream node needs to set a direct dependency with the assignment node (the assignment node is one of the downstream nodes). Layer parent node) .
- When the assignment node is used as an upstream node in conjunction with other nodes, must first submit the assignment node so that the downstream node can parse out the parameters during configuration.
1.2 The context parameter transparent transmission relationship between the assignment node and the downstream node
context parameter configuration in the scheduling configuration of the assignment node and the reference node, the parameter transparent transmission reference relationship is formed:
- Assignment nodes (fuzhi\_python, fuzhi\_sql, fuzhi\_shell) need to add the parameters to be assigned downstream as node context , the output parameter this node.
- Assigned parameter value assigned to parameter reference downstream nodes need to be added as reference node context in node input parameters .
illustrate:
- Assignment node parameter transfer only supports transfer to one level of child nodes, and does not support cross-node transfer .
1.3 Assignment language and assignment results
When referencing the result of the assignment node, the parameter output format of the assignment node is related to the way the downstream node references the parameter. The assignment of the assignment parameter (outputs parameter) in different languages is explained as follows.
assignment language | the Outputs parameter values | the Outputs parameter format | the Outputs parameter size limit |
ODPs SQL | last line of the SELECT statement output parameters of an assignment, add It is the output parameter of this node of the assignment node for reference by other nodes. | passes the output result downstream as a two-dimensional array. | transfer a maximum value of 2 MB. If the output result of the assignment statement exceeds this limit, the assignment node will fail. |
SHELL | The data of the last line of the ECHO statement is added as the output parameter of this node of the assignment node for reference by other nodes. | divides the output result into a one-dimensional array based on commas (,). | |
Python | added as the output parameter of this node of the assignment node for other nodes to reference. | divides the output result into a one-dimensional array based on commas (,). |
comparison items | assigned node | parameter node |
passthrough scene | node operation results passthrough | node parameters passthrough |
pass through limit | only transparently transmitted to child nodes , does not support cross-node transparent transmission | can across nodes pass-through |
node properties | one kind of task node, the node run assigned tasks , Supports <span>ODPS SQL, SHELL and Python three assignment languages</span> | is essentially a virtual node, and will not run tasks generate data |
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。