Introduction to DataWorks function practice series, to help you analyze the pain points in the process of business realization and improve the efficiency of business functions!
Past review:
- quick overview of DataWorks function practice Issue 01-Data Synchronization Solution : Introduces you to optional data synchronization solutions in different scenarios.
- Issue 02-Exclusive Data Integration Resource Group : Introduces you to the resource group and network connection schemes and precautions that can be used during data synchronization.
Through the introduction of the first two issues, you can understand the main knowledge points of using DataWorks for data synchronization: data synchronization scheme and resource group. In the actual application process, we often need to isolate the development and production environment, and the development environment is used for Data synchronization test, the production environment is used for the synchronization processing of production data. This issue will introduce you to the main knowledge points of DataWorks to realize the isolation of development and production environments.
Feature recommendation: Standard mode-isolation of development environment and production environment
In order to facilitate user production data with different security control requirements, DataWorks provides you with two workspace modes: simple mode and standard mode simple mode cannot set development environment and production environment, while standard mode provides at the same time The development environment and the production environment are separated, and you can perform data task processing in the development environment and the production environment respectively.
Part1: DataWorks workspace in simple mode and standard mode
First, I will introduce you the main difference between the two modes of workspace.
<span>simple mode</span> | <span>standard mode</span> |
16135d1936032e 1613519d1936032 (Or an EMR cluster, Hologres database, etc.), the environment is regarded as a production (PROD) environment. </span><span class="lake-card-margin-top lake-card-margin-bottom"><img src="https://ucc.alicdn.com/pic/developer-ecology/5905f40277b34b25b42339b43191595c.png" class="image lake-drag-image" alt="Simple working | <span>Under the standard mode working space, a DataWorks space corresponds to the lower layer Two MaxCompute projects (or two EMR clusters, Hologres databases, etc.), one is regarded as a development (DEV) environment and the other is regarded as a production (PROD) environment. </span><span class="lake-card-margin-top lake-card-margin-bottom"><img src="https://ucc.alicdn.com/pic/developer-ecology/06025e5dcc2a4570bcea8a5c681a7e4a.png" class="image lake-drag-image" alt="standard working |
<span> Calculation Engine Type </ span> | <span> environment </ span> | standard mode workspace | simple mode workspace <span> (Development Environment | production environment) </span>
MaxCompute | <span>Development environment</span> | <span>The current login task (not optional): the person who performs the task by default</span> > | <span>Page running task (not optional): The default is the person who performs the task (currently logged in)</span><span>Scheduled access identity (optional):</span><ul><li>< span>Alibaba Cloud main account</span></li><li><span>Alibaba Cloud RAM role</span></li></ul><ul><li><span>Task leader: task Owner account identity</span></li></ul> |
<span>Production environment</span> | <span>Scheduled access identity (optional):</span><ul>< li><span>Alibaba Cloud main account</span></li><li><span>Alibaba Cloud RAM user</span></li></ul><ul><li><span>Alibaba Cloud RAM role</span></li></ul> | ||
<span>E-MapReduce</span> | <span>Development environment</span> | <ul><li><Hadoopspan> Access identity users in shortcut mode: unified use of users in the cluster. </span></li><li><span>Access identity in safe mode: task performer</span></li></ul> | <ul><li><span>Access in shortcut mode Identity: Unified use of Hadoop users in the cluster. </span></li><li><span>Access identity in safe mode (optional):</span></li></ul><ul><li><ul><li><span >Task owner</span></li><li><span>Alibaba Cloud main account</span></li></ul></li></ul><ul><li><ul ><li><span>Alibaba Cloud RAM users</span></li></ul></li></ul> |
<span>Production environment</span> | <ul>< li><span> Access identity in shortcut mode: uniform use of Hadoop users in the cluster. </span></li><li><span>Access identity in safe mode (optional):</span></li></ul><ul><li><ul><li><span >Task owner</span></li><li><span>Alibaba Cloud main account</span></li></ul></li></ul><ul><li><ul ><li><span>Alibaba Cloud RAM user</span></li></ul></li></ul> | ||
<span>Hologres</span> | <span>Development environment </span> | <span>Page running task (not optional): The default is the person who performs the task (the currently logged-in person). </span> | <span>Page running task (not optional): The default is the person who performs the task (currently logged in)</span><span>Scheduled access identity (optional):</span><ul>< li><span>Alibaba Cloud main account</span></li><li><span> |
<span>Production environment</span> span> | <span>Scheduled access identity (optional):</span><ul><li><span>Alibaba Cloud main account</span></li><li><span>Alibaba Cloud RAM user </span></li></ul> |
<span> breakdown characteristics </ span> | <span> Simple Mode </ span> | <span> standard mode </ span> |
<span> Permissions Overview</span> | <span>In the simple mode space, the "development" role of DataWorks is mapped with the "Role_Project_Dev" role of the bound MaxCompute project, so DataWorks</span> <span>development role Naturally, it can read all the data in the MaxCompute project</span> <span>. </span> | <span>In the standard mode space, the "development" role of DataWorks is mapped with the "Role_Project_Dev" role of the bound MaxCompute project (dev environment), so:</span><ul> <li><span>The DataWorks development role can naturally read all data in the MaxCompute project (dev environment). </span></li><li><span>Because there is no role mapping with MaxCompute project (PROD environment), DataWorks</span> <span>Development role has no MaxCompute (PROD environment) data by default Permission</span> <span>. </span></li></ul> |
<span>Advantages</span> | <span>Simple, convenient and easy to use</span> <span>. </span><span>All data warehouse development work can be completed only by authorizing the data developer "DataWorks development role". </span> | <span>Safety, standard</span> <span>. </span><ul><li><span>It has a safe and standardized code release control process (including code review, code DIFF viewing, etc.) to ensure the stability of the production environment and avoid unnecessary dirty caused by code logic Unexpected situations such as data spread or task reporting errors. </span></li><li><span>Data access is effectively controlled and data security is guaranteed. </span></li></ul> |
<span>Disadvantages</span> | <span>exist</span> <span> unstable and unsafe</span> <span > The risk. </span><ul><li><span>Development roles can add and modify codes at any time without any approval, and submit them to the scheduling system, which will bring instability to the production environment. </span></li><li><span>When facing the MaxCompute computing engine, the development role defaults to have read and write permissions for all tables in the current MaxCompute project, and can add, delete, and modify tables at will. Data security exists risk. </span></li></ul> | <span>The process is relatively complicated</span> <span> Generally, it is impossible to complete all data development and production processes by one person. </span> |
<span> environment type </ span> | <span> standard mode </ span> | <span> exemplary </ span> |
<span> Development Environment < /span> | <span>project name_dev.table name</span> | <span> create a development library table user_info under the projectA project, then the database table name is: projectA_dev.user_info. </span> |
<span>Production environment</span> | <span>Project name. Table name</span> | <span> Create a production table in the project library Auser The database table name is: projectA.user_info. </span> |
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。