Introduction to DataWorks function practice series, to help you analyze the pain points in the process of business realization, and improve the efficiency of business functions!

Past Issue 01-Data Synchronization Solution

functionpractice2.png

Feature recommendation: exclusive data integration resource group

As introduced in the data synchronization solution in the previous issue, when the batch data synchronization task of data integration runs, certain computing resources are required. These resources are resource groups. Usually, data is extracted from the machine where the data source is located to the machine where the resource group is located, and then pushed To the machine where the target data source is located.

functionpractice2-1.png

When performing data synchronization, you can plan which data integration resource group to use according to the actual situation. The key points of resource group planning include connectivity, and performance, .

The following introduces you the details of the data integration resource group from the aspects of the type and performance of the data integration resource group and network connectivity:

Part1: Type and performance comparison of data integration resource group

DataWorks data integration supports a variety of different resource groups:

  • Exclusive data integration resource group:
    A resource group that can be used exclusively after purchase. When tasks are executed concurrently and cannot run staggered, and you need an exclusive resource group to ensure fast and stable data transmission, you can choose an exclusive resource group.
  • Custom data integration resource group:
    If you have spare server resources, you can also use this part of the resources as a resource group for task running in DataWorks. DataWorks supports custom resource groups.

The comparison of the application capabilities of the above two data integration resource groups is shown in the following table:

</span></li><li><span>All data sources that need to be synchronized are in IDC. </span></li></ul>
<span> Type </ span> <span> exclusive resource group </ span> <span> custom resource groups </ span>
<span >Ownership of machine resources</span> <span>Maintained by DataWorks, it is a computing resource exclusively used by its own tenants. </span> <span>Maintained by you, it is your IDC machine. </span>
<span>Network</span> <span> Support Alibaba Cloud products under VPC, public network and any network. </span> <span>Support Alibaba Cloud products under VPC, public network and any network. </span>
<span>Charging method</span> <span>According to the specifications of the machine, monthly and monthly billing will be charged. </span> <span>DataWorks version is charged for monthly usage. </ span>
<span> data sources supported </ span> <span> all sources </ span> <span> all sources </ span>
<span> Security</span> <span>High</span> <span>According to the environment in which your own machine is located</span>
The efficiency of tasks performed by <span class="lake-fontsize-9">refers to whether the task can be allocated enough computing resources and whether it can run at the highest performance. </ span> <span> High </ span> <span> decision based on your own machine environment in which the </ span>
<span> reliability </ span> <span class ="lake-fontsize-9">refers to whether the task can be started on time. When the task is executed, whether the network resources are occupied by other tenants, causing the task to fail to produce results on time. </ span> <span> High </ span> <span> decision based on your own machine environment in which the </ span>
<span> application scenarios </ span> <span >A large number of important production-level tasks. </span> <span>Scenarios for using custom resource groups are as follows:</span><ul><li><span>If you have your own computing resources, you can
<span>Recommendation Index</span> <span>★★★★★</span>
It is highly recommended that you use exclusive data integration resource group to run data integration tasks . exclusive data integration resource group, you need to complete the network configuration and workspace binding, and then you can select the network connection plan with the data source for connection configuration. add and use the exclusive data integration resource group details on the purchase and basic configuration of the exclusive data integration resource group. ### Part2: Network connection plan of data integration resource group When performing data synchronization, it is necessary to realize the network connection between the corresponding type of resource group and the database through the corresponding network solution according to the network environment where the database is located. An overview of the connectivity plan is as follows. functionpractice2-2.png The following is an introduction for you to focus on the details of the network connection plan of the exclusive data integration resource group. The network connection plan of other resource group types can be help center . #### Scenario 1: The data source has the ability to access the public network If the data source has public network access capabilities, then the data source and the resource group can directly communicate with each other through the public network. functionpractice2-3.jpeg #### Scenario 2: The data source is in the VPC network, and the VPC and DataWorks are in the same region If the data source is in the VPC network, and the VPC and DataWorks are in the same region, you can bind the exclusive data integration resource group to the VPC where the data source is located. At the same time, you need to pay attention to whether the resource group and DataWorks are in the same availability zone. If they are not in the same availability zone, you need to manually add a route to ensure that the network between the resource group and the data source is connected. For details on add a route, . functionpractice2-4.jpeg #### Scenario 3: The data source is in the VPC network, and the VPC and DataWorks are in different regions If the data source is in the VPC network, and the VPC and DataWorks are in different regions, then you need to bind a VPC to the exclusive data integration resource group, and then connect the resource group through express channel, VPN or other network connection products. The VPC and the VPC where the data source is located. Common network connectivity products include: * For examples of cloud enterprise network usage scenarios, see cloud enterprise network . * For examples of high-speed channel usage scenarios, please see high-speed channel . * For examples of VPN gateway usage scenarios, see VPN gateway . In addition, you still need to manually add a route to ensure network connectivity. For details on add a route, . functionpractice2-5.jpeg #### Scenario 4: The data source is in IDC If the data source is in the IDC, similar to scenario 3: The data source is in a VPC and is similar to the scenario of DataWorks in a different region, you need to bind a VPC to the exclusive data integration resource group, and then connect the product through a high-speed channel, VPN or other network Connect the VPC bound to the resource group and the VPC where the data source is located. Common network connectivity products include: * For examples of cloud enterprise network usage scenarios, see cloud enterprise network . * For examples of high-speed channel usage scenarios, please see high-speed channel . * For examples of VPN gateway usage scenarios, see VPN gateway . In addition, you still need to manually add a route to ensure network connectivity. For details on add a route, . functionpractice2-6.jpeg #### Scenario 5: The data source is in the classic network If the data source is in the classic network, in this scenario, the data source is not supported to connect to the DataWorks resource group network. It is recommended that you migrate the data source to the VPC network. PS: Alibaba Cloud Classic Network is no longer recommended. It is recommended that you migrate data sources to VPC. ### Part3: Matters needing attention-the impact of whitelisting After ensuring the network connection between the resource group and the data source, you also need to ensure that the resource group and the data source will not be unable to access data due to the restrictions of the whitelist. For example, some data sources will not be allowed to access the whitelist after the whitelist is set. For external IP access, you need to add the IP of the resource group to the whitelist of the data source. When using different types of data integration resource groups, the IP addresses that need to be added to the data source whitelist are inconsistent. For details, please enter the help center view. The following is an example for you. When using the exclusive data integration resource group, you need to obtain and add To the IP address in the whitelist of the data source. * Switch network segment: functionpractice2-7.png * The EIP address of the exclusive resource group: functionpractice2-8.png ## Scenario practice After understanding the exclusive data integration resource group, you can refer to the following documents for practical operation. * adds and uses exclusive data integration resource group * synchronizes data to MaxCompute > Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。