After 3 years of polishing, what new features have been added to the data construction and management platform Dataphin?

Introduction to has undergone 3 years of iterative polishing since the launch of Dataphin products. A new version will be released on May 15, 2021, which mainly involves data source type expansion, data integration, real-time research and development, data service function upgrades, and operation and maintenance experience optimization And so on, will satisfy more user scenarios and enhance the R&D experience.

-For more information about digital intelligence transformation and data center content, please join Cloud Data Center Exchange Group-Digital Intelligence Club and follow the official WeChat official account (scan the QR code at the end of the article or click here to join )

Cloud Data Center official website 160d2d4f1c2c92 https://dp.alibaba.com/index

Dataphin is an intelligent data construction and management platform
Aims to provide a full link from data access to data consumption, one-stop big data capabilities, including products, technologies, and methodology, etc., to help companies build standards that are unified, integrated, capitalized, service-oriented, and closed-loop self-optimizing Intelligent data system.

Overview of new features of the version

Since the launch of Dataphin products, it has undergone 3 years of iterative polishing. The 2.9.4.2 version was released on May 15, 2021. It mainly involves the expansion of data source types, data integration, real-time research and development, data service function upgrades, and operation and maintenance experience optimization. And so on, will satisfy more user scenarios and enhance the R&D experience:

Summary of feature upgrade highlights

1. New data source

Added support for Hologres data source type for subsequent Hologres data synchronization integration, and subsequent real-time Blink task read and write use.

For users who use the Hologres data source, Dataphin supports the integration of Hologres data into the computing engine and multiple data sources, and at the same time, it can flow back from the computing engine and data sources to Hologres; it supports Blink tasks to read and write Hologres data in real time, and stream in Dataphin. In the batch-integrated scenario, Hologres can be used as a unified storage service for batch-stream integration, real-time query and real-time writing, no redundant data relocation work, greatly improving research and development efficiency.

functional scenario a. Data source introduction

Functional scenario b. Offline data integration

function scenario c. Flow batch integration

Define the Hologres meta table, specify the data source and source table

Define the virtual mirror table of the stream batch, and specify the physical table that the batch stream mapped by the mirror table is written separately. Since Hologres supports both offline and real-time read and write, the physical tables on both sides can be Hologres tables.

Create a Blink flow batch integrated task, specify the batch flow parameter configuration and task configuration, the code logic is consistent, a set of code realizes two scheduling timeliness

Dataphin flow batch integrated architecture diagram

2. Blink real-time tasks support code sampling and debugging

Blink real-time tasks support code sampling and debugging. The sampling data supports three methods: local upload file data, online sampling data, and online editing and input data. It is only compatible with blink-3.6 and above.

Before the upgrade, the Blink task does not support local debugging and cannot verify the correctness of the job development data in the draft state. It needs to be submitted to the test instance that starts the operation and maintenance after the operation and maintenance for verification. It cannot be verified before the task is submitted.

After the upgrade, you can perform local data sampling before submitting the Blink task, and output the task running result in the Console, you can instantly check whether the code logic is correct, whether the output result meets expectations, etc., and verify the correctness before submitting the task, which effectively guarantees the quality of the task and Development efficiency. And we support three sampling methods! Automatically sample data from the data source table, upload local csv files (samples can be downloaded), or directly enter data in the sampling panel, which is flexible and convenient.

3. Expansion of the display object of the operation and maintenance list

The display objects on each page of the operation and maintenance list are expanded to 100, and the default display is 20/page, which can be switched to 20/40/60/80/100/page; the operation and maintenance task DAG diagram supports automatic positioning when viewing the materialized code and operation log Corresponding to materialized nodes to improve operation and maintenance efficiency

4. The data service module supports new offline data sources

The data service module newly supports AnalyticDB for MySQL 3.0 offline data source.

Added the role of data service administrator as the approver of API permission applications, service unit permission applications and data source permission applications, strengthens the authority control strength of API data service export, and restricts the API developer's extensive application permissions on the API, which needs to be set Special personnel conduct permission approval, and API application security and permission control effectiveness have been greatly enhanced.

Version summary

In the V2.9.4.2 version released in May, Dataphin performs functional iterations around data sources, general R&D, real-time R&D, operation and maintenance, and data services.

In the next iteration, it is planned to support functions such as CDH 6 calculation engine, audit log, sub-account order purchase, OpenAPI expansion, monitoring and alarm capability upgrade, MySQL8.0 data source, etc., so stay tuned!

I want to know more: https://dp.alibaba.com/consult

Data center is the only way for enterprises to achieve digital intelligence. Alibaba believes that data center is a combination of methodology, tools, and organization, which is "fast", "quasi", "full", "unified", and "passed". Smart big data system.

Currently by Ali cloud external output range of solutions, including common data desk solution , retail sales data desk solution , financial data desk solution , Internet data desk solution , Subdivision scenarios such as and other subdivision scenarios for government data middle-office solutions.

Among them, the Alibaba Cloud Data Center product matrix is based on Dataphin and the Quick series is used as a business scenario cut-in, including:

-Dataphin, a one-stop, intelligent data construction and management platform ;
-Quick BI, intelligent decision-making anytime, anywhere ;
-Quick Audience, comprehensive insight, global marketing, intelligent growth ;
-Quick A+, a one-stop data operation platform cross-multi-terminal global application experience analysis and insight;
-Quick Stock, an intelligent goods operation platform ;
-Quick Decision, an intelligent decision platform ;

official site:

Data Center official website https://dp.alibaba.com

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

After 3 years of polishing, what new features have been added to the data construction and management platform Dataphin?

1. New data source

2. Blink real-time tasks support code sampling and debugging

3. Expansion of the display object of the operation and maintenance list

4. The data service module supports new offline data sources

Version summary

阿里云开发者

引用和评论

福利来了！计算巢支持在已经购买的 ECS 上搭建幻兽帕鲁服务器，支持图形化管理配置

Dolphinscheduler IDEA本地调试

【Hadoop】HDFS架构解析

【Hadoop】HBase系统解析及适用场景

基于 pyflink 的算法工作流设计和改造

DNS服务器地址大全

k8s集群部署（一主两从）