Introduction to Cloud senior technical expert Chen Changcheng released a one-stop data management platform DMS and solutions at the "Data Convergence Cloud · Smart Control of the Future"-Alibaba Cloud Database Innovation Cloud Summit. Topics include some current pain points of enterprise data management, DMS one-stop data management platform and its core technology, real-time data warehouse solutions, and corresponding application practices.

Cloud, Smart Future" 1616fa25a98637 —— Cloud Database Innovation and the 3rd Database Performance Challenge Final Award Ceremony has been successfully concluded. For more dry content, you are welcome to watch the live replay of the summit.

Summit live playback 📎 https://developer.aliyun.com/live/247301

Dry goods PPT download📎 https://developer.aliyun.com/topic/download?id=7986

At the meeting, Alibaba Cloud senior technical expert Chen Changcheng (Tianyu) delivered a keynote speech entitled "One-stop Data Management DMS and the Latest Solution Release". The topics included some current pain points of enterprise data management and DMS one-stop data management platform And its core technology, real-time data warehouse solutions and corresponding application practices.

3、陈长城-阿里云资深技术专家、数据库产品事业部生态工具部负责人.JPG

image001.png

The proportion of China's digital economy continues to increase. In the process of business management, the industry's head concentration effect has made refined operations a very important topic, and the digital value mining of enterprises has become more and more important.

image003.png

Looking back at the entire life cycle of data production within an enterprise, including data production and storage, data processing and analysis, and data application, in fact, there is rarely a platform that integrates all these three aspects into a unified support platform. With the development of each business, most companies will produce data storage production systems defined according to business characteristics, and most of the company’s data warehouse analysis is also built independently. In this process, how to realize the connection and value mining between data systems? It becomes a more difficult problem. Therefore, in each report, we see that in 2022, the proportion of real-time data used by new businesses will reach more than 50%.

image005.png

In the process of real practice, enterprises will encounter data islands and data management problems. There are many types of databases, and the processing of data links is very complicated, maintenance costs are very high, and stability issues are very challenging. How to manage a variety of heterogeneous data in a unified manner and how to manage security have become very challenging issues. In this context, Alibaba Cloud Database puts forward the concept of a one-stop data management platform DMS.

image007.png

DMS uniformly manages the data assets of the enterprise, including database development and design, data integration and processing, data development, data analysis, and data application. The entire process is fully integrated. From the architecture diagram, we can see that the bottom layer is connected to various heterogeneous data sources, and similar data blood ties, data governance, data orchestration, and task scheduling are deposited in the middle. These will become very important data support capabilities among us. At the upper level, we will productize application scenarios, such as data security management, disaster recovery/multi-active capabilities, data archiving, real-time data warehouse construction, and other capabilities, so that more companies can use data solutions with low thresholds. plan.

image009.png

The overall technology structure is divided into three layers. The underlying basic services provide a data security system, a data asset management system, and a development and operation system. The middle support engine is mainly divided into two parts: the control plane and the data plane. The control plane includes the task execution engine and the engine construction related to stable changes; the data plane includes the migration of the data structure, full/batch data synchronization and real-time streaming data synchronization , Data conversion, and multi-source heterogeneous federated query capabilities. The upper layer is business functions, mainly for DevOps such as data security and databases, including application scenarios related to data integration and data development.

image011.png

DMS contains several important core technical features, including data assets and security, database DevOps capabilities, and data integration and development.

image013.png

In terms of the entire data assets and security, in fact, the core construction is the entire global data asset management, so that enterprise data can quickly find the required data for the governance of data assets without the need for physical centralization, while allowing the management of the data itself to cover the entire Life cycle safety.

image015.png

To expand on two points, one is the construction of the knowledge graph of the data. We will collect all the business data and its real physical metadata so that the business can be marked, use schema matching related technology to learn the relationship between the fields of the data, and map the logical definition and physical definition of the business . At the same time, in the process of using the DMS development platform, the business will precipitate some associations related to personnel, data, and permissions, as well as data tags in business-related fields. These things will build a knowledge map of the entire data asset association relationship. This knowledge map can be used in multiple applications. For various data types with heterogeneous sources, how to make a wide table of data warehouses according to the requirements of the business? Then, in the process of constructing the relationship between the data, the data engineer of the enterprise does not need to be very familiar with all the data models. Familiar, because DMS can precipitate these capabilities into the system in advance, select and filter, and get a wide table of this data warehouse, and the ability to use this knowledge map can make the company's data governance and data security management more Controllable.

image017.png

Regarding the identification of sensitive data, after the unified management of all the data within the enterprise, the platform can automatically classify the data for you. On the basis of the classification, it can automatically identify the sensitive data of the five laws including GDPR. It is found that companies can use our more than 15 kinds of desensitization algorithms in the application production process. We also provide the capability of a security agent, so that companies can dynamically query and desensitize data without having to have a database account.

image019.png

The core capability of the second part is DevOps, where security and the entire development platform are integrated. Our entire platform is actually a bit like workbench for developers. There are a lot of data sources at the bottom and a rich set of developer development tools are provided on the top. Therefore, the DMS platform has more than 100,000 weekly active users on the cloud. It will help users to do database table structure design, data changes, and related releases. We provide a security rule engine, which will be embedded in the entire operation process of enterprise database development. Developers will get the greatest convenience in a controlled permission system. Security and efficiency are well balanced. This is the entire design. The core idea.

image021.png

The essence of the security rule engine is to operate and operate the specific objects of the enterprise's structure design, data change, data export, etc., such as the corresponding database type (each database type may have different best practices), and the corresponding work Single personnel and so on are strung together to form the authority mapping related to the operator, the operation action, and the operation object. There are more than two hundred R&D specification templates accumulated in Ali, which can be used by default, or the enterprise can define its own DSL according to the needs, which can easily define the security rule capabilities.

image023.png

In the change part, change security capabilities are also realized. Change security can be understood as the security capabilities in the enterprise change release process, including SQL security audits, and formal SQL execution processes. For table structure or large-scale data operations, change In many small batches, automatic SQL rewriting is used to prevent the stability of the source database from jittering, including table structure changes, and the problem of locking tables becomes non-locking table changes, and some fine-grained changes are safely controlled.

image025.png

The next step is actually to play the value of data. We focus on building data transmission links like streaming batch integration, including low-code development platforms, and building the entire data integration and development capabilities through the support of multi-engine computing capabilities.

image027.png

The data transmission under the entire DMS will be based on the data transmission service built by Alibaba Cloud. The transmission service DTS is the earliest data transmission product released by mainstream cloud vendors. It realizes the real-time transmission of multi-source heterogeneous data, in real-time and stability The above has been well tempered.

image029.png

In the entire link of structure migration and increment, complete real-time data transmission is realized. At the same time, semi-structured or unstructured data will also be identified through semantics, and metadata will be automatically constructed, including data type customization. Build quick storage and warehousing of data, and turn these data into a data asset that can be analyzed and used.

image031.png

The most important thing about the entire stream-batch integrated data architecture is that the Recored Store memory data processing module is used in the entire system construction. Streaming and batch processing are consistent, and the entire data processing process becomes very simple.

image033.png

On the data developer's interface, we provide a drag-and-drop method to define the data processing process, the data source and SQL operation nodes, data transmission nodes, and data conversion can all be defined by dragging and dropping. Enterprise application engineers and database developers can do this kind of data processing definition.

image035.png

Alibaba Cloud's real-time data warehouse construction solution uses a technical architecture that integrates the warehouse and warehouse, that is, the database and data warehouse are integrated and unified management technology architecture. Compared with the previous data link, many online data will be pulled to an offline storage for calculation, and then the calculation results will be returned to the online production system. This process is very long, and the data link and storage costs will be correspondingly higher. Our real-time data warehouse construction solution is that you do not need to initialize the table structure on the target side when you do the full data initialization. We will help you automatically build the table structure on the target during the batch data process. In the process of doing incremental data, any changes in the table structure at the source end or changes in the source end’s active/standby switching will not affect the stability of the entire link. The table structure will be synchronized at the target end, which will affect the entire link. Be transparent automatically.

Next, learn about the DMS real-time data warehouse construction solution through a two-minute video. How to improve productivity through data has become the direction of continuous exploration of enterprises, and data warehouses play a key role in it. Traditional data warehouses are generally based on T+1 data integration to build offline data warehouses to support various analysis and services of the enterprise. This solution will not only affect the stability of online business, but also difficult to support the real-time needs of high-frequency changes in enterprises. Build real-time data warehouse. So how to build an enterprise and real-time data warehouse? Next, I will introduce how to use Alibaba Cloud's one-stop data management platform DMS and the cloud-native real-time data warehouse VB engine to build a real-time data warehouse with a delay of additions, deletions and changes to the online system within one second. DMS supports two real-time data warehouses. Construction plan, real-time data storage and T+1 periodic snapshot based on real-time zipper table.

Among them, real-time data warehousing supports two methods. Method 1: Real-time synchronization of historical full data + incremental data to ADB real-time data warehouse through DMS. The second method is to process real-time data through the DMS data transmission and processing module and then write it into the ADB real-time data warehouse. In order to meet the business requirements for T+1 snapshot data, DMS has introduced a T+1 periodic snapshot solution that does not affect online business. The following describes how to use the program.

Through DMS and work order mode, you can quickly build periodic snapshots based on real-time data, which can support hourly/day snapshot analysis, and can also support retrospective analysis at any business point in time, so as to support the business side to count the total deposits at different times. Scenario requirements such as total balance and total order amount.

Compared with other solutions, the Alibaba Cloud real-time data warehouse construction solution provides the following advantages: 1. The data timeliness is high, and the real-time link has a small impact on the business side, and will not affect the normal operation of the business side due to batch data pull. 2. Realize the one-stop data management integrated with the warehouse, the source-end operation and maintenance changes are not aware of the link, and the timeliness, stability and full-link blood relationship of multiple data aggregation are guaranteed. 3. Built-in complex real-time data processing, calculation logic, and short processing link. 4. Low-code operations can greatly reduce the difficulty of real-time data warehouse construction, improve construction efficiency, and support various real-time scenarios in the process of enterprise digital transformation.

Two practices are introduced below, the first case: an automobile manufacturer uses the DMS+ADB solution to build a data mart and marketing platform.

image037.png

The second case: a bank uses DMS+ADB to build a T+1 data warehouse solution.

image039.png

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

阿里云开发者
3.2k 声望6.3k 粉丝

阿里巴巴官方技术号,关于阿里巴巴经济体的技术创新、实战经验、技术人的成长心得均呈现于此。


引用和评论

0 条评论