Introduction to This article is about 2021 Cloud Habitat Conference-Enterprise Cloud Native Database Best Practice Forum. Chen Changcheng, senior technical expert of Alibaba Cloud Database Division and head of ecological tool product department, talks about "one-stop online data management platform" "DMS technology interpretation" sharing.
This article will introduce the one-stop online data management platform DMS to readers from three parts. It is hoped that through the concept of one-stop data management, enterprises can build warehouses agilely, and quickly realize the value of data through low-threshold data development. Everyone is welcome to use and experience.
- Pain points of enterprise data management
- Cloud Native 2.0 One-Stop Data Management DMS
- Solutions and best practices
1. Pain points of enterprise data management
1) Digital transformation is the strategic focus of enterprise development
In the form of supply-side reforms proposed by the state, many industries continue to focus on the top of the enterprise in the development process. We have seen the recent economic report that the proportion of China’s digital economy’s GDP has increased year by year, and the company itself also has improved operating efficiency. Therefore, under the two-wheel drive of policy-oriented and corporate aspirations, digital transformation is also advancing rapidly.
2) The full life cycle of data in the business
In the entire business development process, the life cycle of data is a series of processes from production to storage, processing, analysis, and application. Many businesses within an enterprise will use different databases according to their own characteristics, resulting in many types of database usage, and data warehouses are mainly built independently, and there will be a variety of different data storage systems and data platforms in the internal systems of the enterprise. Today, there is a lack of a one-stop management platform that covers the data life cycle. At the same time, in order to enable unified management of these data, real-time data trends have become a major trend in the future. It is predicted that the proportion of real-time data for new businesses in 2025 will reach more than 50%.
3) Pain points encountered in the process of valuing enterprise data
There are data islands composed of various types of data within the enterprise, complex data processing links, difficulties in data governance and security management, all have become pain points for exerting the value of data.
2. Cloud native 2.0 one-stop data management DMS
1) Data Management Service DMS
How to carry out the unified security management of data and realize the value of data faster? In this context, we propose a one-stop data management platform. The one-stop data management platform DMS connects the enterprise data assets in a unified manner, connects all heterogeneous data sources through the bottom layer, and manages them in a unified manner. The database design, development, application, release, to the data warehouse construction and data services, build a unified platform covering the data life cycle. In this way, the life cycle of enterprise data management can all be connected in series. This is a very new concept that connects the entire cycle of enterprise online data processing and analysis in series.
DMS products have been deposited within Ali Group for more than 12 years. We have gradually covered the entire data life cycle from the underlying infrastructure of data management, data security, database DevOps, and data transmission.
2) One-stop data management DMS technical architecture
The technical architecture mainly has three layers:
- The underlying basic service is to build a unified data asset, development, operation and maintenance system, and security management system across the entire domain;
- In the middle is the support engine for the control plane and the data plane. The control plane is the support engine for data security and database DevOps scenarios, such as the work order execution engine, the security rule engine, and the stable change engine; the data plane includes full data transmission, incremental and ETL The operators of processing and conversion, including multi-source heterogeneous query processing of federated queries, are the engines of the data plane.
- At the top are business functions for each scenario, supporting data security, database DevOps, data integration and development, and supporting these scenarios to form a one-stop full-link data lifecycle management.
Ÿ Next, let's introduce the three core features of DMS.
3) DMS core technical characteristics
Data Management DMS-Data Assets and Security
Data assets are the unified management of global data, so that companies can quickly know what data is available, where the data is, and the data governance situation, so as to facilitate the use of data. Here are two technical points:
One technology is the construction of knowledge graphs, which associates multi-source heterogeneous physical metadata with related business logic. Through the metadata definition and semantic learning of the field association relationship, combined with the relationship between the work order system and the data during the use of our platform, the input for the construction of the data map is formed. After the data is collected, the relationship map of the global data asset is constructed. Let the data engineer carry out the low-threshold data warehouse, he can specify a few core business fields, the system combines the association relationship to automatically build the data warehouse wide table to help low-threshold warehouse building and the implementation of all data quality throughout the domain.
In terms of data security, we support more than five data security bills, including GDPR, so that companies can identify sensitive data in different categories after choosing a data security bill. In the data production, data integration, data development, and value mining process of the data life cycle, data desensitization will run through it, supporting more than 15 types of data desensitization.
DevSecOps has more than 100,000 developers and active users on the cloud. The platform provides a lot of database developer tool sets. Based on these developer tools, data changes, database table design DDL and security rule engine are combined to enable enterprises to maximize the work efficiency of business developers through DevSecOps while ensuring security. Independently carry out the database table design and change release of the database.
The security rule engine has built-in more than 200 security rule templates. Different database engines have different best practices. Enterprises can define appropriate security rules based on the templates, and define standardized rules with the three factors of operator, database object, and specific operation behavior. For example, the number of data corrections at one time, the number of queries at one time, and the field access permissions of personnel are all designed based on the security engine.
Change security is to guarantee and cover the independent change actions developed by DevSecOps. For example, when large-volume data operations are performed, it will be cut into multiple small-batch operations, and changes with locks will automatically become changes without locks. Through the development and design of security rule detection and interception specifications to make changes safe and reliable, releasing these capabilities to enterprise developers can improve the efficiency of independent R&D iterations.
The problem facing the digital transformation of enterprises is how to perform unified data integration and play the value of data. We hope to provide developers with a convenient experience through integrated data integration and low-code development capabilities.
The core link of the bottom layer of the data is based on the real-time heterogeneous data transmission capabilities of DTS products, and has relatively mature precipitation in data migration, synchronization, and subscription.
After the AnyToAny technical architecture is implemented inside the transmission link, the new data source acts as a plug-in to quickly connect with the original multiple heterogeneous data sources in real time. At the same time, unstructured data can be used for value mining after structured storage through semantic recognition and type mapping.
After internally building an integrated link for data flow and batch, through a unified memory conversion module, it supports user-defined operators and desensitization algorithms. Stream and batch data can be uniformly converted only after one definition. All full data Initialization reuses conversion logic. In the DMS to open a warehouse, the link automatically initializes the table structure at the target, the full data and incremental data are moved over, and the intermediate conversion only needs to be defined once. Performing database switching or DDL changes at the source end can seamlessly synchronize the source end changes to the target data warehouse, realizing the integrated technical architecture of the warehouse. The built-in more than 100 operators for data conversion greatly converge the user data link, make the entire link more stable, and greatly simplify the operation and maintenance cost of the data link.
After data integration is implemented, the data source, cross-database query engine, and the flow and batch of data transmission links can be used as operation nodes by dragging and dropping, allowing users to independently define data processing procedures, through operation and maintenance tools, security The ability of management and unified governance allows enterprises to create mass production tasks.
3. Solutions and best practices
1) A financial company builds a data security production plan based on DMS+RDS
The financial company's data security production program based on DMS+RDS. There are more than 600 database instances in the enterprise, oriented to many front-line business developers. When business development needs to release changes and database operations, communication issues, data security issues, and efficiency issues are managed through DMS to manage data sources and provide unified data security changes. This improves the efficiency of front-end business development, while ensuring data security and change stability.
2) An operator builds multiple activities in different places based on DMS+PolarDB-X
The above picture shows the operator constructing a remote multiple live solution through DMS and Polar DB-X. The infrastructure investment in the disaster recovery computer room of the traditional database cannot bear the business flow, or can only bear the limited business flow. It is difficult for these infrastructure investments to exert value, leading to restrictions on the power of the operators' physical computer rooms and unable to support greater business development. With the help of DMS+PolarDB-X, it is upgraded to a remote multi-active architecture, which realizes rapid disaster tolerance switchover, and at the same time bears business traffic, which meets the demands of business expansion.
3) Global Multi-Live Database
As many companies have strong demands for the remote multiple-active architecture, this time we released the RDS global multiple-active database. Through the RDS console, you can purchase the global multiple-active database with one click, automatically create RDS for multiple data centers and complete the architecture. Through the multi-active interface, the business switching becomes simpler, and the implementation cost and management complexity of the enterprise's remote multi-active are reduced.
4) A bank builds a T+1 data warehouse based on DMS+ADB
The picture above is a case of a bank, which builds a T+1 data warehouse based on DMS+ADB. The batch integration of the company's periodic data led to a large business load in the production library, which affected business stability. The regular reports could not support real-time decision-making of business activities. Based on such pain points, we build a T+1 data warehouse. The zipper table has little impact on the production of the source database. After the first time, the full amount will be incremental real-time data. Periodic reports are produced through regular consolidation. Real-time production reports are generated based on ADB, and historical data snapshots that can be traced back at any point in time can be built locally, helping enterprises to simultaneously solve the demands of regular reports and real-time analysis.
Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。