Introduction to On October 19th, at the 2021 Yunqi Conference, Alibaba Cloud released the DataWorks full-link data governance product system, based on multiple big data architectures such as data warehouses, data lakes, and lake warehouses. DataWorks helped The rising "data suspension" within corporate governance releases corporate data productivity.
On October 19, at the 2021 Yunqi Conference, Alibaba Cloud released the DataWorks full-link data governance product system. Based on multiple big data architectures such as data warehouses, data lakes, and lake warehouses, DataWorks helps companies manage internally. The rising "data suspension" releases the data productivity of enterprises.
Jia Yangqing, senior researcher of Alibaba Cloud Intelligent Computing Platform Division, Vice President of Alibaba Group shared on-site
"As the amount of data becomes larger and larger, the value of unit data will become smaller and smaller. Full-link data governance allows data to flow from low-quality and low-efficiency to high-quality and efficient."
Jia Yangqing, vice president of Alibaba Group and senior researcher of Alibaba Cloud Intelligent Computing Platform Division, said at the scene. The sedimentation of the Yellow River caused the river bed to rise continuously, forming a "overground river" where the river is higher than the ground level. In Kaifeng, Henan, the highest river bed reaches 10 meters, and the river bed increases at a rate of 10 cm every year. As a result, the dams on both sides are constantly increasing. In the digital transformation of enterprises, the amount of data becomes larger and larger, the number of machines becomes more and more, the team becomes larger and larger, is the digital transformation really getting better and better? For enterprises, apparent prosperity does not mean that there will be no "flood" in the future. In Alibaba, Double 11 has become a daily routine. In 2021, the daily data processing level of the big data computing service MaxCompute has surpassed the peak of Double 11 in 2020. The increasing amount of data has caused great cost and efficiency pressure. .
Machine efficiency + human efficiency\= data efficiency
Faced with such an expansion of data every year, Alibaba's solution is to make data efficiency a core indicator of the company through the ability of the big data + AI integrated platform. In terms of machine efficiency, MaxCompute as an offline data warehouse , a single day data processing volume has reached 1.7EB, but in addition to the amount of data, we should pay more attention to MaxCompute only 10% machine growth, which supports 75 % Of data volume growth. This is that MaxCompute continues to pursue extreme optimization in the underlying storage and performance, and has broken the world record of TPCx-BigBench 100TB scale performance for 5 consecutive years. At the same time, Hologres used as a real-time data warehouse , with a peak write of 596 million records per second, and a single table storage of up to 2.5PB. Based on trillions of data, it provides external multi-dimensional analysis and services, and 99.99% of queries can return results within 80ms . Hologres and MaxCompute form an offline, real-time, analysis, and service integrated data warehouse, which greatly simplifies the complexity of the big data architecture from the bottom. Machine-level efficiency is often easy to measure, but human efficiency is difficult to quantify. DataWorks from 2009 to become unified Alibaba Group's big data development governance platform , complete set up in Alibaba data table . Users often vote with their feet on the completeness and ease of use of a platform. At present, the number of daily active users in the large-scale collaborative data platform built on DataWorks has exceeded 50,000. On average, one out of every three Alibaba employees is using DataWorks, serving almost all departments within Alibaba, and the entire chain of precipitation There are more than hundreds of core capabilities in road data governance. In FY2020, Alibaba's comprehensive income through data governance exceeds 1 billion yuan. It can be said that the big data development and governance platform DataWorks and the computing engines MaxCompute and Hologres form the "Wintel Alliance" under the big data architecture to jointly improve the efficiency of enterprise data.
Construction experience: from small workshop to large platform to agile manufacturing
data governance or, data sets or, never wanted to be a product out of the ivory tower , but after many years of grinding out . Alibaba's digital transformation has also experienced the era of slash-and-burn. Each business team maintains multiple Hadoop clusters, like small workshops: what is used and what is needed, and various technical components are gradually piled up like building blocks. In this process, it is often very painful. The platform releases a new function. I don’t know why another component is broken. Then the technicians spend a long time to troubleshoot another component and fix it. The component was released for a while, and another one was hooked up. The problem kept popping up like "press the gourd and float the scoop", as if there is no end to it. Therefore, Alibaba began a vigorous platform unification plan, built a large platform, changed the open source architecture to a self-developed architecture, and gradually migrated MaxCompute on . At this time, the concept of data center also began to be promoted within the group, and gradually implemented the data center methodology of DataWorks , completing the construction of Alibaba's entire data center. So far, from the core e-commerce Tmall Taobao to Ele.me, Youku, Hema and other business teams, all business teams have carried out one-stop collaborative data development on the same large platform. However, with the popularity of large platforms and the number of people using them, the governance of data will become more complicated. In the continuous generation of thousands of tables, companies cannot know how many irregular statements are consuming a lot of computing resources like termites; how many tables are being replicated repeatedly, creating a seeming "data boom"; how many dirty Data is constantly being produced and polluting the quality of the data; how many tables are being continuously applied for permission to use, facing the risk of data security. All these issues pose severe challenges to large platforms. As a result, the large platform gradually evolves towards agile manufacturing. Through the full-link data management capability of , management and control are performed from a global perspective, and at the same time, the decentralization of data decision-making is realized.
DataWorks Full Link Data Governance New Product Release
Full Link Data Governance Summit 2021 Yunqi Assembly, DataWorks in twelve years of accumulated hundreds data development governance on the ability , heavy release the full series of new link data governance.
Data Governance Center
Data governance is not only a technical issue for the big data team of an enterprise, but also an organization and management issue. For the entire organization, how to measure the ultimate effect of data governance? How to make better use of the organization's initiative? In some companies, a special data committee will be established to formulate some data governance specifications, but it is found that the platform does not support these specifications well, or the company has purchased a data platform, but does not know how to use the platform. Complete the work of data governance. In Alibaba, we often refer to health score . From the organizational design, there are platform teams, business teams, and collaborative teams such as risk control and finance under the data committee. For a certain business team, it will set a goal for this year, for example, increase the health score from 80 to 90, starting with computing, storage, etc., not only from the business side, the production side to carry out governance optimization work, there is a need It will also be given to the data platform team to optimize and evolve the engine and data platform products, and everyone will work towards this goal together. With the organization in a measurable way, these departments can put these numbers in their goals. At the same time, all kinds of data governance battles, team competitions and other long-term operations can also be continuously extended through healthy division to achieve the purpose of organizing data collaboration and giving play to the initiative of the data governance organization.
The newly released Data Governance Center of DataWorks forms an enterprise data governance health score for five aspects of enterprise computing, storage, R&D, quality, and security. It uses a problem-driven concept to cover the full link active data before, during and after the event. Governance and data governance health assessment . Enterprise data governance is no longer a "phased project", but a "sustainable operation project".
Intelligent data modeling
The company has built a platform and has done a lot of standardized governance. What is the value to business personnel? How many costs are saved and how many problems have been solved are relatively insensitive to business personnel. The business side only wants to get the data they want faster, so the original data warehouse construction method is more of a small step from the bottom to the top, and quickly meet the needs first. Today's full-link data governance, allows the construction of data warehouses to be standardized, and the direction of sustainable development evolves to , emphasizing the two-pronged approach of top-down standardized modeling from a business perspective and bottom-up building of data warehouses from a development perspective.
DataWorks newly released intelligent data modeling, precipitates the Alibaba data center construction methodology , from the four aspects of data warehouse planning, data standards, dimensional modeling, and data indicators, it interprets the data business of the business from a business perspective. Intelligent data modeling supports rapid data modeling, including forward modeling and reverse modeling, and provides minute-level model creation capabilities. At the same time, you can open up data development, you can directly publish data models to multiple engines, generate quality rules with one click, directly publish tables and automatically generate ETL code. Enterprise business data can easily understand the whole picture, to quickly get to the index data as well as data analysis and exploration-based data model, all of the staff within the enterprise ⼯ can achieve "the same number of files" quick understanding and circulation , let Data decision-making can be truly and effectively decentralized!
Hema Xiansheng uses DataWorks intelligent data modeling to implement the new retail industry data model Rex-LDM
At the same time, the site also released DataWorks data integration real-time synchronization capabilities, intelligent data query, privacy and security computing, DataWorks open platform, data migration tools and cloud migration expert services .
The "Global Digital Economy White Paper" issued by the China Academy of Information and Communications Technology in September 2021 reported that the scale of my country's digital economy reached US$5.4 trillion last year, accounting for nearly one-third of GDP. In the era of digital economy, data has become a key production factor, just as in the era of agricultural economy and industrial economy, land and labor are key production factors. DataWorks through intelligent data modeling, global data integration, efficient data production, active data management, comprehensive data security, fast data service six full-link data governance capabilities, carrying the possibility of digital transformation of thousands of industries. Currently, DataWorks has landed thousands of customers in industries such as digital government, new finance, new retail, energy, industry, transportation, gaming, education, and digital marketing.
State Grid Big Data Center realizes the unified management of PB-level data of headquarters + 27 provincial (municipal) companies through DataWorks, and accelerates the overall digital transformation and upgrading of the power grid through the governance and monitoring operation system of the full-link data center.
Creative Dreamland based on the open source EMR engine and replaces the self-developed scheduling system with DataWorks. The technical staff in the enterprise can focus more on the business and help the data operation of the game industry.
China uses DataWorks intelligent data modeling to implement full-link data model governance, which greatly improves the self-service capabilities of data centers, allows enterprise data decision-making to be decentralized, and releases the digital power of new retail.
In the deep water area where the digital transformation of enterprises is entering, the "data suspension river" will gradually become the "damox sword" of enterprises. Alibaba Cloud is working with customers and partners from all walks of life to manage and manage data, good use of data, 1618ddfc0724cb allows data to gather in advanced productivity!
DataWorks official website: https://www.aliyun.com/product/bigdata/ide
Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users. The copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。