1

Data Fabric, which has been on Gartner's annual technology trend list since 2019, will be listed as the top ten technology trends in the field of data analysis in 2022. What is its value? How to land in the enterprise?

On the recently held QCon Guangzhou station of the global software development conference, Guo Yi, the technical director of NetEase Shufan big data product technology, shared the title of "Practice of Logical Data Lake Architecture Based on Data Fabric", and introduced the latest practice of Data Fabric.

图片

Data Fabric: Benefits and Myths

Data Fabric, Gartner defines it as a design concept - building an integration layer (Fabric) of data and connection processes to support the cross-platform design, deployment and use of data systems to achieve flexible data delivery. NetEase Shufan has carried out a landing practice in the direction of Data Fabric. We call it a logical data lake. NetEase Shufan believes that this integration layer is a cross-platform logical model. We believe that through the logical model, it can help business personnel to shield The underlying complex data architecture, business personnel above the logical model layer, only need to select the data set to achieve the purpose of out-of-the-box. From the results, no matter where the data is stored, this architectural pattern can help enterprises obtain the correct data at low cost and in a timely manner, and achieve end-to-end data governance. Guo Yi summed up the two key words of Data Fabric: logical unity and physical dispersal - this is also the guiding ideology of NetEase Shufan's logical data lake practice.

The benefits of Data Fabric are obvious. Guo Yi said, firstly, it can help us save 70% of the workload, including data discovery, data analysis and data development; secondly, it can help our business personnel to use data for business analysis more quickly, without all the The data can only be analyzed by entering the lake; thirdly, it builds a unified interface between business personnel and data teams, that is, the logical model layer, which makes the collaboration between data teams and business teams more efficient; in addition, it supports Business personnel can complete data consumption by themselves, which greatly expands the scope of data use.

To achieve these effects, Data Fabric naturally requires a series of complete core capabilities, from data source to data consumption.

图片

With a really fragrant Data Fabric, does it mean that the data lakes and data warehouses that enterprises have spent a lot of energy and resources on building before are useless? it's not true!

Combining the practical experience of NetEase Shufan, Guo Yi gave 4 reminders: First of all, Data Fabric is not really going to the lake or warehouse, but to build a decentralized data access layer, in which the lake or warehouse can be used. A data source exists. Secondly, in the case of a large amount of data, the Data Fabric will have performance problems. We can solidify the data into the lake or warehouse as needed. The Data Fabric does not necessarily need to directly access the data source. Thirdly, Data Fabric only provides a richer data access interface, which can either directly access the data source or provide more efficient access through solidification. Another important point is that Data Fabric is not about removing ETL. On the contrary, DataOps and data governance are the foundation of Data Fabric.

NetEase Shufan Logical Data Lake: Metadata management is the key

The logical data lake is the technical solution for NetEase Shufan to implement the Data Fabric. The factors that drive NetEase Shufan to develop a logical data lake are the complex data architecture, data analysis efficiency issues, data departments becoming bottlenecks, and resource utilization issues when supporting NetEase's business. Guo Yi shared NetEase Shufan's logical data lake architecture, including data source management, data catalog, metadata management, DataOps full life cycle development, data model layer, materialized view and other important modules, covering data management, calculation, and use.

图片

Among them, metadata management is the key to connecting different data sources to realize Data Fabric. NetEase Shufan Logical Data Lake supports metadata management through seven major components, including process engine, indicator system, security center, data map, data standard, model design center, and data quality center, and strictly defines the metadata outside and inside the lake. Published core processes with the aforementioned components to ensure that these processes are executed.

图片

Customer practice has proved the value of NetEase Shufan's logical data lake architecture. Taking a large enterprise customer as an example, the customer introduced a logical data lake to build a one-stop development and operation model, and promoted the five unification of data operations with intensive data development in the middle platform: unified logic into the lake, unified development, unified scheduling, unified governance, and unified service, Improve data delivery efficiency and sharing capabilities to gain multiple benefits. From the perspective of platform capabilities, customers have successfully introduced mature data middle-end products and supporting related management specifications. From the perspective of data work mode, the logical data lake allows business personnel to become producers from demanders, so that data developers have no hard-to-find data.

To achieve the operational goals, the first is to improve the development efficiency, the report development efficiency is increased by 50%, and the development efficiency of the visual data application page is doubled; the second is the data self-service analysis capability, and the self-service mode accounts for 30% of all data retrieval and analysis work in each department. , and cultivated 200 business personnel for self-service analysis; and in terms of long-term operational goals, customers will improve the capabilities of the data center, cut over the data marts and data platforms deployed on the local network, change the current 1+N model in the province, and further improve Operational efficiency and data security.

summary

The core goal of an enterprise's digital transformation is to reduce costs and increase efficiency, and the use of data value is crucial. Data Fabric provides a low-cost way to support enterprises to implement this goal smoothly and quickly. The logical data lake is a verification An effective implementation plan. The logical data lake also enables the NetEase Shufan data technology stack to flexibly integrate with the data architectures of different companies in different industries, helping customers realize the transformation from data storage to productivity and meeting the needs of data-driven business innovation.


网易数帆
391 声望550 粉丝

网易数智旗下全链路大数据生产力平台,聚焦全链路数据开发、治理及分析,为企业量身打造稳定、可控、创新的数据生产力平台,服务“看数”、“管数”、“用数”等业务场景,盘活数据资产,释放数据价值。