Practice of Data Standards in NetEase

In life, standards are closely related to us. The food we eat needs to meet the national standards before we can eat it, the vehicle can be driven on the road only when it meets the emission standard, and the computer interface must meet the unified standard before it can be connected with peripherals, etc. In the world of data, data standards are equally important. We expect to truly apply data standards into practice to help customers solve problems such as insufficient capitalization, difficulty in improving data quality, and low data development efficiency, so NetEase started the construction of data standards.

Based on our understanding of data standards, this article will explain the establishment of standards and the introduction of standard management products designed according to the content and process of standard establishment, as well as the specific practice of standards in the process of data governance, hoping to collide with you to gain new understanding.

1. What are the data standards?

In actual production, we generally refer to national standards, local standards, industry standards, etc. to carry out specific activities to ensure that our production process meets regulatory requirements and facilitates upstream and downstream collaboration, so we will see the following standard guidance documents :

Similarly, data standards will also exist in the form of files. In addition to the standards defined by national standards and industry standards, in order to facilitate the adoption of the same data construction specifications by various departments, the company usually uses files to define data standards for various departments to reach. unified consensus.

Although documents are a form of standards, they are unstructured. In practical applications, only when we understand and extract the contents of documents and apply standards to product design and process activities can standards play a real role. Normative constraints.

According to the definition in the White Paper on Data Standards Management Practices issued by the Academy of Information and Communications Technology: Data Standards refer to normative constraints that ensure the consistency and accuracy of internal and external use and exchange of data.

There is no doubt that this is correct. However, we still need to implement the standards. Taking the construction of a data center as an example, we know that the data center emphasizes resource integration. At the data level, it is to integrate the data scattered in various isolated islands in a multi-source heterogeneous system to form a unified data. Service capability is a daunting task, and it is difficult to ensure the value exploration of data and form real data assets through mutual agreement and default trust of relevant parties.

Therefore, to expand the data standard based on this point, one is to expand the management scope, from the narrow data standard (referring to the normative constraints on the basic data itself, such as data format, type, value range, etc.) to the entire data. The standard at the platform level (including normative constraints at each stage of governance); the second is the expansion of management methods. Data standards no longer refer to a series of data standardization documents, but a set of standard requirements, process systems, and technical tools. The formed system, through this system, completes the standard planning, formulation, release, implementation, inspection, maintenance and other behaviors to complete the standardization of data and the precipitation of standards.

2. The value of data standards

Before we talk about value, let's talk about the problems that give us headaches. Everyone is talking about data standards, but have data standards really been applied? We hold a bunch of standard documents and expect everyone to follow this standard internally, but what is the result of the implementation?

When data integrates multi-source heterogeneous data, can data warehouse developers really quickly understand the actual business meaning of these data? If the cost of understanding is high, developers may experience cognitive biases.

Finally, the data integration has come in, and the construction of the data warehouse can be started. How to ensure that the data of each layer meets the quality requirements, does it depend on the personal quality of the development? For example, we generally do data standardization at the dwd layer, and different subject areas are developed by different leaders. How can we ensure that the standardized results seem to meet the specifications? Can the data reliability of dws be guaranteed? Can it still be called a common model layer?

Finally, after the data warehouse development is completed, it needs to be opened to the outside world. In fact, we need to develop not only its data, but also its metadata information to help data users quickly find the data they need. If you just pile the data together, only R&D personnel know what the data is, where it is, and how to use it, which cannot be called a data asset.

There are many more problems, here are just some typical ones. Of course, these problems can be solved, and the solution is data standards. The process of solving the problem may take a long time, because it is not an easy task to advance the standard from management to implementation. It needs to be transformed from the thinking, but we always have to do things correctly.

Some of the values are listed below, but many more possibilities can be found in the actual application process.

Value 1: Establish a unified data view

Establish a general meta-model specification, support user-defined extensions, and abstract and extract information from multi-source heterogeneous data tables to form a unified metadata layer. After all data development is completed, it is released to the unified data catalog maintained by data standards, and multi-dimensional screening is carried out through data catalogs of different dimensions to meet the retrieval needs of various users and achieve the goal of manageable, usable, and searchable assets.

Value 2: Establish a unified data cognition

First, use the standard to complete the standardized description of multi-source heterogeneous data. Although the data is called in different systems, it will be given a unified name as long as it enters our platform, so that the management party, the developer and the user can establish a unified identity. Know. For the outer surface of the warehouse, the data standard is associated with the table fields, which aims to unify the meaning and inform the direction of future data processing; for the inner table of the warehouse, the standard needs to be referenced at the beginning of the model design. We know that the model can be obtained by combining the data items. The data element is the standard data item pool. When designing the model, you only need to select the required fields from the pool and combine them to assemble the desired model.

Value 3: Establish a quality audit system

The existing quality audit is generally set manually by users according to business requirements, and the cognitive deviation of different personnel will make it difficult to control data quality. The data standard automatically generates quality audit rules according to the format, type and other requirements of the data elements through the representation class attributes of the data elements. When a data element is bound to a field of a table, the audit task can be automatically generated according to the quality information requirements of the data element. And ensure the consistency of the source definition.

Value 4: Future-oriented data governance <br>We know that the ultimate purpose of tools is to reduce costs and improve efficiency. Efficiency improvement depends on process specification, and the process is standardized enough to realize the automatic flow of the process to a certain extent. Therefore, future data governance trends should focus on process automation and stage intelligence, both of which require the support of data standards.

Stage intelligence expects to provide intelligent identification capabilities at each stage of the process, such as the real meaning of fields (mounting data standards), resource classification, field enumeration values, etc., to reduce manual participation. In the short term, users change from processors to reviewers. In the long run, the behavior of user intervention feeds back the recognition model, increases recognition accuracy, and reduces labor costs;

Process automation relies on the results of stage intelligence and manual intervention, connecting stages in series, and connecting upstream and downstream as perfectly as possible. When the upstream stage reaches the downstream access conditions, the process operation can be automatically triggered. Of course, the process also needs to unify the upstream and downstream languages. (that is, data standards), which can be verified by trial runs in actual practice.

There are still many values of standards, but we will not repeat them because of space limitations. You can continue to discover the application scenarios of standards. After talking about the value of standards, how do we establish data standards?

3. How to establish data standards?

In the early business development process, in order to solve the current business problems, each business line has built its own personalized business system. In the process of construction, in order to ensure internal communication, there are more or less local data standards. . Therefore, the construction of a unified data standard is largely a closure of local standards. Generally speaking, current national standards or industry standards can be collected, and existing standards can be compared with national or industrial standards. First, this process can be done. To meet the needs of supervision, the second is to greatly save the manpower for standard formulation;

For details, please refer to the six steps in the establishment of data standards, which are: data standard planning, data standard formulation, data standard release, data standard implementation, data standard inspection, and data standard maintenance.

3.1 Data Standard Planning

The standard planning first needs to conduct research and analysis on the business and data of the enterprise, and clarify the scope of the data standard in combination with the actual data standard requirements. Then proceed step by step according to the actual situation.

3.1.1 Collection of current standards

Starting from the business process, the business entities participating in the business process can be delineated. For general business entities such as people, the corresponding current national standards can be collected. For example, the mandatory standard GB 11643 should be followed for the citizen ID number, and the gender code should be referred to and recommended. The provisions of the sexual standard GB/T 2261.1, the administrative division shall refer to the provisions of GB/T 2260, etc. For business entities with industry attributes, such as commercial bank collateral, please refer to the provisions of JR/T 0170.1 and JR/T 0170.2.

3.1.2 From local standard to global standard

Collect the local standards that have been established for each business line (department) of the enterprise and are not suitable for citing current standards or do not exist in current standards, review items with the same business meaning but different standards, and reach an agreement within the enterprise , to get the final unified data standard.

This process can include the unification of basic data standards, the unification of reference standards, and the unification of indicator data standards.

3.1.3 Discover more data standards

It is found that more standards are mainly used in the following situations. First, when local standards are not clear and no current standards are applicable, second, there are many vertical systems in each business line of the enterprise, the volume of data is large, and there is a lack of sufficient manpower and technical means. When standard setting is expected from an overall strategic point of view. In response to this situation, the data standards management platform (described in detail in Section 3) can be used to identify and pick up standards.

There are generally two ways to identify and pick up the standard:

The first type has the need to clearly formulate a certain standard. By defining the concept of data element (detailed in Section 2.2), the object class and characteristics described by the data standard are determined, and then the inventory is scanned through keyword scanning and intelligent identification technology. Data, identify the set of data items that are consistent with the concept of the data element, and probe the set to obtain the field type distribution, length range, value range distribution, etc., so as to construct the representation and description of the data element and form a complete data standard.

The second is to explore whether there is a need to formulate standards for certain data items without a clear need for a certain standard. The system scans the stock data, traverses all the field names in the selected data source type, extracts the field names that reach the duplicate threshold, and formulates data standards for them.

3.2 Data standard formulation

3.2.1 Metadata Standards

The metadata standard mainly regulates the representation and organization of various metadata and assets on the platform.

3.2.1.1 Metamodel formulation

The data center is the foundation and central system of an enterprise's digital transformation. It integrates and capitalizes massive, multi-source, and heterogeneous data in the entire enterprise. However, the multi-source and heterogeneous data are obviously differentiated. How to ensure data managers, users, and developers? Having a unified understanding of data is an urgent problem to be solved. The main purpose of good meta-model design is to shield the complexity of the underlying multi-source heterogeneous system, and use a unified language to describe all kinds of data from different application systems and stored in different types of databases.

We know that metadata is the data that describes the data, and the meta-model is the data description of the model. According to the four-layer meta-model structure proposed by OMG (Object Management Organization), the four-layer relationship can be clearly expressed:

It can be seen that metadata is a relative concept, and the meta model is the metadata of the metadata. In order to make it easier for everyone to understand, here is an example explanation:

The meta-model is not limited to table meta-model and field meta-model, but also includes indicator meta-model, label meta-model, etc. Although the types of metadata described are different, the management methods are the same. In practice, all of them can be included Data standards are managed, and can also be maintained in the corresponding subsystems.

3.2.1.2 Naming and coding rules formulation

Naming rules are mainly used to standardize table names, field names, task names, indicator names, label names, etc., and specify which naming elements should be used for a name and in what order. The encoding rules mainly include user asset encoding, data element internal identifier, label encoding, index encoding, etc., and specify which encoding method should be used for a certain encoding.

Therefore, it is necessary to specify the scope of naming and coding elements. One is to select the existing enumeration values of the platform, such as data hierarchies, subject domains or other existing classification enumerations; the other is that users can customize constants and custom enumeration values; The third is the variable bit sequence provided by the platform. Through the above naming elements, sorting and combination are performed to form naming and coding rules.

Take the data element as an example:

The first encoding method can be "specified identifier (constant) + 7-bit self-incrementing sequence", which can be encoded as DE0000001;

The second coding method can be uniformly coded according to the classification, which is similar to "first-level classification code + second-level classification code + three-digit self-incrementing sequence". Information identification class (001)", then it can be encoded as 01001001, and so on.

3.2.1.3 Data catalog specification formulation

The data catalog provides flexible data organization methods. For example, data warehouse developers use data tiering and subject domains to organize data. For data managers, they may focus more on asset inventory, hoping to classify them according to source systems, management departments, and security classifications. programs to manage.

When we formulate the data catalog, we need to analyze the user's demand scenarios, and provide users with more suitable data perspectives in different scenarios, so as to facilitate users to access and use data. Generally speaking, data source classification, data warehouse design classification, and data security classification will be provided first. The description information of the classification should include at least the classification name, English name, and internal code, so as to facilitate the application in other modules of the platform. And the classification scheme supports users to customize and expand in the later management process.

3.2.2 Basic data standards

3.2.2.1 Formulation of Roots

The root is to make standard naming more standardized and unified, and will eventually be applied to field naming or naming of other assets.

Enterprises can collect word roots according to their own accumulation to form their own word root database. When formulating data elements and dictionaries, they can automatically translate English names according to the word roots according to the input Chinese names.

A complete root information includes three parts: English abbreviation, English full name, and Chinese full name. Multiple full Chinese names are supported to ensure that users can obtain the same English abbreviation when using the same meaning field to translate the root. In addition, in order to facilitate unified management, it is necessary to specify the encoding of the root and the source of the root.

3.2.2.2 Data element formulation

Data element is the concrete embodiment of basic data standard, and it is also the core of data standard management. According to the data standard planning, the first way to formulate data elements is to extract the existing standards in a structured way and use the platform to manage them, and the second way is to establish their own professional data elements according to their own needs.

A complete data element should be composed of three parts, object class, characteristics and representation. As shown in the following figure, only when the object class and its characteristics are bound to the representation, the concept of data element can be transformed into a real data element.

Object class: a collection of ideas, abstract concepts or things in the real world, with clear boundaries and meanings, and features and their behaviors that follow the same rules and can be identified; such as: cars, people, orders, etc.;

Characteristics: a certain property common to all individuals of the object class, such as color, gender, age, price, etc.;

Represents: a combination of range, data type, and if necessary, unit of measure or character set, such as format, range, length, etc.;

Among them, the value range can be given directly by name or code value, can also be given by reference, and can also be given by binding data dictionary.

Therefore, the complete data element name should be: "object class word + characteristic word + representation word", such as a person's gender code.

After understanding the meaning of data elements, how to formulate data elements? We can refer to Parts 1 to 6 of the GB/T 18391 standard. Interested friends can go to understand, here is a structured description of the data element based on our understanding.

When formulating data elements, we usually describe the basic attributes of data elements from six aspects: identification class attributes, definition class attributes, relationship class attributes, presentation class attributes, management class attributes, and additional class attributes, as shown in the following table, which is a The comprehensive and general data element description template needs to be deleted and supplemented according to the actual needs of the enterprise during the application process.

3.2.2.3 The formulation of data dictionary

The data dictionary is the concrete manifestation of the reference data standard. It is generally divided into the original dictionary and the standard dictionary. The original dictionary refers to the enumeration set of the data content of a certain original item in the source system or production system. The standard data dictionary is generally used as the data element value. In the data processing process, the mapping from the original dictionary to the standard dictionary needs to be completed, and the standardization of the dictionary is completed.

The core of the data dictionary is its code value list. The code value list must contain at least two pieces of information: code and code description. If necessary, a description field can be added to supplement it.

How to get a stopwatch:

Original dictionary: database reverse collection, fill in field enumeration values during metadata registration, value domain distribution calculation during data exploration, manual input;
Standard dictionary: structured extraction of current standards, analysis of standard identification results, and manual entry.

3.2.2.4 Data item classification specification formulation

The data item classification is similar to the data catalog, and it is also to meet the classification requirements of different objects in different scenarios. Data item classification is classification at the field level.

When formulating a data catalog, it is necessary to analyze the user's demand scenarios and provide users with different classification schemes in different scenarios. For example, from the perspective of management, it can be divided according to the description objects and source files; from the perspective of data security, it can be divided according to sensitivity level and security level, and the classification scheme supports users to customize and expand in the later management process.

In the actual application process, the specific classification value will be associated with the data element, and then the data element will be associated with the field to achieve the purpose of rapid classification.

3.2.3 Formulation of technical standards

3.2.3.1 Data Type Mapping Relationship

It mainly records the mapping relationship of data types between different data sources, which is convenient for quickly building tables in scenarios such as data transmission and distribution, and improves the configuration efficiency of data transmission tasks.

3.2.3.2 Formulation of Heterogeneous Data Development Template

It mainly manages DDL statement templates of different data sources, including adding, deleting, updating, etc., and assists data developers to quickly generate statements according to templates when selecting corresponding database nodes.

3.3 Data Standard Release

The general data standard recommends following the life cycle of draft, trial, standard, and abolition, but it can be simplified according to the actual situation. For data elements and data dictionaries, follow this life cycle management as much as possible. For word roots, data classification, metamodels, etc., the process can be simplified, and the life cycle management of draft, online, and offline can be adopted.

The data standard release means that after the standard formulation is completed and enters the development completed state, it can be submitted for release review. After the review is passed, it will be applied to the entire system. If subsequent revisions are required, the latest version needs to be republished after the revision is completed.

In addition, it is necessary to check the version changes and the scope of impact before the release, and then make the release effective after evaluating the impact, and notify the relevant parties to make adjustments.

3.4 Data Standard Implementation

The implementation of data standards is mainly divided into two parts. The first part is the application of data governance in various stages, and the second part is the application of new systems and historically existing business systems.

The application of the data governance process is mainly (involving the connection between data standards and various modules, which will be introduced in detail in Section 4):

Metadata: Metadata needs to be described from three aspects: business attributes, technical attributes, and management attributes, and specific description items need to be defined. Data assets: need to carry out inventory of various assets, define asset coding and naming specifications, and define classification basis , Online standard Data quality: need to establish audit rules, need to build a quality inspection system Data security: need to classify data, need to define the basis for classification of data items, identification of sensitive information Model design: need to define data models, data indicators, dimensions Standard data transmission of metrics and other data: Need to connect with different data sources and source systems, and formulate exchange basis between different systems and data sources Data development: Need to define data processing basis, field and dictionary mapping logic, various data source SQL templates

New business system

It must be designed in strict accordance with the published standards, and controlled by using the model design products provided by the platform

running system

The mapping relationship can be established by means of exploration and intelligent identification

3.5 Data Standard Check

After the implementation of the data standard, it is necessary to carry out a bid-drop inspection to confirm the implementation and effect of the standard.

You can refer to relevant indicators, conduct standard citation statistics and standardization rate statistics from the standard side, and judge the implementation of indicators and application effects from multiple perspectives from the quality side statistics table and field quality score.

3.6 Data Standard Maintenance

Maintain data standards

In the process of actual implementation, there may be revisions to the current standards and changes to the business rules of the enterprise, all of which need to be revised and revised to the published standards. Strictly follow the requirements of life cycle circulation, record the version changes, evaluate the impact of the changes, and re-release take effect

Precipitation Data Standards

With the accumulation of standards, we need to precipitate the standards of the industry in which we are located. Through the precipitation of standards, we will establish standard assets, form industry best practices, and enhance the status of enterprises in the industry.

4. Data standard product introduction

Now that we understand how to establish data standards, we can get started. But if you want to do it well, you must first sharpen your tools. A suitable data standard management tool can help us formulate and manage data standards more conveniently and efficiently.

Therefore, based on the analysis of the data standard management process and management content, and fully considering the inconsistency of standard management requirements in different industries, we design the function of the data standard management product. This chapter will introduce each module of the product in detail.

4.1 Overall Product Architecture

4.2 Product function modules

4.2.1 Data Standard Statistics Home

It mainly includes standard asset statistics, standardized situation statistics, standard process statistics, and comprehensive evaluation of standard construction and use.

4.2.2 Data standard file management

This module is used to manage various standard documents referenced by the current platform, and establish links with structured standards to ensure the credibility of standard sources. In addition, files that have been extracted from structured standards will be used as standard templates preset by the platform for users to use.

4.2.2.1 Data element management

Data element management is the core content of standard management. It supports input of data elements in forms and batch import, manages data elements according to standard life cycle draft, trial, standard, and abolition, and supports batch export of data elements to meet the needs of viewing data in different scenarios. yuan demand. Data elements are also bound to audit rules during definition to provide a basis for quality inspection.

In addition, it supports the comparison between different versions of data elements, obtains version differences, and evaluates the risks of standard changes.

4.2.2.2 Data dictionary management

The data dictionary management content includes the original dictionary and the standard dictionary. It can be considered that the original dictionary is the value range distribution of the original data items, and the standard dictionary is the value range distribution of the standard data items. The original dictionary can be actively entered or generated through the value domain distribution of data exploration; the standard dictionary satisfies the same life cycle management as data elements, and also supports batch import and export operations.

In the subsequent implementation, the dictionary table that exists in the existing database of the platform will be picked up, and the relationship between the original dictionary and the standard dictionary will be maintained at the same time, which is convenient for users to quickly perform dictionary benchmarking during data processing.

4.2.2.3 Stem management

Stem management aims to define the mapping relationship between English names, English abbreviations, and Chinese names, and provide normative input for standard naming. When the user defines the data element, data dictionary or model field, the input Chinese name will be split into words, and the English name will be generated according to the root.

In addition to the already supported form entry of stems, the bulk import of stems will be supported in the future to help users quickly import a list of established stems.

4.2.2.4 Data item classification management

The classification management of data items provides three types of hierarchical catalogs. The first manages classification catalogs, where users classify classification schemes; the second manages classification schemes, which are based on a certain data item classification basis (such as description). Object) provides a classification method; the third is the classification value, which belongs to the classification scheme, and will be mounted with the real data element at this layer.

Therefore, data item classification supports basic information management of classification, and also supports batch association and disassociation of data elements.

4.2.3 Metadata Standard Management

4.2.3.1 Naming and coding rules management

Naming rules and coding management should be able to collect and manage the existing enumeration values that can be used as naming elements in the platform, support users to add custom elements, and users can combine elements by clicking or dragging to form naming rules and codes. rule.

4.2.3.2 Data Directory Management

Data catalog management is similar to data item classification management, but the classification objects are different. The classification here is mainly to catalog various assets of the platform. It provides various perspectives and solutions to classify and manage tables, indicators, labels, etc. A unified asset catalog is displayed to make assets understandable, identifiable, and easy to find.

4.2.4 Technical standard management

4.2.4.1 Data Type Mapping Relationship Management

It mainly manages the mapping relationship of data types between different data sources, as shown in the following table. With the increase of data source types, this module supports cross-mapping of multiple data source types.

4.2.4.2 DDL Template Management

It mainly manages DDL statement templates of different data sources, including adding, deleting, updating, etc. It is referenced during model design or offline development, and the parameters in the template are replaced according to the selected information. Take MySQL table creation as an example:

CREATE TABLE IF NOT EXISTS ${table_name}(
${filed_list}
PRIMARY KEY ( ${pk_filed_name} )
)ENGINE=InnoDB DEFAULT CHARSET=utf8;

4.2.5 Standard Process Management

4.2.5.1 Standard Discovery

According to the standard formulation process, the platform provides the database picking capability, identifies the standard, and draws a conclusion based on the identification result, that is, the complete data element definition. Below is a reference to the page identified by the data element concept.

4.2.5.2 Audit Management

Review management is mainly to operate applications for standard life cycle circulation and standard release applications. Reviewers can evaluate and choose to pass or reject according to the actual situation.

4.2.5.3 Standard Release

The standard release adopts the method of whole package release. If a large version of the data element list of the same batch is released, the standard reference baseline of the platform is guaranteed. You need to support viewing the current updated content, submitting a release application, comparing version differences, and supporting viewing release history, etc.

4.2.6 Standard configuration

Standard configuration is mainly for configuration management of data elements and meta-models of data dictionaries. We provide a more comprehensive data standard structured representation method, but according to the needs of different industries for standard description, there may not be so many description items. Therefore, Provides the meta-model configuration of data standards, and users can enable, disable or add standard description items according to the actual situation.

4.2.6.1 Data element template configuration

4.2.6.2 Data dictionary template configuration

5. Combined practice of data standards and data center

In the specific implementation process, we expect to build according to the "requirements-design-development-delivery" process. In the requirements design stage, the current data situation should be mapped out, and the scope of governance and the scope of standard formulation should be determined. Therefore, in the subsequent design, the indicators and model design can be standardized, the quality of metadata and data can be controlled from the source, and the specific implementation of the development process can be guided.

The location of the data standard in the governance process and the interaction with each module.

5.1 Data transfer

Data transmission is responsible for the ability to integrate multi-source heterogeneous data into the big data platform and distribute platform data to other databases. When the target database does not have a corresponding table, it needs to build a table according to the source table, but the types of different data sources are different. , it needs to be matched manually. With the increasing variety of data sources, it is very difficult to match based on human experience.

The standard maintains the mapping relationship between different data sources. When establishing a transmission task, the target table structure can be quickly generated according to the mapping relationship, so as to achieve the ability to quickly create a table and create a table with one click.

5.2 Metadata

In our practice, the configuration of meta-model mainly includes group management of meta-model, management of built-in items in the system, and management of user-defined items. Currently, meta-model design for tables, fields, indicators, and labels has been supported.

5.2.1 Table Metamodel Design

5.2.1.1 Group Management

5.2.1.2 System built-in item management

5.2.1.3 Custom item management

5.2.2 Field Metamodel Design

5.2.2 Design of Indicator Meta Model

5.2.3 Tag metamodel design

5.3 Model Design

5.3.1 Hierarchical Planning

In addition to the built-in layers, users can add custom layers

For the table under the hierarchy, it is necessary to configure the table name design specification, and arrange the selected naming elements in a certain order to obtain the naming rules

5.3.2 Classification planning

Use data catalog management for classification planning, catalog data resources according to scenarios on the resource catalog and asset side, and meet the needs of various users for data checking and data usage. Such as: subject domain division, source system division, security classification, etc.

5.3.3 Standard design of table structure and data items

When designing the table structure, on the one hand, according to the filled Chinese description, the corresponding data element is automatically recommended (if the standard exists), on the other hand, the data element can be directly selected, and the platform will automatically backfill the field name, field type, field according to the selected data element Description and associated standard data dictionary, as shown in the following figure:

Specific applications are generally associated when adding fields in the model design center:

5.4 Data Development

When editing SQL, according to the selected input and output table, through the data element information associated with the table fields, the fields with the same meaning are automatically mapped to quickly generate SQL, and the user only needs to confirm the generated SQL.

In the subsequent planning, the standard will help visualize ETL and automate ETL, assist users in field mapping, and automatically obtain corresponding processing functions according to the audit rules and desensitization rules associated with data elements, and then generate development scripts.

5.5 Data Quality

Data standards are the main reference for data quality audit rules. By associating data quality audit rules with data standards, on the one hand, field-level data quality verification can be realized, and on the other hand, a more general data quality audit rule system can be directly constructed. , to ensure the comprehensiveness and availability of the rules.

5.6 Data Security

Data standards can contain business-sensitive data objects and attributes to define rules related to data security management. Quickly generate field-level encryption or desensitization rules through data element association.

6. Summary

The construction and management of data standards is a long way to go. In the future, the application scenarios of the standards will be gradually expanded to meet the needs of customers in various industries. With the continuous enrichment of management content and the continuous improvement of management processes, standards will serve as the cornerstone of the data center, providing normative guidance and supervision for each module and each process stage.

Practice of Data Standards in NetEase

1. What are the data standards?

2. The value of data standards