Explain in detail the "planning" ability of the core product Dataphin to be built by the data center

Introduction to data center is an upgrade of the traditional data warehouse. It is a complete system of data collection, construction, management and use. Dataphin is a powerful tool for building data center. The core advantage is in data construction and In the management, the OneModel methodology that Alibaba has accumulated over the years in the construction of data centers has been introduced.


The data center is the most cutting-edge data construction system in the current big data field. It does not start from scratch and is made out of nothing. The data center is an upgrade of the traditional data warehouse, and it is a complete system of data collection, construction, management and use. Dataphin is a powerful tool for building a data center. Its core advantage is to introduce the OneModel methodology (one of the components of the OneData system) that Alibaba has accumulated over the years of data center building in the construction and management of data. This article focuses on the design concept of Dataphin's core function planning.



OneModel divides the construction of the data center into four layers:

  1. Subject Domain Modeling : In the data center, the subject corresponds to a macro analysis field, for example, sales analysis is to analyze the subject of "sales". The collection of closely related topics is the subject field. Each industry can be split into There is a topic domain model composed of multiple (ranging from about ten) topic domains.
  2. Conceptual Modeling : On the basis of topic domains, entities and relationships between entities are added to each topic domain.
  3. Logical Modeling : On the basis of the conceptual model, the attributes of each entity and the constraints of the attributes are added.
  4. Business Analysis Modeling : Important and commonly used analysis methods and analysis perspectives in the industry. Based on the logical model, business analysis problems are converted into Dataphin-specific derived indicators, and atomic indicators and business constraints are further refined.



The subject domain modeling and conceptual modeling in the four layers of OneModel are carried and implemented by the planning function of Dataphin. The four layers of OneModel are not aimed at the enterprise-level data center, but are developed around a single independent business. Multiple independent businesses realize the enterprise-level data center through common dimensions. Therefore, Dataphin's planning function also includes the division of independent businesses, that is, the division of business segments. Planning does not affect data accuracy and output timeliness, but is an important data (asset) management function that affects data search, understanding, and authority control.


Business segments

The scale of the enterprise is large or small, the complexity and span of the business are also different, and the data reflects the business, so the data center of each enterprise is also different. The first step of data center construction is to plan. The first step of planning is to comprehensively sort out the business structure of the enterprise and divide the business into independent businesses. In Dataphin, it is the division of business segments.

The general principle of division of business segments is high cohesion and low coupling with . The specific process is as follows:

  1. Examine all the business processes of the enterprise. If there is an upstream and downstream relationship between the two business processes, or have a common business object, then they should be placed in the same business sector. For example, after the purchase process (purchase order) is over, there will generally be a logistics (enterprise’s flow of goods). Logistics relies on procurement. At the same time, goods are the common business objects of the two processes. Therefore, procurement and logistics should belong to the same A business segment. Expand the scope to list the upstream and downstream of each business process and business objects. The business processes that are directly or indirectly connected should belong to the same business segment. Example: retail business, purchase -> purchase Logistics -> warehousing -> sales and delivery, marketing -> sales -> performance -> after-sales, etc., some have upstream and downstream relationships, and some can be connected by goods, they belong to the "retail" business sector.
  2. Conversely, if there is no direct or indirect upstream and downstream relationship between the two business processes, and there is no direct or indirect common business object, they should not be placed in the same business sector. Example of same enterprise, there may be retail and real estate. In the real estate business, there is no upstream and downstream relationship between the process of land acquisition -> design -> development -> sales and the retail business process, and it cannot be passed If a certain business object is connected together, two business sections, "retail" and "real estate" should be created separately.
  3. It is worth noting that some business objects are shared at the enterprise level, such as company employees, administrative geographic divisions (yes, this is also a business object), etc., which will connect all business processes of the entire company to a huge In a single network. Therefore, we must first identify these enterprise-level business objects. For business processes that are only connected by these business objects (without upstream and downstream relationships), we need to cut this connection and assign them to different business sectors.


Subject domain modeling

Subject domain modeling, that is, the business is further divided into multiple subject domains under the business sector. There is no objective principle for the division of subject areas, and it is mainly divided according to the industry experience and business understanding of the data modeler. Take the retail industry as an example.

The subject areas of the retail industry are divided into the following figure, the core subject areas are "people", "goods", and "fields":

  1. Common subject area: data that will be referenced in all business processes, such as geographic location data, corporate personnel organization data
  2. Consumer (person) subject domain: This subject domain is mainly business activity data related to user (consumer) operations in retail enterprises
  3. Commodity (goods) subject area: commodity management (category management, brand management, etc.), commodity structure management (commodity grouping) and other related business activity data
  4. Merchant (field) subject area: Contains data related to offline stores, online e-commerce (self-operated or third-party) sales channels
  5. Traffic subject area: consumer visits to the store and other related data
  6. Transaction subject area: Contains information flow and capital flow data in the form of a contract between the retailer and the consumer such as sales order, payment, refund and return
  7. Contract fulfillment subject area: optional. The retailer distributes the goods to the consumer in accordance with the contract (order), which is the logistics data from the retailer to the consumer
  8. Service subject area: mainly after-sales data
  9. Interaction subject area: Optional. Non-contractual information flow data between retailer and consumer. For example, retailer's interaction with consumers on social media, consumer comments on e-commerce platforms, sharing and collection, etc.
  10. Marketing subject area: advertisements, events, coupons and other data
  11. Content subject area: Optional. The content constructed by the retailer for the purpose of attracting traffic, such as commercial advocacy, live broadcast delivery, promotional publications, etc.
  12. Supply chain subject area: the three streams between the retailer and the supplier, and the logistics and information flow data within the retailer


Conceptual modeling

On the basis of the topic domain model, the model constructed by the entities in each topic domain and the relationship between the entities is the conceptual model.


There are the following nouns in the conceptual model:

  1. Entities: The projection of business objects or business activities in the data world. Entities generally correspond to data tables one-to-one. Certain entities may have the same characteristics (shown as having many same attributes), these entities can be abstracted and generalized into generalized entities, and generalized entities have no corresponding data tables.
  2. Business object: an entity, which is the people and objects involved in the business, or it can be a pure concept. For example: consumer (person), commodity (article), category (concept), etc. In some versions of Dataphin, business objects are also called "dimensions".
  3. Business activity: an entity, the changing behavior of business objects, or the interaction between business objects. For example: visit behavior, sales behavior, etc. In some versions of Dataphin, business activities are also called "business processes".
  4. Entity relationship: the relationship between entities, there are two main types
    a. One is the reference relationship. An entity is an attribute of another entity. For example, the user entity in the user entity has the attribute of address, and the address itself is also an entity, then the user entity refers to the address entity; another example, in the order entity , Buyers, sellers, and commodities are all participating entities in the order, and the order entity refers to the buyer entity, the seller entity, and the commodity entity. From a technical point of view, references are "associations" in SQL. There are three types of reference relationships, one-to-one, one-to-many and many-to-many, which represent the quantitative relationship between the instances (records) of two entities that have a reference relationship.
    b. The second is the inheritance relationship. A certain entity A is subordinate to another entity B. Conceptually, A is more detailed and specific than B. For example, in retail business, you can define an entity as "user", "buyer" and "member" are both users, but more specific (buyers are users who have had transactions, and members are users who participated in the membership program) , "Buyer" entity, "Member" entity inherited "User" entity.

The above is the design philosophy behind Dataphin's core function planning. I hope it can help you make better use of Dataphin's planning functions.

Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.


3.1k 声望
6.2k 粉丝
0 条评论

阿里云开发者阅读 552

Mysql 数据库的批量插入或更新(Upsert)
这个问题已经困扰我一段时间了,对于大量数据的插入或更新,批量操作肯定比每条记录调用一次快得多,新数据可以用 insert 批量插入,老数据可以用 replace into 批量更新。但如果不知道数据是否存在(是否有唯一k...

songofhawk2阅读 2.3k

高效数据运营赋能数字化转型研讨会暨《DataOps 实践手册》新书发布会 预约通道开启!
隆重预告高效数据运营赋能数字化转型研讨会暨《DataOps 实践手册》新书发布会 主办单位:机械工业出版社举办时间:2023 年 1 月 10 日 8:30-12:00会议背景随着数据呈指数级增长,机器学习(ML)和人工智能(AI)...

思否编辑部阅读 3.9k

死磕数据库系列(一):关系型数据库是如何工作的? 死磕数据库系列(二):数据库系统核心知识点详解 死磕数据库系列(三):关系型数据库设计理论与流程详解SQL语法基础基础模式定义了数据如何存储、存储什么样...

民工哥2阅读 781

1024 开源首发 | SQL 审核工具 SQLE
在 2017 、2018 年和 2019 年的 10 月 24 日,爱可生开源社区出品了 MySQL 分布式中间件 DBLE 、数据复制产品 DTLE 和全局事务框架 TXLE 。时隔两年,这三款产品在社区积累了更多的设计经验和用户反馈,有更多的...

爱可生云数据库阅读 3.1k

HTAP 数据库如何实现?浅析开务数据库中的列存引擎
TP 与 AP 融合的 HTAP 数据库正成为业内的发展趋势。但由于大规模数据场景下 TP 与 AP 系统本身的复杂性,要在一套数据库系统中融合两种使用场景的功能并不容易。浪潮推出的 HTAP 开务数据库采用多模存储引擎的方...

KaiwuDB阅读 2.6k

分布式数据库--SQL优化之Plan Hint

KaiwuDB1阅读 665



3.1k 声望
6.2k 粉丝