Aunt Qian&#39;s best practices in data center construction

Introduction to Aunt Qian’s data is the best practice for the construction of the middle office

Company Profile

Aunt Qian is an industry pioneer in the community fresh food chain, with the brand concept of "not selling overnight meat". At the beginning of its establishment, it reorganized the standards of the traditional fresh food industry from the perspective of freshness, and gave a new definition to the meat and vegetable market. By trying and verifying the "Nissin" model and the "scheduled discount" clearance mechanism, we firmly implement non-overnight sales.

As of May 2021, Aunt Qian has deployed nearly 30 cities across the country, with a total number of stores exceeding 3,000+, serving more than 10 million families, operating vegetables, aquatic products, fruits, pork, meat (non-pork), egg and milk, There are more than 500 high-quality products in eight categories of processed foods and comprehensive standard products. The concept of "not selling overnight meat" returns to the simplest purpose: to allow consumers to buy fresh food. To implement this concept, apart from relying on a strong supply chain system, it is also inseparable from the support of science and technology and digital construction.

Background of the project

Aunt Qian’s omni-channel data center mainly carries transaction-side data, which is close to the front line of business to empower business. At present, Data Center has provided online data services for 3000+ stores to support the operations of various departments and business personnel to conduct data analysis and explore the business value behind the data in the smart center. As Aunt Qian, it is the first to support real-time computing and the first to complete The team of data burying points, starting from the data-driven business concept, escorts the development of Aunt Qian’s business.

In the initial stage of the project, in extreme cases where there is only one product and one technology staffing, the initial infrastructure construction of the project will be completed back-to-back within one month, including but not limited to: data warehouse planning, technical architecture, dimensional modeling, data indicators, A series of tasks such as data development, so how did our team break the game with limited human resources? Please listen to me now.

Data center construction

The selection of architecture and middleware affects the subsequent development process and the complexity of operation and maintenance during the construction of the data center. The current open source big data components can be described as blooming everywhere, and each component has its own characteristics, but they are in the big data. In terms of systemization, each has its own gameplay.

A dazzling array of big data component combinations:

钱大妈1.png

Based on the business-oriented principle, manage, and the functionality is as uniform as possible, so that more energy and time can be spent on business thinking and data-enabled applications.

Our hard requirements: offline computing engine, real-time computing engine, OLAP database, KV database, data integration component, distributed storage system
Our soft requirements: computing resources can be adjusted flexibly, operation and maintenance are easy, component links are as short as possible, and batch flow is unified

In the context of the limited investment in the early stage of the project and the urgent need for the business to go online in the short term, considering the compatibility and scalability of cost and system architecture, our team believes that a cloud-native fully managed big data solution: DataWorks+Maxcompute+Hologres+ Flink is more suitable for us.

The following is the positioning of each component:

DataWorks: One-stop management platform for data integration, development, operation and maintenance, services, etc.
Maxcompute: offline distributed computing engine
Hologres: fast query performance, support online OLAP, KV check, real-time reading and writing
Flink: High-performance real-time computing system

Aunt Qian's Data Middle-Taiwan Architecture V1.0

After the product selection is determined, the data center V1.0 will be built, which is mainly used in business interface acceleration query scenarios. The following is the data center V1.0 architecture diagram:

钱大妈2.png

After the architecture and middleware are determined, it can be completed within one day from the opening of the cloud components to the VPC network connection, allowing the project team to quickly invest in the business data integration and access and data domain modeling work at the initial stage of the project. We can complete these two major projects on the corresponding modules of Dataworks.

In the data access process, we do not need to deploy components such as Kafka+Canal to complete the Binlog subscription of the business database and deploy AirFlow to be responsible for task scheduling management. Through the data integration module of DataWorks, we can "one-click" business data. Real-time and offline synchronization to Maxcompute and Hologres.

In the data modeling function of DataWorks, we use the modeling language: based on the Kimball dimensional modeling paradigm to sort out business sectors and business processes, and perform data modeling operations such as data division and dimensional fact table definition, and use FML (Fast Modeling Language) The modeling language implements the logical model to the physical model.

In the business interface query acceleration scenario, we schedule the MaxCompute data offline to the Hologres internal table to obtain a faster query experience. Because the underlying data of the two are seamlessly connected, the synchronization speed is also relatively fast, with 100,000 data It only takes 1 second to complete (100,000/s).

Choosing Hologres as an important part of online business support is because:

Security: Access RAM authentication, convenient and safe authority management.
Rich indexes: According to different query scenarios, choose different storage modes (row storage or column storage) to provide exclusive index support.
Less data redundancy: Two scenarios of KV and OLAP are met in one system, reducing data redundancy caused by cross-systems.

Aunt Qian's Data Mid-Taiwan Architecture V2.0

In the construction of Data Center V2.0, our architecture has achieved pure offline, offline-real-time Lambda architecture iterations in stages. As shown below:

Synchronization: Use the data integration function of DataWorks to subscribe to the same Binlog log in real time and write to both Hologres and MaxCompute systems.
Offline link: Scenarios that are not sensitive to timeliness and are computationally complex: such as user portraits, human goods yard label system, promotion effect review, etc., are still generated by offline ETL calculations through MaxCompute, and finally aggregated to Hologres for query acceleration.
Real-time link: For scenarios with high real-time requirements: such as real-time billboards, risk control perception, rule alarms and other scenarios, it is realized through the combination of Hologres+Flink.

钱大妈3.png

It is worth noting that in the V2.0 version of the data center, we only added a real-time calculation engine (Flink), and expanded a new real-time link: Hologres (source) -> Flink -> Hologres (Sink ). This is due to the native adaptation of these two components:

Flink real-time computing engine natively supports Hologres Connector, with good compatibility and easy reading and writing
Hologres supports Primary Key (PK), which can ensure accurate consistency in Flink end-to-end scenarios (Exactly Once)
Hologres adopts LSM architecture, supports real-time updates and fine-grained updates

A problem often faced under the Lambda architecture is how to federate the real-time and offline calculation results. The combination of internal and external features of Hologres solves this problem for us from the design level: the real-time calculation results are stored in the internal table, offline The calculation results are stored on MaxCompute and accessed through the external table. Since Hologres has no connection with MaxCompute data at the bottom layer, it is convenient to connect offline and real-time data.

In the pure offline architecture version of Data Center V1.0, we mostly use offline ETL processing to push the result set of the ADS layer to Hologres as a query acceleration. But with the development of business, we are required to have real-time DWD wide surface layer and quasi-real-time, even real-time, thematic DWS layer data . Therefore, technically, we implement real-time widening, cold and hot links, and data back-flushing mechanisms based on the internal and external characteristics of Hologres:

DWD layer, real-time widening + cold and hot links

For example, in the order business scenario, the business side needs to obtain the order changes in the past 30 days in real time. Therefore, the hot data in the past 30 days is written into the Hologres internal table in real time with Flink. The advantage of the internal table is that the query is fast, but the disadvantage is that the storage cost is slightly. High (the internal table is stored in the SSD, and the hardware cost is high). For data over 30 days, the data stored on MaxCompute is accessed through archiving or direct access to achieve tiering of hot and cold data and reduce the storage cost of low-frequency access data. In addition, using the feature of Hologres to support PK, through the semantics of "Insert on Conflict", the real-time written data is regularly rolled and leaked, data is flushed and other data bottoming mechanisms to ensure that the data layer has "self-repair" capabilities to prevent certain failures Under the circumstances, the data inconsistency caused by real-time writing.

DWS layer, micro-batch scheduling/logical view + linked query

For example, in the BI scenario, the business can accept a delay of about 5-10 minutes. We calculate the DWS layer by combining micro-batch scheduling and logical views to support the BI data interface and ad hoc query of the business. It is still through the form of internal and external surfaces to achieve the data stratification of cold and hot. Depending on the business and data volume, cold data determines whether it is stored in MaxCompute and accessed through external tables or synchronized into Hologres internal tables to accelerate query speed. Finally, through the merging of queries in Hologres, the effect of binding queries is achieved.

钱大妈4.png

Risk control scenario application

Data Center Architecture V2.0 has been implemented in many business scenarios of Aunt Qian, such as data services, data reports, real-time risk control systems, etc. The following will introduce how to apply it to risk control scenarios.

The real-time risk control system needs to combine business records and buried point logs to screen online risk events such as off-site payments, large abnormal orders, and consumer terminal changes, trigger risk response actions in real time, and provide risk control specialists with timely data support and faster response capabilities .

钱大妈5.png

As can be seen from the above figure, the entire life cycle of real-time risk control data flow will go through Hologres: data source Binlog, real-time dimensional table check, OLAP analysis: the entire risk control link is event-driven by Binlog, and Hologres is When Binlog mode is turned on, the bottom layer provides Binlog information query, which is convenient for locating the consumption location and resuming the consumption situation. In the dimension table check scenario, optimization methods such as LRU and micro-batch writing can be used to optimize the performance of Lookup Join. Finally, the result information is written back to Hologres in real time to provide online analysis and ad hoc query applications, and it can also be pushed to other business-related touch systems in real time to perform a series of risk response actions such as business-side interception and system alarms to complete the entire risk control scenario closed loop.

Business value

Aunt Qian’s omni-channel data platform implements agile business landing based on Alibaba Cloud’s big data solution, supporting multiple internal and external application scenarios, and the main value brought to the business is as follows:

Resource cost reduction: Zero hardware resource cost, all links support resource elastic expansion and contraction, and flexibly respond to various data peak scenarios
Operation and maintenance cost reduction: fully managed operation and maintenance, reducing operation and maintenance costs, allowing more energy to focus on business logic development
Simple architecture: Integrate the three scenarios of OLTP, OLAP and KV into one system (Hologres) to shorten the middleware link
High architecture ecological compatibility: Hologres is highly compatible with offline computing engines (MaxCompute), real-time computing engines (Flink), and DataWorks. It is integrated into the cloud-native ecosystem and is easy to develop and adapt.

expect

As a HSAP (Analysis Service Integration) product under the cloud native system, Hologres is consistent with the architectural concept of Aunt Qian’s omni-channel data center in terms of positioning and concept. It is hoped that the following features can be supported on the product side in the near future :

Resource isolation. Hologres now isolates resources through different instances, and hopes to achieve custom resource isolation in the form of tenants, for example.
Time-sharing flexibility. Adjust resources flexibly according to different peak periods of the business to save user costs.
Materialized view. As of version 0.10 of Hologres, materialized views are not yet supported. With the materialized view, the scene of micro-batch scheduling can be reduced to a certain extent.
The inner surface is hot and cold stratified. Now Hologres stores cold data on MaxCompute, but the query speed still has a certain gap compared with the internal table. It is hoped that the internal table can achieve the tiering of cold and hot data.

Author: Peng Mingde, currently working for Aunt Qian, as an omni-channel data mid-stage project manager and big data development engineer

Copyright Statement: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own the copyright, and does not bear the corresponding legal responsibility. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.

Aunt Qian's best practices in data center construction

Company Profile

Background of the project

Data center construction

Aunt Qian's Data Middle-Taiwan Architecture V1.0

Aunt Qian's Data Mid-Taiwan Architecture V2.0

Risk control scenario application

Business value

expect

阿里云开发者

引用和评论

福利来了！计算巢支持在已经购买的 ECS 上搭建幻兽帕鲁服务器，支持图形化管理配置

Aunt Qian&#39;s best practices in data center construction

Company Profile

Background of the project

Data center construction

Aunt Qian's Data Middle-Taiwan Architecture V1.0

Aunt Qian's Data Mid-Taiwan Architecture V2.0

Risk control scenario application

Business value

expect

阿里云开发者

引用和评论

福利来了！计算巢支持在已经购买的 ECS 上搭建幻兽帕鲁服务器，支持图形化管理配置

Aunt Qian's best practices in data center construction