Flink+Hologres helps Iraq's e-commerce platform to build a new generation of real-time data warehouses
Introduction to Hologres+Flink+DataWorks real-time data warehouse solution brings the value of unified data, unified services, unified governance, and unified storage to Yi's home business. It is really out-of-the-box, what you see is what you get!
Everyone is welcome to give Flink likes and send stars~
Guangzhou Yidejia Network Technology Co., Ltd. is a B2B2C e-commerce platform focusing on serving women. Its business scope includes skin care, color cosmetics, nutritional beauty food, private customized clothing, cross-border e-commerce and other fields. Since the incubation project in 2008, it was launched on Tmall Mall in May 2011. Eight distribution centers across the country, Yanshimei, Yanshan and other brands have been established one after another, and the Yidjia independent e-commerce platform was launched in 2013, and it will be fully launched in 2020. Brand upgrade. Yi's Home uses Internet-based active service marketing to create a strong connection between skincare teachers and customers. From top to bottom, it strictly implements the business philosophy based on quality and professionalism, connecting with social trust, and gaining recognition with service, through continuous innovation. And accumulate to become a leader in social e-commerce.
Business scenario and pain point analysis
Yi’s Home is a B2B2B e-commerce platform that integrates development, design, operation, and sales. In addition to serving millions of members, it also supports thousands of distributors and agents. It has many business applications, large amounts of data, Data query concurrency requirements are high.
The technical department of Iraq’s home has experienced rapid development in the past three years. During the development process, it has always adhered to business priority. For this reason, it has also carried out a variety of technical upgrades and transformations such as application integration, splitting of microservices, and aggregation of distributed applications. The analysis of the status of the department is as follows:
- architecture: has obvious business intrusion problems with multiple languages, multiple data sources, and technology upgrades;
- data: application split caused the data island problem, which in turn caused a large number of data replication and reconstruction problems;
- Application aspect: From the perspective of performance, the business side hopes to see the performance data in a timely and accurate manner, and there is a high demand for real-time performance;
- efficiency: processes and tools are increasingly demanding;
- cost: The is that it is difficult to recruit talents who understand both big data and business, and the cost of team building is high.
In recent years, the business of Yi’s Home has grown rapidly, the amount of data has increased sharply, and the complexity of the business has also increased. Under the current big data architecture, it can solve the problem of "difficult talent reserve" and "business upgrade limited by existing technology." Pain points such as "high pressure on Double 11 events" are imminent.
The technical department of Iraq’s home has a very clear definition of the need for technological upgrading and transformation. mainly focuses on storage elastic expansion and contraction, query performance optimization, OLAP, learning cost, query response, scalability, etc. , the core focus The following 3 questions:
1) How to quickly complete data cleaning
2) How to complete data verification quickly and accurately
3) How to quickly recover from failures
In technology selection, we always adhere to the principle of “technology selection is the primary productivity”, firmly believe that there is no best technical reserve, but better, firmly believe that technology selection is the difference in ability, and insist on improving the ability to do things right at one time , Firmly believe in the importance of open sharing and cognitive upgrading.
In the early days, many experiments were carried out based on open source big data products such as Hadoop, HBase, Kafaka, Azkaban, Spark, Greenplum, etc. Through performance comparison, Greenplum was finally adopted. However, it was finally found that Greenplum had poor concurrency and was only suitable for analysis scenarios. Concurrent query service.
big data computing platform team, the technical department of Yi’s home carried out a comprehensive architecture upgrade. The entire architecture of 160da9aca5344d is composed of DataWorks, real-time computing Flink and Hologres. The architecture is simple and the learning cost is very low. Can easily run through the full link .
The following will introduce to you the best practices of the scenario where Alibaba Cloud technology products land in Iraq’s home
1. Customer system practice
Yidijiayuan customer relationship management system (CRM) is mainly based on MySQL, MQ, Canal and self-developed applications. In order to support the cut-off upgrade of the business system, the technical department independently developed a set of message middleware, which has a high maintenance cost; based on Binlog The custom data development process of products such as, MQ, OLAP, etc. is cumbersome and complex, and the maintenance cost is extremely high, and because the system requires data to be orderly, it has certain restrictions on the concurrency of cleaning.
After the architecture upgrade based on Hologres+DataWorks+real-time computing Flink, the database data is directly written to Hologres in real time through DataWorks data integration, and then Flink is subscribed to Hologres through real-time computing for further real-time cleaning, and the result table is updated to the database to directly serve the business.
clear and simple overall architecture, accurate data, end-to-end pure real-time, integrated storage and analysis, managed operation and maintenance, and fully automated tool operations. The original system took 15 people 3 months to complete the project and the current architecture only takes 2 days The deployment is complete.
2. BI performance system practice
The BI performance system can also be understood as a real-time GMV large screen. There are two main requirements for business data:
- real time
- Accurate, performance calculations must never make mistakes.
The original architecture is shown in the figure below. The original data layer is written into MQ in real time through Binlog and then through the Canal suite, and then business data is layered and cleaned according to the business domain. The order of the task scheduling system to update performance is "day-month-quarter-year". This seemingly perfect solution actually has several problems:
- Real-time problem: It seems real-time, but there may be a 5-10 minute delay in the process;
- Concurrency problem: Concurrency of consumption has a certain limit.
- Operation and maintenance issues: If there is a problem in one of the links in the diagram, it may cause the system to follow.
- Data cleaning timeliness issue: It may take several minutes for the cleaning script to run once, and many other things may happen during this period.
The picture below shows the new architecture of the upgraded BI performance system. Synchronize detailed data to Hologres in real time through DataWorks, and add a real-time Flink real-time ETL job based on Hologres data to complete the processing of "day-month-quarter-year" data, and finally provide analysis and query services for upper-level applications based on Hologres . entire system has pure real-time scheduling, high real-time performance, second-level delay, full SQL development, and efficient data verification.
3. Real-time application data warehouse architecture practice
The technical department of Yi's Home has also been thinking about how to make application developers have the ability to develop big data, and how to make big data not only used by the big data team, but also used by the application development team.
based on the implementation of real-time computing FLink+Hologres+DataWorks real-time data warehouse architecture, which improves the reusability of the data chassis, improves the flexibility of dynamic data adjustment in response to business changes, and builds an application system with data together with the application team .
Fourth, the group data warehouse structure practice
In addition to the e-commerce business, the Yijia Data Warehouse team also needs to support the internal business of the group. The Group's data warehouse platform, such as the mainstream data warehouse architecture in the market, is based on the open source big data system, and has been fully upgraded to Hologres + real-time computing Flink + DataWorks real-time data warehouse architecture.
Business value and empowerment
The value of the new solution of Hologres+real-time calculation Flink+DataWorks real-time data warehouse for business is mainly as follows:
- Unified data: A set of solutions can support the complete process, and the data such as schedules and dimension tables are unified and orderly
- Unified service: Hologers directly provides various online services, including data analysis, data services, etc., reducing interface construction.
- Unified storage: With Hologres as the unified storage, multiple data sources can be directly written to Hologres, no redundant storage, cost saving
- Unified governance: DataWorks provides unified standards, unified operations and unified monitoring, etc., to provide unified governance for the big data development platform.
From a business perspective, the new big data solution really achieves out of the box, and what you see is what you get .
Look to the future
In the field of big data, data scale and business complexity are the key factors that restrict query performance at the same time. In this process, only our developers continue to polish their data models. When the data model reaches a certain level of maturity, performance problems can be solved. Solved easily.
Finally, I hope that everyone embraces technology, embraces changes, and wins in models, data service business, and data service applications. Let us live for applications and fight for applications.
Author: Liu Songsen, CTO of Yi's Home, Senior Engineer, Associate Professor Title, Visiting Professor of
Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own the copyright, and does not bear the corresponding legal responsibility. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.