Introduction to The OneData methodology proposed by Alibaba helps companies clarify the management ideas of the full life cycle of data, and embeds it into the product Dataphin (intelligent data construction and management) to provide services for enterprises through Alibaba Cloud.
Dataphin intelligent data construction and management platform
Facing the requirements of big data construction, management and application in all walks of life, we provide one-stop big data capabilities for intelligent data construction and management of the entire link from data access to data consumption, including products, technologies and methodology, etc., to help create a unified standard , Integrated, asset-oriented, service-oriented, closed-loop and self-optimized intelligent data system to drive innovation.
Direct Dataphin products: https://www.aliyun.com/product/dataphin
Difficulty is the best coach
Alibaba started to build its own big data system in 2008, and is committed to building a diverse business with data services. Along the way, experienced various difficulties.
Technology stuck in temporary access unknown: Alibaba used to build a special "temporary access demand management system" to allocate temporary access time quotas to each business line. Every time before the end of the month, the quota has been zero. Business classmates chase data technology classmates to work overtime to fetch data from time to time... In order to change this situation, a special "SQL skills training for business personnel" has been established, hoping to let business personnel master the temporary access Numerical skills, beautifully called "empowerment". and the essence behind this is: resource . To
There is a difference in the definition of the data caliber: The difference in the data caliber once almost caused the business to lose. The data forecast that the merchant saw in the background showed that the registration requirements for the event could be met, so they prepared the goods in advance and prepared for a big fight, but the final registration failed. The reason is that the data caliber of the small second side is inconsistent with the data caliber of the merchant side 2. The system assessed that the data of the merchant did not meet the standards, resulting in failure. Although the problem was finally solved through coordination. But the essence behind this is: standard .
It’s normal to work overtime to make reports, report and be criticized for overtime. It usually takes 2-3 hours to fetch the numbers, and it takes a lot of energy to check the difference afterwards, 1-2 days at every turn; the final report will also be due to some reasons. Caliber differences and data quality issues cause embarrassment, and there may even be situations where wrong data leads to decision-making errors. The essence behind .
In addition to the above-mentioned typical scenarios, Alibaba also experienced explosive growth in data volume due to business growth. Non-governance and non-management of data means that the cost of data storage and computing continues to rise. cost of 160c70c8b1f73c is also one of the difficulties faced by the big data field .
The exploration is moving forward, and with the determination to overcome difficulties in actual combat, Alibaba has started B2B business data construction, e-commerce business data construction, and Alibaba business data construction. In the process, exploring, accumulating, and moving forward, through more systematic data construction to improve data quality, reduce the risk of data reconstruction, and improve the efficiency of data services. After nearly ten years of polishing, based on actual combat, Alibaba has precipitated the methodology of OneData big data construction (OneModel+OneID+OneService). OneModel conducts unified construction and management of data by disassembling the data system architecture, data element specification definition, and data index structure; OneID establishes entity objects, object-related behavioral data and label construction methods, and contributes to the core business of the enterprise. The elements are capitalized; a unified thematic data unit is constructed for data assets, data APIs are configured and constructed, and API services are provided to enhance the convenience of data asset consumption and enhance the value of data assets.
Overcome pain points and create leading big data capabilities
With the acceleration of the global digitalization process, enterprises are facing more severe market competition, and the dilemma encountered in the transformation of digital intelligence has also been the initial pain of Alibaba. As a result, Alibaba Cloud Data Center came into being, and cooperated with companies from all walks of life in the data field to solve the prominent data problems of the company:
● data standard problem : Chimney-style development and local business service support caused frequent problems with indicators with the same name and different calibers; historically, different business systems have been iteratively launched, and the same object attribute codes are inconsistent and other problems are prominent;
● data quality problem : Repeated construction leads to a long task chain, numerous tasks, tight computing resources, and poor data timeliness; there is a disconnect between the precipitation of the document defined by the caliber and the implementation of the development code, and the risk of data accuracy assurance is high;
● demand response problem : Long chimney development cycle, low efficiency, insufficient application-oriented service, resulting in slow business response speed, business dissatisfaction, and technology but also feel that there is no precipitation and growth; talents who understand business and data Insufficient, needs to understand that development and realization involve a lot of communication, and the service efficiency is poor;
● cost resource problem : The repeated construction of chimney-style development wastes technical resources; it is more difficult to go online and offline. Source system or business changes cannot be reflected in the data in time. In addition, the data is not standard, which makes R&D and maintenance more difficult. A large amount of useless computing and storage causes a waste of resources.
The OneData methodology proposed by Alibaba helps companies clarify the management ideas for the full life cycle of data, and embeds it into the product Dataphin (intelligent data construction and management), and provides services for companies through Alibaba Cloud. In addition to the data integration, development, release, scheduling, and operation and maintenance capabilities involved in the full link of big data processing, Dataphin also provides data specification definition, logical model definition, automatic code generation, and data theme service capabilities to efficiently complete data. Build. To
Dataphin product core module
Since its launch in 2018, Dataphin has developed a full picture. So far, it has gone through multiple rounds of major version upgrades, and the core capability modules of the product are clearly visible.
1. Environmental adaptation
The bottom layer is Dataphin's environmental adaptation capabilities. Dataphin supports different cloud environments and provides different choices for customers of different scales and different deployment requirements, including public cloud multi-tenancy, public cloud VPC, private cloud enterprise and agile versions, and local IDC deployment.
2. The engine supports
On the cloud environment, different computing engines can be supported according to different cloud environments. Offline computing engines include Alibaba Cloud MaxCompute, and Hadoop ecological engines include Alibaba Cloud E-MapReduce, CDH5, CDH6, and the upcoming FusionInsight, CDP, etc. The real-time calculation engine supports Alibaba Cloud Blink and Flinkvvp. The open source version of Flink will also be supported soon.
3. Data construction
Based on different cloud environments and computing engines, Dataphin provides data integration, development, release, scheduling, operation and maintenance capabilities involved in the full chain of big data processing, and provides data specification definitions, logical model definitions, automatic code generation, and the subject is query Data construction capabilities.
4. Asset
Dataphin provides supporting asset map, asset blood relationship, asset quality management and monitoring, resource cost management and efficiency-improving asset management capabilities, and provides configurable asset service R&D and management capabilities, which can quickly serve business and feed back data assets business.
Copyright Notice: content of this article is contributed spontaneously by Alibaba Cloud real-name registered users, and the copyright belongs to the original author. The Alibaba Cloud Developer Community does not own its copyright and does not assume corresponding legal responsibilities. For specific rules, please refer to the "Alibaba Cloud Developer Community User Service Agreement" and the "Alibaba Cloud Developer Community Intellectual Property Protection Guidelines". If you find suspected plagiarism in this community, fill in the infringement complaint form to report it. Once verified, the community will immediately delete the suspected infringing content.
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。