How to accelerate cloud-native data applications? This open source project has attracted much attention

On September 17, 2021, at the "2021 OSCAR Open Source Industry Conference" co-sponsored by China Academy of Information and Communications Technology and China Communications Standards Association, it was co-sponsored by the Alibaba Cloud native team (other joint units include: Nanjing University, Alluxio community) The open source project Fluid won the "OSCAR Pinnacle Open Source Project and Open Source Community" award; at the same time, as the co-sponsor and community operation chairman of the Fluid project, Associate Researcher Gu Rong from Nanjing University PASALab was selected as an "open source figure".

Fluid will be officially open sourced in September 2020. Many friends who are familiar with Fluid know that the essence of the project is a cloud-native data orchestration and acceleration system. It officially became a CNCF Sandbox project in May 2021, helping the industry to improve an important territory in the field of cloud-native AI.

In just one year of development, Fluid has won two important recognitions from the open source community at one time, proving that its cloud-native and AI fields are also receiving widespread attention. What is the significance and value of this? We try to monitor the leopard and talk about our views from Fluid's development background and real-world practice.

Cloud native + AI, the circular engine of enterprise digital innovation

This year is the beginning of the 14th Five-Year Plan. In March 2021, Xinhua News Agency published the full text of the "Outline of the Fourteenth Five-Year Plan for National Economic and Social Development of the People's Republic of China and Long-Term Goals for 2035" (hereinafter referred to as the "Outline"). As an important action plan for industrial development and technological innovation in the next five years, there are three key words in the outline that attract special attention: "artificial intelligence", "cloud computing", and "open source", which is included in the plan for the first time.

As the infrastructure for building a digital economy, cloud computing has penetrated into all walks of life just like water, electricity and gas, and it is no exaggeration to describe it as moisturizing things silently. In recent years, cloud-native technologies represented by containers, microservices, and DevOps have fully released the powerful service capabilities of the cloud, accelerated the agility of infrastructure, and further realized the improvement of enterprise production efficiency, because it is called “enterprise digitalization” The shortest path of transformation".

As the main resource carried on the information infrastructure, "data" can be regarded as the "blood" of the new infrastructure. The in-depth integration trend of AI technology and cloud computing has also further put forward new requirements for computing power and application architecture.

Looking back at the development of the main technical frameworks in the AI field, such as Spark, Hive, and MapReduce, in order to reduce data transmission, its design considers more data localization architecture. However, as the technical environment and application requirements continue to change, in order to take into account the flexibility of resource expansion and the cost of use, the architecture of separation of computing and storage has gradually become the mainstream in the cloud native environment. While this separation of computing and storage architecture improves system flexibility and flexibility, it also brings challenges in computing performance and management efficiency to data-intensive applications such as AI.

In order to solve the pain points that the existing cloud-native orchestration frameworks for running such applications face high data access latency, difficult joint analysis of multiple data sources, and complex application data processes, the Alibaba Cloud cloud native team, Nanjing University, and Alluxio community jointly initiated and open sourced the cloud Fluid, a native data orchestration and acceleration system, was officially accepted by CNCF as a Sandbox project in May 2021, accelerating data-intensive applications and fully embracing cloud native.

Core functions:

Fluid proposes a series of technological innovations in the collaborative orchestration of cloud-native applications and data, scheduling optimization, and data caching. Its core functions include:

Provides storage-unaware data objects-Dataset: A unified abstract definition and management of different storage systems is realized through Custom Resource Definition, and it supports observability and elastic scaling.
Use distributed caching technology to accelerate data set reading and writing: Customize and manage the distributed data cache engine by extending the CacheRuntime object. Currently, the cache engines Alluxio and JindoFS are natively supported.
Intelligent data orchestration based on container scheduling: Based on the Kubernetes container scheduling and capacity expansion and contraction capabilities, the intelligent orchestration of data cache is realized.
Collaborative scheduling of data sets and applications: Expand the Kubernetes scheduler to perceive data set cache information, schedule applications nearby, and take advantage of the performance advantages of local read and write caches.
Standard access interface: Use the Kubernetes standard storage interface Persistent Volume Claim to access the data set to achieve seamless compatibility with cloud native applications.
Scenario-oriented performance tuning: For tasks such as deep learning and batch data processing, it provides data set warm-up, metadata management optimization, small file IO optimization, automatic elastic scaling and other means to generally improve task operation efficiency.

Open source has become an important choice for cloud native AI applications in the production environment

At this open source industry conference, He Baohong, director of the Institute of Cloud Computing and Big Data of the Institute of Information and Communications Technology, and Dai Xiaohui, executive deputy secretary-general and vice chairman of China Communications Standards Association, and other guests expressed key views, saying that open source is a new production in the software industry. The method is also a new delivery method. After more than 20 years of development, this method has matured. It can fully mobilize individual subjective initiative, collide ideas through community collaboration mechanisms, stimulate technological innovation, lead the development of a new generation of general technology, and build a new cooperation model, through code disclosure, rules disclosure, and process disclosure to create transparency and openness. Through code detection, the security front is automatically formed, which effectively eliminates the concerns of enterprises and individuals about participating, and establishes a trust mechanism, which has become an important choice for enterprises to build information systems.

These views have also been fully confirmed in the Fluid open source community. From the day it was formally established, the parties that co-founded Fluid have been committed to accelerating cloud-native infrastructure to embrace data-intensive applications by combining original research in academia and practical capabilities in industry, and uphold the spirit of open source to work with the community. Promote the construction and use of a unified interface for Kubernetes platform application usage and data management.

In just one year since it was officially open sourced, Fluid has developed rapidly with the help of the community, and has attracted the attention of experts and engineers from many companies such as China Telecom, Weibo, Boss Direct Employment, Fourth Paradigm, and Yunzhisheng. Contributed a lot of development work, including Weibo, China Telecom, Haomo Zhixing and many large-scale well-known IT and Internet companies have successfully applied Fluid to the development and deployment of data-intensive applications in the production environment, greatly improving resource utilization efficiency and Application performance.

Fluid's open source practice has not only been recognized from all walks of life, but also provides companies with innovative data-intensive application development and deployment on the cloud in a native way, accelerating data circulation, collection, processing and value mining, and improving application production efficiency. Reliable experience and methods.

As a data-intensive application operation support platform that is fully compatible with the native Kubernetes ecosystem, Fluid will develop towards a more flexible, intelligent, and extensible architecture to continuously improve the experience of developers and users. In the future, Fluid will continue to work side by side with the community and the ecosystem, and is committed to promoting the ecological construction and popularization of cloud-native technologies in AI and other fields, and expanding the boundaries of cloud-native with global developers.

link (1615adeb98b10a https://github.com/fluid-cloudnative/fluid) to view the Fluid open source project github homepage!

How to accelerate cloud-native data applications? This open source project has attracted much attention

Cloud native + AI, the circular engine of enterprise digital innovation

Core functions:

Open source has become an important choice for cloud native AI applications in the production environment

阿里云云原生

引用和评论

Spring AI Alibaba 发布企业级 MCP 分布式部署方案

基于 MCP 的 AI Agent 应用开发实践

OSPO Summit 2025 正式定档！议题征集同步开启

OSPO Summit 2025 首批议程发布！

🔥吐血整理 Bolt.diy 部署与应用攻略

Koupleless 助力「人力家」实现分布式研发集中式部署，又快又省！

支付宝H5下载被拦截的原因排查与解决指南