Source | Alibaba Cloud Native

On April 27, 2021, the Cloud Native Computing Foundation (CNCF) announced the acceptance of Fluid as the official sandbox project of CNCF through a global TOC vote. Fluid is a cloud-native data orchestration and acceleration system jointly initiated and open sourced by Nanjing University, Alibaba Cloud, and Alluxio open source community.

Fluid project address:
https://github.com/fluid-cloudnative/fluid

Project Introduction

In the cloud-native environment, the separation of computing and storage architecture improves system flexibility and flexibility, and at the same time brings challenges in computing performance and management efficiency to data-intensive applications such as big data/AI. Existing cloud-native orchestration frameworks for running such applications face pain points such as high data access latency, difficulty in joint analysis of multiple data sources, and complex application data processes. Fluid was born to solve these problems.

1222.jpg
Fluid system architecture diagram

Fluid runs on Kubernetes and is a scalable distributed data orchestration and acceleration system. Its goal is to build an efficient support platform for data-intensive applications in a cloud native environment. The project was open sourced in September 2020, and it has developed rapidly in just over half a year. It has attracted the attention and contributions of experts and engineers in many fields, and has been used in many large-scale well-known IT and Internet companies including Weibo and China Telecom.

core function

Fluid proposes a series of technological innovations in the collaborative orchestration of cloud-native applications and data, scheduling optimization, and data caching. Its core functions include:

  • provides storage-unaware data objects-Dataset : A unified abstract definition and management of different storage systems is realized through Custom Resource Definition, which supports observability and elastic scaling.
  • uses distributed caching technology to accelerate data set reading and writing : Customize and manage the distributed data cache engine by extending the CacheRuntime object. Currently, the cache engine Alluxio and JindoFS natively supported.
  • Intelligent data orchestration based on container scheduling : Based on Kubernetes container scheduling and expansion and contraction capabilities, it realizes the intelligent orchestration of data cache.
  • data set and application coordinated scheduling : expand the Kubernetes scheduler to perceive data set cache information, schedule applications nearby, and take advantage of the performance advantages of local read-write cache.
  • standard access interface : Use the Kubernetes standard storage interface Persistent Volume Claim to access data sets to achieve seamless compatibility with cloud native applications.
  • Scene-oriented performance tuning : For tasks such as deep learning and batch data processing, it provides data set warm-up, metadata management optimization, small file IO optimization, automatic elastic scaling and other means to generally improve task operation efficiency.

Look to the future

The Fluid open source project is committed to accelerating the embrace of data-intensive applications in cloud-native infrastructure by combining original research in academia and the practical capabilities of the industrial world, and working with the open source community to build a unified interface for using and managing data on the Kubernetes platform. The Fluid open source community currently has 5 core maintainers (Maintainers) from Nanjing University, Alibaba and Alluxio. Associate researcher Gu Rong from Nanjing University PASALab serves as the chairman of the open source community. In addition, engineers from companies such as China Telecom, Weibo, Boss Direct Employment, Fourth Paradigm, and Yunzhisheng have contributed a lot of development work.

As a data-intensive application operation support platform that is fully compatible with the native Kubernetes ecosystem, Fluid will develop towards a more flexible, intelligent, and extensible architecture to continuously improve the experience of developers and users. In the future, Fluid will continue to work side by side with the community and the ecosystem, and is committed to promoting the ecological construction and popularization of cloud-native technologies in the field of big data/AI systems, and expanding the boundaries of cloud-native with global developers.


阿里云云原生
1k 声望302 粉丝