3
头图

The first anniversary of open source

Started in 2017, JuiceFS is a cloud-native distributed file system designed to help enterprises solve many challenges faced in multi-cloud, cross-cloud, and hybrid cloud environments: data security and protection, big data architecture upgrade, massive small file access , Kubernetes standard storage, etc. JuiceFS is fully compatible with POSIX, HDFS, S3 access protocols, and provides Kubernetes CSI drivers, and has fully managed services on global public clouds. In order to better build software that developers can't put it down, we open sourced JuiceFS on GitHub on January 11, 2021.

Today, JuiceFS has been open source one year anniversary!

One year ago today, we open sourced JuiceFS on GitHub. The original intention is actually very simple: hopes to let more developers know, understand and use through open source. After all, the greatest value of software is to be used. The open source JuiceFS makes users no longer worry about the black box cloud service. Users can download the code by themselves and explore the infinite possibilities of JuiceFS; developers can check the code of JuiceFS, understand, familiarize, trust him from the bottom, and even participate in JuiceFS is under construction. We hope to create a community culture of mutual respect, in which we can not only use JuiceFS, but also exchange new scenarios and gameplay, discuss JuiceFS' engineering design concepts and participate in the formulation of future directions.

The feedback from developers on the open source of JuiceFS has also exceeded our expectations. The open source has appeared on GitHub Trending, Hacker News, InfoQ and other media platforms with developers as the main audience in the first week.

After a year, JuiceFS has made great progress in both the community and the product. However, we know the difficulty of perseverance and will continue to forge ahead with an open and connected mentality.

The product is fully upgraded and more open

When JuiceFS was first open source, the only choice for metadata engine was Redis. Redis, whose storage medium is memory, has many challenges in data reliability and scalability. We have carried out pluggable transformation of the relevant code of the metadata engine, introduced support for relational databases and transactional KV storage like TiKV, solved reliability and scalability issues, and gave users more choices .

As the object storage of JuiceFS data persistence layer, we also support nearly 40 types, basically covering the common types deployed in public cloud, edge cloud, private cloud and other environments. Of course, if there is any omission, please open an issue on GitHub, and we will support it as soon as possible. Broadening the ecology of JuiceFS and improving the openness of JuiceFS are our unswerving pursuits.

At the beginning, JuiceFS only supported the most widely used POSIX API. Since then, it has supported HDFS, S3 API, Kubernetes CSI and Windows operating system. We will support more and more flexible access methods in the future. These protocols are dotted into lines, weaving the data islands scattered within the enterprise into a network, better helping enterprises to open up the data of polymorphic business systems, integrating different technical systems, connecting multiple clouds, and helping customers build a more open data storage platform .

JuiceFS also provides metadata backup and import functions, allowing users to have more protection and reliability in the face of "accidents". This feature gives users the ability to back up in JSON format, which improves data readability and ensures data interchangeability between different metadata engines. Finally, the reliable JuiceFS also provides a "recycle bin" function, where you can find those accidentally deleted data.

In addition to our continued investment in product openness, we also focus on the openness and ease of use of documentation. We deeply understand that documentation is an important link between users and products! Since JuiceFS was open sourced, we have always adhered to the principle of parallel output of high-quality technology and high-quality documents.

2021, we document three complete iterations to achieve document from a "professional" to "universal", to the "experiential" continued transformation of . The work of optimizing the documentation is still ongoing, and efforts are made to ensure that JuiceFS's documentation can be "used immediately by new users" and "used by existing users with peace of mind". In addition to the documentation work, JuiceFS has always maintained the compatibility of data formats and communication protocols in the rapid version iteration, ensuring the forward compatibility of versions, allowing users to upgrade smoothly.

In the year that JuiceFS was open sourced, the products have also undergone tremendous changes, which also made us more determined to follow the open source route. It is extremely correct, because only an open ecosystem is the most vital.

Rich scene landing, ecological co-construction

In just one year, more than 4,400 developers gave JuiceFS a thumbs up. These developers are not only from China, but also developers from Europe, America, Africa, and even the Middle East. Although the new crown epidemic has cut off our physical connection, the open source community has brought us together to contribute to the JuiceFS community in the past 2021.

Over 40 contributors have completed over 800 Pull Requests in the past year, and that's 800 connections we've made through GitHub and the developer community. With the blessing of these 800 connections, JuiceFS has released 16 new versions. These community users who are silently paying attention to JuiceFS behind the scenes have multiplied the pressure and also gave us a lot of motivation.

Based on the WeChat and Slack communities, a user communication group of more than 1,500 people has been established, and they have participated in 9 activities. Everyone started from the use and returned with 33 technical content and scene practice about JuiceFS. Here, we connect the scene and the user.

The file system is the cornerstone of the development of various applications. How to combine with other applications to provide outstanding performance and good experience to form an ecosystem is an important work of the JuiceFS community. In the past year, JuiceFS has been recognized by everyone in some areas and has made good progress.

Big data ecology

JuiceFS is fully compatible with HDFS and integrates seamlessly with the Hadoop ecosystem. Some customers have replaced HDFS to achieve an architecture upgrade that separates storage and computing.

  • Apache Kylin 4.0 released a solution for building clusters with JuiceFS.
  • Using the data lifecycle features of ClickHouse and Elasticsearch, JuiceFS can easily implement data tiered storage, increasing efficiency and reducing costs for users.

AI ecology

The support of JuiceFS multi-access protocol can save a lot of data migration and scheduling work in business processes, and is fully compatible with mainstream machine learning and deep learning training frameworks.

  • The Megvii technical team also contributed the JuiceFS Python SDK to facilitate access to JuiceFS data in a serverless environment.
  • JuiceFS cache acceleration is the most popular feature in AI training scenarios. PaddlePaddle has integrated JuiceFS into the Paddle Operator to accelerate training.
  • Partners of the Yunzhisheng team contributed JuiceFSRuntime to the Fluid community.
  • The vector search engine Milvus also released a solution for building distributed clusters based on JuiceFS.
  • The Byzer community has also integrated JuiceFS as a cloud-native file system into their solutions.

Kubernetes ecosystem

JuiceFS is very suitable for use as PV (PersistentVolume), which is Container Native Storage. Community-provided CSI drivers and comprehensive documentation guides are available on the KubeSphere app store, making it equally easy to use in Rancher and cloud-hosted Kubernetes services.

Friends who are using JuiceFS, also hope to feedback your experience and questions to the JuiceFS community, not only can you get support and help, but also let your experience help many people, which is the value and charm of the open source community.

Multi-industry production environment verification, JuiceFS 1.0 is here

For storage systems, reliability always comes first. JuiceFS innovatively saves metadata and data in mature databases and object storage respectively, which guarantees reliability from the very beginning. This is also the reason why many technology companies can put into production environment and ensure stable operation within half a year of JuiceFS release. The key is. Relying on the standard access protocol, JuiceFS uses the existing test sets of the open source community to ensure compatibility and reliability, as well as various unit tests, stress tests, chaos tests and performance tests to ensure rapid product iteration while ensuring that every time Versions released in high quality.

In the year since JuiceFS was open sourced, many manufacturers such as Xiaomi, Shopee, Li Auto, Zhihu, Aerospace Hongtu, Yaoxin, etc. have deployed JuiceFS in the production environment, and it has been running stably for more than half a year.

  • Xiaomi uses JuiceFS as the storage base of the AI platform.
  • Shopee provides JuiceFS as a cloud platform file storage service to various business lines, supporting a variety of business scenarios.
  • The ideal car uses JuiceFS to realize the separation of storage and calculation of data warehouse.
  • Zhihu uses JuiceFS to speed up the startup loading of Flink stream computing by 4 times.
  • ....

JuiceFS has been running stably and continuously in the production environment of many Internet and AI enterprises. It not only reduces costs for customers, but also improves the efficiency of data use and shortens the cycle of new business launch. Of course, the built-in data protection and encryption also allow customers to Greatly relieved. In the past year, the number of JuiceFS clusters online every day has also increased steadily, from the initial few to more than 500 now, maintaining a high growth rate. It is worth mentioning that this is only the data we have recorded, and I believe there are many users that we have not contacted.

With the support, verification and continuous feedback of the Internet, autonomous driving, gene sequencing, financial technology, intelligent manufacturing and other industries at home and abroad, as well as the support, verification and continuous feedback of the developers in the community, after comprehensive evaluation and verification in various scenarios, the JuiceFS community JuiceFS v1.0-beta will be released this week, community users are welcome to test and give us feedback, and v1.0-GA will be released after improvement based on feedback.

Rethink open source licensing

Back at the beginning of its release in 2021, JuiceFS only supports accessing data through POSIX after mounting. The application accesses data through the kernel and does not need to deal with JuiceFS directly. The application will not be affected by the license of the GPL series, so at that time Adopted the most widely used GPL license (AGPL v3) in the file storage industry.

With the continuous iteration of JuiceFS, more access protocols and SDKs (S3-compatible HTTP protocol and HDFS-compatible Java SDK) have been introduced, affecting users to develop commercial products based on them. At the same time, there are also some open source communities and developers who want to use JuiceFS as a storage base to integrate into their own projects, but the compatibility of AGPL v3 with other open source protocols (such as the Apache protocol) is not very good, preventing more people from enjoying JuiceFS It provides many conveniences such as multi-protocol interworking and efficient caching system.

So, for our original purpose - to build developers' favorite storage products, the Juicedata team decided to change the license to Apache 2.0 from JuiceFS v1.0.

Redefine file storage, the future can be expected

JuiceFS v1.0 is an important milestone, which means that it can be safely used in production environments of various scenarios and begins to accept more and more severe challenges. After that, the community will continue to increase investment and continue to bring more valuable features to everyone, such as the most popular quota management, Snapshot, support for more metadata engines, etc.

With the rapid growth of data scale, distributed file systems are becoming more and more important. Traditional distributed file systems use a complete set of bottom-up systems, which are very complex and difficult to master. JuiceFS innovatively separates metadata and data storage, and reuses existing infrastructure such as mature databases and object storage as much as possible. The access protocol is also compatible with all mainstream interfaces, reducing the system complexity and usage threshold of distributed file systems. Significantly reduced, redefines the construction method of distributed file systems, and can meet the unstructured storage requirements of different scales and scenarios through the combination of a set of systems and different components. At the same time, JuiceFS is a completely cloud-native design, which can be well connected with the ecology on the cloud, in line with the general trend of cloud storage development, and has a very broad application prospect.

Although JuiceFS has done a lot of subtraction, try to avoid reinventing the wheel, and building a mature and reliable storage product still requires a huge engineering investment. In the past year, we have also further strengthened the team of engineers, many of whom have joined the Juicedata team from the participation of the JuiceFS community. We also welcome more like-minded students to join us to create a new era of distributed file storage.

The research and development of open source products requires continuous capital investment. The commercial services we have spent 4 years verifying are also growing rapidly, providing continuous and reliable financial support for the development of JuiceFS. Open source is our sea of stars, and commercialization protects it.

The road is long and long, but the road is coming!

Welcome to follow our project Juicedata/JuiceFS ! (0ᴗ0✿)


JuiceFS
183 声望9 粉丝

JuiceFS 是一款面向云环境设计的高性能共享文件系统。提供完备的 POSIX 兼容性,可将海量低价的云存储作为本地磁盘使用,亦可同时被多台主机同时挂载读写。