Hello everyone, JuiceFS v0.17 was released as scheduled on the occasion of the National Day holiday! This is the second version we launched in the fall of 2021. Let's go straight to the topic and see what new changes are there.
This update has a total of 80+ submissions, and a total of 9 friends from the JuiceFS community have contributed code on GitHub. Here, we express our most sincere gratitude to every contributor, and welcome you in front of the screen to join the JuiceFS open source community to contribute code, documents or discuss ideas.
Passed LTP 1270 test, the compatibility under Linux system is more perfect
The latest version of JuiceFS has been further optimized for the Linux system environment, improved the support of rename and setxattr to read other parameters, and successfully passed the LTP 1270 test.
LTP (Linux Test Project) is a project jointly developed and maintained by IBM, Cisco and many other companies. It aims to provide the open source community with a test set to verify the reliability and stability of Linux. Various tools are included in LTP to verify the Linux kernel and related features.
test result :
Testcase Result Exit Value
-------- ------ ----------
fcntl17 FAIL 7
fcntl17_64 FAIL 7
getxattr05 CONF 32
ioctl_loop05 FAIL 4
ioctl_ns07 FAIL 1
lseek11 CONF 32
open14 CONF 32
openat03 CONF 32
setxattr03 FAIL 6
-----------------------------------------------
Total Tests: 1270
Total Skipped Tests: 4
Total Failures: 5
Kernel Version: 5.4.0-1029-aws
Machine Architecture: x86_64
Among them, skipped and failed items are mainly due to several unsupported functions. For details, see this document .
Optimize the performance of storing temporary data
In response to Spark’s temporary data storage requirements such as shuffle files, community contributors wish William (@allwefantasy) contributed a data delayed upload function to JuiceFS, which allows JuiceFS to write data to the local cache disk first. If these data are in If it is deleted in a short time, there is no need to write to the object storage, which can provide read and write performance close to that of a local disk. And when a lot of data is written, it will be automatically written to the object storage to free up the local disk space, and there is no need to worry about shuffle data filling the disk.
This new feature allows JuiceFS to be used as a flexible local disk, providing unlimited storage space and low-latency access for temporary data.
In order to further improve performance, a new metadata engine ( MemKV
) running in the client's memory has been added. Like other metadata engines, MemKV is also used to store metadata related to data, but it is not persistent. After the client is umount, the metadata of MemKV is released. MemKV runs entirely in memory, has absolute performance advantages, and is very suitable for temporary file storage scenarios.
TiKV metadata engine improves performance by 5 times in Hadoop scenarios
JuiceFS Java client needs frequent path resolution. Redis engine implements multi-level path resolution on the server side through Lua, while SQL and TiKV engines still require multiple metadata requests to resolve a path, especially when the path is relatively deep. Big impact.
To solve this problem, this update introduces a metadata caching mechanism similar to the Linux kernel in the JuiceFS Hadoop SDK client, which can control the expiration time of directories, files, and attributes through parameters. It can be enabled in the following ways:
<property>
<name>juicefs.attr-cache</name>
<value>3</value>
</property>
<property>
<name>juicefs.entry-cache</name>
<value>3</value>
</property>
<property>
<name>juicefs.dir-entry-cache</name>
<value>3</value>
</property>
The following is a metadata performance test on the 9-level directory. It can be seen that enabling metadata caching can greatly improve the performance of metadata operations. (The value represents the delay of the operation, the smaller the better.)
However, it should be noted that turning on the metadata cache will affect the consistency between multiple clients (eventual consistency in a limited time window). For example, after a client deletes a file, other nodes may not expire because the cache has not expired. , Still think the file exists. Therefore, it is generally recommended to use this function in query scenarios. If it is a mixed read-write scenario, it is recommended to turn on the caching of directories and attributes, and turn off the caching of file items.
1 minute hands-on performance test, the results are clear at a glance
We further optimized the results of the built-in performance test tool bench . On the basis of simplicity and intuitiveness, we further highlight key data. If a certain performance data deviates from the normal range, it will be displayed in yellow or even red. , It is recommended to pay special attention to it.
For more information about the new version of JuiceFS, welcome to visit the GitHub project homepage for details:
Recommended reading:
How to use JuiceFS to speed up AI model training by 7 times
How to use JuiceFS performance tools for analysis and tuning
**粗体** _斜体_ [链接](http://example.com) `代码` - 列表 > 引用
。你还可以使用@
来通知其他用户。