Currently, the Linux (ex)FAT drivers do not employ any cluster allocation strategy to keep fragmentation at bay. As a result, when multiple processes are competing for new clusters to expand files in exfat filesystem on Linux simultaneously, the files end up heavily fragmented. HDDs are most impacted, but this could also have some negative impact on various forms of flash memory depending on the type of underlying technology. For instance, modern digital cameras produce multiple media files for a single video stream. If the application does not take the fragmentation issue into account or the system is under memory pressure, the kernel end up allocating clusters in said files in a interleaved manner. Demo script: for (( i = 0; i < 4; i += 1 )); do dd if=/dev/urandom iflag=fullblock bs=1M count=64 of=frag-$i & done for (( i = 0; i < 4; i += 1 )); do wait done filefrag frag-* Result - Linux kernel native exfat, async mount: 780 extents found 740 extents found 809 extents found 712 extents found Result - Linux kernel native exfat, sync mount: 1852 extents found 1836 extents found 1846 extents found 1881 extents found Result - Windows XP: 3 extents found 3 extents found 3 extents found 2 extents found Windows kernel, on the other hand, regardless of the underlying storage interface or the medium, seems to space out clusters for each file. Similar strategy has to be employed by Linux fat filesystems for efficient utilisation of storage backend. In the meantime, userspace applications like rsync may use fallocate to to combat this issue. This patch may introduce a regression-like behaviour to some niche filesystem-agnostic applications that use fallocate and proceed to non-sequentially write to the file. Examples: - libtorrent's use of posix_fallocate() and the first fragment from a peer is near the end of the file - "Download accelerators" that do partial content requests(HTTP 206) in multiple threads writing to the same file The delay incurred in such use cases is documented in WinAPI. Patches that add the ioctl equivalents to the WinAPI function SetFileValidData() and `fsutil file queryvaliddata ...` will follow. Signed-off-by: David Timber --- fs/exfat/file.c | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/fs/exfat/file.c b/fs/exfat/file.c index 90cd540afeaa..4ab7e7e90ae6 100644 --- a/fs/exfat/file.c +++ b/fs/exfat/file.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "exfat_raw.h" #include "exfat_fs.h" @@ -90,6 +91,45 @@ static int exfat_cont_expand(struct inode *inode, loff_t size) return -EIO; } +/* + * Preallocate space for a file. This implements exfat's fallocate file + * operation, which gets called from sys_fallocate system call. User space + * requests len bytes at offset. In contrary to fat, we only support "mode 0" + * because by leaving the valid data length(VDL) field, it is unnecessary to + * zero out the newly allocated clusters. + */ +static long exfat_fallocate(struct file *file, int mode, + loff_t offset, loff_t len) +{ + struct inode *inode = file->f_mapping->host; + loff_t newsize = offset + len; + int err = 0; + + /* No support for other modes */ + if (mode != 0) + return -EOPNOTSUPP; + + /* No support for dir */ + if (!S_ISREG(inode->i_mode)) + return -EOPNOTSUPP; + + if (unlikely(exfat_forced_shutdown(inode->i_sb))) + return -EIO; + + inode_lock(inode); + + if (newsize <= i_size_read(inode)) + goto error; + + /* This is just an expanding truncate */ + err = exfat_cont_expand(inode, newsize); + +error: + inode_unlock(inode); + + return err; +} + static bool exfat_allow_set_time(struct mnt_idmap *idmap, struct exfat_sb_info *sbi, struct inode *inode) { @@ -771,6 +811,7 @@ const struct file_operations exfat_file_operations = { .fsync = exfat_file_fsync, .splice_read = exfat_splice_read, .splice_write = iter_file_splice_write, + .fallocate = exfat_fallocate, .setlease = generic_setlease, }; -- 2.53.0.1.ga224b40d3f.dirty