From: Zhang Yi Currently, when writing back dirty pages to the filesystem with the dioread_nolock feature enabled and when doing DIO, if the area to be written back is part of an unwritten extent, the EXT4_GET_BLOCKS_IO_CREATE_EXT flag is set during block allocation before submitting I/O. The function ext4_split_convert_extents() then attempts to split this extent in advance. This approach is designed to prevents extent splitting and conversion to the written type from failing due to insufficient disk space at the time of I/O completion, which could otherwise result in data loss. However, we already have two mechanisms to ensure successful extent conversion. The first is the EXT4_GET_BLOCKS_METADATA_NOFAIL flag, which is a best effort, it permits the use of 2% of the reserved space or 4,096 blocks in the file system when splitting extents. This flag covers most scenarios where extent splitting might fail. The second is the EXT4_EXT_MAY_ZEROOUT flag, which is also set during extent splitting. If the reserved space is insufficient and splitting fails, it does not retry the allocation. Instead, it directly zeros out the extra part of the extent, thereby avoiding splitting and directly converting the entire extent to the written type. These two mechanisms also exist when I/Os are completed because there is a concurrency window between write-back and fallocate, which may still require us to split extents upon I/O completion. There is no much difference between splitting extents before submitting I/O. Therefore, It seems possible to defer the splitting until I/O completion, it won't increase the risk of I/O failure and data loss. On the contrary, if some I/Os can be merged when I/O completion, it can also reduce unnecessary splitting operations, thereby alleviating the pressure on reserved space. In addition, deferring extent splitting until I/O completion can also simplify the IO submission process and avoid initiating unnecessary journal handles when writing unwritten extents. Signed-off-by: Zhang Yi Reviewed-by: Jan Kara --- fs/ext4/extents.c | 13 +------------ fs/ext4/inode.c | 4 ++-- 2 files changed, 3 insertions(+), 14 deletions(-) diff --git a/fs/ext4/extents.c b/fs/ext4/extents.c index e53959120b04..c98f7c5482b4 100644 --- a/fs/ext4/extents.c +++ b/fs/ext4/extents.c @@ -3787,21 +3787,10 @@ ext4_convert_unwritten_extents_endio(handle_t *handle, struct inode *inode, ext_debug(inode, "logical block %llu, max_blocks %u\n", (unsigned long long)ee_block, ee_len); - /* If extent is larger than requested it is a clear sign that we still - * have some extent state machine issues left. So extent_split is still - * required. - * TODO: Once all related issues will be fixed this situation should be - * illegal. - */ if (ee_block != map->m_lblk || ee_len > map->m_len) { int flags = EXT4_GET_BLOCKS_CONVERT | EXT4_GET_BLOCKS_METADATA_NOFAIL; -#ifdef CONFIG_EXT4_DEBUG - ext4_warning(inode->i_sb, "Inode (%ld) finished: extent logical block %llu," - " len %u; IO logical block %llu, len %u", - inode->i_ino, (unsigned long long)ee_block, ee_len, - (unsigned long long)map->m_lblk, map->m_len); -#endif + path = ext4_split_convert_extents(handle, inode, map, path, flags, NULL); if (IS_ERR(path)) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index bb8165582840..ffde24ff7347 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -2376,7 +2376,7 @@ static int mpage_map_one_extent(handle_t *handle, struct mpage_da_data *mpd) dioread_nolock = ext4_should_dioread_nolock(inode); if (dioread_nolock) - get_blocks_flags |= EXT4_GET_BLOCKS_IO_CREATE_EXT; + get_blocks_flags |= EXT4_GET_BLOCKS_UNWRIT_EXT; err = ext4_map_blocks(handle, inode, map, get_blocks_flags); if (err < 0) @@ -3744,7 +3744,7 @@ static int ext4_iomap_alloc(struct inode *inode, struct ext4_map_blocks *map, else if (EXT4_LBLK_TO_B(inode, map->m_lblk) >= i_size_read(inode)) m_flags = EXT4_GET_BLOCKS_CREATE; else if (ext4_test_inode_flag(inode, EXT4_INODE_EXTENTS)) - m_flags = EXT4_GET_BLOCKS_IO_CREATE_EXT; + m_flags = EXT4_GET_BLOCKS_CREATE_UNWRIT_EXT; if (flags & IOMAP_ATOMIC) ret = ext4_map_blocks_atomic_write(handle, inode, map, m_flags, -- 2.52.0