From: Aditya Prakash Srivastava Syzbot/stress-ng reported an ABBA deadlock in ext4 when exercising concurrent xattr workloads (using the ea_inode mount/format option). The deadlock occurs between the running transaction and the eviction thread: - Task 1 (stress-ng): Holds a reference to a shared mbcache_entry (ce) and calls ext4_xattr_inode_cache_find() -> ext4_iget() to retrieve the corresponding EA inode. Since the EA inode is currently being evicted, ext4_iget() blocks in __wait_on_freeing_inode() waiting for eviction to complete. - Task 2 (eviction thread): Currently evicting the same EA inode in ext4_evict_ea_inode(). It calls mb_cache_entry_wait_unused(oe) which blocks waiting for Task 1 to release the reference to the mbcache_entry. To break this deadlock, implement a new ext4_iget() configuration flag named EXT4_IGET_NOWAIT. When set, perform a non-blocking lookup of the inode via VFS's find_inode_nowait() API. If the inode is currently being evicted (marked with I_FREEING or I_WILL_FREE) or created (I_CREATING), or if it is not present in the VFS inode cache (cache miss), simply skip it (returning -ENOENT) rather than waiting for eviction/creation to complete, breaking the ABBA cycle. Since we return -ENOENT immediately on a cache miss, we never attempt to allocate a new inode or call iget_locked(), completely eliminating any TOCTOU race window. If the returned inode is I_NEW, wait for its initialization to clear via wait_on_new_inode(). If initialization fails and the inode is unhashed during wait_on_new_inode() waking up (e.g., due to an I/O read error in another thread), safely drop the reference and return -ENOENT. This unhashed check is executed unconditionally on all cache-hit pathways to properly handle concurrent initialization failures. Finally, standard validation checks (including is_bad_inode, EXT4_EA_INODE_FL, file_acl, and xattr flags) are executed as normal inside check_igot_inode() to fully guarantee VFS-layer safety. In ext4_xattr_inode_cache_find(), invoke ext4_iget() with the new EXT4_IGET_NOWAIT flag to perform the non-blocking cache search. Suggested-by: Jan Kara Reported-by: Colin Ian King Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219283 Fixes: 0a46ef234756 ("ext4: do not create EA inode under buffer lock") Signed-off-by: Aditya Prakash Srivastava --- Changes in v6: - Drop the redundant is_freeing tracking variable completely, as pointed out by Jan Kara. - Return -ENOENT instead of -ESTALE on cache misses and unhashed inode checks, as requested by Jan Kara. Changes in v5: - Address two critical concurrency issues flagged by the Sashiko AI bot in v4: 1. Resolve the Time-Of-Check to Time-Of-Use (TOCTOU) race window between find_inode_nowait() and iget_locked() by returning -ESTALE immediately on a VFS cache miss. This completely bypasses fallback to iget_locked() and prevents potential ABBA deadlocks. 2. Fix the improperly nested inode_unhashed() safety check by moving it outside the I_NEW condition block, ensuring it runs unconditionally on all cache-hit pathways to prevent false-positive filesystem corruption errors during concurrent initialization failures. Changes in v4: - Check if the inode was unhashed during wait_on_new_inode() waking up to handle transient initialization failures (like I/O read errors) gracefully. Dropping the reference and returning -ESTALE prevents false filesystem corruption errors (__ext4_error), as found by the Sashiko AI bot. Changes in v3: - Implement a new ext4_iget() configuration flag named EXT4_IGET_NOWAIT to fully contain the non-blocking lookup and VFS-level validations within inode.c, as requested by Jan Kara. - Skip inodes currently being created (I_CREATING), following Jan Kara's direct feedback. - Remove all open-coded match helpers and VFS state-checks from xattr.c. Changes in v2: - Read inode state locklessly using inode_state_read_once() to resolve a lockdep assertion on cache hit. - Manually restore essential inode/ea_inode validations on the retrieved inode (is_bad_inode, EXT4_EA_INODE_FL, file_acl, and xattr checks) to match VFS safety guarantees and prevent using corrupted/failed inodes. fs/ext4/ext4.h | 3 ++- fs/ext4/inode.c | 35 ++++++++++++++++++++++++++++++++--- fs/ext4/xattr.c | 2 +- 3 files changed, 35 insertions(+), 5 deletions(-) diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h index b37c136ea3ab..c76dd0bdd3d8 100644 --- a/fs/ext4/ext4.h +++ b/fs/ext4/ext4.h @@ -3144,7 +3144,8 @@ typedef enum { EXT4_IGET_SPECIAL = 0x0001, /* OK to iget a system inode */ EXT4_IGET_HANDLE = 0x0002, /* Inode # is from a handle */ EXT4_IGET_BAD = 0x0004, /* Allow to iget a bad inode */ - EXT4_IGET_EA_INODE = 0x0008 /* Inode should contain an EA value */ + EXT4_IGET_EA_INODE = 0x0008, /* Inode should contain an EA value */ + EXT4_IGET_NOWAIT = 0x0010 /* Non-blocking lookup (skip if freeing) */ } ext4_iget_flags; extern struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ce99807c5f5b..a091a43959a3 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5270,6 +5270,20 @@ void ext4_set_inode_mapping_order(struct inode *inode) mapping_set_folio_order_range(inode->i_mapping, min_order, max_order); } +static int ext4_iget_match(struct inode *inode, u64 ino, void *data) +{ + if (inode->i_ino != ino) + return 0; + spin_lock(&inode->i_lock); + if (inode_state_read(inode) & (I_FREEING | I_WILL_FREE | I_CREATING)) { + spin_unlock(&inode->i_lock); + return -1; + } + __iget(inode); + spin_unlock(&inode->i_lock); + return 1; +} + struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, ext4_iget_flags flags, const char *function, unsigned int line) @@ -5298,9 +5312,24 @@ struct inode *__ext4_iget(struct super_block *sb, unsigned long ino, return ERR_PTR(-EFSCORRUPTED); } - inode = iget_locked(sb, ino); - if (!inode) - return ERR_PTR(-ENOMEM); + if (flags & EXT4_IGET_NOWAIT) { + inode = find_inode_nowait(sb, ino, ext4_iget_match, NULL); + if (!inode) + return ERR_PTR(-ENOENT); + + if (inode_state_read_once(inode) & I_NEW) + wait_on_new_inode(inode); + + if (unlikely(inode_unhashed(inode))) { + iput(inode); + return ERR_PTR(-ENOENT); + } + } else { + inode = iget_locked(sb, ino); + if (!inode) + return ERR_PTR(-ENOMEM); + } + if (!(inode_state_read_once(inode) & I_NEW)) { ret = check_igot_inode(inode, flags, function, line); if (ret) { diff --git a/fs/ext4/xattr.c b/fs/ext4/xattr.c index 982a1f831e22..21b5670d8503 100644 --- a/fs/ext4/xattr.c +++ b/fs/ext4/xattr.c @@ -1550,7 +1550,7 @@ ext4_xattr_inode_cache_find(struct inode *inode, const void *value, while (ce) { ea_inode = ext4_iget(inode->i_sb, ce->e_value, - EXT4_IGET_EA_INODE); + EXT4_IGET_EA_INODE | EXT4_IGET_NOWAIT); if (IS_ERR(ea_inode)) goto next_entry; ext4_xattr_inode_set_class(ea_inode); -- 2.47.3