When kswapd shrinks the dcache, the last iput() on an ext4 inode can trigger ext4_orphan_del(), which calls ext4_reserve_inode_write() and ultimately __ext4_get_inode_loc(). That function calls sb_getblk(), which wraps __getblk() and carries implicit __GFP_NOFAIL. Because kswapd runs with PF_MEMALLOC set, combining NOFAIL with a non-reclaimable context trips WARN_ON_ONCE(current->flags & PF_MEMALLOC) inside __alloc_pages_slowpath(), producing a spurious splat even though the allocation could simply fail and return -ENOMEM to the caller. Switch both sb_getblk() call sites in __ext4_get_inode_loc() to sb_getblk_gfp() with the same flags minus __GFP_NOFAIL (mapping_gfp_constraint(~__GFP_FS) | __GFP_MOVABLE), computing the gfp value once and reusing it for the optional bitmap_bh optimisation fetch. All callers of __ext4_get_inode_loc() -- reached via ext4_get_inode_loc(), __ext4_get_inode_loc_noinmem(), and ext4_get_fc_inode_loc() -- already propagate a non-zero return as an error without aborting the filesystem. Both sb_getblk() call sites in __ext4_get_inode_loc() are converted; the bitmap_bh fetch already falls back to make_io on NULL, so allowing it to fail is a no-op there. Reproduced under syzkaller+FEMU based fuzz tool (FuzzNvme) on x86_64 QEMU, based on mainline 894009e2ef10: WARNING: CPU: 0 PID: 55 at mm/page_alloc.c:4722 __alloc_pages_slowpath Comm: kswapd0 Not tainted 6.19.0+ #14 Call Trace: __alloc_pages_slowpath alloc_pages_mpol folio_alloc_noprof filemap_alloc_folio_noprof __filemap_get_folio grow_dev_folio grow_buffers __getblk_slow bdev_getblk __ext4_get_inode_loc ext4_get_inode_loc ext4_reserve_inode_write ext4_orphan_del ext4_evict_inode evict iput dentry_unlink_inode __dentry_kill shrink_dentry_list prune_dcache_sb super_cache_scan do_shrink_slab shrink_slab shrink_node balance_pgdat kswapd kthread ret_from_fork Related: see d8b90e6387a ("ext4: add ext4_sb_bread_nofail() helper function for ext4_free_branches()") for the same strategy applied to the read path in ext4_free_branches(). Link: https://lore.kernel.org/all/?q=PF_MEMALLOC+nofail+ext4+iput Acked-by: Sungwoo Kim Acked-by: Dave Tian Acked-by: Weidong Zhu Signed-off-by: Chao Shi --- fs/ext4/inode.c | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index c2c2d6ac7f3..1b2a7bd59b8 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -4859,6 +4859,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, ext4_fsblk_t block; struct blk_plug plug; int inodes_per_block, inode_offset; + gfp_t gfp; iloc->bh = NULL; if (ino < EXT4_ROOT_INO || @@ -4887,7 +4888,14 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, } block += (inode_offset / inodes_per_block); - bh = sb_getblk(sb, block); + /* + * No __GFP_NOFAIL: this can run from reclaim context (kswapd + * shrinker -> iput -> ext4_orphan_del path) where NOFAIL trips + * WARN_ON_ONCE in __alloc_pages_slowpath(). + */ + gfp = mapping_gfp_constraint(sb->s_bdev->bd_mapping, ~__GFP_FS) | + __GFP_MOVABLE; + bh = sb_getblk_gfp(sb, block, gfp); if (unlikely(!bh)) return -ENOMEM; if (ext4_buffer_uptodate(bh)) @@ -4912,7 +4920,7 @@ static int __ext4_get_inode_loc(struct super_block *sb, unsigned long ino, start = inode_offset & ~(inodes_per_block - 1); /* Is the inode bitmap in cache? */ - bitmap_bh = sb_getblk(sb, ext4_inode_bitmap(sb, gdp)); + bitmap_bh = sb_getblk_gfp(sb, ext4_inode_bitmap(sb, gdp), gfp); if (unlikely(!bitmap_bh)) goto make_io; -- 2.43.0