From: Chuck Lever The group_pin_kill() function iterates the superblock's s_pins list and invokes each pin's kill callback. Previously, this function was called only during remount read-only (in reconfigure_super). Add a group_pin_kill() call in cleanup_mnt() so that pins registered via pin_insert_sb() receive callbacks during mount teardown as well. This call runs after mnt_pin_kill() processes the per-mount m_list, ensuring: - Pins registered via pin_insert() receive their callback from mnt_pin_kill() (which also removes them from s_list via pin_remove()), so group_pin_kill() skips them. - Pins registered via pin_insert_sb() are only on s_list, so mnt_pin_kill() skips them and group_pin_kill() invokes their callback. This enables subsystems to use pin_insert_sb() for receiving unmount notifications while avoiding any problematic locking context that mnt_pin_kill() callbacks must handle. Because group_pin_kill() operates on the superblock's s_pins list, unmounting any mount of a filesystem--including bind mounts--triggers callbacks for all pins registered on that superblock. For NFSD, this means unmounting an exported bind mount revokes NFSv4 state for the entire filesystem, even if other mounts remain. Signed-off-by: Chuck Lever --- fs/fs_pin.c | 14 +++++++------- fs/namespace.c | 2 ++ 2 files changed, 9 insertions(+), 7 deletions(-) diff --git a/fs/fs_pin.c b/fs/fs_pin.c index 7204b4a5891f..54c1163a9cde 100644 --- a/fs/fs_pin.c +++ b/fs/fs_pin.c @@ -54,17 +54,17 @@ EXPORT_SYMBOL_GPL(pin_insert); * @m: the vfsmount whose superblock to monitor * * Registers @pin on the superblock's s_pins list only. Callbacks arrive - * only from group_pin_kill() (invoked during remount read-only), not - * from mnt_pin_kill() (invoked during mount namespace teardown). + * from group_pin_kill(), invoked during both remount read-only and mount + * teardown. Unlike pin_insert(), the pin is not added to the per-mount + * mnt_pins list, so mnt_pin_kill() does not invoke the callback. * * Use this instead of pin_insert() when mnt_pin_kill() callbacks would - * execute in problematic locking contexts. Because mnt_pin_kill() runs - * during cleanup_mnt(), callbacks cannot acquire locks also taken during - * mount table operations without risking AB-BA deadlock. + * execute in problematic locking contexts. Callbacks registered via this + * function run from group_pin_kill() instead, which may execute under + * different locking conditions. * * After insertion, check SB_ACTIVE to detect racing unmounts. If clear, - * call pin_remove() and abort. Normal unmount cleanup then occurs through - * subsystem-specific shutdown paths without pin callback involvement. + * call pin_remove() and abort. * * The callback must call pin_remove() before returning. Callbacks execute * with the RCU read lock held. diff --git a/fs/namespace.c b/fs/namespace.c index c58674a20cad..a887d45636f5 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -1309,6 +1309,8 @@ static void cleanup_mnt(struct mount *mnt) WARN_ON(mnt_get_writers(mnt)); if (unlikely(mnt->mnt_pins.first)) mnt_pin_kill(mnt); + if (unlikely(!hlist_empty(&mnt->mnt.mnt_sb->s_pins))) + group_pin_kill(&mnt->mnt.mnt_sb->s_pins); hlist_for_each_entry_safe(m, p, &mnt->mnt_stuck_children, mnt_umount) { hlist_del(&m->mnt_umount); mntput(&m->mnt); -- 2.52.0