From: Chuck Lever The fs_pin mechanism notifies interested subsystems when a filesystem is remounted read-only or unmounted. Currently, BSD process accounting uses this to halt accounting when the target filesystem goes away. Registered pins receive callbacks from both group_pin_kill() (during remount read-only) and mnt_pin_kill() (during mount teardown). NFSD maintains NFSv4 client state associated with the superblocks of exported filesystems. Revoking this state during unmount requires lock ordering that conflicts with mnt_pin_kill() context: mnt_pin_kill() runs during cleanup_mnt() with namespace locks held, but NFSD's state revocation path acquires these same locks for mount table lookups, creating AB-BA deadlock potential. Add pin_insert_sb() to register pins on the superblock's s_pins list only. Pins registered this way do not receive mnt_pin_kill() callbacks during mount teardown. After pin insertion, checking SB_ACTIVE detects racing unmounts. When the superblock remains active, normal unmount cleanup occurs through the subsystem's own shutdown path (outside the problematic locking context) without pin callbacks. Signed-off-by: Chuck Lever --- fs/fs_pin.c | 29 +++++++++++++++++++++++++++++ include/linux/fs_pin.h | 1 + 2 files changed, 30 insertions(+) diff --git a/fs/fs_pin.c b/fs/fs_pin.c index 972f34558b97..7204b4a5891f 100644 --- a/fs/fs_pin.c +++ b/fs/fs_pin.c @@ -48,6 +48,35 @@ void pin_insert(struct fs_pin *pin, struct vfsmount *m) } EXPORT_SYMBOL_GPL(pin_insert); +/** + * pin_insert_sb - register an fs_pin on the superblock only + * @pin: the pin to register (must be initialized with init_fs_pin()) + * @m: the vfsmount whose superblock to monitor + * + * Registers @pin on the superblock's s_pins list only. Callbacks arrive + * only from group_pin_kill() (invoked during remount read-only), not + * from mnt_pin_kill() (invoked during mount namespace teardown). + * + * Use this instead of pin_insert() when mnt_pin_kill() callbacks would + * execute in problematic locking contexts. Because mnt_pin_kill() runs + * during cleanup_mnt(), callbacks cannot acquire locks also taken during + * mount table operations without risking AB-BA deadlock. + * + * After insertion, check SB_ACTIVE to detect racing unmounts. If clear, + * call pin_remove() and abort. Normal unmount cleanup then occurs through + * subsystem-specific shutdown paths without pin callback involvement. + * + * The callback must call pin_remove() before returning. Callbacks execute + * with the RCU read lock held. + */ +void pin_insert_sb(struct fs_pin *pin, struct vfsmount *m) +{ + spin_lock(&pin_lock); + hlist_add_head(&pin->s_list, &m->mnt_sb->s_pins); + spin_unlock(&pin_lock); +} +EXPORT_SYMBOL_GPL(pin_insert_sb); + void pin_kill(struct fs_pin *p) { wait_queue_entry_t wait; diff --git a/include/linux/fs_pin.h b/include/linux/fs_pin.h index bdd09fd2520c..24c55329b15f 100644 --- a/include/linux/fs_pin.h +++ b/include/linux/fs_pin.h @@ -21,4 +21,5 @@ static inline void init_fs_pin(struct fs_pin *p, void (*kill)(struct fs_pin *)) void pin_remove(struct fs_pin *); void pin_insert(struct fs_pin *, struct vfsmount *); +void pin_insert_sb(struct fs_pin *, struct vfsmount *); void pin_kill(struct fs_pin *); -- 2.52.0