Commit 5b2b0c6e4577 ("mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device") introduced SWP_HIBERNATION so that the swap area selected through /dev/snapshot remains protected against swapoff() for the lifetime of the uswsusp session. When user space issues SNAPSHOT_SET_SWAP_AREA again, snapshot_set_swap_area() currently drops the old pin before attempting to pin the new swap area. If the new selection fails, the ioctl returns an error and user space is expected to abort the session. However, preserving the existing pin in that case makes the kernel side more robust against a failed re-selection, while keeping the existing userspace-visible behavior unchanged. Implement this with the existing swap helpers: - look up the requested swap area first - treat re-selecting the already pinned area as a no-op - pin the new area before unpinning the old one - leave the existing pin in place if the new pin attempt fails This keeps the hibernation session protected against swapoff() until /dev/snapshot is closed, even after a failed attempt to switch to a different swap area. Suggested-by: Youngjun Park Signed-off-by: DaeMyung Kang --- Notes (not part of the commit, stripped by git am): Changes in v2: - Drop Fixes: and Cc: stable; reframe as a hardening improvement rather than a regression fix, per Youngjun's feedback that the current behavior is intentional and there is no concrete user-observable harm. - Drop the new repin_hibernation_swap_type() helper. Rework snapshot_set_swap_area() in place using the existing find / pin / unpin helpers as Youngjun suggested; the change now touches only kernel/power/user.c and adds no new API. - Update the subject and commit log accordingly. - Add Suggested-by: trailer. v1: https://lore.kernel.org/lkml/20260414143200.1267932-1-charsyam@gmail.com/ Baseline -------- This patch is generated against linux-next at commit 5b2b0c6e4577 ("mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device"). Mainline does not yet carry that commit, and neither the helpers used here (find/pin/unpin_hibernation_swap_type) nor the code site this patch modifies exist there. The base-commit trailer at the bottom of the mbox records the exact commit. Testing ------- The behavior change can be exercised entirely through the /dev/snapshot ioctl path; no actual hibernation cycle is required. A targeted assertion test is below; run it as root in a throwaway VM with two active swap block devices and one non-swap block device (three arguments). Run inside a VM on linux-next at 5b2b0c6e4577 with this patch applied: step1: pinned active swap /dev/vda step2: swapoff blocked with EBUSY while pin is held step3: repinned active swap to /dev/vdb step4: swapoff(/dev/vda) succeeded after repinning away step5: repinned swap is blocked with EBUSY step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: No such device step7: swapoff(/dev/vdb) is still blocked with EBUSY result: pin preserved across failed re-set (hardened behavior) step8: swapoff succeeded after closing /dev/snapshot Without the patch, step7 instead reports swapoff(/dev/vdb) succeeded after failed re-set because the old pin had been released before the failed pin attempt. What the assertion test covers: - SWP_HIBERNATION is enforced against swapoff (step2, step5); - the success path moves the pin from one active swap to another (step3, step4, step5); - a failed re-selection preserves the existing pin (step6, step7); - the pin lifetime ends on /dev/snapshot close (step8). What it does not cover: - the snapshot_open(O_RDONLY) initial resume-device pin path; - the full suspend-to-disk image create/restore flow; - concurrent swapoff racing against SNAPSHOT_SET_SWAP_AREA; - the type == data->swap idempotent branch (not externally observable since it intentionally skips the bit toggle). A normal sysfs-based suspend-to-disk cycle continues to work; the find_hibernation_swap_type() / pin / unpin paths themselves are unchanged. Build tested with allmodconfig and run-tested with CONFIG_PROVE_LOCKING=y and CONFIG_KASAN=y. The VM was booted with oops=panic panic=-1 so any WARN/Oops/BUG would have halted the run; the full test completed cleanly with no kernel log diagnostics. Reproducer (C source, for reference only -- not added to the tree): // SPDX-License-Identifier: GPL-2.0 /* * Reproduce / verify the SNAPSHOT_SET_SWAP_AREA pin-lifetime behavior. * * Run only inside a throwaway VM. The test manipulates swap state and * leaves the target swap area disabled on success. * * Usage: * ./uswsusp_swapoff_repro * * Exit codes: * 0 = expected (hardened) behavior: pin preserved across failed re-set * 1 = old behavior: pin dropped on failed re-set * 2 = setup error / inconclusive */ #define _GNU_SOURCE #include #include #include #include #include #include #include #include #include #include #include #include #include static int encode_dev(dev_t dev) { unsigned int major_num = major(dev); unsigned int minor_num = minor(dev); /* Match new_encode_dev() / new_decode_dev() in the kernel. */ return (major_num & 0xfff) << 8 | (minor_num & 0xff) | ((minor_num & ~0xff) << 12); } static int get_block_dev(const char *path, dev_t *dev) { struct stat st; if (stat(path, &st) < 0) { fprintf(stderr, "stat(%s): %s\n", path, strerror(errno)); return -errno; } if (!S_ISBLK(st.st_mode)) { fprintf(stderr, "%s is not a block device\n", path); return -EINVAL; } *dev = st.st_rdev; return 0; } static int snapshot_set_swap_area(int fd, dev_t dev, long long offset) { struct resume_swap_area area = { .offset = offset, .dev = encode_dev(dev), }; if (ioctl(fd, SNAPSHOT_SET_SWAP_AREA, &area) < 0) return -errno; return 0; } int main(int argc, char **argv) { const char *p1, *p2, *pb; dev_t d1, d2, db; int fd, ret; bool buggy = false; if (argc != 4) { fprintf(stderr, "usage: %s \n", argv[0]); return 2; } if (geteuid() != 0) { fprintf(stderr, "must run as root\n"); return 2; } p1 = argv[1]; p2 = argv[2]; pb = argv[3]; if (get_block_dev(p1, &d1) < 0 || get_block_dev(p2, &d2) < 0 || get_block_dev(pb, &db) < 0) return 2; fd = open("/dev/snapshot", O_WRONLY); if (fd < 0) { fprintf(stderr, "open(/dev/snapshot): %s\n", strerror(errno)); return 2; } ret = snapshot_set_swap_area(fd, d1, 0); if (ret < 0) { fprintf(stderr, "step1: %s\n", strerror(-ret)); goto setup_err; } printf("step1: pinned active swap %s\n", p1); if (swapoff(p1) == 0) { fprintf(stderr, "step2: swapoff unexpectedly succeeded\n"); close(fd); return 1; } if (errno != EBUSY) { fprintf(stderr, "step2: expected EBUSY, got %s\n", strerror(errno)); goto setup_err; } printf("step2: swapoff blocked with EBUSY while pin is held\n"); ret = snapshot_set_swap_area(fd, d2, 0); if (ret < 0) { fprintf(stderr, "step3: %s\n", strerror(-ret)); goto setup_err; } printf("step3: repinned active swap to %s\n", p2); if (swapoff(p1) < 0) { fprintf(stderr, "step4: swapoff(%s): %s\n", p1, strerror(errno)); goto setup_err; } printf("step4: swapoff(%s) succeeded after repinning away\n", p1); if (swapoff(p2) == 0) { fprintf(stderr, "step5: swapoff unexpectedly succeeded\n"); close(fd); return 1; } if (errno != EBUSY) { fprintf(stderr, "step5: expected EBUSY, got %s\n", strerror(errno)); goto setup_err; } printf("step5: repinned swap is blocked with EBUSY\n"); ret = snapshot_set_swap_area(fd, db, 0); if (!ret) { fprintf(stderr, "step6: bogus unexpectedly succeeded\n"); goto setup_err; } printf("step6: bogus SNAPSHOT_SET_SWAP_AREA failed as expected: %s\n", strerror(-ret)); if (swapoff(p2) == 0) { printf("step7: swapoff(%s) succeeded after failed re-set\n", p2); printf("result: pin was dropped on failure (old behavior)\n"); buggy = true; } else if (errno == EBUSY) { printf("step7: swapoff(%s) is still blocked with EBUSY\n", p2); printf("result: pin preserved across failed re-set (hardened behavior)\n"); } else { fprintf(stderr, "step7: unexpected: %s\n", strerror(errno)); goto setup_err; } close(fd); if (!buggy) { if (swapoff(p2) < 0) { fprintf(stderr, "step8: swapoff(%s): %s\n", p2, strerror(errno)); return 2; } printf("step8: swapoff succeeded after closing /dev/snapshot\n"); } printf("note: re-enable with `swapon %s` and `swapon %s`\n", p1, p2); return buggy ? 1 : 0; setup_err: close(fd); return 2; } kernel/power/user.c | 35 ++++++++++++++++++++++++++--------- 1 file changed, 26 insertions(+), 9 deletions(-) diff --git a/kernel/power/user.c b/kernel/power/user.c index 4406f5644a56..e1ab85db2e95 100644 --- a/kernel/power/user.c +++ b/kernel/power/user.c @@ -218,6 +218,7 @@ static int snapshot_set_swap_area(struct snapshot_data *data, { sector_t offset; dev_t swdev; + int type, swap; if (swsusp_swap_in_use()) return -EPERM; @@ -239,18 +240,34 @@ static int snapshot_set_swap_area(struct snapshot_data *data, } /* - * Unpin the swap device if a swap area was already - * set by SNAPSHOT_SET_SWAP_AREA. + * User space encodes device types as two-byte values, so we need to + * recode them. */ - unpin_hibernation_swap_type(data->swap); + type = find_hibernation_swap_type(swdev, offset); + if (type < 0) + return swdev ? -ENODEV : -EINVAL; - /* - * User space encodes device types as two-byte values, - * so we need to recode them - */ - data->swap = pin_hibernation_swap_type(swdev, offset); - if (data->swap < 0) + if (type == data->swap) { + /* + * Re-selecting the already pinned swap area is a no-op. + * Keep the existing pin and just refresh the cached device id. + */ + data->dev = swdev; + return 0; + } + + swap = pin_hibernation_swap_type(swdev, offset); + if (swap < 0) { + /* + * Preserve the existing pin on failure. This can happen if the + * target swap area disappears before pinning, or via the + * defensive -EBUSY path in pin_hibernation_swap_type(). + */ return swdev ? -ENODEV : -EINVAL; + } + + unpin_hibernation_swap_type(data->swap); + data->swap = swap; data->dev = swdev; return 0; } base-commit: 5b2b0c6e457765adbe96fb2d464ff1bcd3d72158 -- 2.43.0