Hi, I've been debugging a use-after-free bug in the swap subsystem that manifests as a crash in free_swap_count_continuations() during swapoff on zram devices. == Problem == KASAN reports wild-memory-access at address 0xdead000000000100 (LIST_POISON1): Oops: general protection fault, probably for non-canonical address 0xfbd59c0000000020 KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107] RIP: 0010:__do_sys_swapoff+0x1151/0x1860 RBP: dead0000000000f8 R13: dead000000000100 The crash occurs when free_swap_count_continuations() iterates over a list_head containing LIST_POISON values from a previous list_del(). == Root Cause == The swap subsystem uses vmalloc_to_page() to get struct page pointers for the swap_map array, then uses page->private and page->lru for swap count continuation lists. When vmalloc allocates high-order pages without __GFP_COMP and splits them via split_page(), the resulting pages may contain stale data: 1. post_alloc_hook() only clears page->private for the head page (page[0]) 2. split_page() only calls set_page_refcounted() for tail pages 3. Tail pages retain whatever was in page->private and page->lru from previous use - including LIST_POISON values from prior list_del() calls In add_swap_count_continuation() (mm/swapfile.c): if (!page_private(head)) { INIT_LIST_HEAD(&head->lru); set_page_private(head, SWP_CONTINUED); } If head is a vmalloc tail page with stale non-zero page->private, the INIT_LIST_HEAD is skipped, leaving page->lru with poison values. When free_swap_count_continuations() later iterates this list, it crashes. The comment at line 3862 says "Page allocation does not initialize the page's lru field, but it does always reset its private field" - this assumption is incorrect for vmalloc pages obtained via split_page(). == Proposed Fix == Initialize page->private and page->lru for all pages in split_page(). This matches the documented expectation in mm/vmalloc.c: "High-order allocations must be able to be treated as independent small pages by callers... Some drivers do their own refcounting on vmalloc_to_page() pages, some use page->mapping, page->lru, etc." --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -3122,6 +3122,16 @@ void split_page(struct page *page, unsigned int order) VM_BUG_ON_PAGE(PageCompound(page), page); VM_BUG_ON_PAGE(!page_count(page), page); + /* + * Split pages may contain stale data from previous use. Initialize + * page->private and page->lru which may have LIST_POISON values. + */ + INIT_LIST_HEAD(&page->lru); + for (i = 1; i < (1 << order); i++) { + set_page_private(page + i, 0); + INIT_LIST_HEAD(&page[i].lru); + } + for (i = 1; i < (1 << order); i++) set_page_refcounted(page + i); split_page_owner(page, order, 0); == Testing == Reproduced with a stress test cycling swapon/swapoff on 8GB zram under memory pressure: - Without patch: crash within ~50 iterations - With patch: 1154+ iterations, no crash The bug was originally discovered on Fedora 44 with kernel 6.19.0-rc7 during normal system shutdown after extended use. == Questions == 1. Is split_page() the right place for this fix, or should the swap code be more defensive about uninitialized vmalloc pages? 2. Should prep_new_page()/post_alloc_hook() initialize all pages in high-order allocations, not just the head? 3. Are there other fields besides page->private and page->lru that callers of split_page() might expect to be initialized? Thoughts? -- Best Regards, Mike Gavrilov.