AI Trajectory

Trajectory: (Expand/Collapse All)
Seq	Timestamp	Type	Name	Duration
0/0	2026/06/30 18:54	flow	patch-triage	3m Args: null Results: { "EnableConfigs": [ "MEMORY_FAILURE" ], "FocusSymbols": [ "get_any_page", "panic_on_unrecoverable_mf" ], "Reasoning": "The patch modifies the memory failure handler in core MM to short-circuit recovery for kernel-owned pages and optionally panic. This error-handling path is reachable in virtualized environments via madvise(MADV_HWPOISON) on kernel-owned pages mapped in user space (like the vDSO) or via sysfs error injection.", "WorthFuzzing": true }
1/1	2026/06/30 18:54	action	read-patch-diff	0m Args: null Results: { "PatchDiff": "commit cd6649a5d4f7cd49ae8556e2d53436c6b6caba06\nAuthor: syz-cluster \u003ctriage@syzkaller.com\u003e\nDate: Tue Jun 30 18:54:07 2026 +0000\n\n syz-cluster: applied patch under review\n\ndiff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst\nindex b9b0c218bfb440..22cc54cac3b21d 100644\n--- a/Documentation/admin-guide/sysctl/vm.rst\n+++ b/Documentation/admin-guide/sysctl/vm.rst\n@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:\n - page-cluster\n - page_lock_unfairness\n - panic_on_oom\n+- panic_on_unrecoverable_memory_failure\n - percpu_pagelist_high_fraction\n - stat_interval\n - stat_refresh\n@@ -925,6 +926,85 @@ panic_on_oom=2+kdump gives you very strong tool to investigate\n why oom happens. You can get snapshot.\n \n \n+panic_on_unrecoverable_memory_failure\n+======================================\n+\n+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page\n+that cannot be recovered by the memory failure handler, the default\n+behaviour is to ignore the error and continue operation. This is\n+dangerous because the corrupted data remains accessible to the kernel,\n+risking silent data corruption or a delayed crash when the poisoned\n+memory is next accessed.\n+\n+When enabled, this sysctl triggers a panic on memory failure events\n+hitting kernel-owned pages that the handler cannot recover:\n+``PageReserved`` (firmware reservations, kernel image, vDSO, zero\n+page, and similar memblock-reserved regions), ``PageSlab``,\n+``PageTable``, and ``PageLargeKmalloc``. These are owned by the\n+kernel and the memory failure handler cannot reliably evict their\n+contents.\n+\n+Other unrecoverable kernel-owned populations (vmalloc allocations,\n+kernel stack pages, ...) are not currently covered because the\n+handler has no page-type signal that distinguishes them from a\n+userspace folio temporarily off the LRU during migration or\n+compaction. Such pages still go through the standard\n+MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a\n+delayed crash on the next access remains possible. Coverage may\n+grow as the handler gains stronger kernel-ownership signals.\n+\n+Recoverable failure paths are also intentionally left out: in-flight\n+buddy allocations and other transient races with the page allocator\n+can reach the same diagnostic, and panicking on them would risk\n+killing the box for a page destined for userspace where the standard\n+SIGBUS recovery path applies. Pages whose state could not be\n+classified at all are not covered either, since an unknown state is\n+not a sound basis for a panic decision.\n+\n+For many environments it is preferable to panic immediately with a clean\n+crash dump that captures the original error context, rather than to\n+continue and face a random crash later whose cause is difficult to\n+diagnose.\n+\n+Use cases\n+---------\n+\n+This option is most useful in environments where unattributed crashes\n+are expensive to debug or where data integrity must take precedence\n+over availability:\n+\n+* Large fleets, where multi-bit ECC errors on kernel pages are observed\n+ regularly and post-mortem analysis of an unrelated downstream crash\n+ (often seconds to minutes after the original error) consumes\n+ significant engineering effort.\n+\n+* Systems configured with kdump, where panicking at the moment of the\n+ hardware error produces a vmcore that still contains the faulting\n+ address, the affected page state, and the originating MCE/GHES\n+ record — context that is typically lost by the time a delayed crash\n+ occurs.\n+\n+* High-availability clusters that rely on fast, deterministic node\n+ failure for failover, and prefer an immediate panic over silent data\n+ corruption propagating to replicas or persistent storage.\n+\n+* Kernel and platform developers reproducing hwpoison issues with\n+ tools such as ``mce-inject`` or error-injection debugfs interfaces,\n+ where panicking on the unrecoverable path makes regressions\n+ immediately visible instead of surfacing as later, unrelated\n+ failures.\n+\n+= =====================================================================\n+0 Try to continue operation (default).\n+1 Panic immediately. If the ``panic`` sysctl is also non-zero then the\n+ machine will be rebooted.\n+= =====================================================================\n+\n+Example::\n+\n+ echo 1 \u003e /proc/sys/vm/panic_on_unrecoverable_memory_failure\n+\n+\n percpu_pagelist_high_fraction\n =============================\n \ndiff --git a/mm/memory-failure.c b/mm/memory-failure.c\nindex 51508a55c4055e..e097fc8262cf83 100644\n--- a/mm/memory-failure.c\n+++ b/mm/memory-failure.c\n@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;\n \n static int sysctl_enable_soft_offline __read_mostly = 1;\n \n+static int sysctl_panic_on_unrecoverable_mf __read_mostly;\n+\n atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);\n \n static bool hw_memory_failure __read_mostly = false;\n@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {\n \t\t.proc_handler\t= proc_dointvec_minmax,\n \t\t.extra1\t\t= SYSCTL_ZERO,\n \t\t.extra2\t\t= SYSCTL_ONE,\n+\t},\n+\t{\n+\t\t.procname\t= \"panic_on_unrecoverable_memory_failure\",\n+\t\t.data\t\t= \u0026sysctl_panic_on_unrecoverable_mf,\n+\t\t.maxlen\t\t= sizeof(sysctl_panic_on_unrecoverable_mf),\n+\t\t.mode\t\t= 0644,\n+\t\t.proc_handler\t= proc_dointvec_minmax,\n+\t\t.extra1\t\t= SYSCTL_ZERO,\n+\t\t.extra2\t\t= SYSCTL_ONE,\n \t}\n };\n \n@@ -980,17 +991,6 @@ static bool has_extra_refcount(struct page_state ps, struct page p,\n \treturn false;\n }\n \n-/\n- Error hit kernel page.\n- * Do nothing, try to be lucky and not touch this instead. For a few cases we\n- * could be more sophisticated.\n- /\n-static int me_kernel(struct page_state ps, struct page p)\n-{\n-\tunlock_page(p);\n-\treturn MF_IGNORED;\n-}\n-\n /\n * Page in unknown state. Do nothing.\n * This is a catch-all in case we fail to make sense of the page state.\n@@ -1199,10 +1199,8 @@ static int me_huge_page(struct page_state ps, struct page p)\n #define mlock\t\t(1UL \u003c\u003c PG_mlocked)\n #define lru\t\t(1UL \u003c\u003c PG_lru)\n #define head\t\t(1UL \u003c\u003c PG_head)\n-#define reserved\t(1UL \u003c\u003c PG_reserved)\n \n static struct page_state error_states[] = {\n-\t{ reserved,\treserved,\tMF_MSG_KERNEL,\tme_kernel },\n \t/\n \t free pages are specially detected outside this table:\n \t * PG_buddy pages only make a small fraction of all free pages.\n@@ -1234,7 +1232,6 @@ static struct page_state error_states[] = {\n #undef mlock\n #undef lru\n #undef head\n-#undef reserved\n \n static void update_per_node_mf_stats(unsigned long pfn,\n \t\t\t\t enum mf_result result)\n@@ -1269,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long pfn,\n \t++mf_stats-\u003etotal;\n }\n \n+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,\n+\t\t\t\t enum mf_result result)\n+{\n+\tif (!sysctl_panic_on_unrecoverable_mf)\n+\t\treturn false;\n+\n+\treturn type == MF_MSG_KERNEL \u0026\u0026 result == MF_IGNORED;\n+}\n+\n /\n \"Dirty/Clean\" indication is not 100% accurate due to the possibility of\n * setting PG_dirty outside page lock. See also comment above set_page_dirty().\n@@ -1286,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,\n \tpr_err(\"%#lx: recovery action for %s: %s\\n\",\n \t\tpfn, action_page_types[type], action_name[result]);\n \n+\tif (panic_on_unrecoverable_mf(type, result))\n+\t\tpanic(\"Memory failure: %#lx: unrecoverable page\", pfn);\n+\n \treturn (result == MF_RECOVERED \|\| result == MF_DELAYED) ? 0 : -EBUSY;\n }\n \n@@ -1339,6 +1348,38 @@ static inline bool HWPoisonHandlable(struct page page, unsigned long flags)\n \treturn PageLRU(page) \|\| is_free_buddy_page(page);\n }\n \n+/\n+ * Positive identification of pages the hwpoison handler cannot recover:\n+ * pages owned by kernel internals with no userspace mapping to unmap, no\n+ * file mapping to invalidate, and no migration target.\n+ /\n+static inline bool is_kernel_owned_page(struct page page)\n+{\n+\tstruct page head;\n+\tbool kernel_owned;\n+\n+\t/ PG_reserved is a per-page flag, never set on a compound page. /\n+\tif (PageReserved(page))\n+\t\treturn true;\n+\n+\t/\n+\t * Page-type bits live only on the head page, so resolve any tail\n+\t * first. The check takes no refcount; recheck the head afterwards\n+\t * so a concurrent split or compound free cannot leave us trusting\n+\t * a stale view. A residual free-\u003ealloc-\u003efree cannot be closed here\n+\t * (frozen slab and large-kmalloc pages cannot be pinned), but is\n+\t * harmless: where a wrong verdict could panic, memory_failure() has\n+\t * already set PageHWPoison, which bars the page from the allocator.\n+\t /\n+retry:\n+\thead = compound_head(page);\n+\tkernel_owned = PageSlab(head) \|\| PageTable(head) \|\|\n+\t\t PageLargeKmalloc(head);\n+\tif (head != compound_head(page))\n+\t\tgoto retry;\n+\treturn kernel_owned;\n+}\n+\n static int __get_hwpoison_page(struct page page, unsigned long flags)\n {\n \tstruct folio folio = page_folio(page);\n@@ -1385,6 +1426,19 @@ static int get_any_page(struct page p, unsigned long flags)\n \tif (flags \u0026 MF_COUNT_INCREASED)\n \t\tcount_increased = true;\n \n+\t/\n+\t Page types we know are kernel-owned and cannot be recovered.\n+\t * Short-circuit before the shake_page() / retry loop, which\n+\t * cannot turn any of these into something HWPoisonHandlable().\n+\t * Drop the caller's reference if MF_COUNT_INCREASED took one.\n+\t /\n+\tif (is_kernel_owned_page(p)) {\n+\t\tif (count_increased)\n+\t\t\tput_page(p);\n+\t\tret = -ENOTRECOVERABLE;\n+\t\tgoto out;\n+\t}\n+\n try_again:\n \tif (!count_increased) {\n \t\tret = __get_hwpoison_page(p, flags);\n@@ -1432,7 +1486,7 @@ static int get_any_page(struct page p, unsigned long flags)\n \t\tret = -EIO;\n \t}\n out:\n-\tif (ret == -EIO)\n+\tif (ret == -EIO \|\| ret == -ENOTRECOVERABLE)\n \t\tpr_err(\"%#lx: unhandlable page.\\n\", page_to_pfn(p));\n \n \treturn ret;\n@@ -1489,7 +1543,10 @@ static int __get_unpoison_page(struct page page)\n -EIO for pages on which we can not handle memory errors,\n * -EBUSY when get_hwpoison_page() has raced with page lifecycle\n * operations like allocation and free,\n- * -EHWPOISON when the page is hwpoisoned and taken off from buddy.\n+ * -EHWPOISON when the page is hwpoisoned and taken off from buddy,\n+ * -ENOTRECOVERABLE for kernel-owned pages identified by\n+ * is_kernel_owned_page() (PG_reserved, slab,\n+ * page-table, large-kmalloc) that the handler cannot recover.\n /\n static int get_hwpoison_page(struct page p, unsigned long flags)\n {\n@@ -2402,7 +2459,8 @@ int memory_failure(unsigned long pfn, int flags)\n \t * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.\n \t /\n \tres = get_hwpoison_page(p, flags);\n-\tif (!res) {\n+\tswitch (res) {\n+\tcase 0:\n \t\tif (is_free_buddy_page(p)) {\n \t\t\tif (take_page_off_buddy(p)) {\n \t\t\t\tpage_ref_inc(p);\n@@ -2421,7 +2479,19 @@ int memory_failure(unsigned long pfn, int flags)\n \t\t\tres = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);\n \t\t}\n \t\tgoto unlock_mutex;\n-\t} else if (res \u003c 0) {\n+\tcase 1:\n+\t\t/ Got a refcount on a handlable page. /\n+\t\tbreak;\n+\tcase -ENOTRECOVERABLE:\n+\t\t/\n+\t\t * Stable unhandlable kernel-owned page (PG_reserved,\n+\t\t * slab, page tables, large-kmalloc).\n+\t\t * No recovery possible.\n+\t\t /\n+\t\tres = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);\n+\t\tgoto unlock_mutex;\n+\tdefault:\n+\t\t/ Transient lifecycle race with the page allocator. /\n \t\tres = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);\n \t\tgoto unlock_mutex;\n \t}\ndiff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile\nindex e6df968f0971c8..ed321ae709dacf 100644\n--- a/tools/testing/selftests/mm/Makefile\n+++ b/tools/testing/selftests/mm/Makefile\n@@ -174,6 +174,10 @@ TEST_PROGS += ksft_userfaultfd.sh\n TEST_PROGS += ksft_vma_merge.sh\n TEST_PROGS += ksft_vmalloc.sh\n \n+# Destructive: every successful run panics the kernel. Installed and\n+# kept executable, but not run from a default kselftest invocation.\n+TEST_PROGS_EXTENDED += hwpoison-panic.sh\n+\n TEST_FILES := test_vmalloc.sh\n TEST_FILES += test_hmm.sh\n TEST_FILES += va_high_addr_switch.sh\ndiff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh\nnew file mode 100755\nindex 00000000000000..d953d136733244\n--- /dev/null\n+++ b/tools/testing/selftests/mm/hwpoison-panic.sh\n@@ -0,0 +1,255 @@\n+#!/bin/bash\n+# SPDX-License-Identifier: GPL-2.0\n+#\n+# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison\n+# error on a kernel-owned page and confirming the kernel panics.\n+#\n+# Three \"kinds\" of kernel-owned page can be targeted, selectable via the\n+# first positional argument (default: rodata):\n+#\n+# rodata - a PG_reserved page in the kernel rodata range\n+# (sourced from /proc/iomem \"Kernel rodata\"). Exercises\n+# memory_failure() -\u003e get_any_page() on a PageReserved page.\n+#\n+# slab - a slab page found via /proc/kpageflags (KPF_SLAB).\n+# Exercises memory_failure() -\u003e get_any_page() on a non\n+# PG_reserved kernel-owned page. This path is what catches\n+# regressions where get_any_page() collapses kernel-owned\n+# pages into a transient -EIO instead of -ENOTRECOVERABLE.\n+#\n+# pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE).\n+# Same path as slab, different page type.\n+#\n+# This test is DESTRUCTIVE: a successful run crashes the kernel. It is\n+# meant to be executed inside a disposable VM (e.g. virtme-ng) with a\n+# serial console captured by the harness. It is skipped unless the\n+# caller opts in via RUN_DESTRUCTIVE=1.\n+#\n+# Test passes externally: the kernel must panic with\n+# \"Memory failure: \u003cpfn\u003e: unrecoverable page\"\n+# A return from the inject means no panic fired: that is a failure,\n+# unless the target PFN raced to a different page type before injection,\n+# in which case the run is inconclusive and is skipped.\n+#\n+# Author: Breno Leitao \u003cleitao@debian.org\u003e\n+\n+set -u\n+\n+# KTAP output helpers (ktap_print_msg, ktap_skip_all, ktap_exit_fail_msg, ...).\n+DIR=\"$(dirname \"$(readlink -f \"$0\")\")\"\n+# shellcheck source=../kselftest/ktap_helpers.sh\n+source \"${DIR}\"/../kselftest/ktap_helpers.sh\n+\n+sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure\n+inject_path=/sys/devices/system/memory/hard_offline_page\n+kpageflags_path=/proc/kpageflags\n+unpoison_path=/sys/kernel/debug/hwpoison/unpoison-pfn\n+\n+# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h)\n+KPF_SLAB=7\n+KPF_COMPOUND_TAIL=16\n+KPF_HWPOISON=19\n+KPF_NOPAGE=20\n+KPF_PGTABLE=26\n+KPF_RESERVED=32\n+\n+pagesize=$(getconf PAGE_SIZE)\n+\n+kind=${1:-rodata}\n+\n+if [ \"$(id -u)\" -ne 0 ]; then\n+\tktap_skip_all \"must run as root\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+if [ ! -w \"$sysctl_path\" ]; then\n+\tktap_skip_all \"$sysctl_path not present (kernel without the sysctl?)\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+if [ ! -w \"$inject_path\" ]; then\n+\tktap_skip_all \"$inject_path not present (no MEMORY_HOTPLUG?)\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+if [ \"${RUN_DESTRUCTIVE:-0}\" != \"1\" ]; then\n+\tktap_skip_all \"destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+# Pick a PFN inside the kernel image rodata region of /proc/iomem.\n+# This is preferred over a top-level \"Reserved\" entry because top-level\n+# Reserved ranges are often firmware holes that have no backing struct\n+# page; pfn_to_online_page() returns NULL on those and memory_failure()\n+# bails out with -ENXIO before reaching the panic path.\n+#\n+# \"Kernel rodata\" is reported as a sub-resource of \"System RAM\" on every\n+# major architecture, which guarantees:\n+# - the PFN is backed by struct page (within an online memory range);\n+# - PG_reserved is set on the page (kernel image area);\n+# - the memory is read-only, so setting PG_hwpoison on it does not\n+# corrupt writable kernel state if the panic somehow does not fire.\n+#\n+# /proc/iomem entries look like (indented for sub-resources):\n+# \" 02500000-02ffffff : Kernel rodata\"\n+pick_rodata_phys_addr() {\n+\tawk -v pagesize=\"$(getconf PAGE_SIZE)\" '\n+\t# Convert a hex string to a number without relying on the gawk-only\n+\t# strtonum(). mawk lacks it and would otherwise spuriously skip\n+\t# this test on distros that ship mawk as /usr/bin/awk.\n+\tfunction hex2num(s, n, i, c, v) {\n+\t\tn = 0\n+\t\tfor (i = 1; i \u003c= length(s); i++) {\n+\t\t\tc = tolower(substr(s, i, 1))\n+\t\t\tv = index(\"0123456789abcdef\", c) - 1\n+\t\t\tif (v \u003c 0)\n+\t\t\t\treturn -1\n+\t\t\tn = n 16 + v\n+\t\t}\n+\t\treturn n\n+\t}\n+\t/: Kernel rodata[[:space:]]$/ {\n+\t\tsub(/^[[:space:]]+/, \"\")\n+\t\tn = split($0, a, /[- ]/)\n+\t\tstart = hex2num(a[1])\n+\t\tend = hex2num(a[2])\n+\t\tif (end \u003c= start)\n+\t\t\tnext\n+\t\t# Page-align upward and emit the first byte of that page.\n+\t\tpfn = int((start + pagesize - 1) / pagesize)\n+\t\tprintf \"0x%x\\n\", pfn pagesize\n+\t\texit 0\n+\t}\n+\t' /proc/iomem\n+}\n+\n+# Walk /proc/kpageflags and return the phys addr of the first PFN that\n+# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL\n+# all clear (so we attack a real, non-tail, not-already-poisoned page).\n+#\n+# We skip the first 16 MiB of PFNs to step past low-memory special\n+# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not\n+# exhibit the slab/pgtable type we are looking for.\n+pick_kpageflags_phys_addr() {\n+\tlocal want_bit=$1\n+\tlocal pagesize skip_pfn\n+\n+\t[ -r \"$kpageflags_path\" ] \|\| return\n+\n+\tpagesize=$(getconf PAGE_SIZE)\n+\tskip_pfn=$(((16 * 1024 * 1024) / pagesize))\n+\n+\tod -An -tx8 -v -w8 -j \"$((skip_pfn * 8))\" \"$kpageflags_path\" 2\u003e/dev/null \| \\\n+\tawk -v want_bit=\"$want_bit\" \\\n+\t -v hwp_bit=\"$KPF_HWPOISON\" \\\n+\t -v nopage_bit=\"$KPF_NOPAGE\" \\\n+\t -v tail_bit=\"$KPF_COMPOUND_TAIL\" \\\n+\t -v base_pfn=\"$skip_pfn\" \\\n+\t -v pagesize=\"$pagesize\" '\n+\t# Test whether bit \"b\" is set in the 16-hex-digit value \"hex\".\n+\t# Done with substring + per-digit lookup so we never rely on awk\n+\t# bitwise operators (mawk lacks them), 64-bit FP precision or the\n+\t# gawk-only strtonum().\n+\tfunction bit_set(hex, b, di, bi, c, v) {\n+\t\tdi = int(b / 4)\n+\t\tbi = b - di * 4\n+\t\tc = substr(hex, length(hex) - di, 1)\n+\t\tv = index(\"0123456789abcdef\", tolower(c)) - 1\n+\t\tif (bi == 0) return (v % 2) == 1\n+\t\tif (bi == 1) return int(v / 2) % 2 == 1\n+\t\tif (bi == 2) return int(v / 4) % 2 == 1\n+\t\treturn int(v / 8) % 2 == 1\n+\t}\n+\t{\n+\t\tgsub(/^[[:space:]]+/, \"\")\n+\t\th = $1\n+\t\tif (bit_set(h, want_bit) \u0026\u0026\n+\t\t !bit_set(h, hwp_bit) \u0026\u0026\n+\t\t !bit_set(h, nopage_bit) \u0026\u0026\n+\t\t !bit_set(h, tail_bit)) {\n+\t\t\tpfn = base_pfn + NR - 1\n+\t\t\tprintf \"0x%x\\n\", pfn * pagesize\n+\t\t\texit 0\n+\t\t}\n+\t}\n+\t'\n+}\n+\n+# Return 0 if /proc/kpageflags bit $2 is set for PFN $1, 1 if it is\n+# clear, or 2 if the word cannot be read. Used to re-confirm the target\n+# page type after a non-panicking inject.\n+kpageflags_bit_set() {\n+\tlocal word\n+\n+\tword=$(od -An -tx8 -v -j \"$(($1 * 8))\" -N 8 \"$kpageflags_path\" 2\u003e/dev/null \| tr -d '[:space:]')\n+\t[ -n \"$word\" ] \|\| return 2\n+\t(( (16#$word \u003e\u003e $2) \u0026 1 ))\n+}\n+\n+# Best-effort: drop the PG_hwpoison marker set by the inject so a failed\n+# run does not leave a poisoned page behind. hard_offline_page() injects\n+# with MF_SW_SIMULATED, so the page stays unpoisonable through the\n+# hwpoison debugfs interface (needs CONFIG_HWPOISON_INJECT + debugfs).\n+try_unpoison() {\n+\t[ -w \"$unpoison_path\" ] \|\| return 0\n+\techo \"$1\" \u003e \"$unpoison_path\" 2\u003e/dev/null \|\| true\n+}\n+\n+case \"$kind\" in\n+rodata)\n+\tphys_addr=$(pick_rodata_phys_addr)\n+\trecheck_bit=$KPF_RESERVED\n+\tmissing_msg='no \"Kernel rodata\" entry in /proc/iomem'\n+\t;;\n+slab)\n+\tphys_addr=$(pick_kpageflags_phys_addr \"$KPF_SLAB\")\n+\trecheck_bit=$KPF_SLAB\n+\tmissing_msg=\"no usable slab PFN found in $kpageflags_path\"\n+\t;;\n+pgtable)\n+\tphys_addr=$(pick_kpageflags_phys_addr \"$KPF_PGTABLE\")\n+\trecheck_bit=$KPF_PGTABLE\n+\tmissing_msg=\"no usable page-table PFN found in $kpageflags_path\"\n+\t;;\n+)\n+\tktap_exit_fail_msg \"unknown kind '$kind' (expected: rodata\|slab\|pgtable)\"\n+\t;;\n+esac\n+\n+if [ -z \"$phys_addr\" ]; then\n+\tktap_skip_all \"$missing_msg\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+ktap_print_msg \"enabling $sysctl_path\"\n+prior=$(cat \"$sysctl_path\")\n+echo 1 \u003e \"$sysctl_path\" \|\| ktap_exit_fail_msg \"failed to enable sysctl\"\n+\n+pfn=$((phys_addr / pagesize))\n+ktap_print_msg \"injecting hwpoison at phys 0x$(printf '%x' \"$phys_addr\") (pfn 0x$(printf '%x' \"$pfn\"), kind=$kind)\"\n+ktap_print_msg \"expecting kernel panic: 'Memory failure: \u003cpfn\u003e: unrecoverable page'\"\n+\n+# A successful run never returns from the inject -- it panics the kernel.\n+# Reaching the code below therefore means no panic fired. Note whether\n+# the write itself succeeded, then put the machine back: restore the\n+# sysctl and best-effort unpoison the page we just marked.\n+if echo \"$phys_addr\" \u003e \"$inject_path\"; then\n+\tverdict=\"inject returned without panic; sysctl ineffective\"\n+else\n+\tverdict=\"inject failed before reaching the panic path\"\n+fi\n+\n+echo \"$prior\" \u003e \"$sysctl_path\"\n+try_unpoison \"$pfn\"\n+\n+# The page type can change between selection and injection (e.g. a slab\n+# or page-table page is freed and reused). Only treat a missing panic as\n+# a failure if the target PFN is still the kernel-owned type we aimed at;\n+# if it raced to another type the run is inconclusive, so skip instead.\n+kpageflags_bit_set \"$pfn\" \"$recheck_bit\"\n+case $? in\n+0)\tktap_exit_fail_msg \"$verdict (page still $kind)\" ;;\n+1)\tktap_skip_all \"target PFN no longer $kind; raced before inject, inconclusive\"\n+\texit \"$KSFT_SKIP\" ;;\n+)\tktap_exit_fail_msg \"$verdict (could not reconfirm page type via $kpageflags_path)\" ;;\n+esac\n" }
2/1	2026/06/30 18:54	agent	patch-evaluator	3m Model: gemini-3.1-pro-preview Args: null Results: { "EnableConfigs": [ "MEMORY_FAILURE" ], "FocusSymbols": [ "get_any_page", "panic_on_unrecoverable_mf" ], "Reasoning": "The patch modifies the memory failure handler in core MM to short-circuit recovery for kernel-owned pages and optionally panic. This error-handling path is reachable in virtualized environments via madvise(MADV_HWPOISON) on kernel-owned pages mapped in user space (like the vDSO) or via sysfs error injection.", "WorthFuzzing": true } Instruction: You are an expert Linux kernel maintainer. Your job is to review a provided patch series and determine if it makes functional changes to the kernel that should be fuzzed. IMPORTANT: The changes have ALREADY been applied and committed as the HEAD commit in your workspace. Do NOT rely on your internal knowledge of the kernel. You must actively use your code access tools to examine the actual source code and confirm any assumptions. Return WorthFuzzing=false if the patch only contains: - Modifications to Documentation/, Kconfig files, or code comments. - Purely decorative changes, such as logging (e.g., pr_err, printk) or tracepoints. - Changes to numeric constants or macros that do not functionally alter execution flow. - Code paths that are impossible to reach in virtualized environments like GCE or QEMU, even when utilizing software-emulated hardware (e.g., usb gadget, mac80211_hwsim). If it modifies reachable core kernel logic, drivers, or architectures, use your code search tools to verify the code can be executed, then return WorthFuzzing=true. When returning WorthFuzzing=true, you MUST ALSO: 1. Extract any specific kernel functions that should be heavily fuzzed into FocusSymbols. Avoid listing generic hot-path functions to prevent skewed test distributions. 2. Identify any specific CONFIG_ options required to properly test this new/modified feature. Go and look into the Kconfig files and check for ifdefs around the code, do not make assumptions. Do not list too generic configs (we already have them enabled). Only list those that specifically cover the modified code. List them in the EnableConfigs output array, and DO NOT add a 'CONFIG_' prefix (e.g., return "NET_IPV4" instead of "CONFIG_NET_IPV4"). Prefer calling several tools at the same time to save round-trips. Use set-results tool to provide results of the analysis. It must be called exactly once before the final reply. Ignore results of this tool. Prompt: For your convenience, here is the diff of the changes: commit cd6649a5d4f7cd49ae8556e2d53436c6b6caba06 Author: syz-cluster <triage@syzkaller.com> Date: Tue Jun 30 18:54:07 2026 +0000 syz-cluster: applied patch under review diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index b9b0c218bfb440..22cc54cac3b21d 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm: - page-cluster - page_lock_unfairness - panic_on_oom +- panic_on_unrecoverable_memory_failure - percpu_pagelist_high_fraction - stat_interval - stat_refresh @@ -925,6 +926,85 @@ panic_on_oom=2+kdump gives you very strong tool to investigate why oom happens. You can get snapshot. +panic_on_unrecoverable_memory_failure +====================================== + +When a hardware memory error (e.g. multi-bit ECC) hits a kernel page +that cannot be recovered by the memory failure handler, the default +behaviour is to ignore the error and continue operation. This is +dangerous because the corrupted data remains accessible to the kernel, +risking silent data corruption or a delayed crash when the poisoned +memory is next accessed. + +When enabled, this sysctl triggers a panic on memory failure events +hitting kernel-owned pages that the handler cannot recover: +``PageReserved`` (firmware reservations, kernel image, vDSO, zero +page, and similar memblock-reserved regions), ``PageSlab``, +``PageTable``, and ``PageLargeKmalloc``. These are owned by the +kernel and the memory failure handler cannot reliably evict their +contents. + +Other unrecoverable kernel-owned populations (vmalloc allocations, +kernel stack pages, ...) are not currently covered because the +handler has no page-type signal that distinguishes them from a +userspace folio temporarily off the LRU during migration or +compaction. Such pages still go through the standard +MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a +delayed crash on the next access remains possible. Coverage may +grow as the handler gains stronger kernel-ownership signals. + +Recoverable failure paths are also intentionally left out: in-flight +buddy allocations and other transient races with the page allocator +can reach the same diagnostic, and panicking on them would risk +killing the box for a page destined for userspace where the standard +SIGBUS recovery path applies. Pages whose state could not be +classified at all are not covered either, since an unknown state is +not a sound basis for a panic decision. + +For many environments it is preferable to panic immediately with a clean +crash dump that captures the original error context, rather than to +continue and face a random crash later whose cause is difficult to +diagnose. + +Use cases +--------- + +This option is most useful in environments where unattributed crashes +are expensive to debug or where data integrity must take precedence +over availability: + +* Large fleets, where multi-bit ECC errors on kernel pages are observed + regularly and post-mortem analysis of an unrelated downstream crash + (often seconds to minutes after the original error) consumes + significant engineering effort. + +* Systems configured with kdump, where panicking at the moment of the + hardware error produces a vmcore that still contains the faulting + address, the affected page state, and the originating MCE/GHES + record — context that is typically lost by the time a delayed crash + occurs. + +* High-availability clusters that rely on fast, deterministic node + failure for failover, and prefer an immediate panic over silent data + corruption propagating to replicas or persistent storage. + +* Kernel and platform developers reproducing hwpoison issues with + tools such as ``mce-inject`` or error-injection debugfs interfaces, + where panicking on the unrecoverable path makes regressions + immediately visible instead of surfacing as later, unrelated + failures. + += ===================================================================== +0 Try to continue operation (default). +1 Panic immediately. If the ``panic`` sysctl is also non-zero then the + machine will be rebooted. += ===================================================================== + +Example:: + + echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure + + percpu_pagelist_high_fraction ============================= diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 51508a55c4055e..e097fc8262cf83 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1; static int sysctl_enable_soft_offline __read_mostly = 1; +static int sysctl_panic_on_unrecoverable_mf __read_mostly; + atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); static bool hw_memory_failure __read_mostly = false; @@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, + }, + { + .procname = "panic_on_unrecoverable_memory_failure", + .data = &sysctl_panic_on_unrecoverable_mf, + .maxlen = sizeof(sysctl_panic_on_unrecoverable_mf), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, } }; @@ -980,17 +991,6 @@ static bool has_extra_refcount(struct page_state ps, struct page p, return false; } -/* - * Error hit kernel page. - * Do nothing, try to be lucky and not touch this instead. For a few cases we - * could be more sophisticated. - / -static int me_kernel(struct page_state ps, struct page p) -{ - unlock_page(p); - return MF_IGNORED; -} - / * Page in unknown state. Do nothing. * This is a catch-all in case we fail to make sense of the page state. @@ -1199,10 +1199,8 @@ static int me_huge_page(struct page_state ps, struct page p) #define mlock (1UL << PG_mlocked) #define lru (1UL << PG_lru) #define head (1UL << PG_head) -#define reserved (1UL << PG_reserved) static struct page_state error_states[] = { - { reserved, reserved, MF_MSG_KERNEL, me_kernel }, /* * free pages are specially detected outside this table: * PG_buddy pages only make a small fraction of all free pages. @@ -1234,7 +1232,6 @@ static struct page_state error_states[] = { #undef mlock #undef lru #undef head -#undef reserved static void update_per_node_mf_stats(unsigned long pfn, enum mf_result result) @@ -1269,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long pfn, ++mf_stats->total; } +static bool panic_on_unrecoverable_mf(enum mf_action_page_type type, + enum mf_result result) +{ + if (!sysctl_panic_on_unrecoverable_mf) + return false; + + return type == MF_MSG_KERNEL && result == MF_IGNORED; +} + /* * "Dirty/Clean" indication is not 100% accurate due to the possibility of * setting PG_dirty outside page lock. See also comment above set_page_dirty(). @@ -1286,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type, pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[type], action_name[result]); + if (panic_on_unrecoverable_mf(type, result)) + panic("Memory failure: %#lx: unrecoverable page", pfn); + return (result == MF_RECOVERED \|\| result == MF_DELAYED) ? 0 : -EBUSY; } @@ -1339,6 +1348,38 @@ static inline bool HWPoisonHandlable(struct page page, unsigned long flags) return PageLRU(page) \|\| is_free_buddy_page(page); } +/ + * Positive identification of pages the hwpoison handler cannot recover: + * pages owned by kernel internals with no userspace mapping to unmap, no + * file mapping to invalidate, and no migration target. + / +static inline bool is_kernel_owned_page(struct page page) +{ + struct page head; + bool kernel_owned; + + / PG_reserved is a per-page flag, never set on a compound page. / + if (PageReserved(page)) + return true; + + / + * Page-type bits live only on the head page, so resolve any tail + * first. The check takes no refcount; recheck the head afterwards + * so a concurrent split or compound free cannot leave us trusting + * a stale view. A residual free->alloc->free cannot be closed here + * (frozen slab and large-kmalloc pages cannot be pinned), but is + * harmless: where a wrong verdict could panic, memory_failure() has + * already set PageHWPoison, which bars the page from the allocator. + / +retry: + head = compound_head(page); + kernel_owned = PageSlab(head) \|\| PageTable(head) \|\| + PageLargeKmalloc(head); + if (head != compound_head(page)) + goto retry; + return kernel_owned; +} + static int __get_hwpoison_page(struct page page, unsigned long flags) { struct folio folio = page_folio(page); @@ -1385,6 +1426,19 @@ static int get_any_page(struct page p, unsigned long flags) if (flags & MF_COUNT_INCREASED) count_increased = true; + /* + * Page types we know are kernel-owned and cannot be recovered. + * Short-circuit before the shake_page() / retry loop, which + * cannot turn any of these into something HWPoisonHandlable(). + * Drop the caller's reference if MF_COUNT_INCREASED took one. + / + if (is_kernel_owned_page(p)) { + if (count_increased) + put_page(p); + ret = -ENOTRECOVERABLE; + goto out; + } + try_again: if (!count_increased) { ret = __get_hwpoison_page(p, flags); @@ -1432,7 +1486,7 @@ static int get_any_page(struct page p, unsigned long flags) ret = -EIO; } out: - if (ret == -EIO) + if (ret == -EIO \|\| ret == -ENOTRECOVERABLE) pr_err("%#lx: unhandlable page.\n", page_to_pfn(p)); return ret; @@ -1489,7 +1543,10 @@ static int __get_unpoison_page(struct page page) -EIO for pages on which we can not handle memory errors, * -EBUSY when get_hwpoison_page() has raced with page lifecycle * operations like allocation and free, - * -EHWPOISON when the page is hwpoisoned and taken off from buddy. + * -EHWPOISON when the page is hwpoisoned and taken off from buddy, + * -ENOTRECOVERABLE for kernel-owned pages identified by + * is_kernel_owned_page() (PG_reserved, slab, + * page-table, large-kmalloc) that the handler cannot recover. / static int get_hwpoison_page(struct page p, unsigned long flags) { @@ -2402,7 +2459,8 @@ int memory_failure(unsigned long pfn, int flags) * that may make page_ref_freeze()/page_ref_unfreeze() mismatch. / res = get_hwpoison_page(p, flags); - if (!res) { + switch (res) { + case 0: if (is_free_buddy_page(p)) { if (take_page_off_buddy(p)) { page_ref_inc(p); @@ -2421,7 +2479,19 @@ int memory_failure(unsigned long pfn, int flags) res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); } goto unlock_mutex; - } else if (res < 0) { + case 1: + / Got a refcount on a handlable page. / + break; + case -ENOTRECOVERABLE: + / + * Stable unhandlable kernel-owned page (PG_reserved, + * slab, page tables, large-kmalloc). + * No recovery possible. + / + res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED); + goto unlock_mutex; + default: + / Transient lifecycle race with the page allocator. / res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED); goto unlock_mutex; } diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index e6df968f0971c8..ed321ae709dacf 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -174,6 +174,10 @@ TEST_PROGS += ksft_userfaultfd.sh TEST_PROGS += ksft_vma_merge.sh TEST_PROGS += ksft_vmalloc.sh +# Destructive: every successful run panics the kernel. Installed and +# kept executable, but not run from a default kselftest invocation. +TEST_PROGS_EXTENDED += hwpoison-panic.sh + TEST_FILES := test_vmalloc.sh TEST_FILES += test_hmm.sh TEST_FILES += va_high_addr_switch.sh diff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh new file mode 100755 index 00000000000000..d953d136733244 --- /dev/null +++ b/tools/testing/selftests/mm/hwpoison-panic.sh @@ -0,0 +1,255 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison +# error on a kernel-owned page and confirming the kernel panics. +# +# Three "kinds" of kernel-owned page can be targeted, selectable via the +# first positional argument (default: rodata): +# +# rodata - a PG_reserved page in the kernel rodata range +# (sourced from /proc/iomem "Kernel rodata"). Exercises +# memory_failure() -> get_any_page() on a PageReserved page. +# +# slab - a slab page found via /proc/kpageflags (KPF_SLAB). +# Exercises memory_failure() -> get_any_page() on a non +# PG_reserved kernel-owned page. This path is what catches +# regressions where get_any_page() collapses kernel-owned +# pages into a transient -EIO instead of -ENOTRECOVERABLE. +# +# pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE). +# Same path as slab, different page type. +# +# This test is DESTRUCTIVE: a successful run crashes the kernel. It is +# meant to be executed inside a disposable VM (e.g. virtme-ng) with a +# serial console captured by the harness. It is skipped unless the +# caller opts in via RUN_DESTRUCTIVE=1. +# +# Test passes externally: the kernel must panic with +# "Memory failure: <pfn>: unrecoverable page" +# A return from the inject means no panic fired: that is a failure, +# unless the target PFN raced to a different page type before injection, +# in which case the run is inconclusive and is skipped. +# +# Author: Breno Leitao <leitao@debian.org> + +set -u + +# KTAP output helpers (ktap_print_msg, ktap_skip_all, ktap_exit_fail_msg, ...). +DIR="$(dirname "$(readlink -f "$0")")" +# shellcheck source=../kselftest/ktap_helpers.sh +source "${DIR}"/../kselftest/ktap_helpers.sh + +sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure +inject_path=/sys/devices/system/memory/hard_offline_page +kpageflags_path=/proc/kpageflags +unpoison_path=/sys/kernel/debug/hwpoison/unpoison-pfn + +# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h) +KPF_SLAB=7 +KPF_COMPOUND_TAIL=16 +KPF_HWPOISON=19 +KPF_NOPAGE=20 +KPF_PGTABLE=26 +KPF_RESERVED=32 + +pagesize=$(getconf PAGE_SIZE) + +kind=${1:-rodata} + +if [ "$(id -u)" -ne 0 ]; then + ktap_skip_all "must run as root" + exit "$KSFT_SKIP" +fi + +if [ ! -w "$sysctl_path" ]; then + ktap_skip_all "$sysctl_path not present (kernel without the sysctl?)" + exit "$KSFT_SKIP" +fi + +if [ ! -w "$inject_path" ]; then + ktap_skip_all "$inject_path not present (no MEMORY_HOTPLUG?)" + exit "$KSFT_SKIP" +fi + +if [ "${RUN_DESTRUCTIVE:-0}" != "1" ]; then + ktap_skip_all "destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM" + exit "$KSFT_SKIP" +fi + +# Pick a PFN inside the kernel image rodata region of /proc/iomem. +# This is preferred over a top-level "Reserved" entry because top-level +# Reserved ranges are often firmware holes that have no backing struct +# page; pfn_to_online_page() returns NULL on those and memory_failure() +# bails out with -ENXIO before reaching the panic path. +# +# "Kernel rodata" is reported as a sub-resource of "System RAM" on every +# major architecture, which guarantees: +# - the PFN is backed by struct page (within an online memory range); +# - PG_reserved is set on the page (kernel image area); +# - the memory is read-only, so setting PG_hwpoison on it does not +# corrupt writable kernel state if the panic somehow does not fire. +# +# /proc/iomem entries look like (indented for sub-resources): +# " 02500000-02ffffff : Kernel rodata" +pick_rodata_phys_addr() { + awk -v pagesize="$(getconf PAGE_SIZE)" ' + # Convert a hex string to a number without relying on the gawk-only + # strtonum(). mawk lacks it and would otherwise spuriously skip + # this test on distros that ship mawk as /usr/bin/awk. + function hex2num(s, n, i, c, v) { + n = 0 + for (i = 1; i <= length(s); i++) { + c = tolower(substr(s, i, 1)) + v = index("0123456789abcdef", c) - 1 + if (v < 0) + return -1 + n = n 16 + v + } + return n + } + /: Kernel rodata[[:space:]]$/ { + sub(/^[[:space:]]+/, "") + n = split($0, a, /[- ]/) + start = hex2num(a[1]) + end = hex2num(a[2]) + if (end <= start) + next + # Page-align upward and emit the first byte of that page. + pfn = int((start + pagesize - 1) / pagesize) + printf "0x%x\n", pfn pagesize + exit 0 + } + ' /proc/iomem +} + +# Walk /proc/kpageflags and return the phys addr of the first PFN that +# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL +# all clear (so we attack a real, non-tail, not-already-poisoned page). +# +# We skip the first 16 MiB of PFNs to step past low-memory special +# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not +# exhibit the slab/pgtable type we are looking for. +pick_kpageflags_phys_addr() { + local want_bit=$1 + local pagesize skip_pfn + + [ -r "$kpageflags_path" ] \|\| return + + pagesize=$(getconf PAGE_SIZE) + skip_pfn=$(((16 * 1024 * 1024) / pagesize)) + + od -An -tx8 -v -w8 -j "$((skip_pfn * 8))" "$kpageflags_path" 2>/dev/null \| \ + awk -v want_bit="$want_bit" \ + -v hwp_bit="$KPF_HWPOISON" \ + -v nopage_bit="$KPF_NOPAGE" \ + -v tail_bit="$KPF_COMPOUND_TAIL" \ + -v base_pfn="$skip_pfn" \ + -v pagesize="$pagesize" ' + # Test whether bit "b" is set in the 16-hex-digit value "hex". + # Done with substring + per-digit lookup so we never rely on awk + # bitwise operators (mawk lacks them), 64-bit FP precision or the + # gawk-only strtonum(). + function bit_set(hex, b, di, bi, c, v) { + di = int(b / 4) + bi = b - di * 4 + c = substr(hex, length(hex) - di, 1) + v = index("0123456789abcdef", tolower(c)) - 1 + if (bi == 0) return (v % 2) == 1 + if (bi == 1) return int(v / 2) % 2 == 1 + if (bi == 2) return int(v / 4) % 2 == 1 + return int(v / 8) % 2 == 1 + } + { + gsub(/^[[:space:]]+/, "") + h = $1 + if (bit_set(h, want_bit) && + !bit_set(h, hwp_bit) && + !bit_set(h, nopage_bit) && + !bit_set(h, tail_bit)) { + pfn = base_pfn + NR - 1 + printf "0x%x\n", pfn * pagesize + exit 0 + } + } + ' +} + +# Return 0 if /proc/kpageflags bit $2 is set for PFN $1, 1 if it is +# clear, or 2 if the word cannot be read. Used to re-confirm the target +# page type after a non-panicking inject. +kpageflags_bit_set() { + local word + + word=$(od -An -tx8 -v -j "$(($1 * 8))" -N 8 "$kpageflags_path" 2>/dev/null \| tr -d '[:space:]') + [ -n "$word" ] \|\| return 2 + (( (16#$word >> $2) & 1 )) +} + +# Best-effort: drop the PG_hwpoison marker set by the inject so a failed +# run does not leave a poisoned page behind. hard_offline_page() injects +# with MF_SW_SIMULATED, so the page stays unpoisonable through the +# hwpoison debugfs interface (needs CONFIG_HWPOISON_INJECT + debugfs). +try_unpoison() { + [ -w "$unpoison_path" ] \|\| return 0 + echo "$1" > "$unpoison_path" 2>/dev/null \|\| true +} + +case "$kind" in +rodata) + phys_addr=$(pick_rodata_phys_addr) + recheck_bit=$KPF_RESERVED + missing_msg='no "Kernel rodata" entry in /proc/iomem' + ;; +slab) + phys_addr=$(pick_kpageflags_phys_addr "$KPF_SLAB") + recheck_bit=$KPF_SLAB + missing_msg="no usable slab PFN found in $kpageflags_path" + ;; +pgtable) + phys_addr=$(pick_kpageflags_phys_addr "$KPF_PGTABLE") + recheck_bit=$KPF_PGTABLE + missing_msg="no usable page-table PFN found in $kpageflags_path" + ;; +) + ktap_exit_fail_msg "unknown kind '$kind' (expected: rodata\|slab\|pgtable)" + ;; +esac + +if [ -z "$phys_addr" ]; then + ktap_skip_all "$missing_msg" + exit "$KSFT_SKIP" +fi + +ktap_print_msg "enabling $sysctl_path" +prior=$(cat "$sysctl_path") +echo 1 > "$sysctl_path" \|\| ktap_exit_fail_msg "failed to enable sysctl" + +pfn=$((phys_addr / pagesize)) +ktap_print_msg "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (pfn 0x$(printf '%x' "$pfn"), kind=$kind)" +ktap_print_msg "expecting kernel panic: 'Memory failure: <pfn>: unrecoverable page'" + +# A successful run never returns from the inject -- it panics the kernel. +# Reaching the code below therefore means no panic fired. Note whether +# the write itself succeeded, then put the machine back: restore the +# sysctl and best-effort unpoison the page we just marked. +if echo "$phys_addr" > "$inject_path"; then + verdict="inject returned without panic; sysctl ineffective" +else + verdict="inject failed before reaching the panic path" +fi + +echo "$prior" > "$sysctl_path" +try_unpoison "$pfn" + +# The page type can change between selection and injection (e.g. a slab +# or page-table page is freed and reused). Only treat a missing panic as +# a failure if the target PFN is still the kernel-owned type we aimed at; +# if it raced to another type the run is inconclusive, so skip instead. +kpageflags_bit_set "$pfn" "$recheck_bit" +case $? in +0) ktap_exit_fail_msg "$verdict (page still $kind)" ;; +1) ktap_skip_all "target PFN no longer $kind; raced before inject, inconclusive" + exit "$KSFT_SKIP" ;; +) ktap_exit_fail_msg "$verdict (could not reconfirm page type via $kpageflags_path)" ;; +esac
3/2	2026/06/30 18:54	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 9019 output: 20 thoughts: 1572 Thoughts: Analyzing the New Kernel Memory Failure Panic Patch Okay, so we've got a new patch introducing a `sysctl` called `panic_on_unrecoverable_memory_failure`, which is `sysctl_panic_on_unrecoverable_mf` in code. The core change is in `mm/memory-failure.c`, and it's designed to panic the kernel when an unrecoverable memory failure occurs specifically on a kernel page. The patch adds `is_kernel_owned_page()` to identify kernel pages; it checks `PageReserved`, `PageSlab`, `PageTable`, and `PageLargeKmalloc` flags. If it finds a kernel-owned page, `get_any_page()` returns `-ENOTRECOVERABLE`. Then, `memory_failure()` handles this and, if the sysctl is set, it panics. My immediate question is, how reachable is this in a virtualized environment, which is where I usually work? Turns out, memory failures can be injected via `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`. There's even a test script, `tools/testing/selftests/mm/hwpoison-panic.sh`, that uses `hard_offline_page`. Ah, but wait, `madvise(MADV_HWPOISON)` only works on user pages. So, to hit kernel pages, we're stuck with `/sys/devices/system/memory/hard_offline_page` or potentially an hwpoison injection module. Good to know `hard_offline_page` is accessible if `CONFIG_MEMORY_FAILURE` is enabled, which is the key. Now, can we actually reach `memory_failure()` on kernel pages? Yes, because `hard_offline_page` is a write to sysfs, which ends up calling `memory_failure()`. Since syzkaller usually has `CONFIG_MEMORY_FAILURE` enabled, this is a distinct possibility. Does syzkaller have access to these sysfs or debugfs paths? Typically, syzkaller doesn't fuzz debugfs or sysfs unless we've explicitly configured it to. But it does fuzz `madvise(MADV_HWPOISON)`. However, again, `madvise` only handles user pages, though I suppose it does call `memory_failure()` on the PFN of the user page. Is it possible for a user page to be marked as kernel-owned ( `PageReserved`, `PageSlab`, `PageTable`, `PageLargeKmalloc` )? Usually not. But this patch is specifically about panicking on unrecoverable memory failures. So, is it even worth fuzzing? If it's just a panic on an error injection path, maybe not. However, `memory_failure()` can be triggered by actual hardware errors (MCEs) or software injection. In a VM, hardware errors are not the norm, so software injection is the way to go. If syzkaller can inject memory failures into kernel pages, the result will simply be a kernel panic, which is exactly what this patch intends. Wait, isn't a kernel panic a crash? Syzkaller reports panics as crashes. So, if syzkaller triggers this panic, it'll be reported as a bug! But this panic is actually intentional when the sysctl is enabled. Is that sysctl enabled by default? Checking… `static int sysctl_panic_on_unrecoverable_mf __read_mostly;` Aha! It's zero by default. So, by default, it won't panic. If syzkaller doesn't enable the sysctl, no panic. If it does enable it, then we get the panic, which syzkaller will consider a bug, but it's really the designed behavior. Let's not forget, the patch also changes `get_any_page()` to return `-ENOTRECOVERABLE` instead of `-EIO` for kernel-owned pages. Before, if `get_any_page()` got `-EIO` it would print an error message. Now with the patch, the message is `recovery action for kernel page: Ignored`. And if the sysctl is enabled, it panics. Is there any complex logic to fuzz here? The `is_kernel_owned_page()` check looks simple enough; it's a straightforward flag check. Really, this is mostly about adding a sysctl for kernel page memory failures. Since those failures are typically only triggered by hardware errors or root-only sysfs/debugfs injection, and the result is a controlled panic or a different printk, it seems there's not much functionally exposed to regular users or any complex state machines to
4/2	2026/06/30 18:54	tool	grepper	0m Args: { "Expression": "MADV_HWPOISON" } Results: { "Output": "Documentation/mm/hwpoison.rst=106=Testing\n--\nDocumentation/mm/hwpoison.rst-108-\nDocumentation/mm/hwpoison.rst:109:* madvise(MADV_HWPOISON, ....) (as root) - Poison a page in the\nDocumentation/mm/hwpoison.rst-110- process for testing\n--\nDocumentation/translations/zh_CN/mm/hwpoison.rst=96=PR_MCE_KILL_GET\n--\nDocumentation/translations/zh_CN/mm/hwpoison.rst-101-\nDocumentation/translations/zh_CN/mm/hwpoison.rst:102:* madvise(MADV_HWPOISON, ....) (as root) - 在测试过程中Poison一个页面\nDocumentation/translations/zh_CN/mm/hwpoison.rst-103-\n--\narch/mips/include/uapi/asm/mman.h-85-#define MADV_UNMERGEABLE 13\t\t/* KSM may not merge identical pages /\narch/mips/include/uapi/asm/mman.h:86:#define MADV_HWPOISON\t 100\t\t/ poison a page for testing /\narch/mips/include/uapi/asm/mman.h-87-\n--\narch/parisc/include/uapi/asm/mman.h-74-\narch/parisc/include/uapi/asm/mman.h:75:#define MADV_HWPOISON 100\t\t/ poison a page for testing /\narch/parisc/include/uapi/asm/mman.h-76-#define MADV_SOFT_OFFLINE 101\t\t/ soft offline page for testing /\n--\ninclude/uapi/asm-generic/mman-common.h-55-#define MADV_DOFORK\t11\t\t/ do inherit across fork /\ninclude/uapi/asm-generic/mman-common.h:56:#define MADV_HWPOISON\t100\t\t/ poison a page for testing /\ninclude/uapi/asm-generic/mman-common.h-57-#define MADV_SOFT_OFFLINE 101\t\t/ soft offline page for testing /\n--\nmm/madvise.c=1492=static bool is_memory_failure(struct madvise_behavior madv_behavior)\n--\nmm/madvise.c-1494-\tswitch (madv_behavior-\u003ebehavior) {\nmm/madvise.c:1495:\tcase MADV_HWPOISON:\nmm/madvise.c-1496-\tcase MADV_SOFT_OFFLINE:\n--\nmm/madvise.c=1518=madvise_behavior_valid(int behavior)\n--\nmm/madvise.c-1551-\tcase MADV_SOFT_OFFLINE:\nmm/madvise.c:1552:\tcase MADV_HWPOISON:\nmm/madvise.c-1553-#endif\n--\nmm/madvise.c=1887=static int madvise_do_behavior(unsigned long start, size_t len_in,\n--\nmm/madvise.c-1943- * MADV_KEEPONFORK - undo the effect of MADV_WIPEONFORK\nmm/madvise.c:1944: * MADV_HWPOISON - trigger memory error handler as if the given memory range\nmm/madvise.c-1945- \t\twere corrupted by unrecoverable hardware memory failure.\n--\ntools/arch/alpha/include/uapi/asm/mman.h-37-#define PROT_WRITE\t0x2\ntools/arch/alpha/include/uapi/asm/mman.h:38:/ MADV_HWPOISON is undefined on alpha, fix it for perf /\ntools/arch/alpha/include/uapi/asm/mman.h:39:#define MADV_HWPOISON\t100\ntools/arch/alpha/include/uapi/asm/mman.h-40-/ MADV_SOFT_OFFLINE is undefined on alpha, fix it for perf /\n--\ntools/arch/mips/include/uapi/asm/mman.h-10-#define MADV_HUGEPAGE\t14\ntools/arch/mips/include/uapi/asm/mman.h:11:#define MADV_HWPOISON\t 100\ntools/arch/mips/include/uapi/asm/mman.h-12-#define MADV_MERGEABLE\t 12\n--\ntools/arch/parisc/include/uapi/asm/mman.h-37-#define PROT_WRITE\t0x2\ntools/arch/parisc/include/uapi/asm/mman.h:38:#define MADV_HWPOISON\t100\ntools/arch/parisc/include/uapi/asm/mman.h-39-#define MADV_SOFT_OFFLINE 101\n--\ntools/arch/xtensa/include/uapi/asm/mman.h-37-#define PROT_WRITE\t0x2\ntools/arch/xtensa/include/uapi/asm/mman.h:38:/ MADV_HWPOISON is undefined on xtensa, fix it for perf /\ntools/arch/xtensa/include/uapi/asm/mman.h:39:#define MADV_HWPOISON\t100\ntools/arch/xtensa/include/uapi/asm/mman.h-40-/ MADV_SOFT_OFFLINE is undefined on xtensa, fix it for perf /\n--\ntools/include/uapi/asm-generic/mman-common.h-55-#define MADV_DOFORK\t11\t\t/ do inherit across fork /\ntools/include/uapi/asm-generic/mman-common.h:56:#define MADV_HWPOISON\t100\t\t/ poison a page for testing /\ntools/include/uapi/asm-generic/mman-common.h-57-#define MADV_SOFT_OFFLINE 101\t\t/ soft offline page for testing /\n--\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c=176=test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size,\n--\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c-205-\thwp_addr = filemap + len / 2 + pagesize;\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c:206:\tif (madvise(hwp_addr, pagesize, MADV_HWPOISON) \u003c 0) {\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c:207:\t\tksft_perror(\"MADV_HWPOISON failed\");\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c-208-\t\tgoto unmap;\n--\ntools/testing/selftests/mm/memory-failure.c=58=static int madv_hard_inject(FIXTURE_DATA(memory_failure) self, void *vaddr)\ntools/testing/selftests/mm/memory-failure.c-59-{\ntools/testing/selftests/mm/memory-failure.c:60:\treturn madvise(vaddr, self-\u003epage_size, MADV_HWPOISON);\ntools/testing/selftests/mm/memory-failure.c-61-}\n" }
5/2	2026/06/30 18:54	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 13491 output: 20 thoughts: 34 Thoughts: Investigating `madvise_hwpoison` and Capability Requirements Okay, let's dive into this. My first thought is to go straight to the source. I need to understand the security implications of this `madvise_hwpoison` system call. The most direct path is to check the code itself, so I'll head over to `mm/madvise.c` in the kernel. I need to ascertain definitively if this function requires `CAP_SYS_ADMIN` or any other capabilities. It’s crucial to know this at a low level to understand any security boundaries around it. Understanding the privilege level needed to call `madvise_hwpoison` is the foundational element that will lead to more complex analysis.
6/2	2026/06/30 18:54	tool	grepper	0m Error: no matches Args: { "Expression": "madvise_hwpoison" } Results: { "error": "no matches" }
7/2	2026/06/30 18:54	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 13714 output: 36 thoughts: 38 Thoughts: Okay, here's my interpretation of that thought process, from an expert's perspective: Diving into `madvise_inject_error` and Kernel Memory Management Right, it's hitting me that this is likely related to the `madvise_inject_error` function, or something very similar. The kernel's memory management can be tricky, and I need to figure out how this specific error is occurring. I need to get into the code and see exactly what's happening. Okay, my first step is to crack open `mm/madvise.c`. I remember that area pretty well. I'm going to jump straight to around line 1492. That should put me right in the vicinity of where I expect this `madvise` related error to be handled or, at the very least, where the relevant logic is executed. From there, I can start tracing the code paths and identify the root cause. This should allow me to formulate a solution pretty quickly.
8/2	2026/06/30 18:54	tool	read-file	0m Args: { "File": "mm/madvise.c", "FirstLine": 1480, "LineCount": 50 } Results: { "Contents": "1480:\t\t\t\tret = memory_failure(pfn, MF_ACTION_REQUIRED \| MF_COUNT_INCREASED \| MF_SW_SIMULATED);\n1481:\t\t\t\tif (ret == -EOPNOTSUPP)\n1482:\t\t\t\t\tret = 0;\n1483:\t\t\t}\n1484:\t\n1485:\t\t\tif (ret)\n1486:\t\t\t\treturn ret;\n1487:\t\t}\n1488:\t\n1489:\t\treturn 0;\n1490:\t}\n1491:\t\n1492:\tstatic bool is_memory_failure(struct madvise_behavior madv_behavior)\n1493:\t{\n1494:\t\tswitch (madv_behavior-\u003ebehavior) {\n1495:\t\tcase MADV_HWPOISON:\n1496:\t\tcase MADV_SOFT_OFFLINE:\n1497:\t\t\treturn true;\n1498:\t\tdefault:\n1499:\t\t\treturn false;\n1500:\t\t}\n1501:\t}\n1502:\t\n1503:\t#else\n1504:\t\n1505:\tstatic int madvise_inject_error(struct madvise_behavior madv_behavior)\n1506:\t{\n1507:\t\treturn 0;\n1508:\t}\n1509:\t\n1510:\tstatic bool is_memory_failure(struct madvise_behavior madv_behavior)\n1511:\t{\n1512:\t\treturn false;\n1513:\t}\n1514:\t\n1515:\t#endif\t/ CONFIG_MEMORY_FAILURE */\n1516:\t\n1517:\tstatic bool\n1518:\tmadvise_behavior_valid(int behavior)\n1519:\t{\n1520:\t\tswitch (behavior) {\n1521:\t\tcase MADV_DOFORK:\n1522:\t\tcase MADV_DONTFORK:\n1523:\t\tcase MADV_NORMAL:\n1524:\t\tcase MADV_SEQUENTIAL:\n1525:\t\tcase MADV_RANDOM:\n1526:\t\tcase MADV_REMOVE:\n1527:\t\tcase MADV_WILLNEED:\n1528:\t\tcase MADV_DONTNEED:\n1529:\t\tcase MADV_DONTNEED_LOCKED:\n" }
9/2	2026/06/30 18:54	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 14630 output: 21 thoughts: 15 Thoughts: Okay, let's dive into this. Here's my take on those thoughts, as if they're my own: Investigating `madvise_inject_error` Alright, time to get to the core of this. My initial thought is to immediately go to the source. "Let's check the definition of `madvise_inject_error`". That's where the real truth lies, and I want to understand precisely what this thing is, how it's defined, and what it's supposed to do. I need that solid, fundamental understanding to move forward. No shortcuts; the definition is the starting point. I'm going to look at the relevant header files, documentation, or code comments. Whatever the canonical definition source is, that's where I need to be. Without it, everything else is just speculation, and I can't afford that. So, let's get that definition.
10/2	2026/06/30 18:54	tool	grepper	0m Args: { "Expression": "madvise_inject_error" } Results: { "Output": "mm/madvise.c=1338=static int madvise_vma_behavior(struct madvise_behavior madv_behavior)\n--\nmm/madvise.c-1446- /\nmm/madvise.c:1447:static int madvise_inject_error(struct madvise_behavior madv_behavior)\nmm/madvise.c-1448-{\n--\nmm/madvise.c=1492=static bool is_memory_failure(struct madvise_behavior madv_behavior)\n--\nmm/madvise.c-1504-\nmm/madvise.c:1505:static int madvise_inject_error(struct madvise_behavior *madv_behavior)\nmm/madvise.c-1506-{\n--\nmm/madvise.c=1887=static int madvise_do_behavior(unsigned long start, size_t len_in,\n--\nmm/madvise.c-1896-\t\trange-\u003eend = start + len_in;\nmm/madvise.c:1897:\t\treturn madvise_inject_error(madv_behavior);\nmm/madvise.c-1898-\t}\n" }
11/2	2026/06/30 18:54	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 15187 output: 36 thoughts: 21 Thoughts: Okay, here's my take on those thoughts, tailored for an expert audience, and written as if they were my own immediate processing: Diving into `madvise_inject_error` Alright, let's get down to brass tacks. I need to take a look at `madvise_inject_error`, specifically starting around line 1447. My spider senses are tingling, and I have a feeling something interesting is going to reveal itself. Given the name, I'm anticipating this function is related to the simulation or injection of errors during memory management, likely a critical piece for testing or debugging memory-related behavior under adverse conditions. I'm hoping to understand its mechanics, how it integrates with the overall `madvise` framework, and ideally, what error conditions it's designed to simulate. I'm keen on seeing if there are any subtle interactions with the page table manipulation or NUMA interactions. I'll be looking for clues about the kinds of errors it can trigger, and how it then plays back into the overall system design. Let's see what secrets are contained within this particular piece of code.
12/2	2026/06/30 18:54	tool	read-file	0m Args: { "File": "mm/madvise.c", "FirstLine": 1440, "LineCount": 40 } Results: { "Contents": "1440:\t\treturn error;\n1441:\t}\n1442:\t\n1443:\t#ifdef CONFIG_MEMORY_FAILURE\n1444:\t/\n1445:\t Error injection support for memory error handling.\n1446:\t /\n1447:\tstatic int madvise_inject_error(struct madvise_behavior madv_behavior)\n1448:\t{\n1449:\t\tunsigned long size;\n1450:\t\tunsigned long start = madv_behavior-\u003erange.start;\n1451:\t\tunsigned long end = madv_behavior-\u003erange.end;\n1452:\t\n1453:\t\tif (!capable(CAP_SYS_ADMIN))\n1454:\t\t\treturn -EPERM;\n1455:\t\n1456:\t\tfor (; start \u003c end; start += size) {\n1457:\t\t\tunsigned long pfn;\n1458:\t\t\tstruct page page;\n1459:\t\t\tint ret;\n1460:\t\n1461:\t\t\tret = get_user_pages_fast(start, 1, 0, \u0026page);\n1462:\t\t\tif (ret != 1)\n1463:\t\t\t\treturn ret;\n1464:\t\t\tpfn = page_to_pfn(page);\n1465:\t\n1466:\t\t\t/\n1467:\t\t\t * When soft offlining hugepages, after migrating the page\n1468:\t\t\t * we dissolve it, therefore in the second loop \"page\" will\n1469:\t\t\t * no longer be a compound page.\n1470:\t\t\t */\n1471:\t\t\tsize = page_size(compound_head(page));\n1472:\t\n1473:\t\t\tif (madv_behavior-\u003ebehavior == MADV_SOFT_OFFLINE) {\n1474:\t\t\t\tpr_info(\"Soft offlining pfn %#lx at process virtual address %#lx\\n\",\n1475:\t\t\t\t\t pfn, start);\n1476:\t\t\t\tret = soft_offline_page(pfn, MF_COUNT_INCREASED);\n1477:\t\t\t} else {\n1478:\t\t\t\tpr_info(\"Injecting memory failure for pfn %#lx at process virtual address %#lx\\n\",\n1479:\t\t\t\t\t pfn, start);\n" }
13/2	2026/06/30 18:54	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 16074 output: 36 thoughts: 1074 Thoughts: Analyzing the Implications of `MADV_HWPOISON` and Kernel Panics Okay, so I'm thinking about this `MADV_HWPOISON` thing and how it interacts with the kernel. It seems like this function requires `CAP_SYS_ADMIN`, which is essentially global root access. I know Syzkaller typically runs as root within its own user namespace, but the `capable()` check is against the initial user namespace, meaning it needs actual root. Since Syzkaller operates in a VM, it has that level of access and can call `madvise(MADV_HWPOISON)`. The crucial point is that `madvise(MADV_HWPOISON)` calls `get_user_pages_fast`. Now, I know `get_user_pages_fast` only returns pages mapped within the user's address space. It doesn't give you kernel pages directly. A user usually can't map a kernel page unless there's a bug or it's an intended mechanism like with `/dev/mem` or a driver issue. Oh wait, the vDSO! That's mapped in user space. Could a user then call `madvise(MADV_HWPOISON)` on the vDSO? Let me check if the vDSO is `PageReserved`. Yes, usually vDSO pages are `PageReserved`. If `get_user_pages_fast` returns the vDSO page and we call `madvise(MADV_HWPOISON)` on it, that’s where things get interesting. `memory_failure()` gets called on the vDSO page. Since it's reserved, `is_kernel_owned_page()` will return true. `get_any_page()` will then return `-ENOTRECOVERABLE`. `memory_failure()` will then end up calling `action_result(pfn, MF_MSG_KERNEL, MF_IGNORED)`. And this is where it clicks: if `sysctl_panic_on_unrecoverable_mf` is set to 1, boom, kernel panic! But this isn't a bug. This is explicit design! The sysctl is intended to panic when a memory failure hits a kernel-owned page. The patch even specifically mentions the vDSO, along with the zero page, as examples of kernel-owned pages that trigger a panic in this scenario. This is because `MADV_HWPOISON` is meant to simulate hardware memory failures, so causing a "memory failure" on the vDSO will trigger a panic. This is the intended behavior of the sysctl. So, is there even any complex logic that needs fuzzing? The patch itself primarily introduces a sysctl and changes the return value of `get_any_page()` for kernel pages. If we enable the sysctl, Syzkaller will simply find that calling `madvise(MADV_HWPOISON)` on the vDSO or the zero page panics the kernel. Syzkaller would then report this as a bug: "kernel panic: Memory failure: ... unrecoverable page." But this is NOT a bug! It’s the intended outcome of the sysctl. Now, if we tell syzkaller to fuzz this, it will just flood us with false positives. Thankfully, there is a list of known intentional panics that it ignores, or, we could just avoid enabling the sysctl. If the sysctl isn't enabled, the panic doesn't happen. The change then becomes cosmetic because `get_any_page()` returns `-ENOTRECOVERABLE` instead of `-EIO`. Before the patch, it would call `action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED)`. Now, there is only a difference in the printk message – "recovery action for kernel page: Ignored" instead of "recovery action for unknown page: Ignored". This is a negligible change. I'd better double check `get_any_page()` to be sure, though.
14/2	2026/06/30 18:55	tool	read-file	0m Args: { "File": "mm/memory-failure.c", "FirstLine": 1420, "LineCount": 40 } Results: { "Contents": "1420:\t\n1421:\tstatic int get_any_page(struct page p, unsigned long flags)\n1422:\t{\n1423:\t\tint ret = 0, pass = 0;\n1424:\t\tbool count_increased = false;\n1425:\t\n1426:\t\tif (flags \u0026 MF_COUNT_INCREASED)\n1427:\t\t\tcount_increased = true;\n1428:\t\n1429:\t\t/\n1430:\t\t * Page types we know are kernel-owned and cannot be recovered.\n1431:\t\t * Short-circuit before the shake_page() / retry loop, which\n1432:\t\t * cannot turn any of these into something HWPoisonHandlable().\n1433:\t\t * Drop the caller's reference if MF_COUNT_INCREASED took one.\n1434:\t\t /\n1435:\t\tif (is_kernel_owned_page(p)) {\n1436:\t\t\tif (count_increased)\n1437:\t\t\t\tput_page(p);\n1438:\t\t\tret = -ENOTRECOVERABLE;\n1439:\t\t\tgoto out;\n1440:\t\t}\n1441:\t\n1442:\ttry_again:\n1443:\t\tif (!count_increased) {\n1444:\t\t\tret = __get_hwpoison_page(p, flags);\n1445:\t\t\tif (!ret) {\n1446:\t\t\t\tif (page_count(p)) {\n1447:\t\t\t\t\t/ We raced with an allocation, retry. /\n1448:\t\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM)\n1449:\t\t\t\t\t\tgoto try_again;\n1450:\t\t\t\t\tret = -EBUSY;\n1451:\t\t\t\t} else if (!PageHuge(p) \u0026\u0026 !is_free_buddy_page(p)) {\n1452:\t\t\t\t\t/ We raced with put_page, retry. /\n1453:\t\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM)\n1454:\t\t\t\t\t\tgoto try_again;\n1455:\t\t\t\t\tret = -EIO;\n1456:\t\t\t\t}\n1457:\t\t\t\tgoto out;\n1458:\t\t\t} else if (ret == -EBUSY) {\n1459:\t\t\t\t/\n" }
15/2	2026/06/30 18:55	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 18658 output: 36 thoughts: 71 Thoughts: Here's my summary, as the expert: Investigating a Potentially Unexpected Memory Page State Okay, so I'm seeing `get_any_page()` being called from `memory_failure()` – which is a bit unusual. The crucial part here is that `pfn_valid()` is returning true but it's not a normal page. That's immediately flagged as a point of interest. Something's clearly not as it should be, and my first instinct is to understand the context. Wait a second, wasn’t that same function call also associated with `get_hwpoison_page()`? My memory is flagging that up. Let me shift focus immediately to `get_hwpoison_page()` because that context is likely related. I need to understand what's happening there, and how `get_any_page()` is getting involved in the hardware poisoning scenario to understand the underlying root cause. Let's dig into that function's details now.
16/2	2026/06/30 18:55	tool	read-file	0m Args: { "File": "mm/memory-failure.c", "FirstLine": 1540, "LineCount": 40 } Results: { "Contents": "1540:\t \n1541:\t Return: 0 on failure or free buddy (hugetlb) page,\n1542:\t * 1 on success for in-use pages in a well-defined state,\n1543:\t * -EIO for pages on which we can not handle memory errors,\n1544:\t * -EBUSY when get_hwpoison_page() has raced with page lifecycle\n1545:\t * operations like allocation and free,\n1546:\t * -EHWPOISON when the page is hwpoisoned and taken off from buddy,\n1547:\t * -ENOTRECOVERABLE for kernel-owned pages identified by\n1548:\t * is_kernel_owned_page() (PG_reserved, slab,\n1549:\t * page-table, large-kmalloc) that the handler cannot recover.\n1550:\t /\n1551:\tstatic int get_hwpoison_page(struct page p, unsigned long flags)\n1552:\t{\n1553:\t\tint ret;\n1554:\t\n1555:\t\tzone_pcp_disable(page_zone(p));\n1556:\t\tif (flags \u0026 MF_UNPOISON)\n1557:\t\t\tret = __get_unpoison_page(p);\n1558:\t\telse\n1559:\t\t\tret = get_any_page(p, flags);\n1560:\t\tzone_pcp_enable(page_zone(p));\n1561:\t\n1562:\t\treturn ret;\n1563:\t}\n1564:\t\n1565:\t/\n1566:\t The caller must guarantee the folio isn't large folio, except hugetlb.\n1567:\t * try_to_unmap() can't handle it.\n1568:\t /\n1569:\tint unmap_poisoned_folio(struct folio folio, unsigned long pfn, bool must_kill)\n1570:\t{\n1571:\t\tenum ttu_flags ttu = TTU_IGNORE_MLOCK \| TTU_SYNC \| TTU_HWPOISON;\n1572:\t\tstruct address_space mapping;\n1573:\t\n1574:\t\tif (folio_test_swapcache(folio)) {\n1575:\t\t\tpr_err(\"%#lx: keeping poisoned page in swap cache\\n\", pfn);\n1576:\t\t\tttu \u0026= ~TTU_HWPOISON;\n1577:\t\t}\n1578:\t\n1579:\t\t/\n" }
17/2	2026/06/30 18:55	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 19640 output: 36 thoughts: 75 Thoughts: Okay, let's break this down. Diving into the `hwpoison` Page Acquisition Logic Right, so `get_hwpoison_page()` is the entry point, and it's calling `get_any_page()`. That seems reasonable as a way to generalize the page allocation process. Now, the interesting bit is that `get_any_page()` has been updated to short-circuit if it's dealing with pages the kernel already owns. That's a significant optimization; avoiding unnecessary calls is always a win. Before this change, though, the code path would have landed directly in `__get_hwpoison_page(p, flags)`. Given the architecture, this is the function where the core logic of handling the hardware poison is likely handled. Therefore, I need to understand that piece as that's where the work used to take place. Let's see what `__get_hwpoison_page()` actually does.
18/2	2026/06/30 18:55	tool	read-file	0m Args: { "File": "mm/memory-failure.c", "FirstLine": 1380, "LineCount": 40 } Results: { "Contents": "1380:\t\treturn kernel_owned;\n1381:\t}\n1382:\t\n1383:\tstatic int __get_hwpoison_page(struct page page, unsigned long flags)\n1384:\t{\n1385:\t\tstruct folio folio = page_folio(page);\n1386:\t\tint ret = 0;\n1387:\t\tbool hugetlb = false;\n1388:\t\n1389:\t\tret = get_hwpoison_hugetlb_folio(folio, \u0026hugetlb, false);\n1390:\t\tif (hugetlb) {\n1391:\t\t\t/* Make sure hugetlb demotion did not happen from under us. /\n1392:\t\t\tif (folio == page_folio(page))\n1393:\t\t\t\treturn ret;\n1394:\t\t\tif (ret \u003e 0) {\n1395:\t\t\t\tfolio_put(folio);\n1396:\t\t\t\tfolio = page_folio(page);\n1397:\t\t\t}\n1398:\t\t}\n1399:\t\n1400:\t\t/\n1401:\t\t * This check prevents from calling folio_try_get() for any\n1402:\t\t * unsupported type of folio in order to reduce the risk of unexpected\n1403:\t\t * races caused by taking a folio refcount.\n1404:\t\t */\n1405:\t\tif (!HWPoisonHandlable(\u0026folio-\u003epage, flags))\n1406:\t\t\treturn -EBUSY;\n1407:\t\n1408:\t\tif (folio_try_get(folio)) {\n1409:\t\t\tif (folio == page_folio(page))\n1410:\t\t\t\treturn 1;\n1411:\t\n1412:\t\t\tpr_info(\"%#lx cannot catch tail\\n\", page_to_pfn(page));\n1413:\t\t\tfolio_put(folio);\n1414:\t\t}\n1415:\t\n1416:\t\treturn 0;\n1417:\t}\n1418:\t\n1419:\t#define GET_PAGE_MAX_RETRY_NUM 3\n" }
19/2	2026/06/30 18:55	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 20510 output: 19 thoughts: 1745 Thoughts: Analysis of the Memory Failure Patch for Fuzzing Suitability Okay, let's break this down. The core change here is that kernel pages, when a memory failure is injected, will now lead to `-ENOTRECOVERABLE` being returned immediately from `get_any_page()`, skipping the retry logic and `shake_page()`. Instead of `-EIO`, which was previously handled by `action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED)`, we now get `action_result(pfn, MF_MSG_KERNEL, MF_IGNORED)`. A new sysctl, `panic_on_unrecoverable_memory_failure`, can be enabled, which will trigger a panic if the failure occurs on a kernel page and the result is `MF_IGNORED`, which is the new default. My first thought is, is this even worth fuzzing? The prompt's criteria are key. The modification is to core kernel logic (`mm/memory-failure.c`). The paths are reachable, both through `madvise(MADV_HWPOISON)` and `/sys/devices/system/memory/hard_offline_page`. The changes do alter the execution flow, specifically short-circuiting the existing retry mechanism. The panic, while intentional, is a functional change. However, the main change is that intentional panic. Enabling the sysctl will lead to syzkaller reporting those as bugs. I need to be careful with `FocusSymbols` and `EnableConfigs`. Focusing on `is_kernel_owned_page` or `panic_on_unrecoverable_mf` with that sysctl enabled means syzkaller would hit the panic repeatedly, leading to false positives. But then I remember syzkaller's job: "If it modifies reachable core kernel logic, drivers, or architectures, use your code search tools to verify the code can be executed, then return WorthFuzzing=true." So, even though there's no complex logic being introduced, there's a functional change in `mm/memory-failure.c` - a change of return code and a short-circuit, it meets the criteria. So, I need to say `WorthFuzzing=true`. If I say `WorthFuzzing=true` I have to list `is_kernel_owned_page` and `panic_on_unrecoverable_mf` as focus symbols. Syzkaller should be trying to inject memory failures into kernel pages, and that is what the new logic deals with. The panic, while intentional, is still a modification that could potentially expose issues, maybe. The fact that the panic will be detected as a bug should not be a concern. Therefore, because this touches core kernel logic that is reachable and functionally alters the execution, I'm going with `WorthFuzzing=true`. ``` WorthFuzzing=true FocusSymbols=[is_kernel_owned_page, panic_on_unrecoverable_mf] ```
20/2	2026/06/30 18:55	tool	grepper	0m Args: { "Expression": "config MEMORY_FAILURE" } Results: { "Output": "mm/Kconfig=774=config ARCH_SUPPORTS_MEMORY_FAILURE\n--\nmm/Kconfig-776-\nmm/Kconfig:777:config MEMORY_FAILURE\nmm/Kconfig-778-\tdepends on MMU\n" }
21/2	2026/06/30 18:55	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 23008 output: 21 thoughts: 28 Thoughts: Okay, here's my take on those thoughts, tailored for an expert audience: Configuration Considerations for Memory Testing Alright, so I'm honing in on the crucial configuration parameters for memory testing. My initial gut feeling is that `MEMORY_FAILURE` is the primary setting I'll be working with. That's the go-to, as it directly simulates memory errors, which is what I'm aiming to evaluate and understand. However, I'm already mentally cross-referencing that with another powerful option: `HWPOISON_INJECT`. Now, this is where it gets interesting. While `MEMORY_FAILURE` might be more straightforward for certain scenarios, `HWPOISON_INJECT` offers a different level of control and potentially more realistic simulation, as it can be used to inject memory errors into the system in a way that is more hardware-like. I'll need to consider which approach, or combination of approaches, will give me the most insightful results, depending on my specific test objectives. Both seem valuable, but my assessment on which is best will be made during a deeper evaluation.
22/2	2026/06/30 18:56	tool	grepper	0m Args: { "Expression": "config HWPOISON_INJECT" } Results: { "Output": "mm/Kconfig=777=config MEMORY_FAILURE\n--\nmm/Kconfig-787-\nmm/Kconfig:788:config HWPOISON_INJECT\nmm/Kconfig-789-\ttristate \"HWPoison pages injector\"\n" }
23/2	2026/06/30 18:56	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 23366 output: 21 thoughts: 245 Thoughts: My Analysis of the `MADV_HWPOISON` Dependency and WorthFuzzing Patch Okay, let's break this down. First, I need to confirm something: is `MADV_HWPOISON` dependent on `CONFIG_MEMORY_FAILURE`? A quick grep in `mm/madvise.c` confirms it; the code directly uses `#ifdef CONFIG_MEMORY_FAILURE` around a function relevant to error injection. So, yes, the dependency is there. Got it. Now, revisiting the `WorthFuzzing` question. This patch introduces a sysctl that intentionally triggers a kernel panic. The implication is clear: syzkaller could be used to directly cause panics. The immediate question is, does fuzzing this feature help uncover other bugs? I'm not immediately seeing it. However, the `get_any_page()` short-circuit is an important functional change that avoids `shake_page()`. It introduces retry logic, which, on its own, seems functional. If `count_increased` is true, it calls `put_page(p)`. Hang on… `count_increased` depends on the `flags` parameter, specifically whether `flags & MF_COUNT_INCREASED` is set. Okay, I need to see where `MF_COUNT_INCREASED` is used and passed. I'll search for that now.
24/2	2026/06/30 18:56	tool	grepper	0m Args: { "Expression": "MF_COUNT_INCREASED" } Results: { "Output": "include/linux/mm.h=4982=enum mf_flags {\ninclude/linux/mm.h:4983:\tMF_COUNT_INCREASED = 1 \u003c\u003c 0,\ninclude/linux/mm.h-4984-\tMF_ACTION_REQUIRED = 1 \u003c\u003c 1,\n--\nmm/madvise.c=1447=static int madvise_inject_error(struct madvise_behavior madv_behavior)\n--\nmm/madvise.c-1475-\t\t\t\t pfn, start);\nmm/madvise.c:1476:\t\t\tret = soft_offline_page(pfn, MF_COUNT_INCREASED);\nmm/madvise.c-1477-\t\t} else {\n--\nmm/madvise.c-1479-\t\t\t\t pfn, start);\nmm/madvise.c:1480:\t\t\tret = memory_failure(pfn, MF_ACTION_REQUIRED \| MF_COUNT_INCREASED \| MF_SW_SIMULATED);\nmm/madvise.c-1481-\t\t\tif (ret == -EOPNOTSUPP)\n--\nmm/memory-failure.c=1421=static int get_any_page(struct page p, unsigned long flags)\n--\nmm/memory-failure.c-1425-\nmm/memory-failure.c:1426:\tif (flags \u0026 MF_COUNT_INCREASED)\nmm/memory-failure.c-1427-\t\tcount_increased = true;\n--\nmm/memory-failure.c-1432-\t * cannot turn any of these into something HWPoisonHandlable().\nmm/memory-failure.c:1433:\t * Drop the caller's reference if MF_COUNT_INCREASED took one.\nmm/memory-failure.c-1434-\t /\n--\nmm/memory-failure.c=2014=static int get_huge_page_for_hwpoison(unsigned long pfn, int flags,\n--\nmm/memory-failure.c-2026-\t\tgoto out_unlock;\nmm/memory-failure.c:2027:\t} else if (flags \u0026 MF_COUNT_INCREASED) {\nmm/memory-failure.c-2028-\t\tret = MF_HUGETLB_IN_USED;\n--\nmm/memory-failure.c=2167=static void put_ref_page(unsigned long pfn, int flags)\nmm/memory-failure.c-2168-{\nmm/memory-failure.c:2169:\tif (!(flags \u0026 MF_COUNT_INCREASED))\nmm/memory-failure.c-2170-\t\treturn;\n--\nmm/memory-failure.c=2387=int memory_failure(unsigned long pfn, int flags)\n--\nmm/memory-failure.c-2443-\t\t\tres = kill_accessing_process(current, pfn, flags);\nmm/memory-failure.c:2444:\t\tif (flags \u0026 MF_COUNT_INCREASED)\nmm/memory-failure.c-2445-\t\t\tput_page(p);\n--\nmm/memory-failure.c-2520-\t\t it is called by soft offline too and it is just called\nmm/memory-failure.c:2521:\t\t * for !MF_COUNT_INCREASED. So here seems to be the best\nmm/memory-failure.c-2522-\t\t * place.\n--\nmm/memory-failure.c=2968=int soft_offline_page(unsigned long pfn, int flags)\n--\nmm/memory-failure.c-2974-\tif (!pfn_valid(pfn)) {\nmm/memory-failure.c:2975:\t\tWARN_ON_ONCE(flags \u0026 MF_COUNT_INCREASED);\nmm/memory-failure.c-2976-\t\treturn -ENXIO;\n--\nmm/memory-failure.c-3019-\t\t\t\ttry_again = false;\nmm/memory-failure.c:3020:\t\t\t\tflags \u0026= ~MF_COUNT_INCREASED;\nmm/memory-failure.c-3021-\t\t\t\tgoto retry;\n" }
25/2	2026/06/30 18:56	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 24976 output: 36 thoughts: 471 Thoughts: Analyzing Potential Reference Count Issues in Memory Failure Handling Okay, so I'm digging into how `madvise_inject_error` and the subsequent memory failure handling work, and something is starting to smell a bit fishy. I see that `madvise_inject_error` passes `MF_COUNT_INCREASED` because it calls `get_user_pages_fast`, which, as we know, takes a reference on the page. Makes sense, that's expected behavior. Then `memory_failure` kicks in and eventually calls `get_hwpoison_page`. That function, in turn, calls `get_any_page`. Now, here's where my alarm bells start ringing. Inside `get_any_page`, there's this bit of code: ```c if (is_kernel_owned_page(p)) { if (count_increased) put_page(p); ret = -ENOTRECOVERABLE; goto out; } ``` Now, hold on a second. If `is_kernel_owned_page` is true, and `count_increased` is also true, it calls `put_page(p)`. That `put_page` is dropping the reference that `get_user_pages_fast` originally took! But, `memory_failure` also calls `get_hwpoison_page` which returns `-ENOTRECOVERABLE`, and also does some extra processing: ```c res = get_hwpoison_page(p, flags); switch (res) { case 0: ... case 1: /* Got a refcount on a handlable page. / break; case -ENOTRECOVERABLE: / * Stable unhandlable kernel-owned page (PG_reserved, * slab, page tables, large-kmalloc). * No recovery possible. / res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED); goto unlock_mutex; ``` If `get_hwpoison_page` returns `-ENOTRECOVERABLE`, it branches to `unlock_mutex`. And what does that do? ```c unlock_mutex: mutex_unlock(&mf_mutex); if (hw_memory_failure) pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[MF_MSG_KERNEL], action_name[MF_IGNORED]); return res; ``` Ah, it just returns. Does it actually drop the reference to the page if `get_hwpoison_page` returns `-ENOTRECOVERABLE`? That's the key question. I need to be absolutely* sure if we're leaking this reference under this particular path. I better go and check the surrounding code of `memory_failure` around line 2440 to make sure the reference isn't leaked.
26/2	2026/06/30 18:56	tool	read-file	0m Args: { "File": "mm/memory-failure.c", "FirstLine": 2430, "LineCount": 40 } Results: { "Contents": "2430:\t\t}\n2431:\t\n2432:\ttry_again:\n2433:\t\tres = try_memory_failure_hugetlb(pfn, flags);\n2434:\t\t/\n2435:\t\t -ENOENT means the page we found is not hugetlb, so proceed with normal page handling\n2436:\t\t /\n2437:\t\tif (res != -ENOENT)\n2438:\t\t\tgoto unlock_mutex;\n2439:\t\n2440:\t\tif (TestSetPageHWPoison(p)) {\n2441:\t\t\tres = -EHWPOISON;\n2442:\t\t\tif (flags \u0026 MF_ACTION_REQUIRED)\n2443:\t\t\t\tres = kill_accessing_process(current, pfn, flags);\n2444:\t\t\tif (flags \u0026 MF_COUNT_INCREASED)\n2445:\t\t\t\tput_page(p);\n2446:\t\t\taction_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);\n2447:\t\t\tgoto unlock_mutex;\n2448:\t\t}\n2449:\t\n2450:\t\t/\n2451:\t\t * We need/can do nothing about count=0 pages.\n2452:\t\t * 1) it's a free page, and therefore in safe hand:\n2453:\t\t * check_new_page() will be the gate keeper.\n2454:\t\t * 2) it's part of a non-compound high order page.\n2455:\t\t * Implies some kernel user: cannot stop them from\n2456:\t\t * R/W the page; let's pray that the page has been\n2457:\t\t * used and will be freed some time later.\n2458:\t\t * In fact it's dangerous to directly bump up page count from 0,\n2459:\t\t * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.\n2460:\t\t /\n2461:\t\tres = get_hwpoison_page(p, flags);\n2462:\t\tswitch (res) {\n2463:\t\tcase 0:\n2464:\t\t\tif (is_free_buddy_page(p)) {\n2465:\t\t\t\tif (take_page_off_buddy(p)) {\n2466:\t\t\t\t\tpage_ref_inc(p);\n2467:\t\t\t\t\tres = MF_RECOVERED;\n2468:\t\t\t\t} else {\n2469:\t\t\t\t\t/ We lost the race, try again */\n" }
27/2	2026/06/30 18:56	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 26811 output: 36 thoughts: 126 Thoughts: Analyzing the HWPoison Page Handling Okay, so I'm focusing on this `get_hwpoison_page` function. A crucial point here is that if it returns `-ENOTRECOVERABLE`, the code proceeds to `unlock_mutex`. The core of my concern here is does it manage to drop the page's reference properly after encountering that error? I'm starting to remember the pre-patch behavior of `get_any_page`. It looks like it would call `__get_hwpoison_page`, but only under specific circumstances. Specifically, if `count_increased` was false. I need to circle back to `get_any_page`. If `count_increased` is true, then `__get_hwpoison_page` wasn't called, but now I need to check whether that's problematic. I must verify the logic surrounding that conditional call carefully. This likely has implications for the reference counts and overall memory integrity. I'll need to scrutinize how those two branches behave in tandem to truly understand the change.
28/2	2026/06/30 18:56	tool	read-file	0m Args: { "File": "mm/memory-failure.c", "FirstLine": 1440, "LineCount": 40 } Results: { "Contents": "1440:\t\t}\n1441:\t\n1442:\ttry_again:\n1443:\t\tif (!count_increased) {\n1444:\t\t\tret = __get_hwpoison_page(p, flags);\n1445:\t\t\tif (!ret) {\n1446:\t\t\t\tif (page_count(p)) {\n1447:\t\t\t\t\t/* We raced with an allocation, retry. /\n1448:\t\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM)\n1449:\t\t\t\t\t\tgoto try_again;\n1450:\t\t\t\t\tret = -EBUSY;\n1451:\t\t\t\t} else if (!PageHuge(p) \u0026\u0026 !is_free_buddy_page(p)) {\n1452:\t\t\t\t\t/ We raced with put_page, retry. /\n1453:\t\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM)\n1454:\t\t\t\t\t\tgoto try_again;\n1455:\t\t\t\t\tret = -EIO;\n1456:\t\t\t\t}\n1457:\t\t\t\tgoto out;\n1458:\t\t\t} else if (ret == -EBUSY) {\n1459:\t\t\t\t/\n1460:\t\t\t\t * We raced with (possibly temporary) unhandlable\n1461:\t\t\t\t * page, retry.\n1462:\t\t\t\t /\n1463:\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM) {\n1464:\t\t\t\t\tshake_page(p);\n1465:\t\t\t\t\tgoto try_again;\n1466:\t\t\t\t}\n1467:\t\t\t\tret = -EIO;\n1468:\t\t\t\tgoto out;\n1469:\t\t\t}\n1470:\t\t}\n1471:\t\n1472:\t\tif (PageHuge(p) \|\| HWPoisonHandlable(p, flags)) {\n1473:\t\t\tret = 1;\n1474:\t\t} else {\n1475:\t\t\t/\n1476:\t\t\t * A page we cannot handle. Check whether we can turn\n1477:\t\t\t * it into something we can handle.\n1478:\t\t\t */\n1479:\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM) {\n" }
29/2	2026/06/30 18:56	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 27773 output: 19 thoughts: 1712 Thoughts: Analysis of the Memory Failure Patch Okay, let's break this down. My initial thought was that this patch, while changing the return value and short-circuiting for kernel pages, was primarily about enabling a panic and therefore probably not worth fuzzing, since syzkaller would likely treat the panic as a bug. But that short-circuit is still potentially a functional change. The problem is that hitting the panic is essentially the main functional change, and that's not what we typically want fuzzing to do. However, I need to stick with the instructions provided. I read the prompt again, and it explicitly states that if core kernel logic is modified and reachable, I should return `WorthFuzzing=true`. This patch does modify core kernel logic within `mm/memory-failure.c`. It alters the behavior of the memory failure handler. And it can be executed in virtualized environments via `madvise(MADV_HWPOISON)` on kernel-owned pages, like the vDSO. Therefore, I need to return `WorthFuzzing=true`. Now, the important part: what should I tell the fuzzer to focus on? I was thinking about `is_kernel_owned_page` initially but then remembered that it is an inline function, so it likely won't appear as a symbol in the coverage. However, the function that calls it (`get_any_page`), is not inline, and is a relevant point of interaction. It seems like a good target. The other potential target is `panic_on_unrecoverable_mf`. I need to make sure that the proper configurations are provided, given that `MEMORY_FAILURE` is needed. I'm going to list it anyway, and double check to be extra certain. `HWPOISON_INJECT` might be needed, but `MADV_HWPOISON` is the primary thing, and I should only list `MEMORY_FAILURE`. So, the plan is: * WorthFuzzing=true * FocusSymbols: * `get_any_page` * `panic_on_unrecoverable_mf` * EnableConfigs: * `MEMORY_FAILURE` The reasoning here is that the patch alters the memory failure handling logic, specifically to short-circuit recovery and optionally trigger a panic for kernel-owned pages. This functionality can be reached via `madvise(MADV_HWPOISON)` on kernel pages, even in a virtualized environment. The selected focus symbols are `get_any_page` and `panic_on_unrecoverable_mf`, as these are the core functions impacted by the patch. The configuration `MEMORY_FAILURE` enables the necessary infrastructure. `get_any_page` is not a performance critical function.
30/2	2026/06/30 18:57	tool	grepper	0m Args: { "Expression": "get_any_page" } Results: { "Output": "mm/memory-failure.c=1383=static int __get_hwpoison_page(struct page page, unsigned long flags)\n--\nmm/memory-failure.c-1420-\nmm/memory-failure.c:1421:static int get_any_page(struct page p, unsigned long flags)\nmm/memory-failure.c-1422-{\n--\nmm/memory-failure.c=1495=static int __get_unpoison_page(struct page page)\n--\nmm/memory-failure.c-1534- extra care for the error page's state (as done in __get_hwpoison_page()),\nmm/memory-failure.c:1535: * and has some retry logic in get_any_page().\nmm/memory-failure.c-1536- \n--\nmm/memory-failure.c=1551=static int get_hwpoison_page(struct page p, unsigned long flags)\n--\nmm/memory-failure.c-1558-\telse\nmm/memory-failure.c:1559:\t\tret = get_any_page(p, flags);\nmm/memory-failure.c-1560-\tzone_pcp_enable(page_zone(p));\n--\nmm/memory-failure.c=2850=static int soft_offline_in_use_page(struct page page)\n--\nmm/memory-failure.c-2910-\t If we succeed to isolate the folio, we grabbed another refcount on\nmm/memory-failure.c:2911:\t * the folio, so we can safely drop the one we got from get_any_page().\nmm/memory-failure.c-2912-\t * If we failed to isolate the folio, it means that we cannot go further\nmm/memory-failure.c-2913-\t * and we will return an error, so drop the reference we got from\nmm/memory-failure.c:2914:\t * get_any_page() as well.\nmm/memory-failure.c-2915-\t */\n--\ntools/testing/selftests/mm/hwpoison-panic.sh-11-# (sourced from /proc/iomem \"Kernel rodata\"). Exercises\ntools/testing/selftests/mm/hwpoison-panic.sh:12:# memory_failure() -\u003e get_any_page() on a PageReserved page.\ntools/testing/selftests/mm/hwpoison-panic.sh-13-#\ntools/testing/selftests/mm/hwpoison-panic.sh-14-# slab - a slab page found via /proc/kpageflags (KPF_SLAB).\ntools/testing/selftests/mm/hwpoison-panic.sh:15:# Exercises memory_failure() -\u003e get_any_page() on a non\ntools/testing/selftests/mm/hwpoison-panic.sh-16-# PG_reserved kernel-owned page. This path is what catches\ntools/testing/selftests/mm/hwpoison-panic.sh:17:# regressions where get_any_page() collapses kernel-owned\ntools/testing/selftests/mm/hwpoison-panic.sh-18-# pages into a transient -EIO instead of -ENOTRECOVERABLE.\n" }
31/2	2026/06/30 18:57	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 30922 output: 124 thoughts: 25 Thoughts: Okay, here's my take on those thoughts, tailored for an expert audience: Confirming Function Execution and Proceeding to Data Setting Alright, after reviewing the context, I'm confident that `get_any_page` is the function to use here. It aligns with the requirements of this stage, specifically the need to retrieve the necessary data. I'm satisfied with the function selection. Now, with that confirmed, I will proceed to call `set-results`. This function likely handles the population or manipulation of my data structure using the result fetched by `get_any_page`. It is the next logical step in the process, enabling downstream operations on the acquired data.
32/2	2026/06/30 18:57	tool	set-results	0m Args: { "EnableConfigs": [ "MEMORY_FAILURE" ], "FocusSymbols": [ "get_any_page", "panic_on_unrecoverable_mf" ], "Reasoning": "The patch modifies the memory failure handler in core MM to short-circuit recovery for kernel-owned pages and optionally panic. This error-handling path is reachable in virtualized environments via madvise(MADV_HWPOISON) on kernel-owned pages mapped in user space (like the vDSO) or via sysfs error injection.", "WorthFuzzing": true } Results: { "EnableConfigs": [ "MEMORY_FAILURE" ], "FocusSymbols": [ "get_any_page", "panic_on_unrecoverable_mf" ], "Reasoning": "The patch modifies the memory failure handler in core MM to short-circuit recovery for kernel-owned pages and optionally panic. This error-handling path is reachable in virtualized environments via madvise(MADV_HWPOISON) on kernel-owned pages mapped in user space (like the vDSO) or via sysfs error injection.", "WorthFuzzing": true }

{
  "EnableConfigs": [
    "MEMORY_FAILURE"
  ],
  "FocusSymbols": [
    "get_any_page",
    "panic_on_unrecoverable_mf"
  ],
  "Reasoning": "The patch modifies the memory failure handler in core MM to short-circuit recovery for kernel-owned pages and optionally panic. This error-handling path is reachable in virtualized environments via madvise(MADV_HWPOISON) on kernel-owned pages mapped in user space (like the vDSO) or via sysfs error injection.",
  "WorthFuzzing": true
}

{
  "PatchDiff": "commit cd6649a5d4f7cd49ae8556e2d53436c6b6caba06\nAuthor: syz-cluster \u003ctriage@syzkaller.com\u003e\nDate:   Tue Jun 30 18:54:07 2026 +0000\n\n    syz-cluster: applied patch under review\n\ndiff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst\nindex b9b0c218bfb440..22cc54cac3b21d 100644\n--- a/Documentation/admin-guide/sysctl/vm.rst\n+++ b/Documentation/admin-guide/sysctl/vm.rst\n@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:\n - page-cluster\n - page_lock_unfairness\n - panic_on_oom\n+- panic_on_unrecoverable_memory_failure\n - percpu_pagelist_high_fraction\n - stat_interval\n - stat_refresh\n@@ -925,6 +926,85 @@ panic_on_oom=2+kdump gives you very strong tool to investigate\n why oom happens. You can get snapshot.\n \n \n+panic_on_unrecoverable_memory_failure\n+======================================\n+\n+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page\n+that cannot be recovered by the memory failure handler, the default\n+behaviour is to ignore the error and continue operation.  This is\n+dangerous because the corrupted data remains accessible to the kernel,\n+risking silent data corruption or a delayed crash when the poisoned\n+memory is next accessed.\n+\n+When enabled, this sysctl triggers a panic on memory failure events\n+hitting kernel-owned pages that the handler cannot recover:\n+``PageReserved`` (firmware reservations, kernel image, vDSO, zero\n+page, and similar memblock-reserved regions), ``PageSlab``,\n+``PageTable``, and ``PageLargeKmalloc``.  These are owned by the\n+kernel and the memory failure handler cannot reliably evict their\n+contents.\n+\n+Other unrecoverable kernel-owned populations (vmalloc allocations,\n+kernel stack pages, ...) are not currently covered because the\n+handler has no page-type signal that distinguishes them from a\n+userspace folio temporarily off the LRU during migration or\n+compaction.  Such pages still go through the standard\n+MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a\n+delayed crash on the next access remains possible.  Coverage may\n+grow as the handler gains stronger kernel-ownership signals.\n+\n+Recoverable failure paths are also intentionally left out: in-flight\n+buddy allocations and other transient races with the page allocator\n+can reach the same diagnostic, and panicking on them would risk\n+killing the box for a page destined for userspace where the standard\n+SIGBUS recovery path applies.  Pages whose state could not be\n+classified at all are not covered either, since an unknown state is\n+not a sound basis for a panic decision.\n+\n+For many environments it is preferable to panic immediately with a clean\n+crash dump that captures the original error context, rather than to\n+continue and face a random crash later whose cause is difficult to\n+diagnose.\n+\n+Use cases\n+---------\n+\n+This option is most useful in environments where unattributed crashes\n+are expensive to debug or where data integrity must take precedence\n+over availability:\n+\n+* Large fleets, where multi-bit ECC errors on kernel pages are observed\n+  regularly and post-mortem analysis of an unrelated downstream crash\n+  (often seconds to minutes after the original error) consumes\n+  significant engineering effort.\n+\n+* Systems configured with kdump, where panicking at the moment of the\n+  hardware error produces a vmcore that still contains the faulting\n+  address, the affected page state, and the originating MCE/GHES\n+  record — context that is typically lost by the time a delayed crash\n+  occurs.\n+\n+* High-availability clusters that rely on fast, deterministic node\n+  failure for failover, and prefer an immediate panic over silent data\n+  corruption propagating to replicas or persistent storage.\n+\n+* Kernel and platform developers reproducing hwpoison issues with\n+  tools such as ``mce-inject`` or error-injection debugfs interfaces,\n+  where panicking on the unrecoverable path makes regressions\n+  immediately visible instead of surfacing as later, unrelated\n+  failures.\n+\n+= =====================================================================\n+0 Try to continue operation (default).\n+1 Panic immediately.  If the ``panic`` sysctl is also non-zero then the\n+  machine will be rebooted.\n+= =====================================================================\n+\n+Example::\n+\n+     echo 1 \u003e /proc/sys/vm/panic_on_unrecoverable_memory_failure\n+\n+\n percpu_pagelist_high_fraction\n =============================\n \ndiff --git a/mm/memory-failure.c b/mm/memory-failure.c\nindex 51508a55c4055e..e097fc8262cf83 100644\n--- a/mm/memory-failure.c\n+++ b/mm/memory-failure.c\n@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;\n \n static int sysctl_enable_soft_offline __read_mostly = 1;\n \n+static int sysctl_panic_on_unrecoverable_mf __read_mostly;\n+\n atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);\n \n static bool hw_memory_failure __read_mostly = false;\n@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {\n \t\t.proc_handler\t= proc_dointvec_minmax,\n \t\t.extra1\t\t= SYSCTL_ZERO,\n \t\t.extra2\t\t= SYSCTL_ONE,\n+\t},\n+\t{\n+\t\t.procname\t= \"panic_on_unrecoverable_memory_failure\",\n+\t\t.data\t\t= \u0026sysctl_panic_on_unrecoverable_mf,\n+\t\t.maxlen\t\t= sizeof(sysctl_panic_on_unrecoverable_mf),\n+\t\t.mode\t\t= 0644,\n+\t\t.proc_handler\t= proc_dointvec_minmax,\n+\t\t.extra1\t\t= SYSCTL_ZERO,\n+\t\t.extra2\t\t= SYSCTL_ONE,\n \t}\n };\n \n@@ -980,17 +991,6 @@ static bool has_extra_refcount(struct page_state *ps, struct page *p,\n \treturn false;\n }\n \n-/*\n- * Error hit kernel page.\n- * Do nothing, try to be lucky and not touch this instead. For a few cases we\n- * could be more sophisticated.\n- */\n-static int me_kernel(struct page_state *ps, struct page *p)\n-{\n-\tunlock_page(p);\n-\treturn MF_IGNORED;\n-}\n-\n /*\n  * Page in unknown state. Do nothing.\n  * This is a catch-all in case we fail to make sense of the page state.\n@@ -1199,10 +1199,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)\n #define mlock\t\t(1UL \u003c\u003c PG_mlocked)\n #define lru\t\t(1UL \u003c\u003c PG_lru)\n #define head\t\t(1UL \u003c\u003c PG_head)\n-#define reserved\t(1UL \u003c\u003c PG_reserved)\n \n static struct page_state error_states[] = {\n-\t{ reserved,\treserved,\tMF_MSG_KERNEL,\tme_kernel },\n \t/*\n \t * free pages are specially detected outside this table:\n \t * PG_buddy pages only make a small fraction of all free pages.\n@@ -1234,7 +1232,6 @@ static struct page_state error_states[] = {\n #undef mlock\n #undef lru\n #undef head\n-#undef reserved\n \n static void update_per_node_mf_stats(unsigned long pfn,\n \t\t\t\t     enum mf_result result)\n@@ -1269,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long pfn,\n \t++mf_stats-\u003etotal;\n }\n \n+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,\n+\t\t\t\t      enum mf_result result)\n+{\n+\tif (!sysctl_panic_on_unrecoverable_mf)\n+\t\treturn false;\n+\n+\treturn type == MF_MSG_KERNEL \u0026\u0026 result == MF_IGNORED;\n+}\n+\n /*\n  * \"Dirty/Clean\" indication is not 100% accurate due to the possibility of\n  * setting PG_dirty outside page lock. See also comment above set_page_dirty().\n@@ -1286,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,\n \tpr_err(\"%#lx: recovery action for %s: %s\\n\",\n \t\tpfn, action_page_types[type], action_name[result]);\n \n+\tif (panic_on_unrecoverable_mf(type, result))\n+\t\tpanic(\"Memory failure: %#lx: unrecoverable page\", pfn);\n+\n \treturn (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;\n }\n \n@@ -1339,6 +1348,38 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)\n \treturn PageLRU(page) || is_free_buddy_page(page);\n }\n \n+/*\n+ * Positive identification of pages the hwpoison handler cannot recover:\n+ * pages owned by kernel internals with no userspace mapping to unmap, no\n+ * file mapping to invalidate, and no migration target.\n+ */\n+static inline bool is_kernel_owned_page(struct page *page)\n+{\n+\tstruct page *head;\n+\tbool kernel_owned;\n+\n+\t/* PG_reserved is a per-page flag, never set on a compound page. */\n+\tif (PageReserved(page))\n+\t\treturn true;\n+\n+\t/*\n+\t * Page-type bits live only on the head page, so resolve any tail\n+\t * first.  The check takes no refcount; recheck the head afterwards\n+\t * so a concurrent split or compound free cannot leave us trusting\n+\t * a stale view.  A residual free-\u003ealloc-\u003efree cannot be closed here\n+\t * (frozen slab and large-kmalloc pages cannot be pinned), but is\n+\t * harmless: where a wrong verdict could panic, memory_failure() has\n+\t * already set PageHWPoison, which bars the page from the allocator.\n+\t */\n+retry:\n+\thead = compound_head(page);\n+\tkernel_owned = PageSlab(head) || PageTable(head) ||\n+\t\t       PageLargeKmalloc(head);\n+\tif (head != compound_head(page))\n+\t\tgoto retry;\n+\treturn kernel_owned;\n+}\n+\n static int __get_hwpoison_page(struct page *page, unsigned long flags)\n {\n \tstruct folio *folio = page_folio(page);\n@@ -1385,6 +1426,19 @@ static int get_any_page(struct page *p, unsigned long flags)\n \tif (flags \u0026 MF_COUNT_INCREASED)\n \t\tcount_increased = true;\n \n+\t/*\n+\t * Page types we know are kernel-owned and cannot be recovered.\n+\t * Short-circuit before the shake_page() / retry loop, which\n+\t * cannot turn any of these into something HWPoisonHandlable().\n+\t * Drop the caller's reference if MF_COUNT_INCREASED took one.\n+\t */\n+\tif (is_kernel_owned_page(p)) {\n+\t\tif (count_increased)\n+\t\t\tput_page(p);\n+\t\tret = -ENOTRECOVERABLE;\n+\t\tgoto out;\n+\t}\n+\n try_again:\n \tif (!count_increased) {\n \t\tret = __get_hwpoison_page(p, flags);\n@@ -1432,7 +1486,7 @@ static int get_any_page(struct page *p, unsigned long flags)\n \t\tret = -EIO;\n \t}\n out:\n-\tif (ret == -EIO)\n+\tif (ret == -EIO || ret == -ENOTRECOVERABLE)\n \t\tpr_err(\"%#lx: unhandlable page.\\n\", page_to_pfn(p));\n \n \treturn ret;\n@@ -1489,7 +1543,10 @@ static int __get_unpoison_page(struct page *page)\n  *         -EIO for pages on which we can not handle memory errors,\n  *         -EBUSY when get_hwpoison_page() has raced with page lifecycle\n  *         operations like allocation and free,\n- *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.\n+ *         -EHWPOISON when the page is hwpoisoned and taken off from buddy,\n+ *         -ENOTRECOVERABLE for kernel-owned pages identified by\n+ *         is_kernel_owned_page() (PG_reserved, slab,\n+ *         page-table, large-kmalloc) that the handler cannot recover.\n  */\n static int get_hwpoison_page(struct page *p, unsigned long flags)\n {\n@@ -2402,7 +2459,8 @@ int memory_failure(unsigned long pfn, int flags)\n \t * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.\n \t */\n \tres = get_hwpoison_page(p, flags);\n-\tif (!res) {\n+\tswitch (res) {\n+\tcase 0:\n \t\tif (is_free_buddy_page(p)) {\n \t\t\tif (take_page_off_buddy(p)) {\n \t\t\t\tpage_ref_inc(p);\n@@ -2421,7 +2479,19 @@ int memory_failure(unsigned long pfn, int flags)\n \t\t\tres = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);\n \t\t}\n \t\tgoto unlock_mutex;\n-\t} else if (res \u003c 0) {\n+\tcase 1:\n+\t\t/* Got a refcount on a handlable page. */\n+\t\tbreak;\n+\tcase -ENOTRECOVERABLE:\n+\t\t/*\n+\t\t * Stable unhandlable kernel-owned page (PG_reserved,\n+\t\t * slab, page tables, large-kmalloc).\n+\t\t * No recovery possible.\n+\t\t */\n+\t\tres = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);\n+\t\tgoto unlock_mutex;\n+\tdefault:\n+\t\t/* Transient lifecycle race with the page allocator. */\n \t\tres = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);\n \t\tgoto unlock_mutex;\n \t}\ndiff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile\nindex e6df968f0971c8..ed321ae709dacf 100644\n--- a/tools/testing/selftests/mm/Makefile\n+++ b/tools/testing/selftests/mm/Makefile\n@@ -174,6 +174,10 @@ TEST_PROGS += ksft_userfaultfd.sh\n TEST_PROGS += ksft_vma_merge.sh\n TEST_PROGS += ksft_vmalloc.sh\n \n+# Destructive: every successful run panics the kernel.  Installed and\n+# kept executable, but not run from a default kselftest invocation.\n+TEST_PROGS_EXTENDED += hwpoison-panic.sh\n+\n TEST_FILES := test_vmalloc.sh\n TEST_FILES += test_hmm.sh\n TEST_FILES += va_high_addr_switch.sh\ndiff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh\nnew file mode 100755\nindex 00000000000000..d953d136733244\n--- /dev/null\n+++ b/tools/testing/selftests/mm/hwpoison-panic.sh\n@@ -0,0 +1,255 @@\n+#!/bin/bash\n+# SPDX-License-Identifier: GPL-2.0\n+#\n+# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison\n+# error on a kernel-owned page and confirming the kernel panics.\n+#\n+# Three \"kinds\" of kernel-owned page can be targeted, selectable via the\n+# first positional argument (default: rodata):\n+#\n+#   rodata  - a PG_reserved page in the kernel rodata range\n+#             (sourced from /proc/iomem \"Kernel rodata\").  Exercises\n+#             memory_failure() -\u003e get_any_page() on a PageReserved page.\n+#\n+#   slab    - a slab page found via /proc/kpageflags (KPF_SLAB).\n+#             Exercises memory_failure() -\u003e get_any_page() on a non\n+#             PG_reserved kernel-owned page.  This path is what catches\n+#             regressions where get_any_page() collapses kernel-owned\n+#             pages into a transient -EIO instead of -ENOTRECOVERABLE.\n+#\n+#   pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE).\n+#             Same path as slab, different page type.\n+#\n+# This test is DESTRUCTIVE: a successful run crashes the kernel.  It is\n+# meant to be executed inside a disposable VM (e.g. virtme-ng) with a\n+# serial console captured by the harness.  It is skipped unless the\n+# caller opts in via RUN_DESTRUCTIVE=1.\n+#\n+# Test passes externally: the kernel must panic with\n+#   \"Memory failure: \u003cpfn\u003e: unrecoverable page\"\n+# A return from the inject means no panic fired: that is a failure,\n+# unless the target PFN raced to a different page type before injection,\n+# in which case the run is inconclusive and is skipped.\n+#\n+# Author: Breno Leitao \u003cleitao@debian.org\u003e\n+\n+set -u\n+\n+# KTAP output helpers (ktap_print_msg, ktap_skip_all, ktap_exit_fail_msg, ...).\n+DIR=\"$(dirname \"$(readlink -f \"$0\")\")\"\n+# shellcheck source=../kselftest/ktap_helpers.sh\n+source \"${DIR}\"/../kselftest/ktap_helpers.sh\n+\n+sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure\n+inject_path=/sys/devices/system/memory/hard_offline_page\n+kpageflags_path=/proc/kpageflags\n+unpoison_path=/sys/kernel/debug/hwpoison/unpoison-pfn\n+\n+# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h)\n+KPF_SLAB=7\n+KPF_COMPOUND_TAIL=16\n+KPF_HWPOISON=19\n+KPF_NOPAGE=20\n+KPF_PGTABLE=26\n+KPF_RESERVED=32\n+\n+pagesize=$(getconf PAGE_SIZE)\n+\n+kind=${1:-rodata}\n+\n+if [ \"$(id -u)\" -ne 0 ]; then\n+\tktap_skip_all \"must run as root\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+if [ ! -w \"$sysctl_path\" ]; then\n+\tktap_skip_all \"$sysctl_path not present (kernel without the sysctl?)\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+if [ ! -w \"$inject_path\" ]; then\n+\tktap_skip_all \"$inject_path not present (no MEMORY_HOTPLUG?)\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+if [ \"${RUN_DESTRUCTIVE:-0}\" != \"1\" ]; then\n+\tktap_skip_all \"destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+# Pick a PFN inside the kernel image rodata region of /proc/iomem.\n+# This is preferred over a top-level \"Reserved\" entry because top-level\n+# Reserved ranges are often firmware holes that have no backing struct\n+# page; pfn_to_online_page() returns NULL on those and memory_failure()\n+# bails out with -ENXIO before reaching the panic path.\n+#\n+# \"Kernel rodata\" is reported as a sub-resource of \"System RAM\" on every\n+# major architecture, which guarantees:\n+#   - the PFN is backed by struct page (within an online memory range);\n+#   - PG_reserved is set on the page (kernel image area);\n+#   - the memory is read-only, so setting PG_hwpoison on it does not\n+#     corrupt writable kernel state if the panic somehow does not fire.\n+#\n+# /proc/iomem entries look like (indented for sub-resources):\n+#     \"  02500000-02ffffff : Kernel rodata\"\n+pick_rodata_phys_addr() {\n+\tawk -v pagesize=\"$(getconf PAGE_SIZE)\" '\n+\t# Convert a hex string to a number without relying on the gawk-only\n+\t# strtonum().  mawk lacks it and would otherwise spuriously skip\n+\t# this test on distros that ship mawk as /usr/bin/awk.\n+\tfunction hex2num(s,   n, i, c, v) {\n+\t\tn = 0\n+\t\tfor (i = 1; i \u003c= length(s); i++) {\n+\t\t\tc = tolower(substr(s, i, 1))\n+\t\t\tv = index(\"0123456789abcdef\", c) - 1\n+\t\t\tif (v \u003c 0)\n+\t\t\t\treturn -1\n+\t\t\tn = n * 16 + v\n+\t\t}\n+\t\treturn n\n+\t}\n+\t/: Kernel rodata[[:space:]]*$/ {\n+\t\tsub(/^[[:space:]]+/, \"\")\n+\t\tn = split($0, a, /[- ]/)\n+\t\tstart = hex2num(a[1])\n+\t\tend   = hex2num(a[2])\n+\t\tif (end \u003c= start)\n+\t\t\tnext\n+\t\t# Page-align upward and emit the first byte of that page.\n+\t\tpfn = int((start + pagesize - 1) / pagesize)\n+\t\tprintf \"0x%x\\n\", pfn * pagesize\n+\t\texit 0\n+\t}\n+\t' /proc/iomem\n+}\n+\n+# Walk /proc/kpageflags and return the phys addr of the first PFN that\n+# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL\n+# all clear (so we attack a real, non-tail, not-already-poisoned page).\n+#\n+# We skip the first 16 MiB of PFNs to step past low-memory special\n+# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not\n+# exhibit the slab/pgtable type we are looking for.\n+pick_kpageflags_phys_addr() {\n+\tlocal want_bit=$1\n+\tlocal pagesize skip_pfn\n+\n+\t[ -r \"$kpageflags_path\" ] || return\n+\n+\tpagesize=$(getconf PAGE_SIZE)\n+\tskip_pfn=$(((16 * 1024 * 1024) / pagesize))\n+\n+\tod -An -tx8 -v -w8 -j \"$((skip_pfn * 8))\" \"$kpageflags_path\" 2\u003e/dev/null | \\\n+\tawk -v want_bit=\"$want_bit\" \\\n+\t    -v hwp_bit=\"$KPF_HWPOISON\" \\\n+\t    -v nopage_bit=\"$KPF_NOPAGE\" \\\n+\t    -v tail_bit=\"$KPF_COMPOUND_TAIL\" \\\n+\t    -v base_pfn=\"$skip_pfn\" \\\n+\t    -v pagesize=\"$pagesize\" '\n+\t# Test whether bit \"b\" is set in the 16-hex-digit value \"hex\".\n+\t# Done with substring + per-digit lookup so we never rely on awk\n+\t# bitwise operators (mawk lacks them), 64-bit FP precision or the\n+\t# gawk-only strtonum().\n+\tfunction bit_set(hex, b,    di, bi, c, v) {\n+\t\tdi = int(b / 4)\n+\t\tbi = b - di * 4\n+\t\tc = substr(hex, length(hex) - di, 1)\n+\t\tv = index(\"0123456789abcdef\", tolower(c)) - 1\n+\t\tif (bi == 0) return (v % 2) == 1\n+\t\tif (bi == 1) return int(v / 2) % 2 == 1\n+\t\tif (bi == 2) return int(v / 4) % 2 == 1\n+\t\treturn int(v / 8) % 2 == 1\n+\t}\n+\t{\n+\t\tgsub(/^[[:space:]]+/, \"\")\n+\t\th = $1\n+\t\tif (bit_set(h, want_bit) \u0026\u0026\n+\t\t    !bit_set(h, hwp_bit) \u0026\u0026\n+\t\t    !bit_set(h, nopage_bit) \u0026\u0026\n+\t\t    !bit_set(h, tail_bit)) {\n+\t\t\tpfn = base_pfn + NR - 1\n+\t\t\tprintf \"0x%x\\n\", pfn * pagesize\n+\t\t\texit 0\n+\t\t}\n+\t}\n+\t'\n+}\n+\n+# Return 0 if /proc/kpageflags bit $2 is set for PFN $1, 1 if it is\n+# clear, or 2 if the word cannot be read.  Used to re-confirm the target\n+# page type after a non-panicking inject.\n+kpageflags_bit_set() {\n+\tlocal word\n+\n+\tword=$(od -An -tx8 -v -j \"$(($1 * 8))\" -N 8 \"$kpageflags_path\" 2\u003e/dev/null | tr -d '[:space:]')\n+\t[ -n \"$word\" ] || return 2\n+\t(( (16#$word \u003e\u003e $2) \u0026 1 ))\n+}\n+\n+# Best-effort: drop the PG_hwpoison marker set by the inject so a failed\n+# run does not leave a poisoned page behind.  hard_offline_page() injects\n+# with MF_SW_SIMULATED, so the page stays unpoisonable through the\n+# hwpoison debugfs interface (needs CONFIG_HWPOISON_INJECT + debugfs).\n+try_unpoison() {\n+\t[ -w \"$unpoison_path\" ] || return 0\n+\techo \"$1\" \u003e \"$unpoison_path\" 2\u003e/dev/null || true\n+}\n+\n+case \"$kind\" in\n+rodata)\n+\tphys_addr=$(pick_rodata_phys_addr)\n+\trecheck_bit=$KPF_RESERVED\n+\tmissing_msg='no \"Kernel rodata\" entry in /proc/iomem'\n+\t;;\n+slab)\n+\tphys_addr=$(pick_kpageflags_phys_addr \"$KPF_SLAB\")\n+\trecheck_bit=$KPF_SLAB\n+\tmissing_msg=\"no usable slab PFN found in $kpageflags_path\"\n+\t;;\n+pgtable)\n+\tphys_addr=$(pick_kpageflags_phys_addr \"$KPF_PGTABLE\")\n+\trecheck_bit=$KPF_PGTABLE\n+\tmissing_msg=\"no usable page-table PFN found in $kpageflags_path\"\n+\t;;\n+*)\n+\tktap_exit_fail_msg \"unknown kind '$kind' (expected: rodata|slab|pgtable)\"\n+\t;;\n+esac\n+\n+if [ -z \"$phys_addr\" ]; then\n+\tktap_skip_all \"$missing_msg\"\n+\texit \"$KSFT_SKIP\"\n+fi\n+\n+ktap_print_msg \"enabling $sysctl_path\"\n+prior=$(cat \"$sysctl_path\")\n+echo 1 \u003e \"$sysctl_path\" || ktap_exit_fail_msg \"failed to enable sysctl\"\n+\n+pfn=$((phys_addr / pagesize))\n+ktap_print_msg \"injecting hwpoison at phys 0x$(printf '%x' \"$phys_addr\") (pfn 0x$(printf '%x' \"$pfn\"), kind=$kind)\"\n+ktap_print_msg \"expecting kernel panic: 'Memory failure: \u003cpfn\u003e: unrecoverable page'\"\n+\n+# A successful run never returns from the inject -- it panics the kernel.\n+# Reaching the code below therefore means no panic fired.  Note whether\n+# the write itself succeeded, then put the machine back: restore the\n+# sysctl and best-effort unpoison the page we just marked.\n+if echo \"$phys_addr\" \u003e \"$inject_path\"; then\n+\tverdict=\"inject returned without panic; sysctl ineffective\"\n+else\n+\tverdict=\"inject failed before reaching the panic path\"\n+fi\n+\n+echo \"$prior\" \u003e \"$sysctl_path\"\n+try_unpoison \"$pfn\"\n+\n+# The page type can change between selection and injection (e.g. a slab\n+# or page-table page is freed and reused).  Only treat a missing panic as\n+# a failure if the target PFN is still the kernel-owned type we aimed at;\n+# if it raced to another type the run is inconclusive, so skip instead.\n+kpageflags_bit_set \"$pfn\" \"$recheck_bit\"\n+case $? in\n+0)\tktap_exit_fail_msg \"$verdict (page still $kind)\" ;;\n+1)\tktap_skip_all \"target PFN no longer $kind; raced before inject, inconclusive\"\n+\texit \"$KSFT_SKIP\" ;;\n+*)\tktap_exit_fail_msg \"$verdict (could not reconfirm page type via $kpageflags_path)\" ;;\n+esac\n"
}

gemini-3.1-pro-preview

{
  "EnableConfigs": [
    "MEMORY_FAILURE"
  ],
  "FocusSymbols": [
    "get_any_page",
    "panic_on_unrecoverable_mf"
  ],
  "Reasoning": "The patch modifies the memory failure handler in core MM to short-circuit recovery for kernel-owned pages and optionally panic. This error-handling path is reachable in virtualized environments via madvise(MADV_HWPOISON) on kernel-owned pages mapped in user space (like the vDSO) or via sysfs error injection.",
  "WorthFuzzing": true
}

You are an expert Linux kernel maintainer.
Your job is to review a provided patch series and determine
if it makes functional changes to the kernel that should be fuzzed.

IMPORTANT: The changes have ALREADY been applied and committed as the HEAD commit in
your workspace. Do NOT rely on your internal knowledge of the kernel. You must actively
use your code access tools to examine the actual source code and confirm any assumptions.

Return WorthFuzzing=false if the patch only contains:
- Modifications to Documentation/, Kconfig files, or code comments.
- Purely decorative changes, such as logging (e.g., pr_err, printk) or tracepoints.
- Changes to numeric constants or macros that do not functionally alter execution flow.
- Code paths that are impossible to reach in virtualized environments like GCE or QEMU,
even when utilizing software-emulated hardware (e.g., usb gadget, mac80211_hwsim).

If it modifies reachable core kernel logic, drivers, or architectures, use your code search
tools to verify the code can be executed, then return WorthFuzzing=true.

When returning WorthFuzzing=true, you MUST ALSO:
1. Extract any specific kernel functions that should be heavily fuzzed into FocusSymbols.
   Avoid listing generic hot-path functions to prevent skewed test distributions.
2. Identify any specific CONFIG_ options required to properly test this new/modified feature.
   Go and look into the Kconfig files and check for ifdefs around the code, do not make assumptions.
   Do not list too generic configs (we already have them enabled). Only list those that
   specifically cover the modified code. List them in the EnableConfigs output array,
   and DO NOT add a 'CONFIG_' prefix (e.g., return "NET_IPV4" instead of "CONFIG_NET_IPV4").
Prefer calling several tools at the same time to save round-trips.


Use set-results tool to provide results of the analysis.
It must be called exactly once before the final reply.
Ignore results of this tool.

For your convenience, here is the diff of the changes:
commit cd6649a5d4f7cd49ae8556e2d53436c6b6caba06
Author: syz-cluster <triage@syzkaller.com>
Date:   Tue Jun 30 18:54:07 2026 +0000

    syz-cluster: applied patch under review

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index b9b0c218bfb440..22cc54cac3b21d 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:
 - page-cluster
 - page_lock_unfairness
 - panic_on_oom
+- panic_on_unrecoverable_memory_failure
 - percpu_pagelist_high_fraction
 - stat_interval
 - stat_refresh
@@ -925,6 +926,85 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
 why oom happens. You can get snapshot.
 
 
+panic_on_unrecoverable_memory_failure
+======================================
+
+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page
+that cannot be recovered by the memory failure handler, the default
+behaviour is to ignore the error and continue operation.  This is
+dangerous because the corrupted data remains accessible to the kernel,
+risking silent data corruption or a delayed crash when the poisoned
+memory is next accessed.
+
+When enabled, this sysctl triggers a panic on memory failure events
+hitting kernel-owned pages that the handler cannot recover:
+``PageReserved`` (firmware reservations, kernel image, vDSO, zero
+page, and similar memblock-reserved regions), ``PageSlab``,
+``PageTable``, and ``PageLargeKmalloc``.  These are owned by the
+kernel and the memory failure handler cannot reliably evict their
+contents.
+
+Other unrecoverable kernel-owned populations (vmalloc allocations,
+kernel stack pages, ...) are not currently covered because the
+handler has no page-type signal that distinguishes them from a
+userspace folio temporarily off the LRU during migration or
+compaction.  Such pages still go through the standard
+MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a
+delayed crash on the next access remains possible.  Coverage may
+grow as the handler gains stronger kernel-ownership signals.
+
+Recoverable failure paths are also intentionally left out: in-flight
+buddy allocations and other transient races with the page allocator
+can reach the same diagnostic, and panicking on them would risk
+killing the box for a page destined for userspace where the standard
+SIGBUS recovery path applies.  Pages whose state could not be
+classified at all are not covered either, since an unknown state is
+not a sound basis for a panic decision.
+
+For many environments it is preferable to panic immediately with a clean
+crash dump that captures the original error context, rather than to
+continue and face a random crash later whose cause is difficult to
+diagnose.
+
+Use cases
+---------
+
+This option is most useful in environments where unattributed crashes
+are expensive to debug or where data integrity must take precedence
+over availability:
+
+* Large fleets, where multi-bit ECC errors on kernel pages are observed
+  regularly and post-mortem analysis of an unrelated downstream crash
+  (often seconds to minutes after the original error) consumes
+  significant engineering effort.
+
+* Systems configured with kdump, where panicking at the moment of the
+  hardware error produces a vmcore that still contains the faulting
+  address, the affected page state, and the originating MCE/GHES
+  record — context that is typically lost by the time a delayed crash
+  occurs.
+
+* High-availability clusters that rely on fast, deterministic node
+  failure for failover, and prefer an immediate panic over silent data
+  corruption propagating to replicas or persistent storage.
+
+* Kernel and platform developers reproducing hwpoison issues with
+  tools such as ``mce-inject`` or error-injection debugfs interfaces,
+  where panicking on the unrecoverable path makes regressions
+  immediately visible instead of surfacing as later, unrelated
+  failures.
+
+= =====================================================================
+0 Try to continue operation (default).
+1 Panic immediately.  If the ``panic`` sysctl is also non-zero then the
+  machine will be rebooted.
+= =====================================================================
+
+Example::
+
+     echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+
 percpu_pagelist_high_fraction
 =============================
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 51508a55c4055e..e097fc8262cf83 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;
 
 static int sysctl_enable_soft_offline __read_mostly = 1;
 
+static int sysctl_panic_on_unrecoverable_mf __read_mostly;
+
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
+	},
+	{
+		.procname	= "panic_on_unrecoverable_memory_failure",
+		.data		= &sysctl_panic_on_unrecoverable_mf,
+		.maxlen		= sizeof(sysctl_panic_on_unrecoverable_mf),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
 	}
 };
 
@@ -980,17 +991,6 @@ static bool has_extra_refcount(struct page_state *ps, struct page *p,
 	return false;
 }
 
-/*
- * Error hit kernel page.
- * Do nothing, try to be lucky and not touch this instead. For a few cases we
- * could be more sophisticated.
- */
-static int me_kernel(struct page_state *ps, struct page *p)
-{
-	unlock_page(p);
-	return MF_IGNORED;
-}
-
 /*
  * Page in unknown state. Do nothing.
  * This is a catch-all in case we fail to make sense of the page state.
@@ -1199,10 +1199,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 #define mlock		(1UL << PG_mlocked)
 #define lru		(1UL << PG_lru)
 #define head		(1UL << PG_head)
-#define reserved	(1UL << PG_reserved)
 
 static struct page_state error_states[] = {
-	{ reserved,	reserved,	MF_MSG_KERNEL,	me_kernel },
 	/*
 	 * free pages are specially detected outside this table:
 	 * PG_buddy pages only make a small fraction of all free pages.
@@ -1234,7 +1232,6 @@ static struct page_state error_states[] = {
 #undef mlock
 #undef lru
 #undef head
-#undef reserved
 
 static void update_per_node_mf_stats(unsigned long pfn,
 				     enum mf_result result)
@@ -1269,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long pfn,
 	++mf_stats->total;
 }
 
+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
+				      enum mf_result result)
+{
+	if (!sysctl_panic_on_unrecoverable_mf)
+		return false;
+
+	return type == MF_MSG_KERNEL && result == MF_IGNORED;
+}
+
 /*
  * "Dirty/Clean" indication is not 100% accurate due to the possibility of
  * setting PG_dirty outside page lock. See also comment above set_page_dirty().
@@ -1286,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
 	pr_err("%#lx: recovery action for %s: %s\n",
 		pfn, action_page_types[type], action_name[result]);
 
+	if (panic_on_unrecoverable_mf(type, result))
+		panic("Memory failure: %#lx: unrecoverable page", pfn);
+
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 
@@ -1339,6 +1348,38 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
 	return PageLRU(page) || is_free_buddy_page(page);
 }
 
+/*
+ * Positive identification of pages the hwpoison handler cannot recover:
+ * pages owned by kernel internals with no userspace mapping to unmap, no
+ * file mapping to invalidate, and no migration target.
+ */
+static inline bool is_kernel_owned_page(struct page *page)
+{
+	struct page *head;
+	bool kernel_owned;
+
+	/* PG_reserved is a per-page flag, never set on a compound page. */
+	if (PageReserved(page))
+		return true;
+
+	/*
+	 * Page-type bits live only on the head page, so resolve any tail
+	 * first.  The check takes no refcount; recheck the head afterwards
+	 * so a concurrent split or compound free cannot leave us trusting
+	 * a stale view.  A residual free->alloc->free cannot be closed here
+	 * (frozen slab and large-kmalloc pages cannot be pinned), but is
+	 * harmless: where a wrong verdict could panic, memory_failure() has
+	 * already set PageHWPoison, which bars the page from the allocator.
+	 */
+retry:
+	head = compound_head(page);
+	kernel_owned = PageSlab(head) || PageTable(head) ||
+		       PageLargeKmalloc(head);
+	if (head != compound_head(page))
+		goto retry;
+	return kernel_owned;
+}
+
 static int __get_hwpoison_page(struct page *page, unsigned long flags)
 {
 	struct folio *folio = page_folio(page);
@@ -1385,6 +1426,19 @@ static int get_any_page(struct page *p, unsigned long flags)
 	if (flags & MF_COUNT_INCREASED)
 		count_increased = true;
 
+	/*
+	 * Page types we know are kernel-owned and cannot be recovered.
+	 * Short-circuit before the shake_page() / retry loop, which
+	 * cannot turn any of these into something HWPoisonHandlable().
+	 * Drop the caller's reference if MF_COUNT_INCREASED took one.
+	 */
+	if (is_kernel_owned_page(p)) {
+		if (count_increased)
+			put_page(p);
+		ret = -ENOTRECOVERABLE;
+		goto out;
+	}
+
 try_again:
 	if (!count_increased) {
 		ret = __get_hwpoison_page(p, flags);
@@ -1432,7 +1486,7 @@ static int get_any_page(struct page *p, unsigned long flags)
 		ret = -EIO;
 	}
 out:
-	if (ret == -EIO)
+	if (ret == -EIO || ret == -ENOTRECOVERABLE)
 		pr_err("%#lx: unhandlable page.\n", page_to_pfn(p));
 
 	return ret;
@@ -1489,7 +1543,10 @@ static int __get_unpoison_page(struct page *page)
  *         -EIO for pages on which we can not handle memory errors,
  *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
  *         operations like allocation and free,
- *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
+ *         -EHWPOISON when the page is hwpoisoned and taken off from buddy,
+ *         -ENOTRECOVERABLE for kernel-owned pages identified by
+ *         is_kernel_owned_page() (PG_reserved, slab,
+ *         page-table, large-kmalloc) that the handler cannot recover.
  */
 static int get_hwpoison_page(struct page *p, unsigned long flags)
 {
@@ -2402,7 +2459,8 @@ int memory_failure(unsigned long pfn, int flags)
 	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 	 */
 	res = get_hwpoison_page(p, flags);
-	if (!res) {
+	switch (res) {
+	case 0:
 		if (is_free_buddy_page(p)) {
 			if (take_page_off_buddy(p)) {
 				page_ref_inc(p);
@@ -2421,7 +2479,19 @@ int memory_failure(unsigned long pfn, int flags)
 			res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
 		}
 		goto unlock_mutex;
-	} else if (res < 0) {
+	case 1:
+		/* Got a refcount on a handlable page. */
+		break;
+	case -ENOTRECOVERABLE:
+		/*
+		 * Stable unhandlable kernel-owned page (PG_reserved,
+		 * slab, page tables, large-kmalloc).
+		 * No recovery possible.
+		 */
+		res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
+		goto unlock_mutex;
+	default:
+		/* Transient lifecycle race with the page allocator. */
 		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
 		goto unlock_mutex;
 	}
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index e6df968f0971c8..ed321ae709dacf 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -174,6 +174,10 @@ TEST_PROGS += ksft_userfaultfd.sh
 TEST_PROGS += ksft_vma_merge.sh
 TEST_PROGS += ksft_vmalloc.sh
 
+# Destructive: every successful run panics the kernel.  Installed and
+# kept executable, but not run from a default kselftest invocation.
+TEST_PROGS_EXTENDED += hwpoison-panic.sh
+
 TEST_FILES := test_vmalloc.sh
 TEST_FILES += test_hmm.sh
 TEST_FILES += va_high_addr_switch.sh
diff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh
new file mode 100755
index 00000000000000..d953d136733244
--- /dev/null
+++ b/tools/testing/selftests/mm/hwpoison-panic.sh
@@ -0,0 +1,255 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison
+# error on a kernel-owned page and confirming the kernel panics.
+#
+# Three "kinds" of kernel-owned page can be targeted, selectable via the
+# first positional argument (default: rodata):
+#
+#   rodata  - a PG_reserved page in the kernel rodata range
+#             (sourced from /proc/iomem "Kernel rodata").  Exercises
+#             memory_failure() -> get_any_page() on a PageReserved page.
+#
+#   slab    - a slab page found via /proc/kpageflags (KPF_SLAB).
+#             Exercises memory_failure() -> get_any_page() on a non
+#             PG_reserved kernel-owned page.  This path is what catches
+#             regressions where get_any_page() collapses kernel-owned
+#             pages into a transient -EIO instead of -ENOTRECOVERABLE.
+#
+#   pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE).
+#             Same path as slab, different page type.
+#
+# This test is DESTRUCTIVE: a successful run crashes the kernel.  It is
+# meant to be executed inside a disposable VM (e.g. virtme-ng) with a
+# serial console captured by the harness.  It is skipped unless the
+# caller opts in via RUN_DESTRUCTIVE=1.
+#
+# Test passes externally: the kernel must panic with
+#   "Memory failure: <pfn>: unrecoverable page"
+# A return from the inject means no panic fired: that is a failure,
+# unless the target PFN raced to a different page type before injection,
+# in which case the run is inconclusive and is skipped.
+#
+# Author: Breno Leitao <leitao@debian.org>
+
+set -u
+
+# KTAP output helpers (ktap_print_msg, ktap_skip_all, ktap_exit_fail_msg, ...).
+DIR="$(dirname "$(readlink -f "$0")")"
+# shellcheck source=../kselftest/ktap_helpers.sh
+source "${DIR}"/../kselftest/ktap_helpers.sh
+
+sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure
+inject_path=/sys/devices/system/memory/hard_offline_page
+kpageflags_path=/proc/kpageflags
+unpoison_path=/sys/kernel/debug/hwpoison/unpoison-pfn
+
+# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h)
+KPF_SLAB=7
+KPF_COMPOUND_TAIL=16
+KPF_HWPOISON=19
+KPF_NOPAGE=20
+KPF_PGTABLE=26
+KPF_RESERVED=32
+
+pagesize=$(getconf PAGE_SIZE)
+
+kind=${1:-rodata}
+
+if [ "$(id -u)" -ne 0 ]; then
+	ktap_skip_all "must run as root"
+	exit "$KSFT_SKIP"
+fi
+
+if [ ! -w "$sysctl_path" ]; then
+	ktap_skip_all "$sysctl_path not present (kernel without the sysctl?)"
+	exit "$KSFT_SKIP"
+fi
+
+if [ ! -w "$inject_path" ]; then
+	ktap_skip_all "$inject_path not present (no MEMORY_HOTPLUG?)"
+	exit "$KSFT_SKIP"
+fi
+
+if [ "${RUN_DESTRUCTIVE:-0}" != "1" ]; then
+	ktap_skip_all "destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM"
+	exit "$KSFT_SKIP"
+fi
+
+# Pick a PFN inside the kernel image rodata region of /proc/iomem.
+# This is preferred over a top-level "Reserved" entry because top-level
+# Reserved ranges are often firmware holes that have no backing struct
+# page; pfn_to_online_page() returns NULL on those and memory_failure()
+# bails out with -ENXIO before reaching the panic path.
+#
+# "Kernel rodata" is reported as a sub-resource of "System RAM" on every
+# major architecture, which guarantees:
+#   - the PFN is backed by struct page (within an online memory range);
+#   - PG_reserved is set on the page (kernel image area);
+#   - the memory is read-only, so setting PG_hwpoison on it does not
+#     corrupt writable kernel state if the panic somehow does not fire.
+#
+# /proc/iomem entries look like (indented for sub-resources):
+#     "  02500000-02ffffff : Kernel rodata"
+pick_rodata_phys_addr() {
+	awk -v pagesize="$(getconf PAGE_SIZE)" '
+	# Convert a hex string to a number without relying on the gawk-only
+	# strtonum().  mawk lacks it and would otherwise spuriously skip
+	# this test on distros that ship mawk as /usr/bin/awk.
+	function hex2num(s,   n, i, c, v) {
+		n = 0
+		for (i = 1; i <= length(s); i++) {
+			c = tolower(substr(s, i, 1))
+			v = index("0123456789abcdef", c) - 1
+			if (v < 0)
+				return -1
+			n = n * 16 + v
+		}
+		return n
+	}
+	/: Kernel rodata[[:space:]]*$/ {
+		sub(/^[[:space:]]+/, "")
+		n = split($0, a, /[- ]/)
+		start = hex2num(a[1])
+		end   = hex2num(a[2])
+		if (end <= start)
+			next
+		# Page-align upward and emit the first byte of that page.
+		pfn = int((start + pagesize - 1) / pagesize)
+		printf "0x%x\n", pfn * pagesize
+		exit 0
+	}
+	' /proc/iomem
+}
+
+# Walk /proc/kpageflags and return the phys addr of the first PFN that
+# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL
+# all clear (so we attack a real, non-tail, not-already-poisoned page).
+#
+# We skip the first 16 MiB of PFNs to step past low-memory special
+# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not
+# exhibit the slab/pgtable type we are looking for.
+pick_kpageflags_phys_addr() {
+	local want_bit=$1
+	local pagesize skip_pfn
+
+	[ -r "$kpageflags_path" ] || return
+
+	pagesize=$(getconf PAGE_SIZE)
+	skip_pfn=$(((16 * 1024 * 1024) / pagesize))
+
+	od -An -tx8 -v -w8 -j "$((skip_pfn * 8))" "$kpageflags_path" 2>/dev/null | \
+	awk -v want_bit="$want_bit" \
+	    -v hwp_bit="$KPF_HWPOISON" \
+	    -v nopage_bit="$KPF_NOPAGE" \
+	    -v tail_bit="$KPF_COMPOUND_TAIL" \
+	    -v base_pfn="$skip_pfn" \
+	    -v pagesize="$pagesize" '
+	# Test whether bit "b" is set in the 16-hex-digit value "hex".
+	# Done with substring + per-digit lookup so we never rely on awk
+	# bitwise operators (mawk lacks them), 64-bit FP precision or the
+	# gawk-only strtonum().
+	function bit_set(hex, b,    di, bi, c, v) {
+		di = int(b / 4)
+		bi = b - di * 4
+		c = substr(hex, length(hex) - di, 1)
+		v = index("0123456789abcdef", tolower(c)) - 1
+		if (bi == 0) return (v % 2) == 1
+		if (bi == 1) return int(v / 2) % 2 == 1
+		if (bi == 2) return int(v / 4) % 2 == 1
+		return int(v / 8) % 2 == 1
+	}
+	{
+		gsub(/^[[:space:]]+/, "")
+		h = $1
+		if (bit_set(h, want_bit) &&
+		    !bit_set(h, hwp_bit) &&
+		    !bit_set(h, nopage_bit) &&
+		    !bit_set(h, tail_bit)) {
+			pfn = base_pfn + NR - 1
+			printf "0x%x\n", pfn * pagesize
+			exit 0
+		}
+	}
+	'
+}
+
+# Return 0 if /proc/kpageflags bit $2 is set for PFN $1, 1 if it is
+# clear, or 2 if the word cannot be read.  Used to re-confirm the target
+# page type after a non-panicking inject.
+kpageflags_bit_set() {
+	local word
+
+	word=$(od -An -tx8 -v -j "$(($1 * 8))" -N 8 "$kpageflags_path" 2>/dev/null | tr -d '[:space:]')
+	[ -n "$word" ] || return 2
+	(( (16#$word >> $2) & 1 ))
+}
+
+# Best-effort: drop the PG_hwpoison marker set by the inject so a failed
+# run does not leave a poisoned page behind.  hard_offline_page() injects
+# with MF_SW_SIMULATED, so the page stays unpoisonable through the
+# hwpoison debugfs interface (needs CONFIG_HWPOISON_INJECT + debugfs).
+try_unpoison() {
+	[ -w "$unpoison_path" ] || return 0
+	echo "$1" > "$unpoison_path" 2>/dev/null || true
+}
+
+case "$kind" in
+rodata)
+	phys_addr=$(pick_rodata_phys_addr)
+	recheck_bit=$KPF_RESERVED
+	missing_msg='no "Kernel rodata" entry in /proc/iomem'
+	;;
+slab)
+	phys_addr=$(pick_kpageflags_phys_addr "$KPF_SLAB")
+	recheck_bit=$KPF_SLAB
+	missing_msg="no usable slab PFN found in $kpageflags_path"
+	;;
+pgtable)
+	phys_addr=$(pick_kpageflags_phys_addr "$KPF_PGTABLE")
+	recheck_bit=$KPF_PGTABLE
+	missing_msg="no usable page-table PFN found in $kpageflags_path"
+	;;
+*)
+	ktap_exit_fail_msg "unknown kind '$kind' (expected: rodata|slab|pgtable)"
+	;;
+esac
+
+if [ -z "$phys_addr" ]; then
+	ktap_skip_all "$missing_msg"
+	exit "$KSFT_SKIP"
+fi
+
+ktap_print_msg "enabling $sysctl_path"
+prior=$(cat "$sysctl_path")
+echo 1 > "$sysctl_path" || ktap_exit_fail_msg "failed to enable sysctl"
+
+pfn=$((phys_addr / pagesize))
+ktap_print_msg "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (pfn 0x$(printf '%x' "$pfn"), kind=$kind)"
+ktap_print_msg "expecting kernel panic: 'Memory failure: <pfn>: unrecoverable page'"
+
+# A successful run never returns from the inject -- it panics the kernel.
+# Reaching the code below therefore means no panic fired.  Note whether
+# the write itself succeeded, then put the machine back: restore the
+# sysctl and best-effort unpoison the page we just marked.
+if echo "$phys_addr" > "$inject_path"; then
+	verdict="inject returned without panic; sysctl ineffective"
+else
+	verdict="inject failed before reaching the panic path"
+fi
+
+echo "$prior" > "$sysctl_path"
+try_unpoison "$pfn"
+
+# The page type can change between selection and injection (e.g. a slab
+# or page-table page is freed and reused).  Only treat a missing panic as
+# a failure if the target PFN is still the kernel-owned type we aimed at;
+# if it raced to another type the run is inconclusive, so skip instead.
+kpageflags_bit_set "$pfn" "$recheck_bit"
+case $? in
+0)	ktap_exit_fail_msg "$verdict (page still $kind)" ;;
+1)	ktap_skip_all "target PFN no longer $kind; raced before inject, inconclusive"
+	exit "$KSFT_SKIP" ;;
+*)	ktap_exit_fail_msg "$verdict (could not reconfirm page type via $kpageflags_path)" ;;
+esac

gemini-3.1-pro-preview

							input: 9019
							output: 20
							thoughts: 1572

**Analyzing the New Kernel Memory Failure Panic Patch**

Okay, so we've got a new patch introducing a `sysctl` called `panic_on_unrecoverable_memory_failure`, which is `sysctl_panic_on_unrecoverable_mf` in code. The core change is in `mm/memory-failure.c`, and it's designed to panic the kernel when an unrecoverable memory failure occurs specifically on a kernel page. The patch adds `is_kernel_owned_page()` to identify kernel pages; it checks `PageReserved`, `PageSlab`, `PageTable`, and `PageLargeKmalloc` flags. If it finds a kernel-owned page, `get_any_page()` returns `-ENOTRECOVERABLE`. Then, `memory_failure()` handles this and, if the sysctl is set, it panics.

My immediate question is, how reachable is this in a virtualized environment, which is where I usually work? Turns out, memory failures can be injected via `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`. There's even a test script, `tools/testing/selftests/mm/hwpoison-panic.sh`, that uses `hard_offline_page`. Ah, but wait, `madvise(MADV_HWPOISON)` only works on *user* pages. So, to hit kernel pages, we're stuck with `/sys/devices/system/memory/hard_offline_page` or potentially an hwpoison injection module. Good to know `hard_offline_page` is accessible if `CONFIG_MEMORY_FAILURE` is enabled, which is the key.

Now, can we actually reach `memory_failure()` on kernel pages? Yes, because `hard_offline_page` is a write to sysfs, which ends up calling `memory_failure()`. Since syzkaller usually has `CONFIG_MEMORY_FAILURE` enabled, this is a distinct possibility. Does syzkaller have access to these sysfs or debugfs paths? Typically, syzkaller doesn't fuzz debugfs or sysfs unless we've explicitly configured it to. But it does fuzz `madvise(MADV_HWPOISON)`. However, again, `madvise` only handles user pages, though I suppose it does call `memory_failure()` on the PFN of the user page. Is it possible for a user page to be marked as kernel-owned ( `PageReserved`, `PageSlab`, `PageTable`, `PageLargeKmalloc` )? Usually not.

But this patch is specifically about panicking on *unrecoverable* memory failures. So, is it even worth fuzzing? If it's just a panic on an error injection path, maybe not. However, `memory_failure()` can be triggered by actual hardware errors (MCEs) or software injection. In a VM, hardware errors are not the norm, so software injection is the way to go. If syzkaller *can* inject memory failures into kernel pages, the result will simply be a kernel panic, which is *exactly* what this patch intends.

Wait, isn't a kernel panic a crash? Syzkaller reports panics as crashes. So, if syzkaller triggers this panic, it'll be reported as a bug! But this panic is actually *intentional* when the sysctl is enabled. Is that sysctl enabled by default? Checking… `static int sysctl_panic_on_unrecoverable_mf __read_mostly;` Aha! It's zero by default. So, by default, it won't panic. If syzkaller doesn't enable the sysctl, no panic. If it *does* enable it, then we get the panic, which syzkaller will consider a bug, but it's really the designed behavior.

Let's not forget, the patch also changes `get_any_page()` to return `-ENOTRECOVERABLE` instead of `-EIO` for kernel-owned pages. Before, if `get_any_page()` got `-EIO` it would print an error message. Now with the patch, the message is `recovery action for kernel page: Ignored`. And if the sysctl is enabled, it panics.

Is there any complex logic to fuzz here? The `is_kernel_owned_page()` check looks simple enough; it's a straightforward flag check. Really, this is mostly about adding a sysctl for kernel page memory failures. Since those failures are typically only triggered by hardware errors or root-only sysfs/debugfs injection, and the result is a controlled panic or a different printk, it seems there's not much functionally exposed to regular users or any complex state machines to

{
  "Expression": "MADV_HWPOISON"
}

{
  "Output": "Documentation/mm/hwpoison.rst=106=Testing\n--\nDocumentation/mm/hwpoison.rst-108-\nDocumentation/mm/hwpoison.rst:109:* madvise(MADV_HWPOISON, ....) (as root) - Poison a page in the\nDocumentation/mm/hwpoison.rst-110-  process for testing\n--\nDocumentation/translations/zh_CN/mm/hwpoison.rst=96=PR_MCE_KILL_GET\n--\nDocumentation/translations/zh_CN/mm/hwpoison.rst-101-\nDocumentation/translations/zh_CN/mm/hwpoison.rst:102:* madvise(MADV_HWPOISON, ....) (as root) - 在测试过程中Poison一个页面\nDocumentation/translations/zh_CN/mm/hwpoison.rst-103-\n--\narch/mips/include/uapi/asm/mman.h-85-#define MADV_UNMERGEABLE 13\t\t/* KSM may not merge identical pages */\narch/mips/include/uapi/asm/mman.h:86:#define MADV_HWPOISON\t 100\t\t/* poison a page for testing */\narch/mips/include/uapi/asm/mman.h-87-\n--\narch/parisc/include/uapi/asm/mman.h-74-\narch/parisc/include/uapi/asm/mman.h:75:#define MADV_HWPOISON     100\t\t/* poison a page for testing */\narch/parisc/include/uapi/asm/mman.h-76-#define MADV_SOFT_OFFLINE 101\t\t/* soft offline page for testing */\n--\ninclude/uapi/asm-generic/mman-common.h-55-#define MADV_DOFORK\t11\t\t/* do inherit across fork */\ninclude/uapi/asm-generic/mman-common.h:56:#define MADV_HWPOISON\t100\t\t/* poison a page for testing */\ninclude/uapi/asm-generic/mman-common.h-57-#define MADV_SOFT_OFFLINE 101\t\t/* soft offline page for testing */\n--\nmm/madvise.c=1492=static bool is_memory_failure(struct madvise_behavior *madv_behavior)\n--\nmm/madvise.c-1494-\tswitch (madv_behavior-\u003ebehavior) {\nmm/madvise.c:1495:\tcase MADV_HWPOISON:\nmm/madvise.c-1496-\tcase MADV_SOFT_OFFLINE:\n--\nmm/madvise.c=1518=madvise_behavior_valid(int behavior)\n--\nmm/madvise.c-1551-\tcase MADV_SOFT_OFFLINE:\nmm/madvise.c:1552:\tcase MADV_HWPOISON:\nmm/madvise.c-1553-#endif\n--\nmm/madvise.c=1887=static int madvise_do_behavior(unsigned long start, size_t len_in,\n--\nmm/madvise.c-1943- *  MADV_KEEPONFORK - undo the effect of MADV_WIPEONFORK\nmm/madvise.c:1944: *  MADV_HWPOISON - trigger memory error handler as if the given memory range\nmm/madvise.c-1945- *\t\twere corrupted by unrecoverable hardware memory failure.\n--\ntools/arch/alpha/include/uapi/asm/mman.h-37-#define PROT_WRITE\t0x2\ntools/arch/alpha/include/uapi/asm/mman.h:38:/* MADV_HWPOISON is undefined on alpha, fix it for perf */\ntools/arch/alpha/include/uapi/asm/mman.h:39:#define MADV_HWPOISON\t100\ntools/arch/alpha/include/uapi/asm/mman.h-40-/* MADV_SOFT_OFFLINE is undefined on alpha, fix it for perf */\n--\ntools/arch/mips/include/uapi/asm/mman.h-10-#define MADV_HUGEPAGE\t14\ntools/arch/mips/include/uapi/asm/mman.h:11:#define MADV_HWPOISON\t 100\ntools/arch/mips/include/uapi/asm/mman.h-12-#define MADV_MERGEABLE\t 12\n--\ntools/arch/parisc/include/uapi/asm/mman.h-37-#define PROT_WRITE\t0x2\ntools/arch/parisc/include/uapi/asm/mman.h:38:#define MADV_HWPOISON\t100\ntools/arch/parisc/include/uapi/asm/mman.h-39-#define MADV_SOFT_OFFLINE 101\n--\ntools/arch/xtensa/include/uapi/asm/mman.h-37-#define PROT_WRITE\t0x2\ntools/arch/xtensa/include/uapi/asm/mman.h:38:/* MADV_HWPOISON is undefined on xtensa, fix it for perf */\ntools/arch/xtensa/include/uapi/asm/mman.h:39:#define MADV_HWPOISON\t100\ntools/arch/xtensa/include/uapi/asm/mman.h-40-/* MADV_SOFT_OFFLINE is undefined on xtensa, fix it for perf */\n--\ntools/include/uapi/asm-generic/mman-common.h-55-#define MADV_DOFORK\t11\t\t/* do inherit across fork */\ntools/include/uapi/asm-generic/mman-common.h:56:#define MADV_HWPOISON\t100\t\t/* poison a page for testing */\ntools/include/uapi/asm-generic/mman-common.h-57-#define MADV_SOFT_OFFLINE 101\t\t/* soft offline page for testing */\n--\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c=176=test_hugetlb_read_hwpoison(int fd, size_t len, size_t wr_chunk_size,\n--\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c-205-\thwp_addr = filemap + len / 2 + pagesize;\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c:206:\tif (madvise(hwp_addr, pagesize, MADV_HWPOISON) \u003c 0) {\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c:207:\t\tksft_perror(\"MADV_HWPOISON failed\");\ntools/testing/selftests/mm/hugetlb-read-hwpoison.c-208-\t\tgoto unmap;\n--\ntools/testing/selftests/mm/memory-failure.c=58=static int madv_hard_inject(FIXTURE_DATA(memory_failure) * self, void *vaddr)\ntools/testing/selftests/mm/memory-failure.c-59-{\ntools/testing/selftests/mm/memory-failure.c:60:\treturn madvise(vaddr, self-\u003epage_size, MADV_HWPOISON);\ntools/testing/selftests/mm/memory-failure.c-61-}\n"
}

gemini-3.1-pro-preview

							input: 13491
							output: 20
							thoughts: 34

**Investigating `madvise_hwpoison` and Capability Requirements**

Okay, let's dive into this. My first thought is to go straight to the source. I need to understand the security implications of this `madvise_hwpoison` system call. The most direct path is to check the code itself, so I'll head over to `mm/madvise.c` in the kernel. I need to ascertain definitively if this function requires `CAP_SYS_ADMIN` or any other capabilities. It’s crucial to know this at a low level to understand any security boundaries around it. Understanding the privilege level needed to call `madvise_hwpoison` is the foundational element that will lead to more complex analysis.

{
  "Expression": "madvise_hwpoison"
}

{
  "error": "no matches"
}

gemini-3.1-pro-preview

							input: 13714
							output: 36
							thoughts: 38

Okay, here's my interpretation of that thought process, from an expert's perspective:

**Diving into `madvise_inject_error` and Kernel Memory Management**

Right, it's hitting me that this is likely related to the `madvise_inject_error` function, or something very similar. The kernel's memory management can be tricky, and I need to figure out how this specific error is occurring. I need to get into the code and see exactly what's happening.

Okay, my first step is to crack open `mm/madvise.c`. I remember that area pretty well. I'm going to jump straight to around line 1492. That should put me right in the vicinity of where I expect this `madvise` related error to be handled or, at the very least, where the relevant logic is executed. From there, I can start tracing the code paths and identify the root cause. This should allow me to formulate a solution pretty quickly.

{
  "File": "mm/madvise.c",
  "FirstLine": 1480,
  "LineCount": 50
}

{
  "Contents": "1480:\t\t\t\tret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);\n1481:\t\t\t\tif (ret == -EOPNOTSUPP)\n1482:\t\t\t\t\tret = 0;\n1483:\t\t\t}\n1484:\t\n1485:\t\t\tif (ret)\n1486:\t\t\t\treturn ret;\n1487:\t\t}\n1488:\t\n1489:\t\treturn 0;\n1490:\t}\n1491:\t\n1492:\tstatic bool is_memory_failure(struct madvise_behavior *madv_behavior)\n1493:\t{\n1494:\t\tswitch (madv_behavior-\u003ebehavior) {\n1495:\t\tcase MADV_HWPOISON:\n1496:\t\tcase MADV_SOFT_OFFLINE:\n1497:\t\t\treturn true;\n1498:\t\tdefault:\n1499:\t\t\treturn false;\n1500:\t\t}\n1501:\t}\n1502:\t\n1503:\t#else\n1504:\t\n1505:\tstatic int madvise_inject_error(struct madvise_behavior *madv_behavior)\n1506:\t{\n1507:\t\treturn 0;\n1508:\t}\n1509:\t\n1510:\tstatic bool is_memory_failure(struct madvise_behavior *madv_behavior)\n1511:\t{\n1512:\t\treturn false;\n1513:\t}\n1514:\t\n1515:\t#endif\t/* CONFIG_MEMORY_FAILURE */\n1516:\t\n1517:\tstatic bool\n1518:\tmadvise_behavior_valid(int behavior)\n1519:\t{\n1520:\t\tswitch (behavior) {\n1521:\t\tcase MADV_DOFORK:\n1522:\t\tcase MADV_DONTFORK:\n1523:\t\tcase MADV_NORMAL:\n1524:\t\tcase MADV_SEQUENTIAL:\n1525:\t\tcase MADV_RANDOM:\n1526:\t\tcase MADV_REMOVE:\n1527:\t\tcase MADV_WILLNEED:\n1528:\t\tcase MADV_DONTNEED:\n1529:\t\tcase MADV_DONTNEED_LOCKED:\n"
}

gemini-3.1-pro-preview

							input: 14630
							output: 21
							thoughts: 15

Okay, let's dive into this. Here's my take on those thoughts, as if they're my own:

**Investigating `madvise_inject_error`**

Alright, time to get to the core of this. My initial thought is to immediately go to the source. "Let's check the definition of `madvise_inject_error`". That's where the real truth lies, and I want to understand *precisely* what this thing is, how it's defined, and what it's supposed to do. I need that solid, fundamental understanding to move forward. No shortcuts; the definition *is* the starting point. I'm going to look at the relevant header files, documentation, or code comments. Whatever the canonical definition source is, that's where I need to be. Without it, everything else is just speculation, and I can't afford that. So, let's get that definition.

{
  "Expression": "madvise_inject_error"
}

{
  "Output": "mm/madvise.c=1338=static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)\n--\nmm/madvise.c-1446- */\nmm/madvise.c:1447:static int madvise_inject_error(struct madvise_behavior *madv_behavior)\nmm/madvise.c-1448-{\n--\nmm/madvise.c=1492=static bool is_memory_failure(struct madvise_behavior *madv_behavior)\n--\nmm/madvise.c-1504-\nmm/madvise.c:1505:static int madvise_inject_error(struct madvise_behavior *madv_behavior)\nmm/madvise.c-1506-{\n--\nmm/madvise.c=1887=static int madvise_do_behavior(unsigned long start, size_t len_in,\n--\nmm/madvise.c-1896-\t\trange-\u003eend = start + len_in;\nmm/madvise.c:1897:\t\treturn madvise_inject_error(madv_behavior);\nmm/madvise.c-1898-\t}\n"
}

gemini-3.1-pro-preview

							input: 15187
							output: 36
							thoughts: 21

Okay, here's my take on those thoughts, tailored for an expert audience, and written as if they were my own immediate processing:

**Diving into `madvise_inject_error`**

Alright, let's get down to brass tacks. I need to take a look at `madvise_inject_error`, specifically starting around line 1447. My spider senses are tingling, and I have a feeling something interesting is going to reveal itself. Given the name, I'm anticipating this function is related to the simulation or injection of errors during memory management, likely a critical piece for testing or debugging memory-related behavior under adverse conditions. I'm hoping to understand its mechanics, how it integrates with the overall `madvise` framework, and ideally, what error conditions it's designed to simulate. I'm keen on seeing if there are any subtle interactions with the page table manipulation or NUMA interactions. I'll be looking for clues about the kinds of errors it can trigger, and how it then plays back into the overall system design. Let's see what secrets are contained within this particular piece of code.

{
  "File": "mm/madvise.c",
  "FirstLine": 1440,
  "LineCount": 40
}

{
  "Contents": "1440:\t\treturn error;\n1441:\t}\n1442:\t\n1443:\t#ifdef CONFIG_MEMORY_FAILURE\n1444:\t/*\n1445:\t * Error injection support for memory error handling.\n1446:\t */\n1447:\tstatic int madvise_inject_error(struct madvise_behavior *madv_behavior)\n1448:\t{\n1449:\t\tunsigned long size;\n1450:\t\tunsigned long start = madv_behavior-\u003erange.start;\n1451:\t\tunsigned long end = madv_behavior-\u003erange.end;\n1452:\t\n1453:\t\tif (!capable(CAP_SYS_ADMIN))\n1454:\t\t\treturn -EPERM;\n1455:\t\n1456:\t\tfor (; start \u003c end; start += size) {\n1457:\t\t\tunsigned long pfn;\n1458:\t\t\tstruct page *page;\n1459:\t\t\tint ret;\n1460:\t\n1461:\t\t\tret = get_user_pages_fast(start, 1, 0, \u0026page);\n1462:\t\t\tif (ret != 1)\n1463:\t\t\t\treturn ret;\n1464:\t\t\tpfn = page_to_pfn(page);\n1465:\t\n1466:\t\t\t/*\n1467:\t\t\t * When soft offlining hugepages, after migrating the page\n1468:\t\t\t * we dissolve it, therefore in the second loop \"page\" will\n1469:\t\t\t * no longer be a compound page.\n1470:\t\t\t */\n1471:\t\t\tsize = page_size(compound_head(page));\n1472:\t\n1473:\t\t\tif (madv_behavior-\u003ebehavior == MADV_SOFT_OFFLINE) {\n1474:\t\t\t\tpr_info(\"Soft offlining pfn %#lx at process virtual address %#lx\\n\",\n1475:\t\t\t\t\t pfn, start);\n1476:\t\t\t\tret = soft_offline_page(pfn, MF_COUNT_INCREASED);\n1477:\t\t\t} else {\n1478:\t\t\t\tpr_info(\"Injecting memory failure for pfn %#lx at process virtual address %#lx\\n\",\n1479:\t\t\t\t\t pfn, start);\n"
}

gemini-3.1-pro-preview

							input: 16074
							output: 36
							thoughts: 1074

**Analyzing the Implications of `MADV_HWPOISON` and Kernel Panics**

Okay, so I'm thinking about this `MADV_HWPOISON` thing and how it interacts with the kernel. It seems like this function requires `CAP_SYS_ADMIN`, which is essentially global root access. I know Syzkaller typically runs as root within its own user namespace, but the `capable()` check is against the initial user namespace, meaning it needs *actual* root. Since Syzkaller operates in a VM, it has that level of access and can call `madvise(MADV_HWPOISON)`.

The crucial point is that `madvise(MADV_HWPOISON)` calls `get_user_pages_fast`. Now, I know `get_user_pages_fast` *only* returns pages mapped within the user's address space. It doesn't give you kernel pages directly. A user usually can't map a kernel page unless there's a bug or it's an intended mechanism like with `/dev/mem` or a driver issue. Oh wait, the vDSO! That's mapped in user space. Could a user then call `madvise(MADV_HWPOISON)` on the vDSO?

Let me check if the vDSO is `PageReserved`. Yes, usually vDSO pages *are* `PageReserved`. If `get_user_pages_fast` returns the vDSO page and we call `madvise(MADV_HWPOISON)` on it, that’s where things get interesting. `memory_failure()` gets called on the vDSO page. Since it's reserved, `is_kernel_owned_page()` will return true. `get_any_page()` will then return `-ENOTRECOVERABLE`. `memory_failure()` will then end up calling `action_result(pfn, MF_MSG_KERNEL, MF_IGNORED)`.

And this is where it clicks: if `sysctl_panic_on_unrecoverable_mf` is set to 1, *boom*, kernel panic! But this isn't a bug. This is explicit design! The sysctl is intended to panic when a memory failure hits a kernel-owned page. The patch even specifically mentions the vDSO, along with the zero page, as examples of kernel-owned pages that trigger a panic in this scenario. This is because `MADV_HWPOISON` is meant to *simulate* hardware memory failures, so causing a "memory failure" on the vDSO will trigger a panic. This is the intended behavior of the sysctl.

So, is there even any complex logic that needs fuzzing? The patch itself primarily introduces a sysctl and changes the return value of `get_any_page()` for kernel pages. If we enable the sysctl, Syzkaller will simply find that calling `madvise(MADV_HWPOISON)` on the vDSO or the zero page panics the kernel. Syzkaller would then report this as a bug: "kernel panic: Memory failure: ... unrecoverable page." But this is NOT a bug! It’s the intended outcome of the sysctl.

Now, if we tell syzkaller to fuzz this, it will just flood us with false positives. Thankfully, there is a list of known intentional panics that it ignores, or, we could just avoid enabling the sysctl. If the sysctl isn't enabled, the panic doesn't happen. The change then becomes cosmetic because `get_any_page()` returns `-ENOTRECOVERABLE` instead of `-EIO`. Before the patch, it would call `action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED)`. Now, there is only a difference in the printk message – "recovery action for kernel page: Ignored" instead of "recovery action for unknown page: Ignored". This is a negligible change.

I'd better double check `get_any_page()` to be sure, though.

{
  "File": "mm/memory-failure.c",
  "FirstLine": 1420,
  "LineCount": 40
}

{
  "Contents": "1420:\t\n1421:\tstatic int get_any_page(struct page *p, unsigned long flags)\n1422:\t{\n1423:\t\tint ret = 0, pass = 0;\n1424:\t\tbool count_increased = false;\n1425:\t\n1426:\t\tif (flags \u0026 MF_COUNT_INCREASED)\n1427:\t\t\tcount_increased = true;\n1428:\t\n1429:\t\t/*\n1430:\t\t * Page types we know are kernel-owned and cannot be recovered.\n1431:\t\t * Short-circuit before the shake_page() / retry loop, which\n1432:\t\t * cannot turn any of these into something HWPoisonHandlable().\n1433:\t\t * Drop the caller's reference if MF_COUNT_INCREASED took one.\n1434:\t\t */\n1435:\t\tif (is_kernel_owned_page(p)) {\n1436:\t\t\tif (count_increased)\n1437:\t\t\t\tput_page(p);\n1438:\t\t\tret = -ENOTRECOVERABLE;\n1439:\t\t\tgoto out;\n1440:\t\t}\n1441:\t\n1442:\ttry_again:\n1443:\t\tif (!count_increased) {\n1444:\t\t\tret = __get_hwpoison_page(p, flags);\n1445:\t\t\tif (!ret) {\n1446:\t\t\t\tif (page_count(p)) {\n1447:\t\t\t\t\t/* We raced with an allocation, retry. */\n1448:\t\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM)\n1449:\t\t\t\t\t\tgoto try_again;\n1450:\t\t\t\t\tret = -EBUSY;\n1451:\t\t\t\t} else if (!PageHuge(p) \u0026\u0026 !is_free_buddy_page(p)) {\n1452:\t\t\t\t\t/* We raced with put_page, retry. */\n1453:\t\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM)\n1454:\t\t\t\t\t\tgoto try_again;\n1455:\t\t\t\t\tret = -EIO;\n1456:\t\t\t\t}\n1457:\t\t\t\tgoto out;\n1458:\t\t\t} else if (ret == -EBUSY) {\n1459:\t\t\t\t/*\n"
}

gemini-3.1-pro-preview

							input: 18658
							output: 36
							thoughts: 71

Here's my summary, as the expert:

**Investigating a Potentially Unexpected Memory Page State**

Okay, so I'm seeing `get_any_page()` being called from `memory_failure()` – which is a bit unusual. The crucial part here is that `pfn_valid()` is returning true *but* it's not a *normal* page. That's immediately flagged as a point of interest. Something's clearly not as it should be, and my first instinct is to understand the context.

Wait a second, wasn’t that same function call also associated with `get_hwpoison_page()`? My memory is flagging that up. Let me shift focus immediately to `get_hwpoison_page()` because that context is likely related. I need to understand what's happening there, and how `get_any_page()` is getting involved in the hardware poisoning scenario to understand the underlying root cause. Let's dig into that function's details now.

{
  "File": "mm/memory-failure.c",
  "FirstLine": 1540,
  "LineCount": 40
}

{
  "Contents": "1540:\t *\n1541:\t * Return: 0 on failure or free buddy (hugetlb) page,\n1542:\t *         1 on success for in-use pages in a well-defined state,\n1543:\t *         -EIO for pages on which we can not handle memory errors,\n1544:\t *         -EBUSY when get_hwpoison_page() has raced with page lifecycle\n1545:\t *         operations like allocation and free,\n1546:\t *         -EHWPOISON when the page is hwpoisoned and taken off from buddy,\n1547:\t *         -ENOTRECOVERABLE for kernel-owned pages identified by\n1548:\t *         is_kernel_owned_page() (PG_reserved, slab,\n1549:\t *         page-table, large-kmalloc) that the handler cannot recover.\n1550:\t */\n1551:\tstatic int get_hwpoison_page(struct page *p, unsigned long flags)\n1552:\t{\n1553:\t\tint ret;\n1554:\t\n1555:\t\tzone_pcp_disable(page_zone(p));\n1556:\t\tif (flags \u0026 MF_UNPOISON)\n1557:\t\t\tret = __get_unpoison_page(p);\n1558:\t\telse\n1559:\t\t\tret = get_any_page(p, flags);\n1560:\t\tzone_pcp_enable(page_zone(p));\n1561:\t\n1562:\t\treturn ret;\n1563:\t}\n1564:\t\n1565:\t/*\n1566:\t * The caller must guarantee the folio isn't large folio, except hugetlb.\n1567:\t * try_to_unmap() can't handle it.\n1568:\t */\n1569:\tint unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill)\n1570:\t{\n1571:\t\tenum ttu_flags ttu = TTU_IGNORE_MLOCK | TTU_SYNC | TTU_HWPOISON;\n1572:\t\tstruct address_space *mapping;\n1573:\t\n1574:\t\tif (folio_test_swapcache(folio)) {\n1575:\t\t\tpr_err(\"%#lx: keeping poisoned page in swap cache\\n\", pfn);\n1576:\t\t\tttu \u0026= ~TTU_HWPOISON;\n1577:\t\t}\n1578:\t\n1579:\t\t/*\n"
}

gemini-3.1-pro-preview

							input: 19640
							output: 36
							thoughts: 75

Okay, let's break this down.

**Diving into the `hwpoison` Page Acquisition Logic**

Right, so `get_hwpoison_page()` is the entry point, and it's calling `get_any_page()`. That seems reasonable as a way to generalize the page allocation process. Now, the interesting bit is that `get_any_page()` has been updated to short-circuit if it's dealing with pages the kernel already owns. That's a significant optimization; avoiding unnecessary calls is always a win.

Before this change, though, the code path would have landed directly in `__get_hwpoison_page(p, flags)`. Given the architecture, this is the function where the core logic of handling the hardware poison is likely handled. Therefore, I need to understand that piece as that's where the work used to take place. Let's see what `__get_hwpoison_page()` actually *does*.

{
  "File": "mm/memory-failure.c",
  "FirstLine": 1380,
  "LineCount": 40
}

{
  "Contents": "1380:\t\treturn kernel_owned;\n1381:\t}\n1382:\t\n1383:\tstatic int __get_hwpoison_page(struct page *page, unsigned long flags)\n1384:\t{\n1385:\t\tstruct folio *folio = page_folio(page);\n1386:\t\tint ret = 0;\n1387:\t\tbool hugetlb = false;\n1388:\t\n1389:\t\tret = get_hwpoison_hugetlb_folio(folio, \u0026hugetlb, false);\n1390:\t\tif (hugetlb) {\n1391:\t\t\t/* Make sure hugetlb demotion did not happen from under us. */\n1392:\t\t\tif (folio == page_folio(page))\n1393:\t\t\t\treturn ret;\n1394:\t\t\tif (ret \u003e 0) {\n1395:\t\t\t\tfolio_put(folio);\n1396:\t\t\t\tfolio = page_folio(page);\n1397:\t\t\t}\n1398:\t\t}\n1399:\t\n1400:\t\t/*\n1401:\t\t * This check prevents from calling folio_try_get() for any\n1402:\t\t * unsupported type of folio in order to reduce the risk of unexpected\n1403:\t\t * races caused by taking a folio refcount.\n1404:\t\t */\n1405:\t\tif (!HWPoisonHandlable(\u0026folio-\u003epage, flags))\n1406:\t\t\treturn -EBUSY;\n1407:\t\n1408:\t\tif (folio_try_get(folio)) {\n1409:\t\t\tif (folio == page_folio(page))\n1410:\t\t\t\treturn 1;\n1411:\t\n1412:\t\t\tpr_info(\"%#lx cannot catch tail\\n\", page_to_pfn(page));\n1413:\t\t\tfolio_put(folio);\n1414:\t\t}\n1415:\t\n1416:\t\treturn 0;\n1417:\t}\n1418:\t\n1419:\t#define GET_PAGE_MAX_RETRY_NUM 3\n"
}

gemini-3.1-pro-preview

							input: 20510
							output: 19
							thoughts: 1745

**Analysis of the Memory Failure Patch for Fuzzing Suitability**

Okay, let's break this down. The core change here is that kernel pages, when a memory failure is injected, will now lead to `-ENOTRECOVERABLE` being returned immediately from `get_any_page()`, skipping the retry logic and `shake_page()`. Instead of `-EIO`, which was previously handled by `action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED)`, we now get `action_result(pfn, MF_MSG_KERNEL, MF_IGNORED)`. A new sysctl, `panic_on_unrecoverable_memory_failure`, can be enabled, which will trigger a panic *if* the failure occurs on a kernel page and the result is `MF_IGNORED`, which is the new default.

My first thought is, is this even worth fuzzing? The prompt's criteria are key. The modification *is* to core kernel logic (`mm/memory-failure.c`). The paths are reachable, both through `madvise(MADV_HWPOISON)` and `/sys/devices/system/memory/hard_offline_page`. The changes *do* alter the execution flow, specifically short-circuiting the existing retry mechanism. The panic, while intentional, *is* a functional change.

However, the main change is that intentional panic. Enabling the sysctl will lead to syzkaller reporting those as bugs. I need to be careful with `FocusSymbols` and `EnableConfigs`. Focusing on `is_kernel_owned_page` or `panic_on_unrecoverable_mf` with that sysctl enabled means syzkaller would hit the panic repeatedly, leading to false positives.

But then I remember syzkaller's job: "If it modifies reachable core kernel logic, drivers, or architectures, use your code search tools to verify the code can be executed, then return WorthFuzzing=true."

So, even though there's no complex logic being introduced, there's a functional change in `mm/memory-failure.c` - a change of return code and a short-circuit, it meets the criteria. So, I need to say `WorthFuzzing=true`.

If I say `WorthFuzzing=true` I *have* to list `is_kernel_owned_page` and `panic_on_unrecoverable_mf` as focus symbols. Syzkaller *should* be trying to inject memory failures into kernel pages, and that is what the new logic deals with.

The panic, while *intentional*, is still a modification that could potentially expose issues, maybe. The fact that the panic will be detected as a bug should not be a concern.

Therefore, because this touches core kernel logic that is reachable and functionally alters the execution, I'm going with `WorthFuzzing=true`.

```
WorthFuzzing=true
FocusSymbols=[is_kernel_owned_page, panic_on_unrecoverable_mf]
```

{
  "Expression": "config MEMORY_FAILURE"
}

{
  "Output": "mm/Kconfig=774=config ARCH_SUPPORTS_MEMORY_FAILURE\n--\nmm/Kconfig-776-\nmm/Kconfig:777:config MEMORY_FAILURE\nmm/Kconfig-778-\tdepends on MMU\n"
}

gemini-3.1-pro-preview

							input: 23008
							output: 21
							thoughts: 28

Okay, here's my take on those thoughts, tailored for an expert audience:

**Configuration Considerations for Memory Testing**

Alright, so I'm honing in on the crucial configuration parameters for memory testing. My initial gut feeling is that `MEMORY_FAILURE` is the primary setting I'll be working with. That's the go-to, as it directly simulates memory errors, which is what I'm aiming to evaluate and understand.

However, I'm already mentally cross-referencing that with another powerful option: `HWPOISON_INJECT`. Now, this is where it gets interesting. While `MEMORY_FAILURE` might be more straightforward for certain scenarios, `HWPOISON_INJECT` offers a different level of control and potentially more realistic simulation, as it can be used to inject memory errors into the system in a way that is more hardware-like. I'll need to consider which approach, or combination of approaches, will give me the most insightful results, depending on my specific test objectives. Both seem valuable, but my assessment on which is best will be made during a deeper evaluation.

{
  "Expression": "config HWPOISON_INJECT"
}

{
  "Output": "mm/Kconfig=777=config MEMORY_FAILURE\n--\nmm/Kconfig-787-\nmm/Kconfig:788:config HWPOISON_INJECT\nmm/Kconfig-789-\ttristate \"HWPoison pages injector\"\n"
}

gemini-3.1-pro-preview

							input: 23366
							output: 21
							thoughts: 245

**My Analysis of the `MADV_HWPOISON` Dependency and WorthFuzzing Patch**

Okay, let's break this down. First, I need to confirm something: is `MADV_HWPOISON` dependent on `CONFIG_MEMORY_FAILURE`? A quick grep in `mm/madvise.c` confirms it; the code directly uses `#ifdef CONFIG_MEMORY_FAILURE` around a function relevant to error injection. So, yes, the dependency is there. Got it.

Now, revisiting the `WorthFuzzing` question. This patch introduces a sysctl that intentionally triggers a kernel panic. The implication is clear: syzkaller could be used to directly cause panics. The immediate question is, does fuzzing *this* feature help uncover *other* bugs? I'm not immediately seeing it. However, the `get_any_page()` short-circuit is an important functional change that avoids `shake_page()`. It introduces retry logic, which, on its own, seems functional. If `count_increased` is true, it calls `put_page(p)`. Hang on… `count_increased` depends on the `flags` parameter, specifically whether `flags & MF_COUNT_INCREASED` is set. Okay, I need to see where `MF_COUNT_INCREASED` is *used* and passed. I'll search for that now.

{
  "Expression": "MF_COUNT_INCREASED"
}

{
  "Output": "include/linux/mm.h=4982=enum mf_flags {\ninclude/linux/mm.h:4983:\tMF_COUNT_INCREASED = 1 \u003c\u003c 0,\ninclude/linux/mm.h-4984-\tMF_ACTION_REQUIRED = 1 \u003c\u003c 1,\n--\nmm/madvise.c=1447=static int madvise_inject_error(struct madvise_behavior *madv_behavior)\n--\nmm/madvise.c-1475-\t\t\t\t pfn, start);\nmm/madvise.c:1476:\t\t\tret = soft_offline_page(pfn, MF_COUNT_INCREASED);\nmm/madvise.c-1477-\t\t} else {\n--\nmm/madvise.c-1479-\t\t\t\t pfn, start);\nmm/madvise.c:1480:\t\t\tret = memory_failure(pfn, MF_ACTION_REQUIRED | MF_COUNT_INCREASED | MF_SW_SIMULATED);\nmm/madvise.c-1481-\t\t\tif (ret == -EOPNOTSUPP)\n--\nmm/memory-failure.c=1421=static int get_any_page(struct page *p, unsigned long flags)\n--\nmm/memory-failure.c-1425-\nmm/memory-failure.c:1426:\tif (flags \u0026 MF_COUNT_INCREASED)\nmm/memory-failure.c-1427-\t\tcount_increased = true;\n--\nmm/memory-failure.c-1432-\t * cannot turn any of these into something HWPoisonHandlable().\nmm/memory-failure.c:1433:\t * Drop the caller's reference if MF_COUNT_INCREASED took one.\nmm/memory-failure.c-1434-\t */\n--\nmm/memory-failure.c=2014=static int get_huge_page_for_hwpoison(unsigned long pfn, int flags,\n--\nmm/memory-failure.c-2026-\t\tgoto out_unlock;\nmm/memory-failure.c:2027:\t} else if (flags \u0026 MF_COUNT_INCREASED) {\nmm/memory-failure.c-2028-\t\tret = MF_HUGETLB_IN_USED;\n--\nmm/memory-failure.c=2167=static void put_ref_page(unsigned long pfn, int flags)\nmm/memory-failure.c-2168-{\nmm/memory-failure.c:2169:\tif (!(flags \u0026 MF_COUNT_INCREASED))\nmm/memory-failure.c-2170-\t\treturn;\n--\nmm/memory-failure.c=2387=int memory_failure(unsigned long pfn, int flags)\n--\nmm/memory-failure.c-2443-\t\t\tres = kill_accessing_process(current, pfn, flags);\nmm/memory-failure.c:2444:\t\tif (flags \u0026 MF_COUNT_INCREASED)\nmm/memory-failure.c-2445-\t\t\tput_page(p);\n--\nmm/memory-failure.c-2520-\t\t * it is called by soft offline too and it is just called\nmm/memory-failure.c:2521:\t\t * for !MF_COUNT_INCREASED.  So here seems to be the best\nmm/memory-failure.c-2522-\t\t * place.\n--\nmm/memory-failure.c=2968=int soft_offline_page(unsigned long pfn, int flags)\n--\nmm/memory-failure.c-2974-\tif (!pfn_valid(pfn)) {\nmm/memory-failure.c:2975:\t\tWARN_ON_ONCE(flags \u0026 MF_COUNT_INCREASED);\nmm/memory-failure.c-2976-\t\treturn -ENXIO;\n--\nmm/memory-failure.c-3019-\t\t\t\ttry_again = false;\nmm/memory-failure.c:3020:\t\t\t\tflags \u0026= ~MF_COUNT_INCREASED;\nmm/memory-failure.c-3021-\t\t\t\tgoto retry;\n"
}

gemini-3.1-pro-preview

							input: 24976
							output: 36
							thoughts: 471

**Analyzing Potential Reference Count Issues in Memory Failure Handling**

Okay, so I'm digging into how `madvise_inject_error` and the subsequent memory failure handling work, and something is starting to smell a bit fishy. I see that `madvise_inject_error` passes `MF_COUNT_INCREASED` because it calls `get_user_pages_fast`, which, as we know, *takes* a reference on the page. Makes sense, that's expected behavior.

Then `memory_failure` kicks in and eventually calls `get_hwpoison_page`. That function, in turn, calls `get_any_page`. Now, here's where my alarm bells start ringing. Inside `get_any_page`, there's this bit of code:

```c
	if (is_kernel_owned_page(p)) {
		if (count_increased)
			put_page(p);
		ret = -ENOTRECOVERABLE;
		goto out;
	}
```

Now, hold on a second. If `is_kernel_owned_page` is true, and `count_increased` is also true, it calls `put_page(p)`. That `put_page` is *dropping* the reference that `get_user_pages_fast` originally took! But, `memory_failure` also calls `get_hwpoison_page` which returns `-ENOTRECOVERABLE`, and also does some extra processing:

```c
	res = get_hwpoison_page(p, flags);
	switch (res) {
	case 0:
        ...
	case 1:
		/* Got a refcount on a handlable page. */
		break;
	case -ENOTRECOVERABLE:
		/*
		 * Stable unhandlable kernel-owned page (PG_reserved,
		 * slab, page tables, large-kmalloc).
		 * No recovery possible.
		 */
		res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
		goto unlock_mutex;
```

If `get_hwpoison_page` returns `-ENOTRECOVERABLE`, it branches to `unlock_mutex`. And what does that do?

```c
unlock_mutex:
	mutex_unlock(&mf_mutex);
	if (hw_memory_failure)
		pr_err("%#lx: recovery action for %s: %s\n",
			pfn, action_page_types[MF_MSG_KERNEL], action_name[MF_IGNORED]);
	return res;
```

Ah, it just returns. Does it actually drop the reference to the page if `get_hwpoison_page` returns `-ENOTRECOVERABLE`? That's the key question. I need to be *absolutely* sure if we're leaking this reference under this particular path. I better go and check the surrounding code of `memory_failure` around line 2440 to make sure the reference isn't leaked.

{
  "File": "mm/memory-failure.c",
  "FirstLine": 2430,
  "LineCount": 40
}

{
  "Contents": "2430:\t\t}\n2431:\t\n2432:\ttry_again:\n2433:\t\tres = try_memory_failure_hugetlb(pfn, flags);\n2434:\t\t/*\n2435:\t\t * -ENOENT means the page we found is not hugetlb, so proceed with normal page handling\n2436:\t\t */\n2437:\t\tif (res != -ENOENT)\n2438:\t\t\tgoto unlock_mutex;\n2439:\t\n2440:\t\tif (TestSetPageHWPoison(p)) {\n2441:\t\t\tres = -EHWPOISON;\n2442:\t\t\tif (flags \u0026 MF_ACTION_REQUIRED)\n2443:\t\t\t\tres = kill_accessing_process(current, pfn, flags);\n2444:\t\t\tif (flags \u0026 MF_COUNT_INCREASED)\n2445:\t\t\t\tput_page(p);\n2446:\t\t\taction_result(pfn, MF_MSG_ALREADY_POISONED, MF_FAILED);\n2447:\t\t\tgoto unlock_mutex;\n2448:\t\t}\n2449:\t\n2450:\t\t/*\n2451:\t\t * We need/can do nothing about count=0 pages.\n2452:\t\t * 1) it's a free page, and therefore in safe hand:\n2453:\t\t *    check_new_page() will be the gate keeper.\n2454:\t\t * 2) it's part of a non-compound high order page.\n2455:\t\t *    Implies some kernel user: cannot stop them from\n2456:\t\t *    R/W the page; let's pray that the page has been\n2457:\t\t *    used and will be freed some time later.\n2458:\t\t * In fact it's dangerous to directly bump up page count from 0,\n2459:\t\t * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.\n2460:\t\t */\n2461:\t\tres = get_hwpoison_page(p, flags);\n2462:\t\tswitch (res) {\n2463:\t\tcase 0:\n2464:\t\t\tif (is_free_buddy_page(p)) {\n2465:\t\t\t\tif (take_page_off_buddy(p)) {\n2466:\t\t\t\t\tpage_ref_inc(p);\n2467:\t\t\t\t\tres = MF_RECOVERED;\n2468:\t\t\t\t} else {\n2469:\t\t\t\t\t/* We lost the race, try again */\n"
}

gemini-3.1-pro-preview

							input: 26811
							output: 36
							thoughts: 126

**Analyzing the HWPoison Page Handling**

Okay, so I'm focusing on this `get_hwpoison_page` function. A crucial point here is that if it returns `-ENOTRECOVERABLE`, the code proceeds to `unlock_mutex`. The core of my concern here is *does* it manage to drop the page's reference properly after encountering that error?

I'm starting to remember the pre-patch behavior of `get_any_page`. It looks like it would call `__get_hwpoison_page`, but only under specific circumstances. Specifically, if `count_increased` was false.

I need to circle back to `get_any_page`. If `count_increased` is true, then `__get_hwpoison_page` *wasn't* called, but now I need to check whether that's problematic. I must verify the logic surrounding that conditional call carefully. This likely has implications for the reference counts and overall memory integrity. I'll need to scrutinize how those two branches behave in tandem to truly understand the change.

{
  "File": "mm/memory-failure.c",
  "FirstLine": 1440,
  "LineCount": 40
}

{
  "Contents": "1440:\t\t}\n1441:\t\n1442:\ttry_again:\n1443:\t\tif (!count_increased) {\n1444:\t\t\tret = __get_hwpoison_page(p, flags);\n1445:\t\t\tif (!ret) {\n1446:\t\t\t\tif (page_count(p)) {\n1447:\t\t\t\t\t/* We raced with an allocation, retry. */\n1448:\t\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM)\n1449:\t\t\t\t\t\tgoto try_again;\n1450:\t\t\t\t\tret = -EBUSY;\n1451:\t\t\t\t} else if (!PageHuge(p) \u0026\u0026 !is_free_buddy_page(p)) {\n1452:\t\t\t\t\t/* We raced with put_page, retry. */\n1453:\t\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM)\n1454:\t\t\t\t\t\tgoto try_again;\n1455:\t\t\t\t\tret = -EIO;\n1456:\t\t\t\t}\n1457:\t\t\t\tgoto out;\n1458:\t\t\t} else if (ret == -EBUSY) {\n1459:\t\t\t\t/*\n1460:\t\t\t\t * We raced with (possibly temporary) unhandlable\n1461:\t\t\t\t * page, retry.\n1462:\t\t\t\t */\n1463:\t\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM) {\n1464:\t\t\t\t\tshake_page(p);\n1465:\t\t\t\t\tgoto try_again;\n1466:\t\t\t\t}\n1467:\t\t\t\tret = -EIO;\n1468:\t\t\t\tgoto out;\n1469:\t\t\t}\n1470:\t\t}\n1471:\t\n1472:\t\tif (PageHuge(p) || HWPoisonHandlable(p, flags)) {\n1473:\t\t\tret = 1;\n1474:\t\t} else {\n1475:\t\t\t/*\n1476:\t\t\t * A page we cannot handle. Check whether we can turn\n1477:\t\t\t * it into something we can handle.\n1478:\t\t\t */\n1479:\t\t\tif (pass++ \u003c GET_PAGE_MAX_RETRY_NUM) {\n"
}

gemini-3.1-pro-preview

							input: 27773
							output: 19
							thoughts: 1712

**Analysis of the Memory Failure Patch**

Okay, let's break this down. My initial thought was that this patch, while changing the return value and short-circuiting for kernel pages, was primarily about enabling a panic and therefore probably not worth fuzzing, since syzkaller would likely treat the panic as a bug. But that short-circuit is still potentially a functional change. The problem is that hitting the panic is essentially the main functional change, and that's not what we typically want fuzzing to do. However, I need to stick with the instructions provided. I read the prompt again, and it explicitly states that if core kernel logic is modified and reachable, I should return `WorthFuzzing=true`.

This patch *does* modify core kernel logic within `mm/memory-failure.c`. It alters the behavior of the memory failure handler. And it can be executed in virtualized environments via `madvise(MADV_HWPOISON)` on kernel-owned pages, like the vDSO. Therefore, I need to return `WorthFuzzing=true`.

Now, the important part: what should I tell the fuzzer to focus on? I was thinking about `is_kernel_owned_page` initially but then remembered that it is an inline function, so it likely won't appear as a symbol in the coverage. However, the function that *calls* it (`get_any_page`), is not inline, and *is* a relevant point of interaction. It seems like a good target. The other potential target is `panic_on_unrecoverable_mf`.

I need to make sure that the proper configurations are provided, given that `MEMORY_FAILURE` is needed. I'm going to list it anyway, and double check to be extra certain. `HWPOISON_INJECT` might be needed, but `MADV_HWPOISON` is the primary thing, and I should only list `MEMORY_FAILURE`.

So, the plan is:

*   **WorthFuzzing=true**
*   **FocusSymbols**:
    *   `get_any_page`
    *   `panic_on_unrecoverable_mf`
*   **EnableConfigs**:
    *   `MEMORY_FAILURE`

The reasoning here is that the patch alters the memory failure handling logic, specifically to short-circuit recovery and optionally trigger a panic for kernel-owned pages. This functionality can be reached via `madvise(MADV_HWPOISON)` on kernel pages, even in a virtualized environment. The selected focus symbols are `get_any_page` and `panic_on_unrecoverable_mf`, as these are the core functions impacted by the patch. The configuration `MEMORY_FAILURE` enables the necessary infrastructure. `get_any_page` is not a performance critical function.

{
  "Expression": "get_any_page"
}

{
  "Output": "mm/memory-failure.c=1383=static int __get_hwpoison_page(struct page *page, unsigned long flags)\n--\nmm/memory-failure.c-1420-\nmm/memory-failure.c:1421:static int get_any_page(struct page *p, unsigned long flags)\nmm/memory-failure.c-1422-{\n--\nmm/memory-failure.c=1495=static int __get_unpoison_page(struct page *page)\n--\nmm/memory-failure.c-1534- * extra care for the error page's state (as done in __get_hwpoison_page()),\nmm/memory-failure.c:1535: * and has some retry logic in get_any_page().\nmm/memory-failure.c-1536- *\n--\nmm/memory-failure.c=1551=static int get_hwpoison_page(struct page *p, unsigned long flags)\n--\nmm/memory-failure.c-1558-\telse\nmm/memory-failure.c:1559:\t\tret = get_any_page(p, flags);\nmm/memory-failure.c-1560-\tzone_pcp_enable(page_zone(p));\n--\nmm/memory-failure.c=2850=static int soft_offline_in_use_page(struct page *page)\n--\nmm/memory-failure.c-2910-\t * If we succeed to isolate the folio, we grabbed another refcount on\nmm/memory-failure.c:2911:\t * the folio, so we can safely drop the one we got from get_any_page().\nmm/memory-failure.c-2912-\t * If we failed to isolate the folio, it means that we cannot go further\nmm/memory-failure.c-2913-\t * and we will return an error, so drop the reference we got from\nmm/memory-failure.c:2914:\t * get_any_page() as well.\nmm/memory-failure.c-2915-\t */\n--\ntools/testing/selftests/mm/hwpoison-panic.sh-11-#             (sourced from /proc/iomem \"Kernel rodata\").  Exercises\ntools/testing/selftests/mm/hwpoison-panic.sh:12:#             memory_failure() -\u003e get_any_page() on a PageReserved page.\ntools/testing/selftests/mm/hwpoison-panic.sh-13-#\ntools/testing/selftests/mm/hwpoison-panic.sh-14-#   slab    - a slab page found via /proc/kpageflags (KPF_SLAB).\ntools/testing/selftests/mm/hwpoison-panic.sh:15:#             Exercises memory_failure() -\u003e get_any_page() on a non\ntools/testing/selftests/mm/hwpoison-panic.sh-16-#             PG_reserved kernel-owned page.  This path is what catches\ntools/testing/selftests/mm/hwpoison-panic.sh:17:#             regressions where get_any_page() collapses kernel-owned\ntools/testing/selftests/mm/hwpoison-panic.sh-18-#             pages into a transient -EIO instead of -ENOTRECOVERABLE.\n"
}

gemini-3.1-pro-preview

							input: 30922
							output: 124
							thoughts: 25

Okay, here's my take on those thoughts, tailored for an expert audience:

**Confirming Function Execution and Proceeding to Data Setting**

Alright, after reviewing the context, I'm confident that `get_any_page` is the function to use here. It aligns with the requirements of this stage, specifically the need to retrieve the necessary data. I'm satisfied with the function selection.

Now, with that confirmed, I will proceed to call `set-results`. This function likely handles the population or manipulation of my data structure using the result fetched by `get_any_page`. It is the next logical step in the process, enabling downstream operations on the acquired data.

{
  "EnableConfigs": [
    "MEMORY_FAILURE"
  ],
  "FocusSymbols": [
    "get_any_page",
    "panic_on_unrecoverable_mf"
  ],
  "Reasoning": "The patch modifies the memory failure handler in core MM to short-circuit recovery for kernel-owned pages and optionally panic. This error-handling path is reachable in virtualized environments via madvise(MADV_HWPOISON) on kernel-owned pages mapped in user space (like the vDSO) or via sysfs error injection.",
  "WorthFuzzing": true
}

{
  "EnableConfigs": [
    "MEMORY_FAILURE"
  ],
  "FocusSymbols": [
    "get_any_page",
    "panic_on_unrecoverable_mf"
  ],
  "Reasoning": "The patch modifies the memory failure handler in core MM to short-circuit recovery for kernel-owned pages and optionally panic. This error-handling path is reachable in virtualized environments via madvise(MADV_HWPOISON) on kernel-owned pages mapped in user space (like the vDSO) or via sysfs error injection.",
  "WorthFuzzing": true
}

LLM Calls Summary:
Total Calls	Total Tokens	Avg Tokens	Total Duration (Seconds)	Avg Duration (Seconds)

Tool Calls Summary:
Total Calls	Total Duration (Seconds)	Avg Duration (Seconds)