AI Trajectory

Trajectory: (Expand/Collapse All)
Seq	Timestamp	Type	Name	Duration
0/0	2026/06/26 23:05	flow	patch-triage	4m Args: null Results: { "EnableConfigs": [ "MEMORY_FAILURE", "HWPOISON_INJECT" ], "FocusSymbols": [ "memory_failure", "get_any_page", "action_result", "panic_on_unrecoverable_mf", "is_kernel_owned_page" ], "Reasoning": "The patch modifies the core memory failure handling logic (`memory_failure`, `get_any_page`) to explicitly identify and short-circuit on kernel-owned pages (slab, page tables, reserved, large kmalloc). It also introduces a new sysctl to panic on such unrecoverable errors. This is reachable via hardware error injection interfaces (e.g., `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`).", "WorthFuzzing": true }
1/1	2026/06/26 23:05	action	read-patch-diff	0m Args: null Results: { "PatchDiff": "commit 581742259c41b2f1d7712a44b0ab008650850750\nAuthor: syz-cluster \u003ctriage@syzkaller.com\u003e\nDate: Fri Jun 26 23:05:19 2026 +0000\n\n syz-cluster: applied patch under review\n\ndiff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst\nindex b9b0c218bfb440..22cc54cac3b21d 100644\n--- a/Documentation/admin-guide/sysctl/vm.rst\n+++ b/Documentation/admin-guide/sysctl/vm.rst\n@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:\n - page-cluster\n - page_lock_unfairness\n - panic_on_oom\n+- panic_on_unrecoverable_memory_failure\n - percpu_pagelist_high_fraction\n - stat_interval\n - stat_refresh\n@@ -925,6 +926,85 @@ panic_on_oom=2+kdump gives you very strong tool to investigate\n why oom happens. You can get snapshot.\n \n \n+panic_on_unrecoverable_memory_failure\n+======================================\n+\n+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page\n+that cannot be recovered by the memory failure handler, the default\n+behaviour is to ignore the error and continue operation. This is\n+dangerous because the corrupted data remains accessible to the kernel,\n+risking silent data corruption or a delayed crash when the poisoned\n+memory is next accessed.\n+\n+When enabled, this sysctl triggers a panic on memory failure events\n+hitting kernel-owned pages that the handler cannot recover:\n+``PageReserved`` (firmware reservations, kernel image, vDSO, zero\n+page, and similar memblock-reserved regions), ``PageSlab``,\n+``PageTable``, and ``PageLargeKmalloc``. These are owned by the\n+kernel and the memory failure handler cannot reliably evict their\n+contents.\n+\n+Other unrecoverable kernel-owned populations (vmalloc allocations,\n+kernel stack pages, ...) are not currently covered because the\n+handler has no page-type signal that distinguishes them from a\n+userspace folio temporarily off the LRU during migration or\n+compaction. Such pages still go through the standard\n+MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a\n+delayed crash on the next access remains possible. Coverage may\n+grow as the handler gains stronger kernel-ownership signals.\n+\n+Recoverable failure paths are also intentionally left out: in-flight\n+buddy allocations and other transient races with the page allocator\n+can reach the same diagnostic, and panicking on them would risk\n+killing the box for a page destined for userspace where the standard\n+SIGBUS recovery path applies. Pages whose state could not be\n+classified at all are not covered either, since an unknown state is\n+not a sound basis for a panic decision.\n+\n+For many environments it is preferable to panic immediately with a clean\n+crash dump that captures the original error context, rather than to\n+continue and face a random crash later whose cause is difficult to\n+diagnose.\n+\n+Use cases\n+---------\n+\n+This option is most useful in environments where unattributed crashes\n+are expensive to debug or where data integrity must take precedence\n+over availability:\n+\n+* Large fleets, where multi-bit ECC errors on kernel pages are observed\n+ regularly and post-mortem analysis of an unrelated downstream crash\n+ (often seconds to minutes after the original error) consumes\n+ significant engineering effort.\n+\n+* Systems configured with kdump, where panicking at the moment of the\n+ hardware error produces a vmcore that still contains the faulting\n+ address, the affected page state, and the originating MCE/GHES\n+ record — context that is typically lost by the time a delayed crash\n+ occurs.\n+\n+* High-availability clusters that rely on fast, deterministic node\n+ failure for failover, and prefer an immediate panic over silent data\n+ corruption propagating to replicas or persistent storage.\n+\n+* Kernel and platform developers reproducing hwpoison issues with\n+ tools such as ``mce-inject`` or error-injection debugfs interfaces,\n+ where panicking on the unrecoverable path makes regressions\n+ immediately visible instead of surfacing as later, unrelated\n+ failures.\n+\n+= =====================================================================\n+0 Try to continue operation (default).\n+1 Panic immediately. If the ``panic`` sysctl is also non-zero then the\n+ machine will be rebooted.\n+= =====================================================================\n+\n+Example::\n+\n+ echo 1 \u003e /proc/sys/vm/panic_on_unrecoverable_memory_failure\n+\n+\n percpu_pagelist_high_fraction\n =============================\n \ndiff --git a/mm/memory-failure.c b/mm/memory-failure.c\nindex 51508a55c4055e..611160c98c6f64 100644\n--- a/mm/memory-failure.c\n+++ b/mm/memory-failure.c\n@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;\n \n static int sysctl_enable_soft_offline __read_mostly = 1;\n \n+static int sysctl_panic_on_unrecoverable_mf __read_mostly;\n+\n atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);\n \n static bool hw_memory_failure __read_mostly = false;\n@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {\n \t\t.proc_handler\t= proc_dointvec_minmax,\n \t\t.extra1\t\t= SYSCTL_ZERO,\n \t\t.extra2\t\t= SYSCTL_ONE,\n+\t},\n+\t{\n+\t\t.procname\t= \"panic_on_unrecoverable_memory_failure\",\n+\t\t.data\t\t= \u0026sysctl_panic_on_unrecoverable_mf,\n+\t\t.maxlen\t\t= sizeof(sysctl_panic_on_unrecoverable_mf),\n+\t\t.mode\t\t= 0644,\n+\t\t.proc_handler\t= proc_dointvec_minmax,\n+\t\t.extra1\t\t= SYSCTL_ZERO,\n+\t\t.extra2\t\t= SYSCTL_ONE,\n \t}\n };\n \n@@ -980,17 +991,6 @@ static bool has_extra_refcount(struct page_state ps, struct page p,\n \treturn false;\n }\n \n-/\n- Error hit kernel page.\n- * Do nothing, try to be lucky and not touch this instead. For a few cases we\n- * could be more sophisticated.\n- /\n-static int me_kernel(struct page_state ps, struct page p)\n-{\n-\tunlock_page(p);\n-\treturn MF_IGNORED;\n-}\n-\n /\n * Page in unknown state. Do nothing.\n * This is a catch-all in case we fail to make sense of the page state.\n@@ -1199,10 +1199,8 @@ static int me_huge_page(struct page_state ps, struct page p)\n #define mlock\t\t(1UL \u003c\u003c PG_mlocked)\n #define lru\t\t(1UL \u003c\u003c PG_lru)\n #define head\t\t(1UL \u003c\u003c PG_head)\n-#define reserved\t(1UL \u003c\u003c PG_reserved)\n \n static struct page_state error_states[] = {\n-\t{ reserved,\treserved,\tMF_MSG_KERNEL,\tme_kernel },\n \t/\n \t free pages are specially detected outside this table:\n \t * PG_buddy pages only make a small fraction of all free pages.\n@@ -1234,7 +1232,6 @@ static struct page_state error_states[] = {\n #undef mlock\n #undef lru\n #undef head\n-#undef reserved\n \n static void update_per_node_mf_stats(unsigned long pfn,\n \t\t\t\t enum mf_result result)\n@@ -1269,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long pfn,\n \t++mf_stats-\u003etotal;\n }\n \n+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,\n+\t\t\t\t enum mf_result result)\n+{\n+\tif (!sysctl_panic_on_unrecoverable_mf)\n+\t\treturn false;\n+\n+\treturn type == MF_MSG_KERNEL \u0026\u0026 result == MF_IGNORED;\n+}\n+\n /\n \"Dirty/Clean\" indication is not 100% accurate due to the possibility of\n * setting PG_dirty outside page lock. See also comment above set_page_dirty().\n@@ -1286,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,\n \tpr_err(\"%#lx: recovery action for %s: %s\\n\",\n \t\tpfn, action_page_types[type], action_name[result]);\n \n+\tif (panic_on_unrecoverable_mf(type, result))\n+\t\tpanic(\"Memory failure: %#lx: unrecoverable page\", pfn);\n+\n \treturn (result == MF_RECOVERED \|\| result == MF_DELAYED) ? 0 : -EBUSY;\n }\n \n@@ -1339,6 +1348,36 @@ static inline bool HWPoisonHandlable(struct page page, unsigned long flags)\n \treturn PageLRU(page) \|\| is_free_buddy_page(page);\n }\n \n+/\n+ * Positive identification of pages the hwpoison handler cannot recover:\n+ * pages owned by kernel internals with no userspace mapping to unmap, no\n+ * file mapping to invalidate, and no migration target.\n+ /\n+static inline bool is_kernel_owned_page(struct page page)\n+{\n+\tstruct page head;\n+\tbool kernel_owned;\n+\n+\t/ PG_reserved is a per-page flag, never set on a compound page. /\n+\tif (PageReserved(page))\n+\t\treturn true;\n+\n+\t/\n+\t * Page-type bits live only on the head page, so resolve any tail\n+\t * first. The check takes no refcount; recheck the head afterwards\n+\t * so a concurrent split or compound free cannot leave us trusting\n+\t * a stale view. A free-\u003ealloc-\u003efree in the same window is still\n+\t * possible but closing it would require taking a reference here.\n+\t /\n+retry:\n+\thead = compound_head(page);\n+\tkernel_owned = PageSlab(head) \|\| PageTable(head) \|\|\n+\t\t PageLargeKmalloc(head);\n+\tif (head != compound_head(page))\n+\t\tgoto retry;\n+\treturn kernel_owned;\n+}\n+\n static int __get_hwpoison_page(struct page page, unsigned long flags)\n {\n \tstruct folio folio = page_folio(page);\n@@ -1385,6 +1424,19 @@ static int get_any_page(struct page p, unsigned long flags)\n \tif (flags \u0026 MF_COUNT_INCREASED)\n \t\tcount_increased = true;\n \n+\t/\n+\t Page types we know are kernel-owned and cannot be recovered.\n+\t * Short-circuit before the shake_page() / retry loop, which\n+\t * cannot turn any of these into something HWPoisonHandlable().\n+\t * Drop the caller's reference if MF_COUNT_INCREASED took one.\n+\t /\n+\tif (is_kernel_owned_page(p)) {\n+\t\tif (count_increased)\n+\t\t\tput_page(p);\n+\t\tret = -ENOTRECOVERABLE;\n+\t\tgoto out;\n+\t}\n+\n try_again:\n \tif (!count_increased) {\n \t\tret = __get_hwpoison_page(p, flags);\n@@ -1432,7 +1484,7 @@ static int get_any_page(struct page p, unsigned long flags)\n \t\tret = -EIO;\n \t}\n out:\n-\tif (ret == -EIO)\n+\tif (ret == -EIO \|\| ret == -ENOTRECOVERABLE)\n \t\tpr_err(\"%#lx: unhandlable page.\\n\", page_to_pfn(p));\n \n \treturn ret;\n@@ -1489,7 +1541,10 @@ static int __get_unpoison_page(struct page page)\n -EIO for pages on which we can not handle memory errors,\n * -EBUSY when get_hwpoison_page() has raced with page lifecycle\n * operations like allocation and free,\n- * -EHWPOISON when the page is hwpoisoned and taken off from buddy.\n+ * -EHWPOISON when the page is hwpoisoned and taken off from buddy,\n+ * -ENOTRECOVERABLE for kernel-owned pages identified by\n+ * is_kernel_owned_page() (PG_reserved, slab,\n+ * page-table, large-kmalloc) that the handler cannot recover.\n /\n static int get_hwpoison_page(struct page p, unsigned long flags)\n {\n@@ -2402,7 +2457,8 @@ int memory_failure(unsigned long pfn, int flags)\n \t * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.\n \t /\n \tres = get_hwpoison_page(p, flags);\n-\tif (!res) {\n+\tswitch (res) {\n+\tcase 0:\n \t\tif (is_free_buddy_page(p)) {\n \t\t\tif (take_page_off_buddy(p)) {\n \t\t\t\tpage_ref_inc(p);\n@@ -2421,7 +2477,19 @@ int memory_failure(unsigned long pfn, int flags)\n \t\t\tres = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);\n \t\t}\n \t\tgoto unlock_mutex;\n-\t} else if (res \u003c 0) {\n+\tcase 1:\n+\t\t/ Got a refcount on a handlable page. /\n+\t\tbreak;\n+\tcase -ENOTRECOVERABLE:\n+\t\t/\n+\t\t * Stable unhandlable kernel-owned page (PG_reserved,\n+\t\t * slab, page tables, large-kmalloc).\n+\t\t * No recovery possible.\n+\t\t /\n+\t\tres = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);\n+\t\tgoto unlock_mutex;\n+\tdefault:\n+\t\t/ Transient lifecycle race with the page allocator. /\n \t\tres = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);\n \t\tgoto unlock_mutex;\n \t}\ndiff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile\nindex e6df968f0971c8..ed321ae709dacf 100644\n--- a/tools/testing/selftests/mm/Makefile\n+++ b/tools/testing/selftests/mm/Makefile\n@@ -174,6 +174,10 @@ TEST_PROGS += ksft_userfaultfd.sh\n TEST_PROGS += ksft_vma_merge.sh\n TEST_PROGS += ksft_vmalloc.sh\n \n+# Destructive: every successful run panics the kernel. Installed and\n+# kept executable, but not run from a default kselftest invocation.\n+TEST_PROGS_EXTENDED += hwpoison-panic.sh\n+\n TEST_FILES := test_vmalloc.sh\n TEST_FILES += test_hmm.sh\n TEST_FILES += va_high_addr_switch.sh\ndiff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh\nnew file mode 100755\nindex 00000000000000..aafc06e895d01d\n--- /dev/null\n+++ b/tools/testing/selftests/mm/hwpoison-panic.sh\n@@ -0,0 +1,249 @@\n+#!/bin/bash\n+# SPDX-License-Identifier: GPL-2.0\n+#\n+# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison\n+# error on a kernel-owned page and confirming the kernel panics.\n+#\n+# Three \"kinds\" of kernel-owned page can be targeted, selectable via the\n+# first positional argument (default: rodata):\n+#\n+# rodata - a PG_reserved page in the kernel rodata range\n+# (sourced from /proc/iomem \"Kernel rodata\"). Exercises\n+# memory_failure() -\u003e get_any_page() on a PageReserved page.\n+#\n+# slab - a slab page found via /proc/kpageflags (KPF_SLAB).\n+# Exercises memory_failure() -\u003e get_any_page() on a non\n+# PG_reserved kernel-owned page. This path is what catches\n+# regressions where get_any_page() collapses kernel-owned\n+# pages into a transient -EIO instead of -ENOTRECOVERABLE.\n+#\n+# pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE).\n+# Same path as slab, different page type.\n+#\n+# This test is DESTRUCTIVE: a successful run crashes the kernel. It is\n+# meant to be executed inside a disposable VM (e.g. virtme-ng) with a\n+# serial console captured by the harness. It is skipped unless the\n+# caller opts in via RUN_DESTRUCTIVE=1.\n+#\n+# Test passes externally: the kernel must panic with\n+# \"Memory failure: \u003cpfn\u003e: unrecoverable page\"\n+# A return from the inject means no panic fired: that is a failure,\n+# unless the target PFN raced to a different page type before injection,\n+# in which case the run is inconclusive and is skipped.\n+#\n+# Author: Breno Leitao \u003cleitao@debian.org\u003e\n+\n+set -u\n+\n+ksft_skip=4\n+sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure\n+inject_path=/sys/devices/system/memory/hard_offline_page\n+kpageflags_path=/proc/kpageflags\n+unpoison_path=/sys/kernel/debug/hwpoison/unpoison-pfn\n+\n+# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h)\n+KPF_SLAB=7\n+KPF_COMPOUND_TAIL=16\n+KPF_HWPOISON=19\n+KPF_NOPAGE=20\n+KPF_PGTABLE=26\n+KPF_RESERVED=32\n+\n+pagesize=$(getconf PAGE_SIZE)\n+\n+kind=${1:-rodata}\n+\n+ksft_print() { echo \"# $\"; }\n+ksft_exit_skip() { ksft_print \"$\"; exit \"$ksft_skip\"; }\n+ksft_exit_fail() { echo \"not ok 1 $\"; exit 1; }\n+\n+if [ \"$(id -u)\" -ne 0 ]; then\n+\tksft_exit_skip \"must run as root\"\n+fi\n+\n+if [ ! -w \"$sysctl_path\" ]; then\n+\tksft_exit_skip \"$sysctl_path not present (kernel without the sysctl?)\"\n+fi\n+\n+if [ ! -w \"$inject_path\" ]; then\n+\tksft_exit_skip \"$inject_path not present (no MEMORY_HOTPLUG?)\"\n+fi\n+\n+if [ \"${RUN_DESTRUCTIVE:-0}\" != \"1\" ]; then\n+\tksft_exit_skip \"destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM\"\n+fi\n+\n+# Pick a PFN inside the kernel image rodata region of /proc/iomem.\n+# This is preferred over a top-level \"Reserved\" entry because top-level\n+# Reserved ranges are often firmware holes that have no backing struct\n+# page; pfn_to_online_page() returns NULL on those and memory_failure()\n+# bails out with -ENXIO before reaching the panic path.\n+#\n+# \"Kernel rodata\" is reported as a sub-resource of \"System RAM\" on every\n+# major architecture, which guarantees:\n+# - the PFN is backed by struct page (within an online memory range);\n+# - PG_reserved is set on the page (kernel image area);\n+# - the memory is read-only, so setting PG_hwpoison on it does not\n+# corrupt writable kernel state if the panic somehow does not fire.\n+#\n+# /proc/iomem entries look like (indented for sub-resources):\n+# \" 02500000-02ffffff : Kernel rodata\"\n+pick_rodata_phys_addr() {\n+\tawk -v pagesize=\"$(getconf PAGE_SIZE)\" '\n+\t# Convert a hex string to a number without relying on the gawk-only\n+\t# strtonum(). mawk lacks it and would otherwise spuriously skip\n+\t# this test on distros that ship mawk as /usr/bin/awk.\n+\tfunction hex2num(s, n, i, c, v) {\n+\t\tn = 0\n+\t\tfor (i = 1; i \u003c= length(s); i++) {\n+\t\t\tc = tolower(substr(s, i, 1))\n+\t\t\tv = index(\"0123456789abcdef\", c) - 1\n+\t\t\tif (v \u003c 0)\n+\t\t\t\treturn -1\n+\t\t\tn = n * 16 + v\n+\t\t}\n+\t\treturn n\n+\t}\n+\t/: Kernel rodata[[:space:]]$/ {\n+\t\tsub(/^[[:space:]]+/, \"\")\n+\t\tn = split($0, a, /[- ]/)\n+\t\tstart = hex2num(a[1])\n+\t\tend = hex2num(a[2])\n+\t\tif (end \u003c= start)\n+\t\t\tnext\n+\t\t# Page-align upward and emit the first byte of that page.\n+\t\tpfn = int((start + pagesize - 1) / pagesize)\n+\t\tprintf \"0x%x\\n\", pfn pagesize\n+\t\texit 0\n+\t}\n+\t' /proc/iomem\n+}\n+\n+# Walk /proc/kpageflags and return the phys addr of the first PFN that\n+# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL\n+# all clear (so we attack a real, non-tail, not-already-poisoned page).\n+#\n+# We skip the first 16 MiB of PFNs to step past low-memory special\n+# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not\n+# exhibit the slab/pgtable type we are looking for.\n+pick_kpageflags_phys_addr() {\n+\tlocal want_bit=$1\n+\tlocal pagesize skip_pfn\n+\n+\t[ -r \"$kpageflags_path\" ] \|\| return\n+\n+\tpagesize=$(getconf PAGE_SIZE)\n+\tskip_pfn=$(((16 * 1024 * 1024) / pagesize))\n+\n+\tod -An -tx8 -v -w8 -j \"$((skip_pfn * 8))\" \"$kpageflags_path\" 2\u003e/dev/null \| \\\n+\tawk -v want_bit=\"$want_bit\" \\\n+\t -v hwp_bit=\"$KPF_HWPOISON\" \\\n+\t -v nopage_bit=\"$KPF_NOPAGE\" \\\n+\t -v tail_bit=\"$KPF_COMPOUND_TAIL\" \\\n+\t -v base_pfn=\"$skip_pfn\" \\\n+\t -v pagesize=\"$pagesize\" '\n+\t# Test whether bit \"b\" is set in the 16-hex-digit value \"hex\".\n+\t# Done with substring + per-digit lookup so we never rely on awk\n+\t# bitwise operators (mawk lacks them), 64-bit FP precision or the\n+\t# gawk-only strtonum().\n+\tfunction bit_set(hex, b, di, bi, c, v) {\n+\t\tdi = int(b / 4)\n+\t\tbi = b - di * 4\n+\t\tc = substr(hex, length(hex) - di, 1)\n+\t\tv = index(\"0123456789abcdef\", tolower(c)) - 1\n+\t\tif (bi == 0) return (v % 2) == 1\n+\t\tif (bi == 1) return int(v / 2) % 2 == 1\n+\t\tif (bi == 2) return int(v / 4) % 2 == 1\n+\t\treturn int(v / 8) % 2 == 1\n+\t}\n+\t{\n+\t\tgsub(/^[[:space:]]+/, \"\")\n+\t\th = $1\n+\t\tif (bit_set(h, want_bit) \u0026\u0026\n+\t\t !bit_set(h, hwp_bit) \u0026\u0026\n+\t\t !bit_set(h, nopage_bit) \u0026\u0026\n+\t\t !bit_set(h, tail_bit)) {\n+\t\t\tpfn = base_pfn + NR - 1\n+\t\t\tprintf \"0x%x\\n\", pfn * pagesize\n+\t\t\texit 0\n+\t\t}\n+\t}\n+\t'\n+}\n+\n+# Return 0 if /proc/kpageflags bit $2 is set for PFN $1, 1 if it is\n+# clear, or 2 if the word cannot be read. Used to re-confirm the target\n+# page type after a non-panicking inject.\n+kpageflags_bit_set() {\n+\tlocal word\n+\n+\tword=$(od -An -tx8 -v -j \"$(($1 * 8))\" -N 8 \"$kpageflags_path\" 2\u003e/dev/null \| tr -d '[:space:]')\n+\t[ -n \"$word\" ] \|\| return 2\n+\t(( (16#$word \u003e\u003e $2) \u0026 1 ))\n+}\n+\n+# Best-effort: drop the PG_hwpoison marker set by the inject so a failed\n+# run does not leave a poisoned page behind. hard_offline_page() injects\n+# with MF_SW_SIMULATED, so the page stays unpoisonable through the\n+# hwpoison debugfs interface (needs CONFIG_HWPOISON_INJECT + debugfs).\n+try_unpoison() {\n+\t[ -w \"$unpoison_path\" ] \|\| return 0\n+\techo \"$1\" \u003e \"$unpoison_path\" 2\u003e/dev/null \|\| true\n+}\n+\n+case \"$kind\" in\n+rodata)\n+\tphys_addr=$(pick_rodata_phys_addr)\n+\trecheck_bit=$KPF_RESERVED\n+\tmissing_msg='no \"Kernel rodata\" entry in /proc/iomem'\n+\t;;\n+slab)\n+\tphys_addr=$(pick_kpageflags_phys_addr \"$KPF_SLAB\")\n+\trecheck_bit=$KPF_SLAB\n+\tmissing_msg=\"no usable slab PFN found in $kpageflags_path\"\n+\t;;\n+pgtable)\n+\tphys_addr=$(pick_kpageflags_phys_addr \"$KPF_PGTABLE\")\n+\trecheck_bit=$KPF_PGTABLE\n+\tmissing_msg=\"no usable page-table PFN found in $kpageflags_path\"\n+\t;;\n+)\n+\tksft_exit_fail \"unknown kind '$kind' (expected: rodata\|slab\|pgtable)\"\n+\t;;\n+esac\n+\n+if [ -z \"$phys_addr\" ]; then\n+\tksft_exit_skip \"$missing_msg\"\n+fi\n+\n+ksft_print \"enabling $sysctl_path\"\n+prior=$(cat \"$sysctl_path\")\n+echo 1 \u003e \"$sysctl_path\" \|\| ksft_exit_fail \"failed to enable sysctl\"\n+\n+pfn=$((phys_addr / pagesize))\n+ksft_print \"injecting hwpoison at phys 0x$(printf '%x' \"$phys_addr\") (pfn 0x$(printf '%x' \"$pfn\"), kind=$kind)\"\n+ksft_print \"expecting kernel panic: 'Memory failure: \u003cpfn\u003e: unrecoverable page'\"\n+\n+# A successful run never returns from the inject -- it panics the kernel.\n+# Reaching the code below therefore means no panic fired. Note whether\n+# the write itself succeeded, then put the machine back: restore the\n+# sysctl and best-effort unpoison the page we just marked.\n+if echo \"$phys_addr\" \u003e \"$inject_path\"; then\n+\tverdict=\"inject returned without panic; sysctl ineffective\"\n+else\n+\tverdict=\"inject failed before reaching the panic path\"\n+fi\n+\n+echo \"$prior\" \u003e \"$sysctl_path\"\n+try_unpoison \"$pfn\"\n+\n+# The page type can change between selection and injection (e.g. a slab\n+# or page-table page is freed and reused). Only treat a missing panic as\n+# a failure if the target PFN is still the kernel-owned type we aimed at;\n+# if it raced to another type the run is inconclusive, so skip instead.\n+kpageflags_bit_set \"$pfn\" \"$recheck_bit\"\n+case $? in\n+0)\tksft_exit_fail \"$verdict (page still $kind)\" ;;\n+1)\tksft_exit_skip \"target PFN no longer $kind; raced before inject, inconclusive\" ;;\n+)\tksft_exit_fail \"$verdict (could not reconfirm page type via $kpageflags_path)\" ;;\n+esac\n" }
2/1	2026/06/26 23:05	agent	patch-evaluator	4m Model: gemini-3.1-pro-preview Args: null Results: { "EnableConfigs": [ "MEMORY_FAILURE", "HWPOISON_INJECT" ], "FocusSymbols": [ "memory_failure", "get_any_page", "action_result", "panic_on_unrecoverable_mf", "is_kernel_owned_page" ], "Reasoning": "The patch modifies the core memory failure handling logic (`memory_failure`, `get_any_page`) to explicitly identify and short-circuit on kernel-owned pages (slab, page tables, reserved, large kmalloc). It also introduces a new sysctl to panic on such unrecoverable errors. This is reachable via hardware error injection interfaces (e.g., `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`).", "WorthFuzzing": true } Instruction: You are an expert Linux kernel maintainer. Your job is to review a provided patch series and determine if it makes functional changes to the kernel that should be fuzzed. IMPORTANT: The changes have ALREADY been applied and committed as the HEAD commit in your workspace. Do NOT rely on your internal knowledge of the kernel. You must actively use your code access tools to examine the actual source code and confirm any assumptions. Return WorthFuzzing=false if the patch only contains: - Modifications to Documentation/, Kconfig files, or code comments. - Purely decorative changes, such as logging (e.g., pr_err, printk) or tracepoints. - Changes to numeric constants or macros that do not functionally alter execution flow. - Code paths that are impossible to reach in virtualized environments like GCE or QEMU, even when utilizing software-emulated hardware (e.g., usb gadget, mac80211_hwsim). If it modifies reachable core kernel logic, drivers, or architectures, use your code search tools to verify the code can be executed, then return WorthFuzzing=true. When returning WorthFuzzing=true, you MUST ALSO: 1. Extract any specific kernel functions that should be heavily fuzzed into FocusSymbols. Avoid listing generic hot-path functions to prevent skewed test distributions. 2. Identify any specific CONFIG_ options required to properly test this new/modified feature. Go and look into the Kconfig files and check for ifdefs around the code, do not make assumptions. Do not list too generic configs (we already have them enabled). Only list those that specifically cover the modified code. List them in the EnableConfigs output array, and DO NOT add a 'CONFIG_' prefix (e.g., return "NET_IPV4" instead of "CONFIG_NET_IPV4"). Prefer calling several tools at the same time to save round-trips. Use set-results tool to provide results of the analysis. It must be called exactly once before the final reply. Ignore results of this tool. Prompt: For your convenience, here is the diff of the changes: commit 581742259c41b2f1d7712a44b0ab008650850750 Author: syz-cluster <triage@syzkaller.com> Date: Fri Jun 26 23:05:19 2026 +0000 syz-cluster: applied patch under review diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst index b9b0c218bfb440..22cc54cac3b21d 100644 --- a/Documentation/admin-guide/sysctl/vm.rst +++ b/Documentation/admin-guide/sysctl/vm.rst @@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm: - page-cluster - page_lock_unfairness - panic_on_oom +- panic_on_unrecoverable_memory_failure - percpu_pagelist_high_fraction - stat_interval - stat_refresh @@ -925,6 +926,85 @@ panic_on_oom=2+kdump gives you very strong tool to investigate why oom happens. You can get snapshot. +panic_on_unrecoverable_memory_failure +====================================== + +When a hardware memory error (e.g. multi-bit ECC) hits a kernel page +that cannot be recovered by the memory failure handler, the default +behaviour is to ignore the error and continue operation. This is +dangerous because the corrupted data remains accessible to the kernel, +risking silent data corruption or a delayed crash when the poisoned +memory is next accessed. + +When enabled, this sysctl triggers a panic on memory failure events +hitting kernel-owned pages that the handler cannot recover: +``PageReserved`` (firmware reservations, kernel image, vDSO, zero +page, and similar memblock-reserved regions), ``PageSlab``, +``PageTable``, and ``PageLargeKmalloc``. These are owned by the +kernel and the memory failure handler cannot reliably evict their +contents. + +Other unrecoverable kernel-owned populations (vmalloc allocations, +kernel stack pages, ...) are not currently covered because the +handler has no page-type signal that distinguishes them from a +userspace folio temporarily off the LRU during migration or +compaction. Such pages still go through the standard +MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a +delayed crash on the next access remains possible. Coverage may +grow as the handler gains stronger kernel-ownership signals. + +Recoverable failure paths are also intentionally left out: in-flight +buddy allocations and other transient races with the page allocator +can reach the same diagnostic, and panicking on them would risk +killing the box for a page destined for userspace where the standard +SIGBUS recovery path applies. Pages whose state could not be +classified at all are not covered either, since an unknown state is +not a sound basis for a panic decision. + +For many environments it is preferable to panic immediately with a clean +crash dump that captures the original error context, rather than to +continue and face a random crash later whose cause is difficult to +diagnose. + +Use cases +--------- + +This option is most useful in environments where unattributed crashes +are expensive to debug or where data integrity must take precedence +over availability: + +* Large fleets, where multi-bit ECC errors on kernel pages are observed + regularly and post-mortem analysis of an unrelated downstream crash + (often seconds to minutes after the original error) consumes + significant engineering effort. + +* Systems configured with kdump, where panicking at the moment of the + hardware error produces a vmcore that still contains the faulting + address, the affected page state, and the originating MCE/GHES + record — context that is typically lost by the time a delayed crash + occurs. + +* High-availability clusters that rely on fast, deterministic node + failure for failover, and prefer an immediate panic over silent data + corruption propagating to replicas or persistent storage. + +* Kernel and platform developers reproducing hwpoison issues with + tools such as ``mce-inject`` or error-injection debugfs interfaces, + where panicking on the unrecoverable path makes regressions + immediately visible instead of surfacing as later, unrelated + failures. + += ===================================================================== +0 Try to continue operation (default). +1 Panic immediately. If the ``panic`` sysctl is also non-zero then the + machine will be rebooted. += ===================================================================== + +Example:: + + echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure + + percpu_pagelist_high_fraction ============================= diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 51508a55c4055e..611160c98c6f64 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1; static int sysctl_enable_soft_offline __read_mostly = 1; +static int sysctl_panic_on_unrecoverable_mf __read_mostly; + atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0); static bool hw_memory_failure __read_mostly = false; @@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = { .proc_handler = proc_dointvec_minmax, .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_ONE, + }, + { + .procname = "panic_on_unrecoverable_memory_failure", + .data = &sysctl_panic_on_unrecoverable_mf, + .maxlen = sizeof(sysctl_panic_on_unrecoverable_mf), + .mode = 0644, + .proc_handler = proc_dointvec_minmax, + .extra1 = SYSCTL_ZERO, + .extra2 = SYSCTL_ONE, } }; @@ -980,17 +991,6 @@ static bool has_extra_refcount(struct page_state ps, struct page p, return false; } -/* - * Error hit kernel page. - * Do nothing, try to be lucky and not touch this instead. For a few cases we - * could be more sophisticated. - / -static int me_kernel(struct page_state ps, struct page p) -{ - unlock_page(p); - return MF_IGNORED; -} - / * Page in unknown state. Do nothing. * This is a catch-all in case we fail to make sense of the page state. @@ -1199,10 +1199,8 @@ static int me_huge_page(struct page_state ps, struct page p) #define mlock (1UL << PG_mlocked) #define lru (1UL << PG_lru) #define head (1UL << PG_head) -#define reserved (1UL << PG_reserved) static struct page_state error_states[] = { - { reserved, reserved, MF_MSG_KERNEL, me_kernel }, /* * free pages are specially detected outside this table: * PG_buddy pages only make a small fraction of all free pages. @@ -1234,7 +1232,6 @@ static struct page_state error_states[] = { #undef mlock #undef lru #undef head -#undef reserved static void update_per_node_mf_stats(unsigned long pfn, enum mf_result result) @@ -1269,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long pfn, ++mf_stats->total; } +static bool panic_on_unrecoverable_mf(enum mf_action_page_type type, + enum mf_result result) +{ + if (!sysctl_panic_on_unrecoverable_mf) + return false; + + return type == MF_MSG_KERNEL && result == MF_IGNORED; +} + /* * "Dirty/Clean" indication is not 100% accurate due to the possibility of * setting PG_dirty outside page lock. See also comment above set_page_dirty(). @@ -1286,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type, pr_err("%#lx: recovery action for %s: %s\n", pfn, action_page_types[type], action_name[result]); + if (panic_on_unrecoverable_mf(type, result)) + panic("Memory failure: %#lx: unrecoverable page", pfn); + return (result == MF_RECOVERED \|\| result == MF_DELAYED) ? 0 : -EBUSY; } @@ -1339,6 +1348,36 @@ static inline bool HWPoisonHandlable(struct page page, unsigned long flags) return PageLRU(page) \|\| is_free_buddy_page(page); } +/ + * Positive identification of pages the hwpoison handler cannot recover: + * pages owned by kernel internals with no userspace mapping to unmap, no + * file mapping to invalidate, and no migration target. + / +static inline bool is_kernel_owned_page(struct page page) +{ + struct page head; + bool kernel_owned; + + / PG_reserved is a per-page flag, never set on a compound page. / + if (PageReserved(page)) + return true; + + / + * Page-type bits live only on the head page, so resolve any tail + * first. The check takes no refcount; recheck the head afterwards + * so a concurrent split or compound free cannot leave us trusting + * a stale view. A free->alloc->free in the same window is still + * possible but closing it would require taking a reference here. + / +retry: + head = compound_head(page); + kernel_owned = PageSlab(head) \|\| PageTable(head) \|\| + PageLargeKmalloc(head); + if (head != compound_head(page)) + goto retry; + return kernel_owned; +} + static int __get_hwpoison_page(struct page page, unsigned long flags) { struct folio folio = page_folio(page); @@ -1385,6 +1424,19 @@ static int get_any_page(struct page p, unsigned long flags) if (flags & MF_COUNT_INCREASED) count_increased = true; + /* + * Page types we know are kernel-owned and cannot be recovered. + * Short-circuit before the shake_page() / retry loop, which + * cannot turn any of these into something HWPoisonHandlable(). + * Drop the caller's reference if MF_COUNT_INCREASED took one. + / + if (is_kernel_owned_page(p)) { + if (count_increased) + put_page(p); + ret = -ENOTRECOVERABLE; + goto out; + } + try_again: if (!count_increased) { ret = __get_hwpoison_page(p, flags); @@ -1432,7 +1484,7 @@ static int get_any_page(struct page p, unsigned long flags) ret = -EIO; } out: - if (ret == -EIO) + if (ret == -EIO \|\| ret == -ENOTRECOVERABLE) pr_err("%#lx: unhandlable page.\n", page_to_pfn(p)); return ret; @@ -1489,7 +1541,10 @@ static int __get_unpoison_page(struct page page) -EIO for pages on which we can not handle memory errors, * -EBUSY when get_hwpoison_page() has raced with page lifecycle * operations like allocation and free, - * -EHWPOISON when the page is hwpoisoned and taken off from buddy. + * -EHWPOISON when the page is hwpoisoned and taken off from buddy, + * -ENOTRECOVERABLE for kernel-owned pages identified by + * is_kernel_owned_page() (PG_reserved, slab, + * page-table, large-kmalloc) that the handler cannot recover. / static int get_hwpoison_page(struct page p, unsigned long flags) { @@ -2402,7 +2457,8 @@ int memory_failure(unsigned long pfn, int flags) * that may make page_ref_freeze()/page_ref_unfreeze() mismatch. / res = get_hwpoison_page(p, flags); - if (!res) { + switch (res) { + case 0: if (is_free_buddy_page(p)) { if (take_page_off_buddy(p)) { page_ref_inc(p); @@ -2421,7 +2477,19 @@ int memory_failure(unsigned long pfn, int flags) res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED); } goto unlock_mutex; - } else if (res < 0) { + case 1: + / Got a refcount on a handlable page. / + break; + case -ENOTRECOVERABLE: + / + * Stable unhandlable kernel-owned page (PG_reserved, + * slab, page tables, large-kmalloc). + * No recovery possible. + / + res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED); + goto unlock_mutex; + default: + / Transient lifecycle race with the page allocator. / res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED); goto unlock_mutex; } diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile index e6df968f0971c8..ed321ae709dacf 100644 --- a/tools/testing/selftests/mm/Makefile +++ b/tools/testing/selftests/mm/Makefile @@ -174,6 +174,10 @@ TEST_PROGS += ksft_userfaultfd.sh TEST_PROGS += ksft_vma_merge.sh TEST_PROGS += ksft_vmalloc.sh +# Destructive: every successful run panics the kernel. Installed and +# kept executable, but not run from a default kselftest invocation. +TEST_PROGS_EXTENDED += hwpoison-panic.sh + TEST_FILES := test_vmalloc.sh TEST_FILES += test_hmm.sh TEST_FILES += va_high_addr_switch.sh diff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh new file mode 100755 index 00000000000000..aafc06e895d01d --- /dev/null +++ b/tools/testing/selftests/mm/hwpoison-panic.sh @@ -0,0 +1,249 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison +# error on a kernel-owned page and confirming the kernel panics. +# +# Three "kinds" of kernel-owned page can be targeted, selectable via the +# first positional argument (default: rodata): +# +# rodata - a PG_reserved page in the kernel rodata range +# (sourced from /proc/iomem "Kernel rodata"). Exercises +# memory_failure() -> get_any_page() on a PageReserved page. +# +# slab - a slab page found via /proc/kpageflags (KPF_SLAB). +# Exercises memory_failure() -> get_any_page() on a non +# PG_reserved kernel-owned page. This path is what catches +# regressions where get_any_page() collapses kernel-owned +# pages into a transient -EIO instead of -ENOTRECOVERABLE. +# +# pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE). +# Same path as slab, different page type. +# +# This test is DESTRUCTIVE: a successful run crashes the kernel. It is +# meant to be executed inside a disposable VM (e.g. virtme-ng) with a +# serial console captured by the harness. It is skipped unless the +# caller opts in via RUN_DESTRUCTIVE=1. +# +# Test passes externally: the kernel must panic with +# "Memory failure: <pfn>: unrecoverable page" +# A return from the inject means no panic fired: that is a failure, +# unless the target PFN raced to a different page type before injection, +# in which case the run is inconclusive and is skipped. +# +# Author: Breno Leitao <leitao@debian.org> + +set -u + +ksft_skip=4 +sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure +inject_path=/sys/devices/system/memory/hard_offline_page +kpageflags_path=/proc/kpageflags +unpoison_path=/sys/kernel/debug/hwpoison/unpoison-pfn + +# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h) +KPF_SLAB=7 +KPF_COMPOUND_TAIL=16 +KPF_HWPOISON=19 +KPF_NOPAGE=20 +KPF_PGTABLE=26 +KPF_RESERVED=32 + +pagesize=$(getconf PAGE_SIZE) + +kind=${1:-rodata} + +ksft_print() { echo "# $"; } +ksft_exit_skip() { ksft_print "$"; exit "$ksft_skip"; } +ksft_exit_fail() { echo "not ok 1 $"; exit 1; } + +if [ "$(id -u)" -ne 0 ]; then + ksft_exit_skip "must run as root" +fi + +if [ ! -w "$sysctl_path" ]; then + ksft_exit_skip "$sysctl_path not present (kernel without the sysctl?)" +fi + +if [ ! -w "$inject_path" ]; then + ksft_exit_skip "$inject_path not present (no MEMORY_HOTPLUG?)" +fi + +if [ "${RUN_DESTRUCTIVE:-0}" != "1" ]; then + ksft_exit_skip "destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM" +fi + +# Pick a PFN inside the kernel image rodata region of /proc/iomem. +# This is preferred over a top-level "Reserved" entry because top-level +# Reserved ranges are often firmware holes that have no backing struct +# page; pfn_to_online_page() returns NULL on those and memory_failure() +# bails out with -ENXIO before reaching the panic path. +# +# "Kernel rodata" is reported as a sub-resource of "System RAM" on every +# major architecture, which guarantees: +# - the PFN is backed by struct page (within an online memory range); +# - PG_reserved is set on the page (kernel image area); +# - the memory is read-only, so setting PG_hwpoison on it does not +# corrupt writable kernel state if the panic somehow does not fire. +# +# /proc/iomem entries look like (indented for sub-resources): +# " 02500000-02ffffff : Kernel rodata" +pick_rodata_phys_addr() { + awk -v pagesize="$(getconf PAGE_SIZE)" ' + # Convert a hex string to a number without relying on the gawk-only + # strtonum(). mawk lacks it and would otherwise spuriously skip + # this test on distros that ship mawk as /usr/bin/awk. + function hex2num(s, n, i, c, v) { + n = 0 + for (i = 1; i <= length(s); i++) { + c = tolower(substr(s, i, 1)) + v = index("0123456789abcdef", c) - 1 + if (v < 0) + return -1 + n = n * 16 + v + } + return n + } + /: Kernel rodata[[:space:]]$/ { + sub(/^[[:space:]]+/, "") + n = split($0, a, /[- ]/) + start = hex2num(a[1]) + end = hex2num(a[2]) + if (end <= start) + next + # Page-align upward and emit the first byte of that page. + pfn = int((start + pagesize - 1) / pagesize) + printf "0x%x\n", pfn pagesize + exit 0 + } + ' /proc/iomem +} + +# Walk /proc/kpageflags and return the phys addr of the first PFN that +# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL +# all clear (so we attack a real, non-tail, not-already-poisoned page). +# +# We skip the first 16 MiB of PFNs to step past low-memory special +# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not +# exhibit the slab/pgtable type we are looking for. +pick_kpageflags_phys_addr() { + local want_bit=$1 + local pagesize skip_pfn + + [ -r "$kpageflags_path" ] \|\| return + + pagesize=$(getconf PAGE_SIZE) + skip_pfn=$(((16 * 1024 * 1024) / pagesize)) + + od -An -tx8 -v -w8 -j "$((skip_pfn * 8))" "$kpageflags_path" 2>/dev/null \| \ + awk -v want_bit="$want_bit" \ + -v hwp_bit="$KPF_HWPOISON" \ + -v nopage_bit="$KPF_NOPAGE" \ + -v tail_bit="$KPF_COMPOUND_TAIL" \ + -v base_pfn="$skip_pfn" \ + -v pagesize="$pagesize" ' + # Test whether bit "b" is set in the 16-hex-digit value "hex". + # Done with substring + per-digit lookup so we never rely on awk + # bitwise operators (mawk lacks them), 64-bit FP precision or the + # gawk-only strtonum(). + function bit_set(hex, b, di, bi, c, v) { + di = int(b / 4) + bi = b - di * 4 + c = substr(hex, length(hex) - di, 1) + v = index("0123456789abcdef", tolower(c)) - 1 + if (bi == 0) return (v % 2) == 1 + if (bi == 1) return int(v / 2) % 2 == 1 + if (bi == 2) return int(v / 4) % 2 == 1 + return int(v / 8) % 2 == 1 + } + { + gsub(/^[[:space:]]+/, "") + h = $1 + if (bit_set(h, want_bit) && + !bit_set(h, hwp_bit) && + !bit_set(h, nopage_bit) && + !bit_set(h, tail_bit)) { + pfn = base_pfn + NR - 1 + printf "0x%x\n", pfn * pagesize + exit 0 + } + } + ' +} + +# Return 0 if /proc/kpageflags bit $2 is set for PFN $1, 1 if it is +# clear, or 2 if the word cannot be read. Used to re-confirm the target +# page type after a non-panicking inject. +kpageflags_bit_set() { + local word + + word=$(od -An -tx8 -v -j "$(($1 * 8))" -N 8 "$kpageflags_path" 2>/dev/null \| tr -d '[:space:]') + [ -n "$word" ] \|\| return 2 + (( (16#$word >> $2) & 1 )) +} + +# Best-effort: drop the PG_hwpoison marker set by the inject so a failed +# run does not leave a poisoned page behind. hard_offline_page() injects +# with MF_SW_SIMULATED, so the page stays unpoisonable through the +# hwpoison debugfs interface (needs CONFIG_HWPOISON_INJECT + debugfs). +try_unpoison() { + [ -w "$unpoison_path" ] \|\| return 0 + echo "$1" > "$unpoison_path" 2>/dev/null \|\| true +} + +case "$kind" in +rodata) + phys_addr=$(pick_rodata_phys_addr) + recheck_bit=$KPF_RESERVED + missing_msg='no "Kernel rodata" entry in /proc/iomem' + ;; +slab) + phys_addr=$(pick_kpageflags_phys_addr "$KPF_SLAB") + recheck_bit=$KPF_SLAB + missing_msg="no usable slab PFN found in $kpageflags_path" + ;; +pgtable) + phys_addr=$(pick_kpageflags_phys_addr "$KPF_PGTABLE") + recheck_bit=$KPF_PGTABLE + missing_msg="no usable page-table PFN found in $kpageflags_path" + ;; +) + ksft_exit_fail "unknown kind '$kind' (expected: rodata\|slab\|pgtable)" + ;; +esac + +if [ -z "$phys_addr" ]; then + ksft_exit_skip "$missing_msg" +fi + +ksft_print "enabling $sysctl_path" +prior=$(cat "$sysctl_path") +echo 1 > "$sysctl_path" \|\| ksft_exit_fail "failed to enable sysctl" + +pfn=$((phys_addr / pagesize)) +ksft_print "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (pfn 0x$(printf '%x' "$pfn"), kind=$kind)" +ksft_print "expecting kernel panic: 'Memory failure: <pfn>: unrecoverable page'" + +# A successful run never returns from the inject -- it panics the kernel. +# Reaching the code below therefore means no panic fired. Note whether +# the write itself succeeded, then put the machine back: restore the +# sysctl and best-effort unpoison the page we just marked. +if echo "$phys_addr" > "$inject_path"; then + verdict="inject returned without panic; sysctl ineffective" +else + verdict="inject failed before reaching the panic path" +fi + +echo "$prior" > "$sysctl_path" +try_unpoison "$pfn" + +# The page type can change between selection and injection (e.g. a slab +# or page-table page is freed and reused). Only treat a missing panic as +# a failure if the target PFN is still the kernel-owned type we aimed at; +# if it raced to another type the run is inconclusive, so skip instead. +kpageflags_bit_set "$pfn" "$recheck_bit" +case $? in +0) ksft_exit_fail "$verdict (page still $kind)" ;; +1) ksft_exit_skip "target PFN no longer $kind; raced before inject, inconclusive" ;; +) ksft_exit_fail "$verdict (could not reconfirm page type via $kpageflags_path)" ;; +esac
3/2	2026/06/26 23:05	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 8898 output: 25 thoughts: 524 Thoughts: Analyzing the "Panic on Unrecoverable Memory Failure" Patch Okay, so I'm looking at this patch that introduces a new sysctl, `panic_on_unrecoverable_memory_failure`. The goal is to make the kernel panic when a memory failure occurs on kernel-owned pages that the regular error handler can't fix. The code modifies `mm/memory-failure.c` and adds a self-test, which is good. The core of it seems to be in this new `is_kernel_owned_page()` function. This is key, as it identifies if the page is truly kernel-space, looking for `PageReserved`, `PageSlab`, `PageTable`, or `PageLargeKmalloc`. If it is indeed a kernel page, then `get_any_page()` returns `-ENOTRECOVERABLE`. The existing `memory_failure()` function then processes this error code and calls `action_result()`. It's `action_result()` that has been updated to check this new `panic_on_unrecoverable_mf` function, and if it returns true, then panic. This directly touches core memory failure handling, so reachability is critical. I'm thinking this could be triggered by genuine hardware memory errors, of course, or even by injecting faults via software. `madvise(MADV_HWPOISON)` immediately comes to mind, but that usually operates on user-space pages. However, maybe `/sys/devices/system/memory/hard_offline_page` or something in debugfs could be used to inject it onto kernel pages. The self-test, in fact, does use that `/sys/devices/system/memory/hard_offline_page` method, which is a good sign. The question then is, how do we fuzz this? Syzkaller seems like the obvious choice. It has `syz_inject_hwpoison` or something similar that could target this, and it can write to `/sys/devices/system/memory/hard_offline_page`. However, I need to remember that Syzkaller doesn't typically focus on hardware error injection unless specifically configured. But it does have some `madvise(MADV_HWPOISON)` support. This is good to keep in mind. Wait, I just remembered the default: the sysctl is disabled by default! `sysctl_panic_on_unrecoverable_mf` defaults to zero, so to effectively fuzz this, we'll have to enable that sysctl first. Luckily, Syzkaller can write to `/proc/sys/vm/panic_on_unrecoverable_memory_failure`. I should probably dive into `mm/memory-failure.c` code directly now. I'll need to use `grepper` to see if the `MEMORY_FAILURE` config option is involved in any of this as well.
4/2	2026/06/26 23:05	tool	grepper	0m Args: { "Expression": "MEMORY_FAILURE", "PathPrefix": "mm/" } Results: { "Output": "mm/Kconfig=702=config DEFAULT_MMAP_MIN_ADDR\n--\nmm/Kconfig-720-\nmm/Kconfig:721:config ARCH_SUPPORTS_MEMORY_FAILURE\nmm/Kconfig-722-\tbool\nmm/Kconfig-723-\nmm/Kconfig:724:config MEMORY_FAILURE\nmm/Kconfig-725-\tdepends on MMU\nmm/Kconfig:726:\tdepends on ARCH_SUPPORTS_MEMORY_FAILURE\nmm/Kconfig-727-\tbool \"Enable recovery from hardware memory errors\"\n--\nmm/Kconfig=735=config HWPOISON_INJECT\nmm/Kconfig-736-\ttristate \"HWPoison pages injector\"\nmm/Kconfig:737:\tdepends on MEMORY_FAILURE \u0026\u0026 DEBUG_KERNEL \u0026\u0026 PROC_FS\nmm/Kconfig-738-\tselect PROC_PAGE_MONITOR\n--\nmm/Kconfig.debug=147=config PAGE_POISONING\n--\nmm/Kconfig.debug-156-\t Note that \"poison\" here is not the same thing as the \"HWPoison\"\nmm/Kconfig.debug:157:\t for CONFIG_MEMORY_FAILURE. This is software poisoning only.\nmm/Kconfig.debug-158-\n--\nmm/Makefile=111=obj-$(CONFIG_DMAPOOL_TEST) += dmapool_test.o\nmm/Makefile:112:obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o\nmm/Makefile-113-obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o\n--\nmm/internal.h=1400=static inline bool node_reclaim_enabled(void)\n--\nmm/internal.h-1408- /\nmm/internal.h:1409:#ifdef CONFIG_MEMORY_FAILURE\nmm/internal.h-1410-int unmap_poisoned_folio(struct folio folio, unsigned long pfn, bool must_kill);\n--\nmm/ksm.c=3155=void rmap_walk_ksm(struct folio folio, struct rmap_walk_control rwc)\n--\nmm/ksm.c-3223-\nmm/ksm.c:3224:#ifdef CONFIG_MEMORY_FAILURE\nmm/ksm.c-3225-/\n--\nmm/madvise.c=1339=static int madvise_vma_behavior(struct madvise_behavior madv_behavior)\n--\nmm/madvise.c-1443-\nmm/madvise.c:1444:#ifdef CONFIG_MEMORY_FAILURE\nmm/madvise.c-1445-/\n--\nmm/madvise.c=1511=static bool is_memory_failure(struct madvise_behavior madv_behavior)\n--\nmm/madvise.c-1515-\nmm/madvise.c:1516:#endif\t/* CONFIG_MEMORY_FAILURE /\nmm/madvise.c-1517-\n--\nmm/madvise.c=1519=madvise_behavior_valid(int behavior)\n--\nmm/madvise.c-1550-\tcase MADV_GUARD_REMOVE:\nmm/madvise.c:1551:#ifdef CONFIG_MEMORY_FAILURE\nmm/madvise.c-1552-\tcase MADV_SOFT_OFFLINE:\n--\nmm/memory-failure.c=2618=EXPORT_SYMBOL_GPL(memory_failure);\nmm/memory-failure.c-2619-\nmm/memory-failure.c:2620:#define MEMORY_FAILURE_FIFO_ORDER\t4\nmm/memory-failure.c:2621:#define MEMORY_FAILURE_FIFO_SIZE\t(1 \u003c\u003c MEMORY_FAILURE_FIFO_ORDER)\nmm/memory-failure.c-2622-\n--\nmm/memory-failure.c=2628=struct memory_failure_cpu {\nmm/memory-failure.c-2629-\tDECLARE_KFIFO(fifo, struct memory_failure_entry,\nmm/memory-failure.c:2630:\t\t MEMORY_FAILURE_FIFO_SIZE);\nmm/memory-failure.c-2631-\traw_spinlock_t lock;\n--\nmm/memory-failure.c=2848=static int soft_offline_in_use_page(struct page page)\n--\nmm/memory-failure.c-2859-\t\t.gfp_mask = GFP_USER \| __GFP_MOVABLE \| __GFP_RETRY_MAYFAIL,\nmm/memory-failure.c:2860:\t\t.reason = MR_MEMORY_FAILURE,\nmm/memory-failure.c-2861-\t};\n--\nmm/memory-failure.c-2917-\t\tret = migrate_pages(\u0026pagelist, alloc_migration_target, NULL,\nmm/memory-failure.c:2918:\t\t\t(unsigned long)\u0026mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE, NULL);\nmm/memory-failure.c-2919-\t\tif (!ret) {\n--\nmm/migrate.c=1190=static void migrate_folio_done(struct folio src,\n--\nmm/migrate.c-1196-\nmm/migrate.c:1197:\tif (reason != MR_MEMORY_FAILURE)\nmm/migrate.c-1198-\t\t/ We release the page in page_handle_poison. /\n--\nmm/page_alloc.c=7609=EXPORT_SYMBOL(is_free_buddy_page);\nmm/page_alloc.c-7610-\nmm/page_alloc.c:7611:#ifdef CONFIG_MEMORY_FAILURE\nmm/page_alloc.c-7612-static inline void add_to_free_list(struct page page, struct zone zone,\n--\nmm/page_vma_mapped.c=180=bool page_vma_mapped_walk(struct page_vma_mapped_walk pvmw)\n--\nmm/page_vma_mapped.c-334-\nmm/page_vma_mapped.c:335:#ifdef CONFIG_MEMORY_FAILURE\nmm/page_vma_mapped.c-336-/*\n--\nmm/show_mem.c=405=void __show_mem(unsigned int filter, nodemask_t nodemask, int max_zone_idx)\n--\nmm/show_mem.c-427-#endif\nmm/show_mem.c:428:#ifdef CONFIG_MEMORY_FAILURE\nmm/show_mem.c-429-\tprintk(\"%lu pages hwpoisoned\\n\", atomic_long_read(\u0026num_poisoned_pages));\n" }
5/2	2026/06/26 23:05	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 11750 output: 34 thoughts: 529 Thoughts: Fuzzing Analysis: Unrecoverable Memory Failure Handling Okay, so this patch introduces a new sysctl, `panic_on_unrecoverable_memory_failure`, and modifies `memory_failure()`. My immediate thought is, "Worth fuzzing?" Absolutely, this is new functionality in the kernel that's designed to trigger a panic under specific memory failure conditions. But wait, a panic means a crash, and a fuzzer will see that. The sysctl is disabled by default, which is important. If the fuzzer enables it, and then triggers an unrecoverable memory failure, boom, panic. Is the panic a bug? No, it's the intended behavior when the sysctl is set. So, if the fuzzer hits this, it'll report a "crash," but it's potentially a false positive from a fuzzing perspective, because we intentionally caused the kernel to panic. Now, if the fuzzer can trigger a memory failure on a kernel page and the sysctl is enabled, it will panic. The question then becomes, should we fuzz this? The patch might uncover bugs in the memory failure handling logic itself, or it could simply trigger the intended panic. Hold on, the prompt says, “Return WorthFuzzing=false if the patch only contains code paths that are impossible to reach in virtualized environments..." Can we trigger a memory failure in QEMU? Yes, we can use `madvise(MADV_HWPOISON)` or `/sys/devices/system/memory/hard_offline_page`. Wait, the patch explicitly adds a selftest that uses `/sys/devices/system/memory/hard_offline_page`. That's a clear signal. This is worth fuzzing, because it modifies core kernel logic and it is possible to hit it in a virtualized environment. Now, we need to consider how the fuzzer is configured. If the sysctl is enabled, it panics. If it's disabled, it doesn't. If we enable the sysctl, the fuzzer will definitely hit that panic and report it as a crash. Is that what we want? Probably not. A fuzzing setup usually shouldn't be intentionally configured to panic the kernel unless it's designed to test a specific panic condition. Syzkaller generally avoids panics if they are intended, or we just wouldn't enable the sysctl. Okay, if we don't enable the sysctl, the new code `panic_on_unrecoverable_mf` simply returns false. But, it also adds `is_kernel_owned_page()` which is called unconditionally in `get_any_page()`. That sounds interesting. I should go look at those changes to `get_any_page()` closer.
6/2	2026/06/26 23:06	tool	grepper	0m Args: { "Expression": "is_kernel_owned_page", "PathPrefix": "mm/memory-failure.c" } Results: { "Output": "mm/memory-failure.c=1339=static inline bool HWPoisonHandlable(struct page page, unsigned long flags)\n--\nmm/memory-failure.c-1355- /\nmm/memory-failure.c:1356:static inline bool is_kernel_owned_page(struct page page)\nmm/memory-failure.c-1357-{\n--\nmm/memory-failure.c=1419=static int get_any_page(struct page p, unsigned long flags)\n--\nmm/memory-failure.c-1432-\t /\nmm/memory-failure.c:1433:\tif (is_kernel_owned_page(p)) {\nmm/memory-failure.c-1434-\t\tif (count_increased)\n--\nmm/memory-failure.c=1493=static int __get_unpoison_page(struct page page)\n--\nmm/memory-failure.c-1545- * -ENOTRECOVERABLE for kernel-owned pages identified by\nmm/memory-failure.c:1546: * is_kernel_owned_page() (PG_reserved, slab,\nmm/memory-failure.c-1547- * page-table, large-kmalloc) that the handler cannot recover.\n" }
7/2	2026/06/26 23:06	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 13246 output: 33 thoughts: 264 Thoughts: My Analysis of a Recent Kernel Change Okay, so I've been digging into this kernel code, and here's what I'm seeing. It looks like `is_kernel_owned_page()` gets called unconditionally within `get_any_page()`. Interesting. If that function returns `true`, indicating a kernel-owned page, `get_any_page()` immediately returns `-ENOTRECOVERABLE`. Now, `memory_failure()` handles that specific error code, `-ENOTRECOVERABLE`, and does so by calling `action_result(pfn, MF_MSG_KERNEL, MF_IGNORED)`. The crucial bit is, this changes how `memory_failure()` behaves with kernel-owned pages, even if the relevant sysctl is disabled. Previously, if I understand correctly, it would have gone through `try_again` and eventually given a `-EIO` error after calling `shake_page()`. Now, it’s short-circuiting and returning `-ENOTRECOVERABLE`. This is a functional change in `memory_failure()`. This really is a change in the intended behavior, at least to me. Okay, is this fuzzable? Absolutely. Seems like a good target. I should really focus on `memory_failure`, `get_any_page`, and `is_kernel_owned_page` when I fuzz. Wait a second, `memory_failure` is actually the main entry point to it all. I should see how reachable it is, and yup, it's accessible through `madvise(MADV_HWPOISON)` or `hard_offline_page`. Now I should just look into `madvise(MADV_HWPOISON)`.
8/2	2026/06/26 23:07	tool	grepper	0m Args: { "Expression": "MADV_HWPOISON", "PathPrefix": "mm/madvise.c" } Results: { "Output": "mm/madvise.c=1493=static bool is_memory_failure(struct madvise_behavior madv_behavior)\n--\nmm/madvise.c-1495-\tswitch (madv_behavior-\u003ebehavior) {\nmm/madvise.c:1496:\tcase MADV_HWPOISON:\nmm/madvise.c-1497-\tcase MADV_SOFT_OFFLINE:\n--\nmm/madvise.c=1519=madvise_behavior_valid(int behavior)\n--\nmm/madvise.c-1552-\tcase MADV_SOFT_OFFLINE:\nmm/madvise.c:1553:\tcase MADV_HWPOISON:\nmm/madvise.c-1554-#endif\n--\nmm/madvise.c=1888=static int madvise_do_behavior(unsigned long start, size_t len_in,\n--\nmm/madvise.c-1944- MADV_KEEPONFORK - undo the effect of MADV_WIPEONFORK\nmm/madvise.c:1945: * MADV_HWPOISON - trigger memory error handler as if the given memory range\nmm/madvise.c-1946- *\t\twere corrupted by unrecoverable hardware memory failure.\n" }
9/2	2026/06/26 23:07	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 14256 output: 32 thoughts: 192 Thoughts: Analyzing `MADV_HWPOISON` and its Implications Okay, so I see that `MADV_HWPOISON` is supported, that's good. But wait a minute, I need to remember that `madvise(MADV_HWPOISON)` only operates on user-mapped pages. This means I can't just throw it at anything. Now, I need to check what `is_kernel_owned_page()` uses to determine a kernel-owned page. Ah, it uses checks for `PageReserved`, `PageSlab`, `PageTable`, and `PageLargeKmalloc`. Can a user actually map a `PageReserved` page? I believe they can. I'm pretty sure that happens through things like `/dev/mem`, or maybe even some drivers with special capabilities, right? But the others, like a `PageSlab` or a `PageTable`, those are usually off-limits to userland. Hold on, I also need to think about `hard_offline_page`. Can that function target any physical frame number (PFN), regardless of ownership? I think so. And I remember that Syzkaller, which I'm dealing with here, it runs as root in its testing virtual machines. Wait, is that right? Yes, that's right. Syzkaller does have root privileges. So, can Syzkaller then write to `/sys/devices/system/memory/hard_offline_page` to trigger this? I need to verify if `hard_offline_page` is even enabled. That's the key question now.
10/2	2026/06/26 23:07	tool	grepper	0m Args: { "Expression": "hard_offline_page", "PathPrefix": "drivers/base/memory.c" } Results: { "Output": "drivers/base/memory.c=604=static ssize_t soft_offline_page_store(struct device dev,\n--\ndrivers/base/memory.c-619-/ Forcibly offline a page, including killing processes. /\ndrivers/base/memory.c:620:static ssize_t hard_offline_page_store(struct device dev,\ndrivers/base/memory.c-621-\t\t\t\t struct device_attribute attr,\n--\ndrivers/base/memory.c=637=static DEVICE_ATTR_WO(soft_offline_page);\ndrivers/base/memory.c:638:static DEVICE_ATTR_WO(hard_offline_page);\ndrivers/base/memory.c-639-#endif\n--\ndrivers/base/memory.c=911=static struct attribute memory_root_attrs[] = {\n--\ndrivers/base/memory.c-917-\t\u0026dev_attr_soft_offline_page.attr,\ndrivers/base/memory.c:918:\t\u0026dev_attr_hard_offline_page.attr,\ndrivers/base/memory.c-919-#endif\n" }
11/2	2026/06/26 23:07	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 15111 output: 29 thoughts: 27 Thoughts: Okay, here's my summarized thought process, presented as if I'm the one having these thoughts, targeting an expert audience: Kernel Memory Failure Investigation Alright, let's see, the context points towards something related to kernel memory failure. My initial hunch, given the context, is that the relevant code should be in the `drivers/base/` directory, specifically within the `memory.c` file. That seems like the logical place to start digging. Now, the key is whether or not this functionality is gated behind a configuration option. I'll need to check if this is compiled under `CONFIG_MEMORY_FAILURE`. That's where I need to start looking, for the preprocessor directives, to see how that's handled.
12/2	2026/06/26 23:08	tool	grepper	0m Args: { "Expression": "hard_offline_page", "PathPrefix": "drivers/base/" } Results: { "Output": "drivers/base/memory.c=604=static ssize_t soft_offline_page_store(struct device dev,\n--\ndrivers/base/memory.c-619-/ Forcibly offline a page, including killing processes. /\ndrivers/base/memory.c:620:static ssize_t hard_offline_page_store(struct device dev,\ndrivers/base/memory.c-621-\t\t\t\t struct device_attribute attr,\n--\ndrivers/base/memory.c=637=static DEVICE_ATTR_WO(soft_offline_page);\ndrivers/base/memory.c:638:static DEVICE_ATTR_WO(hard_offline_page);\ndrivers/base/memory.c-639-#endif\n--\ndrivers/base/memory.c=911=static struct attribute memory_root_attrs[] = {\n--\ndrivers/base/memory.c-917-\t\u0026dev_attr_soft_offline_page.attr,\ndrivers/base/memory.c:918:\t\u0026dev_attr_hard_offline_page.attr,\ndrivers/base/memory.c-919-#endif\n" }
13/2	2026/06/26 23:08	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 15612 output: 35 thoughts: 11 Thoughts: Okay, here's my take on those thoughts, tailored for an expert audience: Analyzing Conditional Compilation Alright, my first instinct here is to zero in on that `#ifdef`. I need to understand the surrounding context of this code block immediately. I'm looking beyond just the code snippet itself and thinking about which configuration options or platform-specific definitions are influencing its inclusion or exclusion. This is classic debugging and maintenance, after all. I'm mentally parsing the preprocessor directives, quickly assessing what environment variables or build flags are probably set to control its behavior. Are we dealing with a debugging feature, a platform-specific optimization, or perhaps a feature flag tied to a paid version? Knowing the preprocessor symbol will be key to understanding the code's ultimate functionality in a given build configuration. The purpose of this ifdef will allow me to assess the ramifications of this code block to my current work. I am assessing the dependencies and implications of this conditional.
14/2	2026/06/26 23:08	tool	read-file	0m Args: { "File": "drivers/base/memory.c", "FirstLine": 610, "LineCount": 30 } Results: { "Contents": " 610:\t\tif (!capable(CAP_SYS_ADMIN))\n 611:\t\t\treturn -EPERM;\n 612:\t\tif (kstrtoull(buf, 0, \u0026pfn) \u003c 0)\n 613:\t\t\treturn -EINVAL;\n 614:\t\tpfn \u003e\u003e= PAGE_SHIFT;\n 615:\t\tret = soft_offline_page(pfn, 0);\n 616:\t\treturn ret == 0 ? count : ret;\n 617:\t}\n 618:\t\n 619:\t/* Forcibly offline a page, including killing processes. /\n 620:\tstatic ssize_t hard_offline_page_store(struct device dev,\n 621:\t\t\t\t\t struct device_attribute attr,\n 622:\t\t\t\t\t const char buf, size_t count)\n 623:\t{\n 624:\t\tint ret;\n 625:\t\tu64 pfn;\n 626:\t\tif (!capable(CAP_SYS_ADMIN))\n 627:\t\t\treturn -EPERM;\n 628:\t\tif (kstrtoull(buf, 0, \u0026pfn) \u003c 0)\n 629:\t\t\treturn -EINVAL;\n 630:\t\tpfn \u003e\u003e= PAGE_SHIFT;\n 631:\t\tret = memory_failure(pfn, MF_SW_SIMULATED);\n 632:\t\tif (ret == -EOPNOTSUPP)\n 633:\t\t\tret = 0;\n 634:\t\treturn ret ? ret : count;\n 635:\t}\n 636:\t\n 637:\tstatic DEVICE_ATTR_WO(soft_offline_page);\n 638:\tstatic DEVICE_ATTR_WO(hard_offline_page);\n 639:\t#endif\n" }
15/2	2026/06/26 23:08	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 16327 output: 35 thoughts: 13 Thoughts: Okay, here's my attempt at summarizing that thought process, tailored for an expert audience: Analyzing the Conditional Compilation Block Alright, I'm diving in. The immediate next step is to examine the code segment above the `#ifdef` directive. I need to understand the context and what conditions are being checked. This is where the real meat of the logic usually lies; the preprocessor's logic is just a guard rail, a setup for the core execution. I'm looking for the implications of why this block is conditionally compiled in the first place. What’s the rationale behind this? What environmental factors, feature flags, or platform differences are being accounted for? Understanding the surrounding code, especially the potential preprocessor definitions, is critical to understanding what specific block of code is being protected or enabled. I will focus my efforts on understanding the `#ifdef` statement's purpose.
16/2	2026/06/26 23:08	tool	read-file	0m Args: { "File": "drivers/base/memory.c", "FirstLine": 590, "LineCount": 20 } Results: { "Contents": " 590:\tout:\n 591:\t\tunlock_device_hotplug();\n 592:\t\treturn ret;\n 593:\t}\n 594:\t\n 595:\tstatic DEVICE_ATTR_WO(probe);\n 596:\t#endif\n 597:\t\n 598:\t#ifdef CONFIG_MEMORY_FAILURE\n 599:\t/\n 600:\t Support for offlining pages of memory\n 601:\t /\n 602:\t\n 603:\t/ Soft offline a page /\n 604:\tstatic ssize_t soft_offline_page_store(struct device dev,\n 605:\t\t\t\t\t struct device_attribute attr,\n 606:\t\t\t\t\t const char buf, size_t count)\n 607:\t{\n 608:\t\tint ret;\n 609:\t\tu64 pfn;\n" }
17/2	2026/06/26 23:08	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 16806 output: 30 thoughts: 317 Thoughts: Fuzzing Strategy for `CONFIG_MEMORY_FAILURE` Changes Okay, so this patch is touching code under `CONFIG_MEMORY_FAILURE`. My initial assessment is that it's altering the behavior of `memory_failure()` and `get_any_page()`, so this is definitely worth a closer look for fuzzing potential. We need to see if it's got functional changes. The `panic_on_unrecoverable_memory_failure` sysctl is off by default, but the fuzzer can easily flip that switch to explore all possibilities. Even if we keep it off, there's a significant change: `is_kernel_owned_page()` will alter the return value of `get_any_page()` to `-ENOTRECOVERABLE` instead of `-EIO`. That, in turn, changes how `memory_failure()` behaves; it now causes it to immediately return `action_result(pfn, MF_MSG_KERNEL, MF_IGNORED)`. This is clearly a functional change, so yes, we should definitely fuzz this. Now, what are the key symbols to target? I'm thinking `memory_failure`, `get_any_page`, `is_kernel_owned_page`, and `panic_on_unrecoverable_mf`. But hold on, `is_kernel_owned_page` and `panic_on_unrecoverable_mf` are marked as static inline or static functions. They're likely inlined, so that affects their usefulness as direct fuzz targets. So, `memory_failure` and `get_any_page` remain the primary focus, and `action_result` is also worth keeping in mind, since it's the return of a critical function. As for configurations, `MEMORY_FAILURE` is the obvious one, but wait, `MEMORY_FAILURE` depends on `ARCH_SUPPORTS_MEMORY_FAILURE`. So we need to make sure we also account for `ARCH_SUPPORTS_MEMORY_FAILURE` in our testing.
18/2	2026/06/26 23:09	tool	grepper	0m Args: { "Expression": "ARCH_SUPPORTS_MEMORY_FAILURE", "PathPrefix": "arch/" } Results: { "Output": "arch/arm64/Kconfig=2=config ARM64\n--\narch/arm64/Kconfig-73-\tselect ARCH_SUPPORTS_HUGETLBFS\narch/arm64/Kconfig:74:\tselect ARCH_SUPPORTS_MEMORY_FAILURE\narch/arm64/Kconfig-75-\tselect ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK\n--\narch/parisc/Kconfig=2=config PARISC\n--\narch/parisc/Kconfig-22-\tselect ARCH_SUPPORTS_HUGETLBFS if PA20\narch/parisc/Kconfig:23:\tselect ARCH_SUPPORTS_MEMORY_FAILURE\narch/parisc/Kconfig-24-\tselect ARCH_STACKWALK\n--\narch/powerpc/Kconfig=624=config PPC64_SUPPORTS_MEMORY_FAILURE\n--\narch/powerpc/Kconfig-627-\tdefault \"y\" if PPC_POWERNV\narch/powerpc/Kconfig:628:\tselect ARCH_SUPPORTS_MEMORY_FAILURE\narch/powerpc/Kconfig-629-\n--\narch/x86/Kconfig=762=config X86_SUPPORTS_MEMORY_FAILURE\n--\narch/x86/Kconfig-768-\tdepends on X86_64 \|\| !SPARSEMEM\narch/x86/Kconfig:769:\tselect ARCH_SUPPORTS_MEMORY_FAILURE\narch/x86/Kconfig-770-\n" }
19/2	2026/06/26 23:09	llm	patch-evaluator	0m Model: gemini-3.1-pro-preview Args: null Results: null Tokens: input: 17971 output: 189 thoughts: 1101 Thoughts: Analyzing Memory Failure Handling and Related Configurations Okay, so I'm focusing on enabling and understanding the `MEMORY_FAILURE` mechanism. My initial thought is to identify all relevant configurations. I see that `MEMORY_FAILURE` is the primary one, and I'm also considering `HWPOISON_INJECT`. The latter seems interesting for fuzzing, possibly enabling the injection of memory failures. I know it provides `/sys/kernel/debug/hwpoison/corrupt-pfn` and `unpoison-pfn`, and the selftests use `unpoison-pfn`, which confirms its relevance. I'm checking if syzkaller utilizes `HWPOISON_INJECT`. It appears that syzkaller uses `madvise(MADV_HWPOISON)` and `/sys/devices/system/memory/hard_offline_page`, so `HWPOISON_INJECT` is definitely related and worth enabling for effective testing and memory error injection. Therefore, I'll include both `MEMORY_FAILURE` and `HWPOISON_INJECT` in my configuration. Next, I'll examine the code changes in `mm/memory-failure.c`. I'm focusing on the functions directly involved in this process, specifically `memory_failure`, `get_any_page`, `is_kernel_owned_page`, `panic_on_unrecoverable_mf`, and `action_result`. I'm noting that `is_kernel_owned_page` and `panic_on_unrecoverable_mf` are static and potentially inlined, which might affect their symbol availability. Now, I'm verifying the export status of the key functions. I see that `memory_failure` is exported with `EXPORT_SYMBOL_GPL(memory_failure)`, so it's directly accessible. `get_any_page` and `action_result` are static within the file, making their direct use outside this context unlikely, but I'm primarily interested in their internal logic. From what I can tell, the modifications I've read modify the core memory failure handling logic. They short-circuit error processing for kernel-owned pages (such as those used by the slab allocator, page tables, or reserved pages). A new sysctl is also introduced to trigger a panic upon encountering unrecoverable errors with these kernel-owned pages. This mechanism is clearly linked to hardware error injection mechanisms such as `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`. To reiterate, I need to double-check that `HWPOISON_INJECT` is a necessity. It is a module which provides debugfs interfaces. Syzkaller can utilize it. Thus, it is desirable to have it enabled. Looking for other dependencies, I can see that `DEBUG_KERNEL` and `PROC_FS` are dependencies for `HWPOISON_INJECT`. They're likely standard and don't require specific consideration. `MMU` is a dependency for `MEMORY_FAILURE`, which is also standard. Based on this analysis, I'm confident in the approach and these configurations are the most important.
20/2	2026/06/26 23:09	tool	set-results	0m Args: { "EnableConfigs": [ "MEMORY_FAILURE", "HWPOISON_INJECT" ], "FocusSymbols": [ "memory_failure", "get_any_page", "action_result", "panic_on_unrecoverable_mf", "is_kernel_owned_page" ], "Reasoning": "The patch modifies the core memory failure handling logic (`memory_failure`, `get_any_page`) to explicitly identify and short-circuit on kernel-owned pages (slab, page tables, reserved, large kmalloc). It also introduces a new sysctl to panic on such unrecoverable errors. This is reachable via hardware error injection interfaces (e.g., `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`).", "WorthFuzzing": true } Results: { "EnableConfigs": [ "MEMORY_FAILURE", "HWPOISON_INJECT" ], "FocusSymbols": [ "memory_failure", "get_any_page", "action_result", "panic_on_unrecoverable_mf", "is_kernel_owned_page" ], "Reasoning": "The patch modifies the core memory failure handling logic (`memory_failure`, `get_any_page`) to explicitly identify and short-circuit on kernel-owned pages (slab, page tables, reserved, large kmalloc). It also introduces a new sysctl to panic on such unrecoverable errors. This is reachable via hardware error injection interfaces (e.g., `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`).", "WorthFuzzing": true }

{
  "EnableConfigs": [
    "MEMORY_FAILURE",
    "HWPOISON_INJECT"
  ],
  "FocusSymbols": [
    "memory_failure",
    "get_any_page",
    "action_result",
    "panic_on_unrecoverable_mf",
    "is_kernel_owned_page"
  ],
  "Reasoning": "The patch modifies the core memory failure handling logic (`memory_failure`, `get_any_page`) to explicitly identify and short-circuit on kernel-owned pages (slab, page tables, reserved, large kmalloc). It also introduces a new sysctl to panic on such unrecoverable errors. This is reachable via hardware error injection interfaces (e.g., `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`).",
  "WorthFuzzing": true
}

{
  "PatchDiff": "commit 581742259c41b2f1d7712a44b0ab008650850750\nAuthor: syz-cluster \u003ctriage@syzkaller.com\u003e\nDate:   Fri Jun 26 23:05:19 2026 +0000\n\n    syz-cluster: applied patch under review\n\ndiff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst\nindex b9b0c218bfb440..22cc54cac3b21d 100644\n--- a/Documentation/admin-guide/sysctl/vm.rst\n+++ b/Documentation/admin-guide/sysctl/vm.rst\n@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:\n - page-cluster\n - page_lock_unfairness\n - panic_on_oom\n+- panic_on_unrecoverable_memory_failure\n - percpu_pagelist_high_fraction\n - stat_interval\n - stat_refresh\n@@ -925,6 +926,85 @@ panic_on_oom=2+kdump gives you very strong tool to investigate\n why oom happens. You can get snapshot.\n \n \n+panic_on_unrecoverable_memory_failure\n+======================================\n+\n+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page\n+that cannot be recovered by the memory failure handler, the default\n+behaviour is to ignore the error and continue operation.  This is\n+dangerous because the corrupted data remains accessible to the kernel,\n+risking silent data corruption or a delayed crash when the poisoned\n+memory is next accessed.\n+\n+When enabled, this sysctl triggers a panic on memory failure events\n+hitting kernel-owned pages that the handler cannot recover:\n+``PageReserved`` (firmware reservations, kernel image, vDSO, zero\n+page, and similar memblock-reserved regions), ``PageSlab``,\n+``PageTable``, and ``PageLargeKmalloc``.  These are owned by the\n+kernel and the memory failure handler cannot reliably evict their\n+contents.\n+\n+Other unrecoverable kernel-owned populations (vmalloc allocations,\n+kernel stack pages, ...) are not currently covered because the\n+handler has no page-type signal that distinguishes them from a\n+userspace folio temporarily off the LRU during migration or\n+compaction.  Such pages still go through the standard\n+MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a\n+delayed crash on the next access remains possible.  Coverage may\n+grow as the handler gains stronger kernel-ownership signals.\n+\n+Recoverable failure paths are also intentionally left out: in-flight\n+buddy allocations and other transient races with the page allocator\n+can reach the same diagnostic, and panicking on them would risk\n+killing the box for a page destined for userspace where the standard\n+SIGBUS recovery path applies.  Pages whose state could not be\n+classified at all are not covered either, since an unknown state is\n+not a sound basis for a panic decision.\n+\n+For many environments it is preferable to panic immediately with a clean\n+crash dump that captures the original error context, rather than to\n+continue and face a random crash later whose cause is difficult to\n+diagnose.\n+\n+Use cases\n+---------\n+\n+This option is most useful in environments where unattributed crashes\n+are expensive to debug or where data integrity must take precedence\n+over availability:\n+\n+* Large fleets, where multi-bit ECC errors on kernel pages are observed\n+  regularly and post-mortem analysis of an unrelated downstream crash\n+  (often seconds to minutes after the original error) consumes\n+  significant engineering effort.\n+\n+* Systems configured with kdump, where panicking at the moment of the\n+  hardware error produces a vmcore that still contains the faulting\n+  address, the affected page state, and the originating MCE/GHES\n+  record — context that is typically lost by the time a delayed crash\n+  occurs.\n+\n+* High-availability clusters that rely on fast, deterministic node\n+  failure for failover, and prefer an immediate panic over silent data\n+  corruption propagating to replicas or persistent storage.\n+\n+* Kernel and platform developers reproducing hwpoison issues with\n+  tools such as ``mce-inject`` or error-injection debugfs interfaces,\n+  where panicking on the unrecoverable path makes regressions\n+  immediately visible instead of surfacing as later, unrelated\n+  failures.\n+\n+= =====================================================================\n+0 Try to continue operation (default).\n+1 Panic immediately.  If the ``panic`` sysctl is also non-zero then the\n+  machine will be rebooted.\n+= =====================================================================\n+\n+Example::\n+\n+     echo 1 \u003e /proc/sys/vm/panic_on_unrecoverable_memory_failure\n+\n+\n percpu_pagelist_high_fraction\n =============================\n \ndiff --git a/mm/memory-failure.c b/mm/memory-failure.c\nindex 51508a55c4055e..611160c98c6f64 100644\n--- a/mm/memory-failure.c\n+++ b/mm/memory-failure.c\n@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;\n \n static int sysctl_enable_soft_offline __read_mostly = 1;\n \n+static int sysctl_panic_on_unrecoverable_mf __read_mostly;\n+\n atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);\n \n static bool hw_memory_failure __read_mostly = false;\n@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {\n \t\t.proc_handler\t= proc_dointvec_minmax,\n \t\t.extra1\t\t= SYSCTL_ZERO,\n \t\t.extra2\t\t= SYSCTL_ONE,\n+\t},\n+\t{\n+\t\t.procname\t= \"panic_on_unrecoverable_memory_failure\",\n+\t\t.data\t\t= \u0026sysctl_panic_on_unrecoverable_mf,\n+\t\t.maxlen\t\t= sizeof(sysctl_panic_on_unrecoverable_mf),\n+\t\t.mode\t\t= 0644,\n+\t\t.proc_handler\t= proc_dointvec_minmax,\n+\t\t.extra1\t\t= SYSCTL_ZERO,\n+\t\t.extra2\t\t= SYSCTL_ONE,\n \t}\n };\n \n@@ -980,17 +991,6 @@ static bool has_extra_refcount(struct page_state *ps, struct page *p,\n \treturn false;\n }\n \n-/*\n- * Error hit kernel page.\n- * Do nothing, try to be lucky and not touch this instead. For a few cases we\n- * could be more sophisticated.\n- */\n-static int me_kernel(struct page_state *ps, struct page *p)\n-{\n-\tunlock_page(p);\n-\treturn MF_IGNORED;\n-}\n-\n /*\n  * Page in unknown state. Do nothing.\n  * This is a catch-all in case we fail to make sense of the page state.\n@@ -1199,10 +1199,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)\n #define mlock\t\t(1UL \u003c\u003c PG_mlocked)\n #define lru\t\t(1UL \u003c\u003c PG_lru)\n #define head\t\t(1UL \u003c\u003c PG_head)\n-#define reserved\t(1UL \u003c\u003c PG_reserved)\n \n static struct page_state error_states[] = {\n-\t{ reserved,\treserved,\tMF_MSG_KERNEL,\tme_kernel },\n \t/*\n \t * free pages are specially detected outside this table:\n \t * PG_buddy pages only make a small fraction of all free pages.\n@@ -1234,7 +1232,6 @@ static struct page_state error_states[] = {\n #undef mlock\n #undef lru\n #undef head\n-#undef reserved\n \n static void update_per_node_mf_stats(unsigned long pfn,\n \t\t\t\t     enum mf_result result)\n@@ -1269,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long pfn,\n \t++mf_stats-\u003etotal;\n }\n \n+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,\n+\t\t\t\t      enum mf_result result)\n+{\n+\tif (!sysctl_panic_on_unrecoverable_mf)\n+\t\treturn false;\n+\n+\treturn type == MF_MSG_KERNEL \u0026\u0026 result == MF_IGNORED;\n+}\n+\n /*\n  * \"Dirty/Clean\" indication is not 100% accurate due to the possibility of\n  * setting PG_dirty outside page lock. See also comment above set_page_dirty().\n@@ -1286,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,\n \tpr_err(\"%#lx: recovery action for %s: %s\\n\",\n \t\tpfn, action_page_types[type], action_name[result]);\n \n+\tif (panic_on_unrecoverable_mf(type, result))\n+\t\tpanic(\"Memory failure: %#lx: unrecoverable page\", pfn);\n+\n \treturn (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;\n }\n \n@@ -1339,6 +1348,36 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)\n \treturn PageLRU(page) || is_free_buddy_page(page);\n }\n \n+/*\n+ * Positive identification of pages the hwpoison handler cannot recover:\n+ * pages owned by kernel internals with no userspace mapping to unmap, no\n+ * file mapping to invalidate, and no migration target.\n+ */\n+static inline bool is_kernel_owned_page(struct page *page)\n+{\n+\tstruct page *head;\n+\tbool kernel_owned;\n+\n+\t/* PG_reserved is a per-page flag, never set on a compound page. */\n+\tif (PageReserved(page))\n+\t\treturn true;\n+\n+\t/*\n+\t * Page-type bits live only on the head page, so resolve any tail\n+\t * first.  The check takes no refcount; recheck the head afterwards\n+\t * so a concurrent split or compound free cannot leave us trusting\n+\t * a stale view.  A free-\u003ealloc-\u003efree in the same window is still\n+\t * possible but closing it would require taking a reference here.\n+\t */\n+retry:\n+\thead = compound_head(page);\n+\tkernel_owned = PageSlab(head) || PageTable(head) ||\n+\t\t       PageLargeKmalloc(head);\n+\tif (head != compound_head(page))\n+\t\tgoto retry;\n+\treturn kernel_owned;\n+}\n+\n static int __get_hwpoison_page(struct page *page, unsigned long flags)\n {\n \tstruct folio *folio = page_folio(page);\n@@ -1385,6 +1424,19 @@ static int get_any_page(struct page *p, unsigned long flags)\n \tif (flags \u0026 MF_COUNT_INCREASED)\n \t\tcount_increased = true;\n \n+\t/*\n+\t * Page types we know are kernel-owned and cannot be recovered.\n+\t * Short-circuit before the shake_page() / retry loop, which\n+\t * cannot turn any of these into something HWPoisonHandlable().\n+\t * Drop the caller's reference if MF_COUNT_INCREASED took one.\n+\t */\n+\tif (is_kernel_owned_page(p)) {\n+\t\tif (count_increased)\n+\t\t\tput_page(p);\n+\t\tret = -ENOTRECOVERABLE;\n+\t\tgoto out;\n+\t}\n+\n try_again:\n \tif (!count_increased) {\n \t\tret = __get_hwpoison_page(p, flags);\n@@ -1432,7 +1484,7 @@ static int get_any_page(struct page *p, unsigned long flags)\n \t\tret = -EIO;\n \t}\n out:\n-\tif (ret == -EIO)\n+\tif (ret == -EIO || ret == -ENOTRECOVERABLE)\n \t\tpr_err(\"%#lx: unhandlable page.\\n\", page_to_pfn(p));\n \n \treturn ret;\n@@ -1489,7 +1541,10 @@ static int __get_unpoison_page(struct page *page)\n  *         -EIO for pages on which we can not handle memory errors,\n  *         -EBUSY when get_hwpoison_page() has raced with page lifecycle\n  *         operations like allocation and free,\n- *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.\n+ *         -EHWPOISON when the page is hwpoisoned and taken off from buddy,\n+ *         -ENOTRECOVERABLE for kernel-owned pages identified by\n+ *         is_kernel_owned_page() (PG_reserved, slab,\n+ *         page-table, large-kmalloc) that the handler cannot recover.\n  */\n static int get_hwpoison_page(struct page *p, unsigned long flags)\n {\n@@ -2402,7 +2457,8 @@ int memory_failure(unsigned long pfn, int flags)\n \t * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.\n \t */\n \tres = get_hwpoison_page(p, flags);\n-\tif (!res) {\n+\tswitch (res) {\n+\tcase 0:\n \t\tif (is_free_buddy_page(p)) {\n \t\t\tif (take_page_off_buddy(p)) {\n \t\t\t\tpage_ref_inc(p);\n@@ -2421,7 +2477,19 @@ int memory_failure(unsigned long pfn, int flags)\n \t\t\tres = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);\n \t\t}\n \t\tgoto unlock_mutex;\n-\t} else if (res \u003c 0) {\n+\tcase 1:\n+\t\t/* Got a refcount on a handlable page. */\n+\t\tbreak;\n+\tcase -ENOTRECOVERABLE:\n+\t\t/*\n+\t\t * Stable unhandlable kernel-owned page (PG_reserved,\n+\t\t * slab, page tables, large-kmalloc).\n+\t\t * No recovery possible.\n+\t\t */\n+\t\tres = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);\n+\t\tgoto unlock_mutex;\n+\tdefault:\n+\t\t/* Transient lifecycle race with the page allocator. */\n \t\tres = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);\n \t\tgoto unlock_mutex;\n \t}\ndiff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile\nindex e6df968f0971c8..ed321ae709dacf 100644\n--- a/tools/testing/selftests/mm/Makefile\n+++ b/tools/testing/selftests/mm/Makefile\n@@ -174,6 +174,10 @@ TEST_PROGS += ksft_userfaultfd.sh\n TEST_PROGS += ksft_vma_merge.sh\n TEST_PROGS += ksft_vmalloc.sh\n \n+# Destructive: every successful run panics the kernel.  Installed and\n+# kept executable, but not run from a default kselftest invocation.\n+TEST_PROGS_EXTENDED += hwpoison-panic.sh\n+\n TEST_FILES := test_vmalloc.sh\n TEST_FILES += test_hmm.sh\n TEST_FILES += va_high_addr_switch.sh\ndiff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh\nnew file mode 100755\nindex 00000000000000..aafc06e895d01d\n--- /dev/null\n+++ b/tools/testing/selftests/mm/hwpoison-panic.sh\n@@ -0,0 +1,249 @@\n+#!/bin/bash\n+# SPDX-License-Identifier: GPL-2.0\n+#\n+# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison\n+# error on a kernel-owned page and confirming the kernel panics.\n+#\n+# Three \"kinds\" of kernel-owned page can be targeted, selectable via the\n+# first positional argument (default: rodata):\n+#\n+#   rodata  - a PG_reserved page in the kernel rodata range\n+#             (sourced from /proc/iomem \"Kernel rodata\").  Exercises\n+#             memory_failure() -\u003e get_any_page() on a PageReserved page.\n+#\n+#   slab    - a slab page found via /proc/kpageflags (KPF_SLAB).\n+#             Exercises memory_failure() -\u003e get_any_page() on a non\n+#             PG_reserved kernel-owned page.  This path is what catches\n+#             regressions where get_any_page() collapses kernel-owned\n+#             pages into a transient -EIO instead of -ENOTRECOVERABLE.\n+#\n+#   pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE).\n+#             Same path as slab, different page type.\n+#\n+# This test is DESTRUCTIVE: a successful run crashes the kernel.  It is\n+# meant to be executed inside a disposable VM (e.g. virtme-ng) with a\n+# serial console captured by the harness.  It is skipped unless the\n+# caller opts in via RUN_DESTRUCTIVE=1.\n+#\n+# Test passes externally: the kernel must panic with\n+#   \"Memory failure: \u003cpfn\u003e: unrecoverable page\"\n+# A return from the inject means no panic fired: that is a failure,\n+# unless the target PFN raced to a different page type before injection,\n+# in which case the run is inconclusive and is skipped.\n+#\n+# Author: Breno Leitao \u003cleitao@debian.org\u003e\n+\n+set -u\n+\n+ksft_skip=4\n+sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure\n+inject_path=/sys/devices/system/memory/hard_offline_page\n+kpageflags_path=/proc/kpageflags\n+unpoison_path=/sys/kernel/debug/hwpoison/unpoison-pfn\n+\n+# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h)\n+KPF_SLAB=7\n+KPF_COMPOUND_TAIL=16\n+KPF_HWPOISON=19\n+KPF_NOPAGE=20\n+KPF_PGTABLE=26\n+KPF_RESERVED=32\n+\n+pagesize=$(getconf PAGE_SIZE)\n+\n+kind=${1:-rodata}\n+\n+ksft_print() { echo \"# $*\"; }\n+ksft_exit_skip() { ksft_print \"$*\"; exit \"$ksft_skip\"; }\n+ksft_exit_fail() { echo \"not ok 1 $*\"; exit 1; }\n+\n+if [ \"$(id -u)\" -ne 0 ]; then\n+\tksft_exit_skip \"must run as root\"\n+fi\n+\n+if [ ! -w \"$sysctl_path\" ]; then\n+\tksft_exit_skip \"$sysctl_path not present (kernel without the sysctl?)\"\n+fi\n+\n+if [ ! -w \"$inject_path\" ]; then\n+\tksft_exit_skip \"$inject_path not present (no MEMORY_HOTPLUG?)\"\n+fi\n+\n+if [ \"${RUN_DESTRUCTIVE:-0}\" != \"1\" ]; then\n+\tksft_exit_skip \"destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM\"\n+fi\n+\n+# Pick a PFN inside the kernel image rodata region of /proc/iomem.\n+# This is preferred over a top-level \"Reserved\" entry because top-level\n+# Reserved ranges are often firmware holes that have no backing struct\n+# page; pfn_to_online_page() returns NULL on those and memory_failure()\n+# bails out with -ENXIO before reaching the panic path.\n+#\n+# \"Kernel rodata\" is reported as a sub-resource of \"System RAM\" on every\n+# major architecture, which guarantees:\n+#   - the PFN is backed by struct page (within an online memory range);\n+#   - PG_reserved is set on the page (kernel image area);\n+#   - the memory is read-only, so setting PG_hwpoison on it does not\n+#     corrupt writable kernel state if the panic somehow does not fire.\n+#\n+# /proc/iomem entries look like (indented for sub-resources):\n+#     \"  02500000-02ffffff : Kernel rodata\"\n+pick_rodata_phys_addr() {\n+\tawk -v pagesize=\"$(getconf PAGE_SIZE)\" '\n+\t# Convert a hex string to a number without relying on the gawk-only\n+\t# strtonum().  mawk lacks it and would otherwise spuriously skip\n+\t# this test on distros that ship mawk as /usr/bin/awk.\n+\tfunction hex2num(s,   n, i, c, v) {\n+\t\tn = 0\n+\t\tfor (i = 1; i \u003c= length(s); i++) {\n+\t\t\tc = tolower(substr(s, i, 1))\n+\t\t\tv = index(\"0123456789abcdef\", c) - 1\n+\t\t\tif (v \u003c 0)\n+\t\t\t\treturn -1\n+\t\t\tn = n * 16 + v\n+\t\t}\n+\t\treturn n\n+\t}\n+\t/: Kernel rodata[[:space:]]*$/ {\n+\t\tsub(/^[[:space:]]+/, \"\")\n+\t\tn = split($0, a, /[- ]/)\n+\t\tstart = hex2num(a[1])\n+\t\tend   = hex2num(a[2])\n+\t\tif (end \u003c= start)\n+\t\t\tnext\n+\t\t# Page-align upward and emit the first byte of that page.\n+\t\tpfn = int((start + pagesize - 1) / pagesize)\n+\t\tprintf \"0x%x\\n\", pfn * pagesize\n+\t\texit 0\n+\t}\n+\t' /proc/iomem\n+}\n+\n+# Walk /proc/kpageflags and return the phys addr of the first PFN that\n+# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL\n+# all clear (so we attack a real, non-tail, not-already-poisoned page).\n+#\n+# We skip the first 16 MiB of PFNs to step past low-memory special\n+# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not\n+# exhibit the slab/pgtable type we are looking for.\n+pick_kpageflags_phys_addr() {\n+\tlocal want_bit=$1\n+\tlocal pagesize skip_pfn\n+\n+\t[ -r \"$kpageflags_path\" ] || return\n+\n+\tpagesize=$(getconf PAGE_SIZE)\n+\tskip_pfn=$(((16 * 1024 * 1024) / pagesize))\n+\n+\tod -An -tx8 -v -w8 -j \"$((skip_pfn * 8))\" \"$kpageflags_path\" 2\u003e/dev/null | \\\n+\tawk -v want_bit=\"$want_bit\" \\\n+\t    -v hwp_bit=\"$KPF_HWPOISON\" \\\n+\t    -v nopage_bit=\"$KPF_NOPAGE\" \\\n+\t    -v tail_bit=\"$KPF_COMPOUND_TAIL\" \\\n+\t    -v base_pfn=\"$skip_pfn\" \\\n+\t    -v pagesize=\"$pagesize\" '\n+\t# Test whether bit \"b\" is set in the 16-hex-digit value \"hex\".\n+\t# Done with substring + per-digit lookup so we never rely on awk\n+\t# bitwise operators (mawk lacks them), 64-bit FP precision or the\n+\t# gawk-only strtonum().\n+\tfunction bit_set(hex, b,    di, bi, c, v) {\n+\t\tdi = int(b / 4)\n+\t\tbi = b - di * 4\n+\t\tc = substr(hex, length(hex) - di, 1)\n+\t\tv = index(\"0123456789abcdef\", tolower(c)) - 1\n+\t\tif (bi == 0) return (v % 2) == 1\n+\t\tif (bi == 1) return int(v / 2) % 2 == 1\n+\t\tif (bi == 2) return int(v / 4) % 2 == 1\n+\t\treturn int(v / 8) % 2 == 1\n+\t}\n+\t{\n+\t\tgsub(/^[[:space:]]+/, \"\")\n+\t\th = $1\n+\t\tif (bit_set(h, want_bit) \u0026\u0026\n+\t\t    !bit_set(h, hwp_bit) \u0026\u0026\n+\t\t    !bit_set(h, nopage_bit) \u0026\u0026\n+\t\t    !bit_set(h, tail_bit)) {\n+\t\t\tpfn = base_pfn + NR - 1\n+\t\t\tprintf \"0x%x\\n\", pfn * pagesize\n+\t\t\texit 0\n+\t\t}\n+\t}\n+\t'\n+}\n+\n+# Return 0 if /proc/kpageflags bit $2 is set for PFN $1, 1 if it is\n+# clear, or 2 if the word cannot be read.  Used to re-confirm the target\n+# page type after a non-panicking inject.\n+kpageflags_bit_set() {\n+\tlocal word\n+\n+\tword=$(od -An -tx8 -v -j \"$(($1 * 8))\" -N 8 \"$kpageflags_path\" 2\u003e/dev/null | tr -d '[:space:]')\n+\t[ -n \"$word\" ] || return 2\n+\t(( (16#$word \u003e\u003e $2) \u0026 1 ))\n+}\n+\n+# Best-effort: drop the PG_hwpoison marker set by the inject so a failed\n+# run does not leave a poisoned page behind.  hard_offline_page() injects\n+# with MF_SW_SIMULATED, so the page stays unpoisonable through the\n+# hwpoison debugfs interface (needs CONFIG_HWPOISON_INJECT + debugfs).\n+try_unpoison() {\n+\t[ -w \"$unpoison_path\" ] || return 0\n+\techo \"$1\" \u003e \"$unpoison_path\" 2\u003e/dev/null || true\n+}\n+\n+case \"$kind\" in\n+rodata)\n+\tphys_addr=$(pick_rodata_phys_addr)\n+\trecheck_bit=$KPF_RESERVED\n+\tmissing_msg='no \"Kernel rodata\" entry in /proc/iomem'\n+\t;;\n+slab)\n+\tphys_addr=$(pick_kpageflags_phys_addr \"$KPF_SLAB\")\n+\trecheck_bit=$KPF_SLAB\n+\tmissing_msg=\"no usable slab PFN found in $kpageflags_path\"\n+\t;;\n+pgtable)\n+\tphys_addr=$(pick_kpageflags_phys_addr \"$KPF_PGTABLE\")\n+\trecheck_bit=$KPF_PGTABLE\n+\tmissing_msg=\"no usable page-table PFN found in $kpageflags_path\"\n+\t;;\n+*)\n+\tksft_exit_fail \"unknown kind '$kind' (expected: rodata|slab|pgtable)\"\n+\t;;\n+esac\n+\n+if [ -z \"$phys_addr\" ]; then\n+\tksft_exit_skip \"$missing_msg\"\n+fi\n+\n+ksft_print \"enabling $sysctl_path\"\n+prior=$(cat \"$sysctl_path\")\n+echo 1 \u003e \"$sysctl_path\" || ksft_exit_fail \"failed to enable sysctl\"\n+\n+pfn=$((phys_addr / pagesize))\n+ksft_print \"injecting hwpoison at phys 0x$(printf '%x' \"$phys_addr\") (pfn 0x$(printf '%x' \"$pfn\"), kind=$kind)\"\n+ksft_print \"expecting kernel panic: 'Memory failure: \u003cpfn\u003e: unrecoverable page'\"\n+\n+# A successful run never returns from the inject -- it panics the kernel.\n+# Reaching the code below therefore means no panic fired.  Note whether\n+# the write itself succeeded, then put the machine back: restore the\n+# sysctl and best-effort unpoison the page we just marked.\n+if echo \"$phys_addr\" \u003e \"$inject_path\"; then\n+\tverdict=\"inject returned without panic; sysctl ineffective\"\n+else\n+\tverdict=\"inject failed before reaching the panic path\"\n+fi\n+\n+echo \"$prior\" \u003e \"$sysctl_path\"\n+try_unpoison \"$pfn\"\n+\n+# The page type can change between selection and injection (e.g. a slab\n+# or page-table page is freed and reused).  Only treat a missing panic as\n+# a failure if the target PFN is still the kernel-owned type we aimed at;\n+# if it raced to another type the run is inconclusive, so skip instead.\n+kpageflags_bit_set \"$pfn\" \"$recheck_bit\"\n+case $? in\n+0)\tksft_exit_fail \"$verdict (page still $kind)\" ;;\n+1)\tksft_exit_skip \"target PFN no longer $kind; raced before inject, inconclusive\" ;;\n+*)\tksft_exit_fail \"$verdict (could not reconfirm page type via $kpageflags_path)\" ;;\n+esac\n"
}

gemini-3.1-pro-preview

{
  "EnableConfigs": [
    "MEMORY_FAILURE",
    "HWPOISON_INJECT"
  ],
  "FocusSymbols": [
    "memory_failure",
    "get_any_page",
    "action_result",
    "panic_on_unrecoverable_mf",
    "is_kernel_owned_page"
  ],
  "Reasoning": "The patch modifies the core memory failure handling logic (`memory_failure`, `get_any_page`) to explicitly identify and short-circuit on kernel-owned pages (slab, page tables, reserved, large kmalloc). It also introduces a new sysctl to panic on such unrecoverable errors. This is reachable via hardware error injection interfaces (e.g., `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`).",
  "WorthFuzzing": true
}

You are an expert Linux kernel maintainer.
Your job is to review a provided patch series and determine
if it makes functional changes to the kernel that should be fuzzed.

IMPORTANT: The changes have ALREADY been applied and committed as the HEAD commit in
your workspace. Do NOT rely on your internal knowledge of the kernel. You must actively
use your code access tools to examine the actual source code and confirm any assumptions.

Return WorthFuzzing=false if the patch only contains:
- Modifications to Documentation/, Kconfig files, or code comments.
- Purely decorative changes, such as logging (e.g., pr_err, printk) or tracepoints.
- Changes to numeric constants or macros that do not functionally alter execution flow.
- Code paths that are impossible to reach in virtualized environments like GCE or QEMU,
even when utilizing software-emulated hardware (e.g., usb gadget, mac80211_hwsim).

If it modifies reachable core kernel logic, drivers, or architectures, use your code search
tools to verify the code can be executed, then return WorthFuzzing=true.

When returning WorthFuzzing=true, you MUST ALSO:
1. Extract any specific kernel functions that should be heavily fuzzed into FocusSymbols.
   Avoid listing generic hot-path functions to prevent skewed test distributions.
2. Identify any specific CONFIG_ options required to properly test this new/modified feature.
   Go and look into the Kconfig files and check for ifdefs around the code, do not make assumptions.
   Do not list too generic configs (we already have them enabled). Only list those that
   specifically cover the modified code. List them in the EnableConfigs output array,
   and DO NOT add a 'CONFIG_' prefix (e.g., return "NET_IPV4" instead of "CONFIG_NET_IPV4").
Prefer calling several tools at the same time to save round-trips.


Use set-results tool to provide results of the analysis.
It must be called exactly once before the final reply.
Ignore results of this tool.

For your convenience, here is the diff of the changes:
commit 581742259c41b2f1d7712a44b0ab008650850750
Author: syz-cluster <triage@syzkaller.com>
Date:   Fri Jun 26 23:05:19 2026 +0000

    syz-cluster: applied patch under review

diff --git a/Documentation/admin-guide/sysctl/vm.rst b/Documentation/admin-guide/sysctl/vm.rst
index b9b0c218bfb440..22cc54cac3b21d 100644
--- a/Documentation/admin-guide/sysctl/vm.rst
+++ b/Documentation/admin-guide/sysctl/vm.rst
@@ -67,6 +67,7 @@ Currently, these files are in /proc/sys/vm:
 - page-cluster
 - page_lock_unfairness
 - panic_on_oom
+- panic_on_unrecoverable_memory_failure
 - percpu_pagelist_high_fraction
 - stat_interval
 - stat_refresh
@@ -925,6 +926,85 @@ panic_on_oom=2+kdump gives you very strong tool to investigate
 why oom happens. You can get snapshot.
 
 
+panic_on_unrecoverable_memory_failure
+======================================
+
+When a hardware memory error (e.g. multi-bit ECC) hits a kernel page
+that cannot be recovered by the memory failure handler, the default
+behaviour is to ignore the error and continue operation.  This is
+dangerous because the corrupted data remains accessible to the kernel,
+risking silent data corruption or a delayed crash when the poisoned
+memory is next accessed.
+
+When enabled, this sysctl triggers a panic on memory failure events
+hitting kernel-owned pages that the handler cannot recover:
+``PageReserved`` (firmware reservations, kernel image, vDSO, zero
+page, and similar memblock-reserved regions), ``PageSlab``,
+``PageTable``, and ``PageLargeKmalloc``.  These are owned by the
+kernel and the memory failure handler cannot reliably evict their
+contents.
+
+Other unrecoverable kernel-owned populations (vmalloc allocations,
+kernel stack pages, ...) are not currently covered because the
+handler has no page-type signal that distinguishes them from a
+userspace folio temporarily off the LRU during migration or
+compaction.  Such pages still go through the standard
+MF_MSG_GET_HWPOISON path: ``PG_hwpoison`` is set on them and a
+delayed crash on the next access remains possible.  Coverage may
+grow as the handler gains stronger kernel-ownership signals.
+
+Recoverable failure paths are also intentionally left out: in-flight
+buddy allocations and other transient races with the page allocator
+can reach the same diagnostic, and panicking on them would risk
+killing the box for a page destined for userspace where the standard
+SIGBUS recovery path applies.  Pages whose state could not be
+classified at all are not covered either, since an unknown state is
+not a sound basis for a panic decision.
+
+For many environments it is preferable to panic immediately with a clean
+crash dump that captures the original error context, rather than to
+continue and face a random crash later whose cause is difficult to
+diagnose.
+
+Use cases
+---------
+
+This option is most useful in environments where unattributed crashes
+are expensive to debug or where data integrity must take precedence
+over availability:
+
+* Large fleets, where multi-bit ECC errors on kernel pages are observed
+  regularly and post-mortem analysis of an unrelated downstream crash
+  (often seconds to minutes after the original error) consumes
+  significant engineering effort.
+
+* Systems configured with kdump, where panicking at the moment of the
+  hardware error produces a vmcore that still contains the faulting
+  address, the affected page state, and the originating MCE/GHES
+  record — context that is typically lost by the time a delayed crash
+  occurs.
+
+* High-availability clusters that rely on fast, deterministic node
+  failure for failover, and prefer an immediate panic over silent data
+  corruption propagating to replicas or persistent storage.
+
+* Kernel and platform developers reproducing hwpoison issues with
+  tools such as ``mce-inject`` or error-injection debugfs interfaces,
+  where panicking on the unrecoverable path makes regressions
+  immediately visible instead of surfacing as later, unrelated
+  failures.
+
+= =====================================================================
+0 Try to continue operation (default).
+1 Panic immediately.  If the ``panic`` sysctl is also non-zero then the
+  machine will be rebooted.
+= =====================================================================
+
+Example::
+
+     echo 1 > /proc/sys/vm/panic_on_unrecoverable_memory_failure
+
+
 percpu_pagelist_high_fraction
 =============================
 
diff --git a/mm/memory-failure.c b/mm/memory-failure.c
index 51508a55c4055e..611160c98c6f64 100644
--- a/mm/memory-failure.c
+++ b/mm/memory-failure.c
@@ -74,6 +74,8 @@ static int sysctl_memory_failure_recovery __read_mostly = 1;
 
 static int sysctl_enable_soft_offline __read_mostly = 1;
 
+static int sysctl_panic_on_unrecoverable_mf __read_mostly;
+
 atomic_long_t num_poisoned_pages __read_mostly = ATOMIC_LONG_INIT(0);
 
 static bool hw_memory_failure __read_mostly = false;
@@ -155,6 +157,15 @@ static const struct ctl_table memory_failure_table[] = {
 		.proc_handler	= proc_dointvec_minmax,
 		.extra1		= SYSCTL_ZERO,
 		.extra2		= SYSCTL_ONE,
+	},
+	{
+		.procname	= "panic_on_unrecoverable_memory_failure",
+		.data		= &sysctl_panic_on_unrecoverable_mf,
+		.maxlen		= sizeof(sysctl_panic_on_unrecoverable_mf),
+		.mode		= 0644,
+		.proc_handler	= proc_dointvec_minmax,
+		.extra1		= SYSCTL_ZERO,
+		.extra2		= SYSCTL_ONE,
 	}
 };
 
@@ -980,17 +991,6 @@ static bool has_extra_refcount(struct page_state *ps, struct page *p,
 	return false;
 }
 
-/*
- * Error hit kernel page.
- * Do nothing, try to be lucky and not touch this instead. For a few cases we
- * could be more sophisticated.
- */
-static int me_kernel(struct page_state *ps, struct page *p)
-{
-	unlock_page(p);
-	return MF_IGNORED;
-}
-
 /*
  * Page in unknown state. Do nothing.
  * This is a catch-all in case we fail to make sense of the page state.
@@ -1199,10 +1199,8 @@ static int me_huge_page(struct page_state *ps, struct page *p)
 #define mlock		(1UL << PG_mlocked)
 #define lru		(1UL << PG_lru)
 #define head		(1UL << PG_head)
-#define reserved	(1UL << PG_reserved)
 
 static struct page_state error_states[] = {
-	{ reserved,	reserved,	MF_MSG_KERNEL,	me_kernel },
 	/*
 	 * free pages are specially detected outside this table:
 	 * PG_buddy pages only make a small fraction of all free pages.
@@ -1234,7 +1232,6 @@ static struct page_state error_states[] = {
 #undef mlock
 #undef lru
 #undef head
-#undef reserved
 
 static void update_per_node_mf_stats(unsigned long pfn,
 				     enum mf_result result)
@@ -1269,6 +1266,15 @@ static void update_per_node_mf_stats(unsigned long pfn,
 	++mf_stats->total;
 }
 
+static bool panic_on_unrecoverable_mf(enum mf_action_page_type type,
+				      enum mf_result result)
+{
+	if (!sysctl_panic_on_unrecoverable_mf)
+		return false;
+
+	return type == MF_MSG_KERNEL && result == MF_IGNORED;
+}
+
 /*
  * "Dirty/Clean" indication is not 100% accurate due to the possibility of
  * setting PG_dirty outside page lock. See also comment above set_page_dirty().
@@ -1286,6 +1292,9 @@ static int action_result(unsigned long pfn, enum mf_action_page_type type,
 	pr_err("%#lx: recovery action for %s: %s\n",
 		pfn, action_page_types[type], action_name[result]);
 
+	if (panic_on_unrecoverable_mf(type, result))
+		panic("Memory failure: %#lx: unrecoverable page", pfn);
+
 	return (result == MF_RECOVERED || result == MF_DELAYED) ? 0 : -EBUSY;
 }
 
@@ -1339,6 +1348,36 @@ static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)
 	return PageLRU(page) || is_free_buddy_page(page);
 }
 
+/*
+ * Positive identification of pages the hwpoison handler cannot recover:
+ * pages owned by kernel internals with no userspace mapping to unmap, no
+ * file mapping to invalidate, and no migration target.
+ */
+static inline bool is_kernel_owned_page(struct page *page)
+{
+	struct page *head;
+	bool kernel_owned;
+
+	/* PG_reserved is a per-page flag, never set on a compound page. */
+	if (PageReserved(page))
+		return true;
+
+	/*
+	 * Page-type bits live only on the head page, so resolve any tail
+	 * first.  The check takes no refcount; recheck the head afterwards
+	 * so a concurrent split or compound free cannot leave us trusting
+	 * a stale view.  A free->alloc->free in the same window is still
+	 * possible but closing it would require taking a reference here.
+	 */
+retry:
+	head = compound_head(page);
+	kernel_owned = PageSlab(head) || PageTable(head) ||
+		       PageLargeKmalloc(head);
+	if (head != compound_head(page))
+		goto retry;
+	return kernel_owned;
+}
+
 static int __get_hwpoison_page(struct page *page, unsigned long flags)
 {
 	struct folio *folio = page_folio(page);
@@ -1385,6 +1424,19 @@ static int get_any_page(struct page *p, unsigned long flags)
 	if (flags & MF_COUNT_INCREASED)
 		count_increased = true;
 
+	/*
+	 * Page types we know are kernel-owned and cannot be recovered.
+	 * Short-circuit before the shake_page() / retry loop, which
+	 * cannot turn any of these into something HWPoisonHandlable().
+	 * Drop the caller's reference if MF_COUNT_INCREASED took one.
+	 */
+	if (is_kernel_owned_page(p)) {
+		if (count_increased)
+			put_page(p);
+		ret = -ENOTRECOVERABLE;
+		goto out;
+	}
+
 try_again:
 	if (!count_increased) {
 		ret = __get_hwpoison_page(p, flags);
@@ -1432,7 +1484,7 @@ static int get_any_page(struct page *p, unsigned long flags)
 		ret = -EIO;
 	}
 out:
-	if (ret == -EIO)
+	if (ret == -EIO || ret == -ENOTRECOVERABLE)
 		pr_err("%#lx: unhandlable page.\n", page_to_pfn(p));
 
 	return ret;
@@ -1489,7 +1541,10 @@ static int __get_unpoison_page(struct page *page)
  *         -EIO for pages on which we can not handle memory errors,
  *         -EBUSY when get_hwpoison_page() has raced with page lifecycle
  *         operations like allocation and free,
- *         -EHWPOISON when the page is hwpoisoned and taken off from buddy.
+ *         -EHWPOISON when the page is hwpoisoned and taken off from buddy,
+ *         -ENOTRECOVERABLE for kernel-owned pages identified by
+ *         is_kernel_owned_page() (PG_reserved, slab,
+ *         page-table, large-kmalloc) that the handler cannot recover.
  */
 static int get_hwpoison_page(struct page *p, unsigned long flags)
 {
@@ -2402,7 +2457,8 @@ int memory_failure(unsigned long pfn, int flags)
 	 * that may make page_ref_freeze()/page_ref_unfreeze() mismatch.
 	 */
 	res = get_hwpoison_page(p, flags);
-	if (!res) {
+	switch (res) {
+	case 0:
 		if (is_free_buddy_page(p)) {
 			if (take_page_off_buddy(p)) {
 				page_ref_inc(p);
@@ -2421,7 +2477,19 @@ int memory_failure(unsigned long pfn, int flags)
 			res = action_result(pfn, MF_MSG_KERNEL_HIGH_ORDER, MF_IGNORED);
 		}
 		goto unlock_mutex;
-	} else if (res < 0) {
+	case 1:
+		/* Got a refcount on a handlable page. */
+		break;
+	case -ENOTRECOVERABLE:
+		/*
+		 * Stable unhandlable kernel-owned page (PG_reserved,
+		 * slab, page tables, large-kmalloc).
+		 * No recovery possible.
+		 */
+		res = action_result(pfn, MF_MSG_KERNEL, MF_IGNORED);
+		goto unlock_mutex;
+	default:
+		/* Transient lifecycle race with the page allocator. */
 		res = action_result(pfn, MF_MSG_GET_HWPOISON, MF_IGNORED);
 		goto unlock_mutex;
 	}
diff --git a/tools/testing/selftests/mm/Makefile b/tools/testing/selftests/mm/Makefile
index e6df968f0971c8..ed321ae709dacf 100644
--- a/tools/testing/selftests/mm/Makefile
+++ b/tools/testing/selftests/mm/Makefile
@@ -174,6 +174,10 @@ TEST_PROGS += ksft_userfaultfd.sh
 TEST_PROGS += ksft_vma_merge.sh
 TEST_PROGS += ksft_vmalloc.sh
 
+# Destructive: every successful run panics the kernel.  Installed and
+# kept executable, but not run from a default kselftest invocation.
+TEST_PROGS_EXTENDED += hwpoison-panic.sh
+
 TEST_FILES := test_vmalloc.sh
 TEST_FILES += test_hmm.sh
 TEST_FILES += va_high_addr_switch.sh
diff --git a/tools/testing/selftests/mm/hwpoison-panic.sh b/tools/testing/selftests/mm/hwpoison-panic.sh
new file mode 100755
index 00000000000000..aafc06e895d01d
--- /dev/null
+++ b/tools/testing/selftests/mm/hwpoison-panic.sh
@@ -0,0 +1,249 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+#
+# Verify vm.panic_on_unrecoverable_memory_failure by injecting a hwpoison
+# error on a kernel-owned page and confirming the kernel panics.
+#
+# Three "kinds" of kernel-owned page can be targeted, selectable via the
+# first positional argument (default: rodata):
+#
+#   rodata  - a PG_reserved page in the kernel rodata range
+#             (sourced from /proc/iomem "Kernel rodata").  Exercises
+#             memory_failure() -> get_any_page() on a PageReserved page.
+#
+#   slab    - a slab page found via /proc/kpageflags (KPF_SLAB).
+#             Exercises memory_failure() -> get_any_page() on a non
+#             PG_reserved kernel-owned page.  This path is what catches
+#             regressions where get_any_page() collapses kernel-owned
+#             pages into a transient -EIO instead of -ENOTRECOVERABLE.
+#
+#   pgtable - a page-table page found via /proc/kpageflags (KPF_PGTABLE).
+#             Same path as slab, different page type.
+#
+# This test is DESTRUCTIVE: a successful run crashes the kernel.  It is
+# meant to be executed inside a disposable VM (e.g. virtme-ng) with a
+# serial console captured by the harness.  It is skipped unless the
+# caller opts in via RUN_DESTRUCTIVE=1.
+#
+# Test passes externally: the kernel must panic with
+#   "Memory failure: <pfn>: unrecoverable page"
+# A return from the inject means no panic fired: that is a failure,
+# unless the target PFN raced to a different page type before injection,
+# in which case the run is inconclusive and is skipped.
+#
+# Author: Breno Leitao <leitao@debian.org>
+
+set -u
+
+ksft_skip=4
+sysctl_path=/proc/sys/vm/panic_on_unrecoverable_memory_failure
+inject_path=/sys/devices/system/memory/hard_offline_page
+kpageflags_path=/proc/kpageflags
+unpoison_path=/sys/kernel/debug/hwpoison/unpoison-pfn
+
+# /proc/kpageflags bit positions (see include/uapi/linux/kernel-page-flags.h)
+KPF_SLAB=7
+KPF_COMPOUND_TAIL=16
+KPF_HWPOISON=19
+KPF_NOPAGE=20
+KPF_PGTABLE=26
+KPF_RESERVED=32
+
+pagesize=$(getconf PAGE_SIZE)
+
+kind=${1:-rodata}
+
+ksft_print() { echo "# $*"; }
+ksft_exit_skip() { ksft_print "$*"; exit "$ksft_skip"; }
+ksft_exit_fail() { echo "not ok 1 $*"; exit 1; }
+
+if [ "$(id -u)" -ne 0 ]; then
+	ksft_exit_skip "must run as root"
+fi
+
+if [ ! -w "$sysctl_path" ]; then
+	ksft_exit_skip "$sysctl_path not present (kernel without the sysctl?)"
+fi
+
+if [ ! -w "$inject_path" ]; then
+	ksft_exit_skip "$inject_path not present (no MEMORY_HOTPLUG?)"
+fi
+
+if [ "${RUN_DESTRUCTIVE:-0}" != "1" ]; then
+	ksft_exit_skip "destructive test; re-run with RUN_DESTRUCTIVE=1 inside a disposable VM"
+fi
+
+# Pick a PFN inside the kernel image rodata region of /proc/iomem.
+# This is preferred over a top-level "Reserved" entry because top-level
+# Reserved ranges are often firmware holes that have no backing struct
+# page; pfn_to_online_page() returns NULL on those and memory_failure()
+# bails out with -ENXIO before reaching the panic path.
+#
+# "Kernel rodata" is reported as a sub-resource of "System RAM" on every
+# major architecture, which guarantees:
+#   - the PFN is backed by struct page (within an online memory range);
+#   - PG_reserved is set on the page (kernel image area);
+#   - the memory is read-only, so setting PG_hwpoison on it does not
+#     corrupt writable kernel state if the panic somehow does not fire.
+#
+# /proc/iomem entries look like (indented for sub-resources):
+#     "  02500000-02ffffff : Kernel rodata"
+pick_rodata_phys_addr() {
+	awk -v pagesize="$(getconf PAGE_SIZE)" '
+	# Convert a hex string to a number without relying on the gawk-only
+	# strtonum().  mawk lacks it and would otherwise spuriously skip
+	# this test on distros that ship mawk as /usr/bin/awk.
+	function hex2num(s,   n, i, c, v) {
+		n = 0
+		for (i = 1; i <= length(s); i++) {
+			c = tolower(substr(s, i, 1))
+			v = index("0123456789abcdef", c) - 1
+			if (v < 0)
+				return -1
+			n = n * 16 + v
+		}
+		return n
+	}
+	/: Kernel rodata[[:space:]]*$/ {
+		sub(/^[[:space:]]+/, "")
+		n = split($0, a, /[- ]/)
+		start = hex2num(a[1])
+		end   = hex2num(a[2])
+		if (end <= start)
+			next
+		# Page-align upward and emit the first byte of that page.
+		pfn = int((start + pagesize - 1) / pagesize)
+		printf "0x%x\n", pfn * pagesize
+		exit 0
+	}
+	' /proc/iomem
+}
+
+# Walk /proc/kpageflags and return the phys addr of the first PFN that
+# has bit $1 set, with KPF_HWPOISON, KPF_NOPAGE and KPF_COMPOUND_TAIL
+# all clear (so we attack a real, non-tail, not-already-poisoned page).
+#
+# We skip the first 16 MiB of PFNs to step past low-memory special
+# ranges (BIOS/EFI/ACPI/etc.) that often are PG_reserved and would not
+# exhibit the slab/pgtable type we are looking for.
+pick_kpageflags_phys_addr() {
+	local want_bit=$1
+	local pagesize skip_pfn
+
+	[ -r "$kpageflags_path" ] || return
+
+	pagesize=$(getconf PAGE_SIZE)
+	skip_pfn=$(((16 * 1024 * 1024) / pagesize))
+
+	od -An -tx8 -v -w8 -j "$((skip_pfn * 8))" "$kpageflags_path" 2>/dev/null | \
+	awk -v want_bit="$want_bit" \
+	    -v hwp_bit="$KPF_HWPOISON" \
+	    -v nopage_bit="$KPF_NOPAGE" \
+	    -v tail_bit="$KPF_COMPOUND_TAIL" \
+	    -v base_pfn="$skip_pfn" \
+	    -v pagesize="$pagesize" '
+	# Test whether bit "b" is set in the 16-hex-digit value "hex".
+	# Done with substring + per-digit lookup so we never rely on awk
+	# bitwise operators (mawk lacks them), 64-bit FP precision or the
+	# gawk-only strtonum().
+	function bit_set(hex, b,    di, bi, c, v) {
+		di = int(b / 4)
+		bi = b - di * 4
+		c = substr(hex, length(hex) - di, 1)
+		v = index("0123456789abcdef", tolower(c)) - 1
+		if (bi == 0) return (v % 2) == 1
+		if (bi == 1) return int(v / 2) % 2 == 1
+		if (bi == 2) return int(v / 4) % 2 == 1
+		return int(v / 8) % 2 == 1
+	}
+	{
+		gsub(/^[[:space:]]+/, "")
+		h = $1
+		if (bit_set(h, want_bit) &&
+		    !bit_set(h, hwp_bit) &&
+		    !bit_set(h, nopage_bit) &&
+		    !bit_set(h, tail_bit)) {
+			pfn = base_pfn + NR - 1
+			printf "0x%x\n", pfn * pagesize
+			exit 0
+		}
+	}
+	'
+}
+
+# Return 0 if /proc/kpageflags bit $2 is set for PFN $1, 1 if it is
+# clear, or 2 if the word cannot be read.  Used to re-confirm the target
+# page type after a non-panicking inject.
+kpageflags_bit_set() {
+	local word
+
+	word=$(od -An -tx8 -v -j "$(($1 * 8))" -N 8 "$kpageflags_path" 2>/dev/null | tr -d '[:space:]')
+	[ -n "$word" ] || return 2
+	(( (16#$word >> $2) & 1 ))
+}
+
+# Best-effort: drop the PG_hwpoison marker set by the inject so a failed
+# run does not leave a poisoned page behind.  hard_offline_page() injects
+# with MF_SW_SIMULATED, so the page stays unpoisonable through the
+# hwpoison debugfs interface (needs CONFIG_HWPOISON_INJECT + debugfs).
+try_unpoison() {
+	[ -w "$unpoison_path" ] || return 0
+	echo "$1" > "$unpoison_path" 2>/dev/null || true
+}
+
+case "$kind" in
+rodata)
+	phys_addr=$(pick_rodata_phys_addr)
+	recheck_bit=$KPF_RESERVED
+	missing_msg='no "Kernel rodata" entry in /proc/iomem'
+	;;
+slab)
+	phys_addr=$(pick_kpageflags_phys_addr "$KPF_SLAB")
+	recheck_bit=$KPF_SLAB
+	missing_msg="no usable slab PFN found in $kpageflags_path"
+	;;
+pgtable)
+	phys_addr=$(pick_kpageflags_phys_addr "$KPF_PGTABLE")
+	recheck_bit=$KPF_PGTABLE
+	missing_msg="no usable page-table PFN found in $kpageflags_path"
+	;;
+*)
+	ksft_exit_fail "unknown kind '$kind' (expected: rodata|slab|pgtable)"
+	;;
+esac
+
+if [ -z "$phys_addr" ]; then
+	ksft_exit_skip "$missing_msg"
+fi
+
+ksft_print "enabling $sysctl_path"
+prior=$(cat "$sysctl_path")
+echo 1 > "$sysctl_path" || ksft_exit_fail "failed to enable sysctl"
+
+pfn=$((phys_addr / pagesize))
+ksft_print "injecting hwpoison at phys 0x$(printf '%x' "$phys_addr") (pfn 0x$(printf '%x' "$pfn"), kind=$kind)"
+ksft_print "expecting kernel panic: 'Memory failure: <pfn>: unrecoverable page'"
+
+# A successful run never returns from the inject -- it panics the kernel.
+# Reaching the code below therefore means no panic fired.  Note whether
+# the write itself succeeded, then put the machine back: restore the
+# sysctl and best-effort unpoison the page we just marked.
+if echo "$phys_addr" > "$inject_path"; then
+	verdict="inject returned without panic; sysctl ineffective"
+else
+	verdict="inject failed before reaching the panic path"
+fi
+
+echo "$prior" > "$sysctl_path"
+try_unpoison "$pfn"
+
+# The page type can change between selection and injection (e.g. a slab
+# or page-table page is freed and reused).  Only treat a missing panic as
+# a failure if the target PFN is still the kernel-owned type we aimed at;
+# if it raced to another type the run is inconclusive, so skip instead.
+kpageflags_bit_set "$pfn" "$recheck_bit"
+case $? in
+0)	ksft_exit_fail "$verdict (page still $kind)" ;;
+1)	ksft_exit_skip "target PFN no longer $kind; raced before inject, inconclusive" ;;
+*)	ksft_exit_fail "$verdict (could not reconfirm page type via $kpageflags_path)" ;;
+esac

gemini-3.1-pro-preview

							input: 8898
							output: 25
							thoughts: 524

**Analyzing the "Panic on Unrecoverable Memory Failure" Patch**

Okay, so I'm looking at this patch that introduces a new sysctl, `panic_on_unrecoverable_memory_failure`. The goal is to make the kernel panic when a memory failure occurs on kernel-owned pages that the regular error handler can't fix. The code modifies `mm/memory-failure.c` and adds a self-test, which is good. The core of it seems to be in this new `is_kernel_owned_page()` function. This is key, as it identifies if the page is truly kernel-space, looking for `PageReserved`, `PageSlab`, `PageTable`, or `PageLargeKmalloc`. If it is indeed a kernel page, then `get_any_page()` returns `-ENOTRECOVERABLE`. The existing `memory_failure()` function then processes this error code and calls `action_result()`. It's `action_result()` that has been updated to check this new `panic_on_unrecoverable_mf` function, and if it returns true, then panic.

This directly touches core memory failure handling, so reachability is critical. I'm thinking this could be triggered by genuine hardware memory errors, of course, or even by injecting faults via software. `madvise(MADV_HWPOISON)` immediately comes to mind, but that usually operates on user-space pages. However, maybe `/sys/devices/system/memory/hard_offline_page` or something in debugfs could be used to inject it onto kernel pages. The self-test, in fact, does use that `/sys/devices/system/memory/hard_offline_page` method, which is a good sign.

The question then is, how do we fuzz this? Syzkaller seems like the obvious choice. It has `syz_inject_hwpoison` or something similar that could target this, and it *can* write to `/sys/devices/system/memory/hard_offline_page`. However, I need to remember that Syzkaller doesn't typically focus on hardware error injection unless specifically configured. But it does have some `madvise(MADV_HWPOISON)` support. This is good to keep in mind.

Wait, I just remembered the default: the sysctl is disabled by default! `sysctl_panic_on_unrecoverable_mf` defaults to zero, so to effectively fuzz this, we'll *have* to enable that sysctl first. Luckily, Syzkaller *can* write to `/proc/sys/vm/panic_on_unrecoverable_memory_failure`.

I should probably dive into `mm/memory-failure.c` code directly now. I'll need to use `grepper` to see if the `MEMORY_FAILURE` config option is involved in any of this as well.

{
  "Expression": "MEMORY_FAILURE",
  "PathPrefix": "mm/"
}

{
  "Output": "mm/Kconfig=702=config DEFAULT_MMAP_MIN_ADDR\n--\nmm/Kconfig-720-\nmm/Kconfig:721:config ARCH_SUPPORTS_MEMORY_FAILURE\nmm/Kconfig-722-\tbool\nmm/Kconfig-723-\nmm/Kconfig:724:config MEMORY_FAILURE\nmm/Kconfig-725-\tdepends on MMU\nmm/Kconfig:726:\tdepends on ARCH_SUPPORTS_MEMORY_FAILURE\nmm/Kconfig-727-\tbool \"Enable recovery from hardware memory errors\"\n--\nmm/Kconfig=735=config HWPOISON_INJECT\nmm/Kconfig-736-\ttristate \"HWPoison pages injector\"\nmm/Kconfig:737:\tdepends on MEMORY_FAILURE \u0026\u0026 DEBUG_KERNEL \u0026\u0026 PROC_FS\nmm/Kconfig-738-\tselect PROC_PAGE_MONITOR\n--\nmm/Kconfig.debug=147=config PAGE_POISONING\n--\nmm/Kconfig.debug-156-\t  Note that \"poison\" here is not the same thing as the \"HWPoison\"\nmm/Kconfig.debug:157:\t  for CONFIG_MEMORY_FAILURE. This is software poisoning only.\nmm/Kconfig.debug-158-\n--\nmm/Makefile=111=obj-$(CONFIG_DMAPOOL_TEST) += dmapool_test.o\nmm/Makefile:112:obj-$(CONFIG_MEMORY_FAILURE) += memory-failure.o\nmm/Makefile-113-obj-$(CONFIG_HWPOISON_INJECT) += hwpoison-inject.o\n--\nmm/internal.h=1400=static inline bool node_reclaim_enabled(void)\n--\nmm/internal.h-1408- */\nmm/internal.h:1409:#ifdef CONFIG_MEMORY_FAILURE\nmm/internal.h-1410-int unmap_poisoned_folio(struct folio *folio, unsigned long pfn, bool must_kill);\n--\nmm/ksm.c=3155=void rmap_walk_ksm(struct folio *folio, struct rmap_walk_control *rwc)\n--\nmm/ksm.c-3223-\nmm/ksm.c:3224:#ifdef CONFIG_MEMORY_FAILURE\nmm/ksm.c-3225-/*\n--\nmm/madvise.c=1339=static int madvise_vma_behavior(struct madvise_behavior *madv_behavior)\n--\nmm/madvise.c-1443-\nmm/madvise.c:1444:#ifdef CONFIG_MEMORY_FAILURE\nmm/madvise.c-1445-/*\n--\nmm/madvise.c=1511=static bool is_memory_failure(struct madvise_behavior *madv_behavior)\n--\nmm/madvise.c-1515-\nmm/madvise.c:1516:#endif\t/* CONFIG_MEMORY_FAILURE */\nmm/madvise.c-1517-\n--\nmm/madvise.c=1519=madvise_behavior_valid(int behavior)\n--\nmm/madvise.c-1550-\tcase MADV_GUARD_REMOVE:\nmm/madvise.c:1551:#ifdef CONFIG_MEMORY_FAILURE\nmm/madvise.c-1552-\tcase MADV_SOFT_OFFLINE:\n--\nmm/memory-failure.c=2618=EXPORT_SYMBOL_GPL(memory_failure);\nmm/memory-failure.c-2619-\nmm/memory-failure.c:2620:#define MEMORY_FAILURE_FIFO_ORDER\t4\nmm/memory-failure.c:2621:#define MEMORY_FAILURE_FIFO_SIZE\t(1 \u003c\u003c MEMORY_FAILURE_FIFO_ORDER)\nmm/memory-failure.c-2622-\n--\nmm/memory-failure.c=2628=struct memory_failure_cpu {\nmm/memory-failure.c-2629-\tDECLARE_KFIFO(fifo, struct memory_failure_entry,\nmm/memory-failure.c:2630:\t\t      MEMORY_FAILURE_FIFO_SIZE);\nmm/memory-failure.c-2631-\traw_spinlock_t lock;\n--\nmm/memory-failure.c=2848=static int soft_offline_in_use_page(struct page *page)\n--\nmm/memory-failure.c-2859-\t\t.gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL,\nmm/memory-failure.c:2860:\t\t.reason = MR_MEMORY_FAILURE,\nmm/memory-failure.c-2861-\t};\n--\nmm/memory-failure.c-2917-\t\tret = migrate_pages(\u0026pagelist, alloc_migration_target, NULL,\nmm/memory-failure.c:2918:\t\t\t(unsigned long)\u0026mtc, MIGRATE_SYNC, MR_MEMORY_FAILURE, NULL);\nmm/memory-failure.c-2919-\t\tif (!ret) {\n--\nmm/migrate.c=1190=static void migrate_folio_done(struct folio *src,\n--\nmm/migrate.c-1196-\nmm/migrate.c:1197:\tif (reason != MR_MEMORY_FAILURE)\nmm/migrate.c-1198-\t\t/* We release the page in page_handle_poison. */\n--\nmm/page_alloc.c=7609=EXPORT_SYMBOL(is_free_buddy_page);\nmm/page_alloc.c-7610-\nmm/page_alloc.c:7611:#ifdef CONFIG_MEMORY_FAILURE\nmm/page_alloc.c-7612-static inline void add_to_free_list(struct page *page, struct zone *zone,\n--\nmm/page_vma_mapped.c=180=bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)\n--\nmm/page_vma_mapped.c-334-\nmm/page_vma_mapped.c:335:#ifdef CONFIG_MEMORY_FAILURE\nmm/page_vma_mapped.c-336-/**\n--\nmm/show_mem.c=405=void __show_mem(unsigned int filter, nodemask_t *nodemask, int max_zone_idx)\n--\nmm/show_mem.c-427-#endif\nmm/show_mem.c:428:#ifdef CONFIG_MEMORY_FAILURE\nmm/show_mem.c-429-\tprintk(\"%lu pages hwpoisoned\\n\", atomic_long_read(\u0026num_poisoned_pages));\n"
}

gemini-3.1-pro-preview

							input: 11750
							output: 34
							thoughts: 529

**Fuzzing Analysis: Unrecoverable Memory Failure Handling**

Okay, so this patch introduces a new sysctl, `panic_on_unrecoverable_memory_failure`, and modifies `memory_failure()`. My immediate thought is, "Worth fuzzing?" Absolutely, this is new functionality in the kernel that's designed to trigger a panic under specific memory failure conditions.

But wait, a panic means a crash, and a fuzzer *will* see that. The sysctl is disabled by default, which is important. If the fuzzer enables it, and then triggers an unrecoverable memory failure, boom, panic. Is the panic a bug? No, it's the *intended* behavior when the sysctl is set. So, if the fuzzer hits this, it'll report a "crash," but it's potentially a false positive from a fuzzing perspective, because we intentionally caused the kernel to panic.

Now, if the fuzzer can trigger a memory failure on a kernel page and the sysctl is enabled, it *will* panic. The question then becomes, should we fuzz this? The patch *might* uncover bugs in the memory failure handling logic itself, or it could simply trigger the intended panic.

Hold on, the prompt says, “Return WorthFuzzing=false if the patch only contains code paths that are impossible to reach in virtualized environments..." Can we trigger a memory failure in QEMU? Yes, we can use `madvise(MADV_HWPOISON)` or `/sys/devices/system/memory/hard_offline_page`.

Wait, the patch *explicitly* adds a selftest that uses `/sys/devices/system/memory/hard_offline_page`. That's a clear signal. This *is* worth fuzzing, because it modifies core kernel logic and it is possible to hit it in a virtualized environment.

Now, we need to consider how the fuzzer is configured. If the sysctl is enabled, it panics. If it's disabled, it doesn't. If we enable the sysctl, the fuzzer will definitely hit that panic and report it as a crash. Is that what we *want*? Probably not. A fuzzing setup usually shouldn't be intentionally configured to panic the kernel unless it's designed to test a specific panic condition. Syzkaller generally avoids panics if they are intended, or we just wouldn't enable the sysctl.

Okay, if we don't enable the sysctl, the new code `panic_on_unrecoverable_mf` simply returns false. But, it also adds `is_kernel_owned_page()` which is called unconditionally in `get_any_page()`. That sounds interesting. I should go look at those changes to `get_any_page()` closer.

{
  "Expression": "is_kernel_owned_page",
  "PathPrefix": "mm/memory-failure.c"
}

{
  "Output": "mm/memory-failure.c=1339=static inline bool HWPoisonHandlable(struct page *page, unsigned long flags)\n--\nmm/memory-failure.c-1355- */\nmm/memory-failure.c:1356:static inline bool is_kernel_owned_page(struct page *page)\nmm/memory-failure.c-1357-{\n--\nmm/memory-failure.c=1419=static int get_any_page(struct page *p, unsigned long flags)\n--\nmm/memory-failure.c-1432-\t */\nmm/memory-failure.c:1433:\tif (is_kernel_owned_page(p)) {\nmm/memory-failure.c-1434-\t\tif (count_increased)\n--\nmm/memory-failure.c=1493=static int __get_unpoison_page(struct page *page)\n--\nmm/memory-failure.c-1545- *         -ENOTRECOVERABLE for kernel-owned pages identified by\nmm/memory-failure.c:1546: *         is_kernel_owned_page() (PG_reserved, slab,\nmm/memory-failure.c-1547- *         page-table, large-kmalloc) that the handler cannot recover.\n"
}

gemini-3.1-pro-preview

							input: 13246
							output: 33
							thoughts: 264

**My Analysis of a Recent Kernel Change**

Okay, so I've been digging into this kernel code, and here's what I'm seeing. It looks like `is_kernel_owned_page()` gets called unconditionally within `get_any_page()`. Interesting. If that function returns `true`, indicating a kernel-owned page, `get_any_page()` immediately returns `-ENOTRECOVERABLE`. Now, `memory_failure()` handles that specific error code, `-ENOTRECOVERABLE`, and does so by calling `action_result(pfn, MF_MSG_KERNEL, MF_IGNORED)`.

The crucial bit is, this *changes* how `memory_failure()` behaves with kernel-owned pages, even if the relevant sysctl is disabled. Previously, if I understand correctly, it would have gone through `try_again` and eventually given a `-EIO` error after calling `shake_page()`. Now, it’s short-circuiting and returning `-ENOTRECOVERABLE`. This is a *functional* change in `memory_failure()`. This really is a change in the intended behavior, at least to me.

Okay, is this fuzzable? Absolutely. Seems like a good target. I should really focus on `memory_failure`, `get_any_page`, and `is_kernel_owned_page` when I fuzz. Wait a second, `memory_failure` is actually the main entry point to it all. I should see how reachable it is, and yup, it's accessible through `madvise(MADV_HWPOISON)` or `hard_offline_page`. Now I should just look into `madvise(MADV_HWPOISON)`.

{
  "Expression": "MADV_HWPOISON",
  "PathPrefix": "mm/madvise.c"
}

{
  "Output": "mm/madvise.c=1493=static bool is_memory_failure(struct madvise_behavior *madv_behavior)\n--\nmm/madvise.c-1495-\tswitch (madv_behavior-\u003ebehavior) {\nmm/madvise.c:1496:\tcase MADV_HWPOISON:\nmm/madvise.c-1497-\tcase MADV_SOFT_OFFLINE:\n--\nmm/madvise.c=1519=madvise_behavior_valid(int behavior)\n--\nmm/madvise.c-1552-\tcase MADV_SOFT_OFFLINE:\nmm/madvise.c:1553:\tcase MADV_HWPOISON:\nmm/madvise.c-1554-#endif\n--\nmm/madvise.c=1888=static int madvise_do_behavior(unsigned long start, size_t len_in,\n--\nmm/madvise.c-1944- *  MADV_KEEPONFORK - undo the effect of MADV_WIPEONFORK\nmm/madvise.c:1945: *  MADV_HWPOISON - trigger memory error handler as if the given memory range\nmm/madvise.c-1946- *\t\twere corrupted by unrecoverable hardware memory failure.\n"
}

gemini-3.1-pro-preview

							input: 14256
							output: 32
							thoughts: 192

**Analyzing `MADV_HWPOISON` and its Implications**

Okay, so I see that `MADV_HWPOISON` is supported, that's good. But wait a minute, I need to remember that `madvise(MADV_HWPOISON)` only operates on user-mapped pages. This means I can't just throw it at anything. Now, I need to check what `is_kernel_owned_page()` uses to determine a kernel-owned page. Ah, it uses checks for `PageReserved`, `PageSlab`, `PageTable`, and `PageLargeKmalloc`.

Can a user actually *map* a `PageReserved` page? I believe they can. I'm pretty sure that happens through things like `/dev/mem`, or maybe even some drivers with special capabilities, right? But the others, like a `PageSlab` or a `PageTable`, those are usually off-limits to userland.

Hold on, I also need to think about `hard_offline_page`. Can that function target *any* physical frame number (PFN), regardless of ownership? I think so. And I remember that Syzkaller, which I'm dealing with here, it runs as root in its testing virtual machines. Wait, is that right? Yes, that's right. Syzkaller *does* have root privileges. So, can Syzkaller then write to `/sys/devices/system/memory/hard_offline_page` to trigger this? I need to verify if `hard_offline_page` is even enabled. That's the key question now.

{
  "Expression": "hard_offline_page",
  "PathPrefix": "drivers/base/memory.c"
}

{
  "Output": "drivers/base/memory.c=604=static ssize_t soft_offline_page_store(struct device *dev,\n--\ndrivers/base/memory.c-619-/* Forcibly offline a page, including killing processes. */\ndrivers/base/memory.c:620:static ssize_t hard_offline_page_store(struct device *dev,\ndrivers/base/memory.c-621-\t\t\t\t       struct device_attribute *attr,\n--\ndrivers/base/memory.c=637=static DEVICE_ATTR_WO(soft_offline_page);\ndrivers/base/memory.c:638:static DEVICE_ATTR_WO(hard_offline_page);\ndrivers/base/memory.c-639-#endif\n--\ndrivers/base/memory.c=911=static struct attribute *memory_root_attrs[] = {\n--\ndrivers/base/memory.c-917-\t\u0026dev_attr_soft_offline_page.attr,\ndrivers/base/memory.c:918:\t\u0026dev_attr_hard_offline_page.attr,\ndrivers/base/memory.c-919-#endif\n"
}

gemini-3.1-pro-preview

							input: 15111
							output: 29
							thoughts: 27

Okay, here's my summarized thought process, presented as if I'm the one having these thoughts, targeting an expert audience:

**Kernel Memory Failure Investigation**

Alright, let's see, the context points towards something related to kernel memory failure. My initial hunch, given the context, is that the relevant code should be in the `drivers/base/` directory, specifically within the `memory.c` file. That seems like the logical place to start digging. Now, the key is whether or not this functionality is gated behind a configuration option. I'll need to check if this is compiled under `CONFIG_MEMORY_FAILURE`. That's where I need to start looking, for the preprocessor directives, to see how that's handled.

{
  "Expression": "hard_offline_page",
  "PathPrefix": "drivers/base/"
}

{
  "Output": "drivers/base/memory.c=604=static ssize_t soft_offline_page_store(struct device *dev,\n--\ndrivers/base/memory.c-619-/* Forcibly offline a page, including killing processes. */\ndrivers/base/memory.c:620:static ssize_t hard_offline_page_store(struct device *dev,\ndrivers/base/memory.c-621-\t\t\t\t       struct device_attribute *attr,\n--\ndrivers/base/memory.c=637=static DEVICE_ATTR_WO(soft_offline_page);\ndrivers/base/memory.c:638:static DEVICE_ATTR_WO(hard_offline_page);\ndrivers/base/memory.c-639-#endif\n--\ndrivers/base/memory.c=911=static struct attribute *memory_root_attrs[] = {\n--\ndrivers/base/memory.c-917-\t\u0026dev_attr_soft_offline_page.attr,\ndrivers/base/memory.c:918:\t\u0026dev_attr_hard_offline_page.attr,\ndrivers/base/memory.c-919-#endif\n"
}

gemini-3.1-pro-preview

							input: 15612
							output: 35
							thoughts: 11

Okay, here's my take on those thoughts, tailored for an expert audience:

**Analyzing Conditional Compilation**

Alright, my first instinct here is to zero in on that `#ifdef`. I need to understand the surrounding context of this code block immediately. I'm looking beyond just the code snippet itself and thinking about which configuration options or platform-specific definitions are influencing its inclusion or exclusion. This is classic debugging and maintenance, after all. I'm mentally parsing the preprocessor directives, quickly assessing what environment variables or build flags are probably set to control its behavior. Are we dealing with a debugging feature, a platform-specific optimization, or perhaps a feature flag tied to a paid version? Knowing the preprocessor symbol will be key to understanding the code's ultimate functionality in a given build configuration. The purpose of this ifdef will allow me to assess the ramifications of this code block to my current work. I am assessing the dependencies and implications of this conditional.

{
  "File": "drivers/base/memory.c",
  "FirstLine": 610,
  "LineCount": 30
}

{
  "Contents": " 610:\t\tif (!capable(CAP_SYS_ADMIN))\n 611:\t\t\treturn -EPERM;\n 612:\t\tif (kstrtoull(buf, 0, \u0026pfn) \u003c 0)\n 613:\t\t\treturn -EINVAL;\n 614:\t\tpfn \u003e\u003e= PAGE_SHIFT;\n 615:\t\tret = soft_offline_page(pfn, 0);\n 616:\t\treturn ret == 0 ? count : ret;\n 617:\t}\n 618:\t\n 619:\t/* Forcibly offline a page, including killing processes. */\n 620:\tstatic ssize_t hard_offline_page_store(struct device *dev,\n 621:\t\t\t\t\t       struct device_attribute *attr,\n 622:\t\t\t\t\t       const char *buf, size_t count)\n 623:\t{\n 624:\t\tint ret;\n 625:\t\tu64 pfn;\n 626:\t\tif (!capable(CAP_SYS_ADMIN))\n 627:\t\t\treturn -EPERM;\n 628:\t\tif (kstrtoull(buf, 0, \u0026pfn) \u003c 0)\n 629:\t\t\treturn -EINVAL;\n 630:\t\tpfn \u003e\u003e= PAGE_SHIFT;\n 631:\t\tret = memory_failure(pfn, MF_SW_SIMULATED);\n 632:\t\tif (ret == -EOPNOTSUPP)\n 633:\t\t\tret = 0;\n 634:\t\treturn ret ? ret : count;\n 635:\t}\n 636:\t\n 637:\tstatic DEVICE_ATTR_WO(soft_offline_page);\n 638:\tstatic DEVICE_ATTR_WO(hard_offline_page);\n 639:\t#endif\n"
}

gemini-3.1-pro-preview

							input: 16327
							output: 35
							thoughts: 13

Okay, here's my attempt at summarizing that thought process, tailored for an expert audience:

**Analyzing the Conditional Compilation Block**

Alright, I'm diving in. The immediate next step is to examine the code segment above the `#ifdef` directive. I need to understand the context and what conditions are being checked. This is where the real meat of the logic usually lies; the preprocessor's logic is just a guard rail, a setup for the core execution. I'm looking for the implications of *why* this block is conditionally compiled in the first place. What’s the rationale behind this? What environmental factors, feature flags, or platform differences are being accounted for? Understanding the surrounding code, especially the potential preprocessor definitions, is critical to understanding what specific block of code is being protected or enabled. I will focus my efforts on understanding the `#ifdef` statement's purpose.

{
  "File": "drivers/base/memory.c",
  "FirstLine": 590,
  "LineCount": 20
}

{
  "Contents": " 590:\tout:\n 591:\t\tunlock_device_hotplug();\n 592:\t\treturn ret;\n 593:\t}\n 594:\t\n 595:\tstatic DEVICE_ATTR_WO(probe);\n 596:\t#endif\n 597:\t\n 598:\t#ifdef CONFIG_MEMORY_FAILURE\n 599:\t/*\n 600:\t * Support for offlining pages of memory\n 601:\t */\n 602:\t\n 603:\t/* Soft offline a page */\n 604:\tstatic ssize_t soft_offline_page_store(struct device *dev,\n 605:\t\t\t\t\t       struct device_attribute *attr,\n 606:\t\t\t\t\t       const char *buf, size_t count)\n 607:\t{\n 608:\t\tint ret;\n 609:\t\tu64 pfn;\n"
}

gemini-3.1-pro-preview

							input: 16806
							output: 30
							thoughts: 317

**Fuzzing Strategy for `CONFIG_MEMORY_FAILURE` Changes**

Okay, so this patch is touching code under `CONFIG_MEMORY_FAILURE`. My initial assessment is that it's altering the behavior of `memory_failure()` and `get_any_page()`, so this is definitely worth a closer look for fuzzing potential. We need to see if it's got functional changes.

The `panic_on_unrecoverable_memory_failure` sysctl is off by default, but the fuzzer can easily flip that switch to explore all possibilities. Even if we keep it off, there's a significant change: `is_kernel_owned_page()` will alter the return value of `get_any_page()` to `-ENOTRECOVERABLE` instead of `-EIO`. That, in turn, changes how `memory_failure()` behaves; it now causes it to immediately return `action_result(pfn, MF_MSG_KERNEL, MF_IGNORED)`. This is clearly a functional change, so yes, we should definitely fuzz this.

Now, what are the key symbols to target? I'm thinking `memory_failure`, `get_any_page`, `is_kernel_owned_page`, and `panic_on_unrecoverable_mf`. But hold on, `is_kernel_owned_page` and `panic_on_unrecoverable_mf` are marked as static inline or static functions. They're likely inlined, so that affects their usefulness as direct fuzz targets. So, `memory_failure` and `get_any_page` remain the primary focus, and `action_result` is also worth keeping in mind, since it's the return of a critical function.

As for configurations, `MEMORY_FAILURE` is the obvious one, but wait, `MEMORY_FAILURE` *depends* on `ARCH_SUPPORTS_MEMORY_FAILURE`. So we need to make sure we also account for `ARCH_SUPPORTS_MEMORY_FAILURE` in our testing.

{
  "Expression": "ARCH_SUPPORTS_MEMORY_FAILURE",
  "PathPrefix": "arch/"
}

{
  "Output": "arch/arm64/Kconfig=2=config ARM64\n--\narch/arm64/Kconfig-73-\tselect ARCH_SUPPORTS_HUGETLBFS\narch/arm64/Kconfig:74:\tselect ARCH_SUPPORTS_MEMORY_FAILURE\narch/arm64/Kconfig-75-\tselect ARCH_SUPPORTS_SHADOW_CALL_STACK if CC_HAVE_SHADOW_CALL_STACK\n--\narch/parisc/Kconfig=2=config PARISC\n--\narch/parisc/Kconfig-22-\tselect ARCH_SUPPORTS_HUGETLBFS if PA20\narch/parisc/Kconfig:23:\tselect ARCH_SUPPORTS_MEMORY_FAILURE\narch/parisc/Kconfig-24-\tselect ARCH_STACKWALK\n--\narch/powerpc/Kconfig=624=config PPC64_SUPPORTS_MEMORY_FAILURE\n--\narch/powerpc/Kconfig-627-\tdefault \"y\" if PPC_POWERNV\narch/powerpc/Kconfig:628:\tselect ARCH_SUPPORTS_MEMORY_FAILURE\narch/powerpc/Kconfig-629-\n--\narch/x86/Kconfig=762=config X86_SUPPORTS_MEMORY_FAILURE\n--\narch/x86/Kconfig-768-\tdepends on X86_64 || !SPARSEMEM\narch/x86/Kconfig:769:\tselect ARCH_SUPPORTS_MEMORY_FAILURE\narch/x86/Kconfig-770-\n"
}

gemini-3.1-pro-preview

							input: 17971
							output: 189
							thoughts: 1101

**Analyzing Memory Failure Handling and Related Configurations**

Okay, so I'm focusing on enabling and understanding the `MEMORY_FAILURE` mechanism. My initial thought is to identify all relevant configurations. I see that `MEMORY_FAILURE` is the primary one, and I'm also considering `HWPOISON_INJECT`. The latter seems interesting for fuzzing, possibly enabling the injection of memory failures. I know it provides `/sys/kernel/debug/hwpoison/corrupt-pfn` and `unpoison-pfn`, and the selftests use `unpoison-pfn`, which confirms its relevance.

I'm checking if syzkaller utilizes `HWPOISON_INJECT`. It appears that syzkaller uses `madvise(MADV_HWPOISON)` and `/sys/devices/system/memory/hard_offline_page`, so `HWPOISON_INJECT` is definitely related and worth enabling for effective testing and memory error injection. Therefore, I'll include both `MEMORY_FAILURE` and `HWPOISON_INJECT` in my configuration.

Next, I'll examine the code changes in `mm/memory-failure.c`. I'm focusing on the functions directly involved in this process, specifically `memory_failure`, `get_any_page`, `is_kernel_owned_page`, `panic_on_unrecoverable_mf`, and `action_result`. I'm noting that `is_kernel_owned_page` and `panic_on_unrecoverable_mf` are static and potentially inlined, which might affect their symbol availability.

Now, I'm verifying the export status of the key functions. I see that `memory_failure` is exported with `EXPORT_SYMBOL_GPL(memory_failure)`, so it's directly accessible. `get_any_page` and `action_result` are static within the file, making their direct use outside this context unlikely, but I'm primarily interested in their internal logic.

From what I can tell, the modifications I've read modify the core memory failure handling logic. They short-circuit error processing for kernel-owned pages (such as those used by the slab allocator, page tables, or reserved pages). A new sysctl is also introduced to trigger a panic upon encountering unrecoverable errors with these kernel-owned pages. This mechanism is clearly linked to hardware error injection mechanisms such as `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`.

To reiterate, I need to double-check that `HWPOISON_INJECT` is a necessity. It is a module which provides debugfs interfaces. Syzkaller can utilize it. Thus, it is desirable to have it enabled.

Looking for other dependencies, I can see that `DEBUG_KERNEL` and `PROC_FS` are dependencies for `HWPOISON_INJECT`. They're likely standard and don't require specific consideration. `MMU` is a dependency for `MEMORY_FAILURE`, which is also standard.

Based on this analysis, I'm confident in the approach and these configurations are the most important.

{
  "EnableConfigs": [
    "MEMORY_FAILURE",
    "HWPOISON_INJECT"
  ],
  "FocusSymbols": [
    "memory_failure",
    "get_any_page",
    "action_result",
    "panic_on_unrecoverable_mf",
    "is_kernel_owned_page"
  ],
  "Reasoning": "The patch modifies the core memory failure handling logic (`memory_failure`, `get_any_page`) to explicitly identify and short-circuit on kernel-owned pages (slab, page tables, reserved, large kmalloc). It also introduces a new sysctl to panic on such unrecoverable errors. This is reachable via hardware error injection interfaces (e.g., `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`).",
  "WorthFuzzing": true
}

{
  "EnableConfigs": [
    "MEMORY_FAILURE",
    "HWPOISON_INJECT"
  ],
  "FocusSymbols": [
    "memory_failure",
    "get_any_page",
    "action_result",
    "panic_on_unrecoverable_mf",
    "is_kernel_owned_page"
  ],
  "Reasoning": "The patch modifies the core memory failure handling logic (`memory_failure`, `get_any_page`) to explicitly identify and short-circuit on kernel-owned pages (slab, page tables, reserved, large kmalloc). It also introduces a new sysctl to panic on such unrecoverable errors. This is reachable via hardware error injection interfaces (e.g., `/sys/devices/system/memory/hard_offline_page` or `madvise(MADV_HWPOISON)`).",
  "WorthFuzzing": true
}

LLM Calls Summary:
Total Calls	Total Tokens	Avg Tokens	Total Duration (Seconds)	Avg Duration (Seconds)

Tool Calls Summary:
Total Calls	Total Duration (Seconds)	Avg Duration (Seconds)