From: "Pratyush Yadav (Google)" The KHO restoration machinery is not capable of dealing with preservations that span multiple NUMA nodes. kho_preserve_folio() guarantees the preservation will only span one NUMA node since folios can't span multiple nodes. This leaves kho_preserve_pages(). While semantically kho_preserve_pages() only deals with 0-order pages, so all preservations should be single page only, in practice it combines preservations to higher orders for efficiency. This can result in a preservation spanning multiple nodes. Break up the preservations into a smaller order if that happens. Suggested-by: Pasha Tatashin Signed-off-by: Pratyush Yadav (Google) --- Notes: Ref: https://lore.kernel.org/linux-mm/CA+CK2bDvaGmfkCPCMWM6gPcd4FfUyD6e5yWE+kNcma1vT3Jw3g@mail.gmail.com/ kernel/liveupdate/kexec_handover.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c index cc68a3692905..bc9bd18294ee 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -869,9 +869,17 @@ int kho_preserve_pages(struct page *page, unsigned long nr_pages) } while (pfn < end_pfn) { - const unsigned int order = + unsigned int order = min(count_trailing_zeros(pfn), ilog2(end_pfn - pfn)); + /* + * Make sure all the pages in a single preservation are in the + * same NUMA node. The restore machinery can not cope with a + * preservation spanning multiple NUMA nodes. + */ + while (pfn_to_nid(pfn) != pfn_to_nid(pfn + (1UL << order) - 1)) + order--; + err = __kho_preserve_order(track, pfn, order); if (err) { failed_pfn = pfn; -- 2.53.0.473.g4a7958ca14-goog KHO currently restricts the maximum order of a restored page to the maximum order supported by the buddy allocator. While this works fine for much of the data passed across kexec, it is possible to have pages larger than MAX_PAGE_ORDER. For one, it is possible to get a larger order when using kho_preserve_pages() if the number of pages is large enough, since it tries to combine multiple aligned 0-order preservations into one higher order preservation. For another, upcoming support for hugepages can have gigantic hugepages being preserved over KHO. There is no real reason for this limit. The KHO preservation machinery can handle any page order. Remove this artificial restriction on max page order. Signed-off-by: Pratyush Yadav Signed-off-by: Pratyush Yadav (Google) --- Notes: This patch was first sent with this RFC series [0]. I am sending it separately since it is an independent patch that is useful even without hugepage preservation. No changes since the RFC. [0] https://lore.kernel.org/linux-mm/20251206230222.853493-1-pratyush@kernel.org/T/#u kernel/liveupdate/kexec_handover.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/liveupdate/kexec_handover.c b/kernel/liveupdate/kexec_handover.c index bc9bd18294ee..1038e41ff9f9 100644 --- a/kernel/liveupdate/kexec_handover.c +++ b/kernel/liveupdate/kexec_handover.c @@ -253,7 +253,7 @@ static struct page *kho_restore_page(phys_addr_t phys, bool is_folio) * check also implicitly makes sure phys is order-aligned since for * non-order-aligned phys addresses, magic will never be set. */ - if (WARN_ON_ONCE(info.magic != KHO_PAGE_MAGIC || info.order > MAX_PAGE_ORDER)) + if (WARN_ON_ONCE(info.magic != KHO_PAGE_MAGIC)) return NULL; nr_pages = (1 << info.order); -- 2.53.0.473.g4a7958ca14-goog