Apply the same batch-freeing optimization from free_contig_range() to the frozen page path. The previous __free_contig_frozen_range() freed each order-0 page individually via free_frozen_pages(), which is slow for the same reason the old free_contig_range() was: each page goes to the order-0 pcp list rather than being coalesced into higher-order blocks. Rewrite __free_contig_frozen_range() to call free_pages_prepare() for each order-0 page, then batch the prepared pages into the largest possible power-of-2 aligned chunks via free_prepared_contig_range(). If free_pages_prepare() fails (e.g. HWPoison, bad page) the page is deliberately not freed; it should not be returned to the allocator. I've tested CMA through debugfs. The test allocates 16384 pages per allocation for several iterations. There is 3.5x improvement. Before: 1406 usec per iteration After: 402 usec per iteration Before: 70.89% 0.69% cma [kernel.kallsyms] [.] free_contig_frozen_range | |--70.20%--free_contig_frozen_range | | | |--46.41%--__free_frozen_pages | | | | | --36.18%--free_frozen_page_commit | | | | | --29.63%--_raw_spin_unlock_irqrestore | | | |--8.76%--_raw_spin_trylock | | | |--7.03%--__preempt_count_dec_and_test | | | |--4.57%--_raw_spin_unlock | | | |--1.96%--__get_pfnblock_flags_mask.isra.0 | | | --1.15%--free_frozen_page_commit | --0.69%--el0t_64_sync After: 23.57% 0.00% cma [kernel.kallsyms] [.] free_contig_frozen_range | ---free_contig_frozen_range | |--20.45%--__free_contig_frozen_range | | | |--17.77%--free_pages_prepare | | | --0.72%--free_prepared_contig_range | | | --0.55%--__free_frozen_pages | --3.12%--free_pages_prepare Suggested-by: Zi Yan Signed-off-by: Muhammad Usama Anjum --- Changes since v2: - Rework the loop to check for memory sections just like __free_contig_range() - Didn't add reviewed-by tags because of rework --- mm/page_alloc.c | 26 ++++++++++++++++++++++++-- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 250cc07e547b8..26eac35ef73bd 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -7038,8 +7038,30 @@ static int __alloc_contig_verify_gfp_mask(gfp_t gfp_mask, gfp_t *gfp_cc_mask) static void __free_contig_frozen_range(unsigned long pfn, unsigned long nr_pages) { - for (; nr_pages--; pfn++) - free_frozen_pages(pfn_to_page(pfn), 0); + struct page *page = pfn_to_page(pfn); + struct page *start = NULL; + unsigned long start_sec; + unsigned long i; + + for (i = 0; i < nr_pages; i++, page++) { + if (!free_pages_prepare(page, 0)) { + if (start) { + free_prepared_contig_range(start, page - start); + start = NULL; + } + } else if (start && + memdesc_section(page->flags) != start_sec) { + free_prepared_contig_range(start, page - start); + start = page; + start_sec = memdesc_section(page->flags); + } else if (!start) { + start = page; + start_sec = memdesc_section(page->flags); + } + } + + if (start) + free_prepared_contig_range(start, page - start); } /** -- 2.47.3