CMA areas are normally not very large, but HugeTLB CMA is an exception. hugetlb_cma, used for 'gigantic' pages (usually 1G), can take up many gigabytes of memory. As such, it is potentially the largest source of 'false OOM' conditions, situations where the kernel runs out of space for unmovable allocations, because it can't allocate from CMA pageblocks, and non-CMA memory has been tied up by other movable allocations. The normal use case of hugetlb_cma is a system where 1G hugetlb pages are sometimes, but not always, needed, so they need to be created and freed dynamically. As such, the best time to address CMA memory imbalances is when CMA hugetlb pages are freed, making multiples of 1G available as buddy managed CMA pageblocks. That is a good time to check if movable allocations fron non-CMA pageblocks should be moved to CMA pageblocks to give the kernel more breathing space. Do this by calling balance_node_cma on either the hugetlb CMA area for the node that just had its number of hugetlb pages reduced, or for all hugetlb CMA areas if the reduction was not node-specific. To have the CMA balancing code act on the hugetlb CMA areas, set the CMA_BALANCE flag when creating them. Signed-off-by: Frank van der Linden --- mm/hugetlb.c | 14 ++++++++------ mm/hugetlb_cma.c | 16 ++++++++++++++++ mm/hugetlb_cma.h | 5 +++++ 3 files changed, 29 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index eed59cfb5d21..611655876f60 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3971,12 +3971,14 @@ static int set_max_huge_pages(struct hstate *h, unsigned long count, int nid, list_add(&folio->lru, &page_list); } - /* free the pages after dropping lock */ - spin_unlock_irq(&hugetlb_lock); - update_and_free_pages_bulk(h, &page_list); - flush_free_hpage_work(h); - spin_lock_irq(&hugetlb_lock); - + if (!list_empty(&page_list)) { + /* free the pages after dropping lock */ + spin_unlock_irq(&hugetlb_lock); + update_and_free_pages_bulk(h, &page_list); + flush_free_hpage_work(h); + hugetlb_cma_balance(nid); + spin_lock_irq(&hugetlb_lock); + } while (count < persistent_huge_pages(h)) { if (!adjust_pool_surplus(h, nodes_allowed, 1)) break; diff --git a/mm/hugetlb_cma.c b/mm/hugetlb_cma.c index 71d0e9a048d4..c0396d35b5bf 100644 --- a/mm/hugetlb_cma.c +++ b/mm/hugetlb_cma.c @@ -276,3 +276,19 @@ bool __init hugetlb_early_cma(struct hstate *h) return hstate_is_gigantic(h) && hugetlb_cma_only; } + +void hugetlb_cma_balance(int nid) +{ + int node; + + if (nid != NUMA_NO_NODE) { + if (hugetlb_cma[nid]) + balance_node_cma(nid, hugetlb_cma[nid]); + } else { + for_each_online_node(node) { + if (hugetlb_cma[node]) + balance_node_cma(node, + hugetlb_cma[node]); + } + } +} diff --git a/mm/hugetlb_cma.h b/mm/hugetlb_cma.h index f7d7fb9880a2..2f2a35b56d8a 100644 --- a/mm/hugetlb_cma.h +++ b/mm/hugetlb_cma.h @@ -13,6 +13,7 @@ bool hugetlb_cma_exclusive_alloc(void); unsigned long hugetlb_cma_total_size(void); void hugetlb_cma_validate_params(void); bool hugetlb_early_cma(struct hstate *h); +void hugetlb_cma_balance(int nid); #else static inline void hugetlb_cma_free_folio(struct folio *folio) { @@ -53,5 +54,9 @@ static inline bool hugetlb_early_cma(struct hstate *h) { return false; } + +static inline void hugetlb_cma_balance(int nid) +{ +} #endif #endif -- 2.51.0.384.g4c02a37b29-goog