drain_pages_zone completely drains a zone of its pcp free pages by repeatedly calling free_pcppages_bulk until pcp->count reaches 0. In this loop, it already performs batched calls to ensure that free_pcppages_bulk isn't called to free too many pages at once, and relinquishes & reacquires the lock between each call to prevent lock starvation from other processes. However, the current batching does not prevent lock starvation. The current implementation creates batches of pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX, which has been seen in Meta workloads to be up to 64 << 5 == 2048 pages. While it is true that CONFIG_PCP_BATCH_SCALE_MAX is a config and indeed can be adjusted by the system admin to be any number from 0 to 6, it's default value of 5 is still too high to be reasonable for any system. Instead, let's create batches of pcp->batch pages, which gives a more reasonable 64 pages per call to free_pcppages_bulk. This gives other processes a chance to grab the lock and prevents starvation. Each individual call to drain_pages_zone may take longer, but we avoid the worst case scenario of completely starving out other system-critical threads from acquiring the pcp lock while 2048 pages are freed one-by-one. Signed-off-by: Joshua Hahn --- mm/page_alloc.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 77e7d9a5f149..b861b647f184 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2623,8 +2623,7 @@ static void drain_pages_zone(unsigned int cpu, struct zone *zone) spin_lock(&pcp->lock); count = pcp->count; if (count) { - int to_drain = min(count, - pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX); + int to_drain = min(count, pcp->batch); free_pcppages_bulk(zone, to_drain, pcp, 0); count -= to_drain; -- 2.47.3