From: Barry Song On phones, we have observed significant phone heating when running apps with high network bandwidth. This is caused by the network stack frequently waking kswapd for order-3 allocations. As a result, memory reclamation becomes constantly active, even though plenty of memory is still available for network allocations which can fall back to order-0. Commit ce27ec60648d ("net: add high_order_alloc_disable sysctl/static key") introduced high_order_alloc_disable for the transmit (TX) path (skb_page_frag_refill()) to mitigate some memory reclamation issues, allowing the TX path to fall back to order-0 immediately, while leaving the receive (RX) path (__page_frag_cache_refill()) unaffected. Users are generally unaware of the sysctl and cannot easily adjust it for specific use cases. Enabling high_order_alloc_disable also completely disables the benefit of order-3 allocations. Additionally, the sysctl does not apply to the RX path. An alternative approach is to disable kswapd for these frequent allocations and provide best-effort order-3 service for both TX and RX paths, while removing the sysctl entirely. Cc: Jonathan Corbet Cc: Eric Dumazet Cc: Kuniyuki Iwashima Cc: Paolo Abeni Cc: Willem de Bruijn Cc: "David S. Miller" Cc: Jakub Kicinski Cc: Simon Horman Cc: Vlastimil Babka Cc: Suren Baghdasaryan Cc: Michal Hocko Cc: Brendan Jackman Cc: Johannes Weiner Cc: Zi Yan Cc: Yunsheng Lin Cc: Huacai Zhou Signed-off-by: Barry Song --- Documentation/admin-guide/sysctl/net.rst | 12 ------------ include/net/sock.h | 1 - mm/page_frag_cache.c | 2 +- net/core/sock.c | 8 ++------ net/core/sysctl_net_core.c | 7 ------- 5 files changed, 3 insertions(+), 27 deletions(-) diff --git a/Documentation/admin-guide/sysctl/net.rst b/Documentation/admin-guide/sysctl/net.rst index 2ef50828aff1..b903bbae239c 100644 --- a/Documentation/admin-guide/sysctl/net.rst +++ b/Documentation/admin-guide/sysctl/net.rst @@ -415,18 +415,6 @@ GRO has decided not to coalesce, it is placed on a per-NAPI list. This list is then passed to the stack when the number of segments reaches the gro_normal_batch limit. -high_order_alloc_disable ------------------------- - -By default the allocator for page frags tries to use high order pages (order-3 -on x86). While the default behavior gives good results in most cases, some users -might have hit a contention in page allocations/freeing. This was especially -true on older kernels (< 5.14) when high-order pages were not stored on per-cpu -lists. This allows to opt-in for order-0 allocation instead but is now mostly of -historical importance. - -Default: 0 - 2. /proc/sys/net/unix - Parameters for Unix domain sockets ---------------------------------------------------------- diff --git a/include/net/sock.h b/include/net/sock.h index 60bcb13f045c..62306c1095d5 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -3011,7 +3011,6 @@ extern __u32 sysctl_wmem_default; extern __u32 sysctl_rmem_default; #define SKB_FRAG_PAGE_ORDER get_order(32768) -DECLARE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); static inline int sk_get_wmem0(const struct sock *sk, const struct proto *proto) { diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index d2423f30577e..dd36114dd16f 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -54,7 +54,7 @@ static struct page *__page_frag_cache_refill(struct page_frag_cache *nc, gfp_t gfp = gfp_mask; #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) - gfp_mask = (gfp_mask & ~__GFP_DIRECT_RECLAIM) | __GFP_COMP | + gfp_mask = (gfp_mask & ~__GFP_RECLAIM) | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY | __GFP_NOMEMALLOC; page = __alloc_pages(gfp_mask, PAGE_FRAG_CACHE_MAX_ORDER, numa_mem_id(), NULL); diff --git a/net/core/sock.c b/net/core/sock.c index dc03d4b5909a..1fa1e9177d86 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -3085,8 +3085,6 @@ static void sk_leave_memory_pressure(struct sock *sk) } } -DEFINE_STATIC_KEY_FALSE(net_high_order_alloc_disable_key); - /** * skb_page_frag_refill - check that a page_frag contains enough room * @sz: minimum size of the fragment we want to get @@ -3110,10 +3108,8 @@ bool skb_page_frag_refill(unsigned int sz, struct page_frag *pfrag, gfp_t gfp) } pfrag->offset = 0; - if (SKB_FRAG_PAGE_ORDER && - !static_branch_unlikely(&net_high_order_alloc_disable_key)) { - /* Avoid direct reclaim but allow kswapd to wake */ - pfrag->page = alloc_pages((gfp & ~__GFP_DIRECT_RECLAIM) | + if (SKB_FRAG_PAGE_ORDER) { + pfrag->page = alloc_pages((gfp & ~__GFP_RECLAIM) | __GFP_COMP | __GFP_NOWARN | __GFP_NORETRY, SKB_FRAG_PAGE_ORDER); diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c index 8cf04b57ade1..181f6532beb8 100644 --- a/net/core/sysctl_net_core.c +++ b/net/core/sysctl_net_core.c @@ -599,13 +599,6 @@ static struct ctl_table net_core_table[] = { .extra1 = SYSCTL_ZERO, .extra2 = SYSCTL_THREE, }, - { - .procname = "high_order_alloc_disable", - .data = &net_high_order_alloc_disable_key.key, - .maxlen = sizeof(net_high_order_alloc_disable_key), - .mode = 0644, - .proc_handler = proc_do_static_key, - }, { .procname = "gro_normal_batch", .data = &net_hotdata.gro_normal_batch, -- 2.39.3 (Apple Git-146)