This is a followup of commit e20dfbad8aab ("net: fix napi_consume_skb() with alien skbs"). Now the per-cpu napi_skb_cache is populated from TX completion path, we can make use of this cache, especially for cpus not used from a driver NAPI poll (primary user of napi_cache). We can use the napi_skb_cache only if current context is not from hard irq. With this patch, I consistently reach 130 Mpps on my UDP tx stress test and reduce SLUB spinlock contention to smaller values. Note there is still some SLUB contention for skb->head allocations. I had to tune /sys/kernel/slab/skbuff_small_head/cpu_partial and /sys/kernel/slab/skbuff_small_head/min_partial depending on the platform taxonomy. Signed-off-by: Eric Dumazet --- net/core/skbuff.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 8a0a4ca7fa5dbb2fa3044ee45bb0b9c8c3ca85ea..9feea830a4dbb61c1c661e802ed315eaeebcc809 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -666,7 +666,12 @@ struct sk_buff *__alloc_skb(unsigned int size, gfp_t gfp_mask, skb = napi_skb_cache_get(true); if (unlikely(!skb)) return NULL; + } else if (!in_hardirq() && !irqs_disabled()) { + local_bh_disable(); + skb = napi_skb_cache_get(false); + local_bh_enable(); } + if (!skb) { fallback: skb = kmem_cache_alloc_node(cache, gfp_mask & ~GFP_DMA, node); -- 2.52.0.rc1.455.g30608eb744-goog