Running a memcache-like workload under production(ish) load on a 300 thread AMD machine we see ~3% of CPU time spent in kmem_cache_free() via tcp_ack(), freeing skbs from rtx queue. This workloads pins workers away from softirq CPU so the Tx skbs are pretty much always allocated on a different CPU than where the ACKs arrive. Try to use the defer skb free queue to return the skbs back to where they came from. This results in a ~4% performance improvement for the workload. Signed-off-by: Jakub Kicinski --- include/net/tcp.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/net/tcp.h b/include/net/tcp.h index ef0fee58fde8..e290651da508 100644 --- a/include/net/tcp.h +++ b/include/net/tcp.h @@ -332,7 +332,7 @@ static inline void tcp_wmem_free_skb(struct sock *sk, struct sk_buff *skb) sk_mem_uncharge(sk, skb->truesize); else sk_mem_uncharge(sk, SKB_TRUESIZE(skb_end_offset(skb))); - __kfree_skb(skb); + skb_attempt_defer_free(skb); } void sk_forced_mem_schedule(struct sock *sk, int size); -- 2.52.0