15 years after Tom Herbert added skb->ooo_okay, only TCP transport benefits from it. We can support other transports directly from skb_set_owner_w(). If no other TX packet for this socket is in a host queue (qdisc, NIC queue) there is no risk of self-inflicted reordering, we can set skb->ooo_okay. This allows netdev_pick_tx() to choose a TX queue based on XPS settings, instead of reusing the queue chosen at the time the first packet was sent for connected sockets. Tested: 500 concurrent UDP_RR connected UDP flows, host with 32 TX queues, 512 cpus, XPS setup. super_netperf 500 -t UDP_RR -H -l 1000 -- -r 100,100 -Nn & This patch saves between 10% and 20% of cycles, depending on how process scheduler migrates threads among cpus. Using following bpftrace script, we can see the effect on Qdisc/NIC tx queues being better used (less cache line misses). bpftrace -e ' k:__dev_queue_xmit { @start[cpu] = nsecs; } kr:__dev_queue_xmit { if (@start[cpu]) { $delay = nsecs - @start[cpu]; delete(@start[cpu]); @__dev_queue_xmit_ns = hist($delay); } } END { clear(@start); }' Before: @__dev_queue_xmit_ns: [128, 256) 6 | | [256, 512) 116283 | | [512, 1K) 1888205 |@@@@@@@@@@@ | [1K, 2K) 8106167 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | [2K, 4K) 8699293 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [4K, 8K) 2600676 |@@@@@@@@@@@@@@@ | [8K, 16K) 721688 |@@@@ | [16K, 32K) 122995 | | [32K, 64K) 10639 | | [64K, 128K) 119 | | [128K, 256K) 1 | | After: @__dev_queue_xmit_ns: [128, 256) 3 | | [256, 512) 651112 |@@ | [512, 1K) 8109938 |@@@@@@@@@@@@@@@@@@@@@@@@@@ | [1K, 2K) 16081031 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| [2K, 4K) 2411692 |@@@@@@@ | [4K, 8K) 98994 | | [8K, 16K) 1536 | | [16K, 32K) 587 | | [32K, 64K) 2 | | Signed-off-by: Eric Dumazet Reviewed-by: Neal Cardwell --- net/core/sock.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/net/core/sock.c b/net/core/sock.c index 542cfa16ee125f6c8487237c9040695d42794087..08ae20069b6d287745800710192396f76c8781b4 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2694,6 +2694,8 @@ void __sock_wfree(struct sk_buff *skb) void skb_set_owner_w(struct sk_buff *skb, struct sock *sk) { + int old_wmem; + skb_orphan(skb); #ifdef CONFIG_INET if (unlikely(!sk_fullsock(sk))) @@ -2707,7 +2709,15 @@ void skb_set_owner_w(struct sk_buff *skb, struct sock *sk) * is enough to guarantee sk_free() won't free this sock until * all in-flight packets are completed */ - refcount_add(skb->truesize, &sk->sk_wmem_alloc); + __refcount_add(skb->truesize, &sk->sk_wmem_alloc, &old_wmem); + + /* (old_wmem == SK_WMEM_ALLOC_BIAS) if no other TX packet for this socket + * is in a host queue (qdisc, NIC queue). + * Set skb->ooo_okay so that netdev_pick_tx() can choose a TX queue + * based on XPS for better performance. + * Otherwise clear ooo_okay to not risk Out Of Order delivery. + */ + skb->ooo_okay = (old_wmem == SK_WMEM_ALLOC_BIAS); } EXPORT_SYMBOL(skb_set_owner_w); -- 2.51.0.740.g6adb054d12-goog