From: Jason Xing When a multi-buffer packet exceeds MAX_SKB_FRAGS and triggers -EOVERFLOW, only the current descriptor is released from the TX ring. The remaining continuation descriptors of the same packet stay in the ring. Since xs->skb is set to NULL after the drop, the TX loop picks up these leftover frags and misinterprets each one as the beginning of a new packet, corrupting the packet stream. Fix this by adding a drain_cont flag to xdp_sock. When overflow occurs and the dropped descriptor has XDP_PKT_CONTD set, the flag is raised. The main TX loop in __xsk_generic_xmit() then handles continuation descriptors one at a time: each gets a normal CQ reservation (with backpressure), its address is submitted to the completion queue, and the descriptor is released from the TX ring. When the last fragment (without XDP_PKT_CONTD) is processed, the flag is cleared and the function returns -EOVERFLOW so the next call starts with a fresh budget for normal packets. This reuses the existing CQ backpressure and budget mechanisms, so if the CQ is full the function returns -EAGAIN and userspace drains the CQ before retrying. Zero buffer leakage, zero packet stream corruption. Closes: https://lore.kernel.org/all/20260425041726.85FB3C2BCB2@smtp.kernel.org/ Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") Signed-off-by: Jason Xing --- include/net/xdp_sock.h | 1 + net/xdp/xsk.c | 22 ++++++++++++++++++++++ 2 files changed, 23 insertions(+) diff --git a/include/net/xdp_sock.h b/include/net/xdp_sock.h index 23e8861e8b25..1958d19d9925 100644 --- a/include/net/xdp_sock.h +++ b/include/net/xdp_sock.h @@ -80,6 +80,7 @@ struct xdp_sock { * call of __xsk_generic_xmit(). */ struct sk_buff *skb; + bool drain_cont; struct list_head map_list; /* Protects map_list */ diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index 3f1e590c855d..232dd7126905 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -936,6 +936,8 @@ static struct sk_buff *xsk_build_skb(struct xdp_sock *xs, xs->tx->invalid_descs++; } xskq_cons_release(xs->tx); + if (xp_mb_desc(desc)) + xs->drain_cont = true; } else { /* Let application retry */ xsk_cq_cancel_locked(xs->pool, 1); @@ -982,6 +984,26 @@ static int __xsk_generic_xmit(struct sock *sk) goto out; } + if (unlikely(xs->drain_cont)) { + unsigned long flags; + u32 idx; + + spin_lock_irqsave(&xs->pool->cq_prod_lock, flags); + idx = xskq_get_prod(xs->pool->cq); + xskq_prod_write_addr(xs->pool->cq, idx, desc.addr); + xskq_prod_submit_n(xs->pool->cq, 1); + spin_unlock_irqrestore(&xs->pool->cq_prod_lock, flags); + + xs->tx->invalid_descs++; + xskq_cons_release(xs->tx); + if (!xp_mb_desc(&desc)) { + xs->drain_cont = false; + err = -EOVERFLOW; + goto out; + } + continue; + } + skb = xsk_build_skb(xs, &desc); if (IS_ERR(skb)) { err = PTR_ERR(skb); -- 2.41.3