After an invalid descriptor is found in __xsk_generic_xmit(), xskq_cons_peek_desc() returns false and the loop body is not entered. Jason's drain fixes reclaim descriptors already attached to xs->skb and later continuation descriptors handled through drain_cont, but the offending descriptor that made peek fail is only released from the Tx ring. This loses one completion for each invalid multi-buffer packet in the generic path. Userspace then waits forever for a descriptor that has already been consumed by the kernel. If the failed descriptor belongs to an already-started or already-draining multi-buffer packet, publish its address to the completion ring before releasing it. Standalone invalid descriptors keep the existing behavior. Fixes: cf24f5a5feea ("xsk: add support for AF_XDP multi-buffer on Tx path") Signed-off-by: Maciej Fijalkowski --- net/xdp/xsk.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/net/xdp/xsk.c b/net/xdp/xsk.c index c489fadc3608..43791647cf18 100644 --- a/net/xdp/xsk.c +++ b/net/xdp/xsk.c @@ -1125,8 +1125,22 @@ static int __xsk_generic_xmit(struct sock *sk) } if (xskq_has_descs(xs->tx)) { + bool reclaim_desc = xs->skb || xs->drain_cont; + + if (reclaim_desc) { + err = xsk_cq_reserve_locked(xs->pool); + if (err) { + err = -EAGAIN; + goto out; + } + } + if (xs->skb) xsk_drop_skb(xs->skb); + + if (reclaim_desc) + xsk_cq_submit_addr_single_locked(xs->pool, &desc); + xskq_cons_release(xs->tx); xs->drain_cont = xp_mb_desc(&desc); } -- 2.43.0