When io_uring recv/send with MSG_WAITALL accumulates partial data through done_io and then encounters an error or EOF, req_set_fail() sets REQ_F_FAIL despite the CQE result being positive (done_io bytes). io_disarm_next() then sees REQ_F_FAIL and cancels all linked operations with -ECANCELED, even though the user-visible result indicates success. This manifests in two code paths: 1) Direct completion: io_recv/io_send fall through to req_set_fail() when ret < min_ret, even if done_io > 0. The CQE shows done_io (positive) but REQ_F_FAIL severs the link chain. 2) io-wq fallback: after APOLL_MAX_RETRY (128) poll retries, the request moves to io-wq. io_recv returns IOU_RETRY from the MSG_WAITALL retry path, io-wq fails the request with -EAGAIN, and io_req_defer_failed -> io_sendrecv_fail overwrites cqe.res with done_io but leaves REQ_F_FAIL set. Fix this by: - Not calling req_set_fail() when done_io > 0 in io_recv, io_recvmsg, io_send, io_sendmsg, io_send_zc, io_sendmsg_zc - Clearing REQ_F_FAIL in io_sendrecv_fail() when done_io > 0 This makes MSG_WAITALL partial completions consistent with non-MSG_WAITALL behavior, where positive results never sever the IO_LINK chain. Reproducer: MSG_WAITALL recv via IO_LINK -> write on a UNIX socketpair where the sender closes after partial data. The recv CQE shows positive bytes but the linked write gets -ECANCELED. Fixes: 0031275d119e ("io_uring: call req_set_fail_links() on short send[msg]()/recv[msg]() with MSG_WAITALL") Cc: stable@vger.kernel.org Signed-off-by: Hannes Furmans --- io_uring/net.c | 22 +++++++++++++++------- 1 file changed, 15 insertions(+), 7 deletions(-) diff --git a/io_uring/net.c b/io_uring/net.c index 8576c6cb2236..ebe51db34af8 100644 --- a/io_uring/net.c +++ b/io_uring/net.c @@ -576,7 +576,8 @@ int io_sendmsg(struct io_kiocb *req, unsigned int issue_flags) } if (ret == -ERESTARTSYS) ret = -EINTR; - req_set_fail(req); + if (!sr->done_io) + req_set_fail(req); } io_req_msg_cleanup(req, issue_flags); if (ret >= 0) @@ -688,7 +689,8 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags) } if (ret == -ERESTARTSYS) ret = -EINTR; - req_set_fail(req); + if (!sr->done_io) + req_set_fail(req); } if (ret >= 0) ret += sr->done_io; @@ -1074,7 +1076,8 @@ int io_recvmsg(struct io_kiocb *req, unsigned int issue_flags) } if (ret == -ERESTARTSYS) ret = -EINTR; - req_set_fail(req); + if (!sr->done_io) + req_set_fail(req); } else if ((flags & MSG_WAITALL) && (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) { req_set_fail(req); } @@ -1220,7 +1223,8 @@ int io_recv(struct io_kiocb *req, unsigned int issue_flags) } if (ret == -ERESTARTSYS) ret = -EINTR; - req_set_fail(req); + if (!sr->done_io) + req_set_fail(req); } else if ((flags & MSG_WAITALL) && (kmsg->msg.msg_flags & (MSG_TRUNC | MSG_CTRUNC))) { out_free: req_set_fail(req); @@ -1498,7 +1502,8 @@ int io_send_zc(struct io_kiocb *req, unsigned int issue_flags) } if (ret == -ERESTARTSYS) ret = -EINTR; - req_set_fail(req); + if (!zc->done_io) + req_set_fail(req); } if (ret >= 0) @@ -1570,7 +1575,8 @@ int io_sendmsg_zc(struct io_kiocb *req, unsigned int issue_flags) } if (ret == -ERESTARTSYS) ret = -EINTR; - req_set_fail(req); + if (!sr->done_io) + req_set_fail(req); } if (ret >= 0) @@ -1595,8 +1601,10 @@ void io_sendrecv_fail(struct io_kiocb *req) { struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg); - if (sr->done_io) + if (sr->done_io) { req->cqe.res = sr->done_io; + req->flags &= ~REQ_F_FAIL; + } if ((req->flags & REQ_F_NEED_CLEANUP) && (req->opcode == IORING_OP_SEND_ZC || req->opcode == IORING_OP_SENDMSG_ZC)) -- 2.53.0