During QEMU CPR live-update (and VHOST_RESET_OWNER in general) the guest keeps running while the host drops and later re-attaches vhost backends. If the guest adds a buffer to the TX virtqueue (guest->host) and kicks while the backend is temporarily NULL (between vhost_vsock_drop_backends() and the next vhost_vsock_start()), then the kick is delivered to the vhost worker, handle_tx_kick() sees a NULL backend and returns, and the kick signal is consumed. The buffer is then left in the ring. Then upon device start vhost_vsock_start() only re-kicks the RX send worker, never the TX VQ, so the buffer is processed only if the guest happens to kick again. But if the guest itself is now waiting for data from the host, it will never kick TX VQ again, and we end up in a deadlock. The deadlock is reproduced during active host->guest socat data transfer under multiple consecutive CPR live-update's. To fix this, in vhost_vsock_start(), after kicking the RX send worker, also queue the TX vq poll so any buffers the guest enqueued while we were paused get scanned. Signed-off-by: Andrey Drobyshev --- drivers/vhost/vsock.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index bcaba36becd7..1fcfe71d18be 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -655,6 +655,12 @@ static int vhost_vsock_start(struct vhost_vsock *vsock) */ vhost_vq_work_queue(&vsock->vqs[VSOCK_VQ_RX], &vsock->send_pkt_work); + /* + * Some packets might've also been queued in TX VQ. Re-scan it here, + * mirroring the RX send-worker kick above. + */ + vhost_poll_queue(&vsock->vqs[VSOCK_VQ_TX].poll); + mutex_unlock(&vsock->dev.mutex); return 0; -- 2.47.1