From: "Denis V. Lunev" Earlier commit ("ms/vhost/vsock: Refuse the connection immediately when guest isn't ready") added a fast-fail in vhost_transport_send_pkt(). It rejects every host send with -EHOSTUNREACH until the destination calls SET_RUNNING(1). The fast-fail condition checks whether device's backends are dropped, and if they're, the guest is considered to be not ready. However, there might be other reasons for backends to be nulled. In particular, when QEMU is performing CPR (checkpoint-restore) migration, device ownership is being RESET and SET again, which leads to backends drop and reattach. If we end up connecting during this window, an AF_VSOCK client gets -EHOSTUNREACH, which is wrong. Add a cpr_paused flag set inside vhost_vsock_drop_backends() when the backend was previously live, cleared by vhost_vsock_start(). When set, vhost_transport_send_pkt() queues the skb instead of fast-failing; the existing kick of send_pkt_work in vhost_vsock_start() drains it on resume. A device that has never run keeps cpr_paused == false and the boot-time fast-fail behaviour is preserved. Pair the cpr_paused store with the backend store using an smp_wmb()/smp_rmb() pair so a concurrent sender on a weakly-ordered architecture never observes (NULL backend, !paused): Signed-off-by: Denis V. Lunev --- drivers/vhost/vsock.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/drivers/vhost/vsock.c b/drivers/vhost/vsock.c index e629886e5cf8..bcaba36becd7 100644 --- a/drivers/vhost/vsock.c +++ b/drivers/vhost/vsock.c @@ -61,6 +61,7 @@ struct vhost_vsock { u32 guest_cid; bool seqpacket_allow; + bool cpr_paused; /* between stop and next start */ }; static u32 vhost_transport_get_local_cid(void) @@ -311,11 +312,17 @@ vhost_transport_send_pkt(struct sk_buff *skb, struct net *net) * the mutex would be too expensive in this hot path, and we already have * all the outcomes covered: if the backend becomes NULL right after the check, * vhost_transport_do_send_pkt() will check it under the mutex anyway. + * + * Don't fast-fail if cpr_paused is set, keep queueing skbs instead. + * The kick in vhost_vsock_start() will drain them on resume. */ if (unlikely(!data_race(vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])))) { - rcu_read_unlock(); - kfree_skb(skb); - return -EHOSTUNREACH; + smp_rmb(); /* pairs with smp_wmb() in start/drop_backends */ + if (!READ_ONCE(vsock->cpr_paused)) { + rcu_read_unlock(); + kfree_skb(skb); + return -EHOSTUNREACH; + } } if (virtio_vsock_skb_reply(skb)) @@ -640,6 +647,9 @@ static int vhost_vsock_start(struct vhost_vsock *vsock) mutex_unlock(&vq->mutex); } + smp_wmb(); /* pairs with smp_rmb() in send_pkt */ + WRITE_ONCE(vsock->cpr_paused, false); + /* Some packets may have been queued before the device was started, * let's kick the send worker to send them. */ @@ -671,6 +681,11 @@ static void vhost_vsock_drop_backends(struct vhost_vsock *vsock) lockdep_assert_held(&vsock->dev.mutex); + if (vhost_vq_get_backend(&vsock->vqs[VSOCK_VQ_RX])) { + WRITE_ONCE(vsock->cpr_paused, true); + smp_wmb(); /* pairs with smp_rmb() in send_pkt */ + } + for (i = 0; i < ARRAY_SIZE(vsock->vqs); i++) { vq = &vsock->vqs[i]; @@ -728,6 +743,7 @@ static int vhost_vsock_dev_open(struct inode *inode, struct file *file) vsock->guest_cid = 0; /* no CID assigned yet */ vsock->seqpacket_allow = false; + vsock->cpr_paused = false; atomic_set(&vsock->queued_replies, 0); -- 2.47.1