PCI devices prior to PCI 2.3 both use level interrupts and do not support interrupt masking, leading to a failure when passed through to a KVM guest on at least the ppc64 platform, which does not utilize the resample IRQFD. This failure manifests as receiving and acknowledging a single interrupt in the guest while leaving the host physical device VFIO IRQ pending. Level interrupts in general require special handling due to their inherently asynchronous nature; both the host and guest interrupt controller need to remain in synchronization in order to coordinate mask and unmask operations. When lazy IRQ masking is used on DisINTx- hardware, the following sequence occurs: * Level IRQ assertion on host * IRQ trigger within host interrupt controller, routed to VFIO driver * Host EOI with hardware level IRQ still asserted * Software mask of interrupt source by VFIO driver * Generation of event and IRQ trigger in KVM guest interrupt controller * Level IRQ deassertion on host * Guest EOI * Guest IRQ level deassertion * Removal of software mask by VFIO driver Note that no actual state change occurs within the host interrupt controller, unlike what would happen with either DisINTx+ hardware or message interrupts. The host EOI is not fired with the hardware level IRQ deasserted, and the level interrupt is not re-armed within the host interrupt controller, leading to an unrecoverable stall of the device. Work around this by disabling lazy IRQ masking for DisINTx- INTx devices. --- drivers/vfio/pci/vfio_pci_intrs.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 123298a4dc8f..011169ca7a34 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -304,6 +304,9 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev, vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX; + if (is_intx(vdev) && !vdev->pci_2_3) + irq_set_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY); + ret = request_irq(pdev->irq, vfio_intx_handler, irqflags, ctx->name, ctx); if (ret) { @@ -351,6 +354,8 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev) if (ctx) { vfio_virqfd_disable(&ctx->unmask); vfio_virqfd_disable(&ctx->mask); + if (!vdev->pci_2_3) + irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY); free_irq(pdev->irq, ctx); if (ctx->trigger) eventfd_ctx_put(ctx->trigger); -- 2.39.5