PCI devices prior to PCI 2.3 both use level interrupts and do not support interrupt masking, leading to a failure when passed through to a KVM guest on at least the ppc64 platform. This failure manifests as receiving and acknowledging a single interrupt in the guest, while the device continues to assert the level interrupt indicating a need for further servicing. When lazy IRQ masking is used on DisINTx- (non-PCI 2.3) hardware, the following sequence occurs: * Level IRQ assertion on device * IRQ marked disabled in kernel * Host interrupt handler exits without clearing the interrupt on the device * Eventfd is delivered to userspace * Guest processes IRQ and clears device interrupt * Device de-asserts INTx, then re-asserts INTx while the interrupt is masked * Newly asserted interrupt acknowledged by kernel VMM without being handled * Software mask removed by VFIO driver * Device INTx still asserted, host controller does not see new edge after EOI The behavior is now platform-dependent. Some platforms (amd64) will continue to spew IRQs for as long as the INTX line remains asserted, therefore the IRQ will be handled by the host as soon as the mask is dropped. Others (ppc64) will only send the one request, and if it is not handled no further interrupts will be sent. The former behavior theoretically leaves the system vulnerable to interrupt storm, and the latter will result in the device stalling after receiving exactly one interrupt in the guest. Work around this by disabling lazy IRQ masking for DisINTx- INTx devices. Signed-off-by: Timothy Pearson --- drivers/vfio/pci/vfio_pci_intrs.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/vfio/pci/vfio_pci_intrs.c b/drivers/vfio/pci/vfio_pci_intrs.c index 123298a4dc8f..61d29f6b3730 100644 --- a/drivers/vfio/pci/vfio_pci_intrs.c +++ b/drivers/vfio/pci/vfio_pci_intrs.c @@ -304,9 +304,14 @@ static int vfio_intx_enable(struct vfio_pci_core_device *vdev, vdev->irq_type = VFIO_PCI_INTX_IRQ_INDEX; + if (!vdev->pci_2_3) + irq_set_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY); + ret = request_irq(pdev->irq, vfio_intx_handler, irqflags, ctx->name, ctx); if (ret) { + if (!vdev->pci_2_3) + irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY); vdev->irq_type = VFIO_PCI_NUM_IRQS; kfree(name); vfio_irq_ctx_free(vdev, ctx, 0); @@ -352,6 +357,8 @@ static void vfio_intx_disable(struct vfio_pci_core_device *vdev) vfio_virqfd_disable(&ctx->unmask); vfio_virqfd_disable(&ctx->mask); free_irq(pdev->irq, ctx); + if (!vdev->pci_2_3) + irq_clear_status_flags(pdev->irq, IRQ_DISABLE_UNLAZY); if (ctx->trigger) eventfd_ctx_put(ctx->trigger); kfree(ctx->name); -- 2.39.5