From: Mikhail Malyshev A passed-through PCI device's BAR is mapped into the guest through a VM_IO/VM_PFNMAP VMA whose fault handler (e.g. vfio_pci_mmap_fault()) declines to install a PTE while the device's memory space is disabled, such as immediately after the guest clears PCI_COMMAND.MEM. If another vCPU accesses that BAR during the window, hva_to_pfn_remapped() fails and the gfn resolves to an error pfn even though the memslot is still valid. kvm_handle_error_pfn() then returns -EFAULT and KVM_RUN exits to userspace, which typically treats this as fatal and kills the VM. This is guest-triggerable: a guest that toggles PCI_COMMAND.MEM on an assigned device while another vCPU touches the BAR can take down its own VM (observed in production with an assigned Intel iGPU; the guest's display driver clears PCI_COMMAND.MEM on one vCPU while another is mid-MMIO to the BAR). On bare metal an access to a BAR whose memory decoding is disabled simply completes as an Unsupported Request: reads return all ones, writes are dropped. KVM can present the same behaviour by treating the access as MMIO and emulating it, which is exactly what the noslot path already does for a gfn that has no memslot. Distinguish the VM_IO/VM_PFNMAP fault-handler failure from other error pfns with a new KVM_PFN_ERR_PFNMAP value (in range of KVM_PFN_ERR_MASK, so existing is_error_pfn() checks are unaffected) and route it to kvm_handle_noslot_fault() in the x86 TDP fault path. Genuine, non-pfnmap faults, e.g. a vanished anonymous backing, still take the fatal -EFAULT path, consistent with what mmu_stress_test already expects. The MMIO mapping self-heals when the device's memory space is re-enabled and the memslot is updated, bumping the MMIO generation. Signed-off-by: Mikhail Malyshev --- arch/x86/kvm/mmu/mmu.c | 16 +++++++++++++++- include/linux/kvm_host.h | 8 ++++++++ virt/kvm/kvm_main.c | 9 ++++++++- 3 files changed, 31 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 91843e9224d04..115e2c4db5fa0 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -4759,8 +4759,22 @@ static int kvm_mmu_faultin_pfn(struct kvm_vcpu *vcpu, if (ret != RET_PF_CONTINUE) return ret; - if (unlikely(is_error_pfn(fault->pfn))) + if (unlikely(is_error_pfn(fault->pfn))) { + /* + * A passed-through PCI BAR is backed by a VM_IO/VM_PFNMAP + * mapping whose fault handler refuses to install a PTE while the + * device's memory space is disabled (e.g. the guest cleared + * PCI_COMMAND.MEM). The fault then fails even though the memslot + * is still valid. Treat such an access as MMIO and emulate it so + * the guest observes Unsupported Request semantics, matching + * bare metal, instead of killing the VM with -EFAULT. Genuine, + * non-pfnmap errors still take the fatal path. + */ + if (fault->pfn == KVM_PFN_ERR_PFNMAP) + return kvm_handle_noslot_fault(vcpu, fault, access); + return kvm_handle_error_pfn(vcpu, fault); + } if (WARN_ON_ONCE(!fault->slot || is_noslot_pfn(fault->pfn))) return kvm_handle_noslot_fault(vcpu, fault, access); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 4c14aee1fb063..dc5973e400721 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -99,6 +99,14 @@ #define KVM_PFN_ERR_RO_FAULT (KVM_PFN_ERR_MASK + 2) #define KVM_PFN_ERR_SIGPENDING (KVM_PFN_ERR_MASK + 3) #define KVM_PFN_ERR_NEEDS_IO (KVM_PFN_ERR_MASK + 4) +/* + * Faulting in a VM_IO/VM_PFNMAP mapping failed because the owner's fault + * handler declined to install a PTE, e.g. a passed-through PCI BAR whose + * device memory is currently disabled (the guest cleared PCI_COMMAND.MEM). + * The memslot is valid; the access should be treated as MMIO rather than a + * fatal -EFAULT. + */ +#define KVM_PFN_ERR_PFNMAP (KVM_PFN_ERR_MASK + 5) /* * error pfns indicate that the gfn is in slot but faild to diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 881f92d7a469e..f232fc2f42380 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -3015,7 +3015,14 @@ kvm_pfn_t hva_to_pfn(struct kvm_follow_pfn *kfp) if (r == -EAGAIN) goto retry; if (r < 0) - pfn = KVM_PFN_ERR_FAULT; + /* + * The owner's fault handler declined to install a PTE + * (e.g. a passed-through PCI BAR with device memory + * disabled). Flag it distinctly so the arch fault + * handler can treat the access as MMIO instead of a + * fatal -EFAULT. + */ + pfn = KVM_PFN_ERR_PFNMAP; } else { if ((kfp->flags & FOLL_NOWAIT) && vma_is_valid(vma, kfp->flags & FOLL_WRITE)) -- 2.43.0