From: Rick Edgecombe Disallow hugepage promotion in the TDP MMU for mirror roots as KVM doesn't currently support promoting S-EPT entries due to the complexity incurred by the TDX-Module's rules for hugepage promotion. - The current TDX-Module requires all 4KB leafs to be either all PENDING or all ACCEPTED before a successful promotion to 2MB. This requirement prevents successful page merging after partially converting a 2MB range from private to shared and then back to private, which is the primary scenario necessitating page promotion. - The TDX-Module effectively requires a break-before-make sequence (to satisfy its TLB flushing rules), i.e. creates a window of time where a different vCPU can encounter faults on a SPTE that KVM is trying to promote to a hugepage. To avoid unexpected BUSY errors, KVM would need to FREEZE the non-leaf SPTE before replacing it with a huge SPTE. Disable hugepage promotion for all map() operations, as supporting page promotion when building the initial image is still non-trivial, and the vast majority of images are ~4MB or less, i.e. the benefit of creating hugepages during TD build time is minimal. Signed-off-by: Rick Edgecombe Co-developed-by: Yan Zhao Signed-off-by: Yan Zhao [sean: check root, add comment, rewrite changelog] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu/mmu.c | 3 ++- arch/x86/kvm/mmu/tdp_mmu.c | 12 +++++++++++- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 4ecbf216d96f..45650f70eeab 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -3419,7 +3419,8 @@ void disallowed_hugepage_adjust(struct kvm_page_fault *fault, u64 spte, int cur_ cur_level == fault->goal_level && is_shadow_present_pte(spte) && !is_large_pte(spte) && - spte_to_child_sp(spte)->nx_huge_page_disallowed) { + ((spte_to_child_sp(spte)->nx_huge_page_disallowed) || + is_mirror_sp(spte_to_child_sp(spte)))) { /* * A small SPTE exists for this pfn, but FNAME(fetch), * direct_map(), or kvm_tdp_mmu_map() would like to create a diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 01e3e4f4baa5..f8ebdd0c6114 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1222,7 +1222,17 @@ int kvm_tdp_mmu_map(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault) for_each_tdp_pte(iter, kvm, root, fault->gfn, fault->gfn + 1) { int r; - if (fault->nx_huge_page_workaround_enabled) + /* + * Don't replace a page table (non-leaf) SPTE with a huge SPTE + * (a.k.a. hugepage promotion) if the NX hugepage workaround is + * enabled, as doing so will cause significant thrashing if one + * or more leaf SPTEs needs to be executable. + * + * Disallow hugepage promotion for mirror roots as KVM doesn't + * (yet) support promoting S-EPT entries while holding mmu_lock + * for read (due to complexity induced by the TDX-Module APIs). + */ + if (fault->nx_huge_page_workaround_enabled || is_mirror_sp(root)) disallowed_hugepage_adjust(fault, iter.old_spte, iter.level); /* -- 2.53.0.rc1.217.geba53bf80e-goog