From: Yan Zhao TDX requires guests to accept S-EPT mappings created by the host KVM. Due to the current implementation of the TDX module, if a guest accepts a GFN at a lower level after KVM maps it at a higher level, the TDX module will emulate an EPT violation VMExit to KVM instead of returning a size mismatch error to the guest. If KVM fails to perform page splitting in the VMExit handler, the guest's accept operation will be triggered again upon re-entering the guest, causing a repeated EPT violation VMExit. To facilitate passing the guest's accept level information to the KVM MMU core and to prevent the repeated mapping of a GFN at different levels due to different accept levels specified by different vCPUs, introduce the interface hugepage_set_guest_inhibit(). This interface specifies across vCPUs that mapping at a certain level is inhibited from the guest. Intentionally don't provide an API to clear KVM_LPAGE_GUEST_INHIBIT_FLAG for the time being, as detecting that it's ok to (re)install a hugepage is tricky (and costly if KVM wants to be 100% accurate), and KVM doesn't currently support hugepage promotion (only direct installation of hugepages) for S-EPT. As a result, the only scenario where clearing the flag would likely allow KVM to install a hugepage is when an entire 2MiB / 1GiB range is converted to shared or private. But if the guest is accepting at 4KiB granulairty, odds are good the guest is using the memory for something "special" and will never convert the entire range to shared (and/or back to private). Punt that optimization to the future, if it's ever needed. Link: https://lore.kernel.org/all/a6ffe23fb97e64109f512fa43e9f6405236ed40a.camel@intel.com [1] Suggested-by: Rick Edgecombe Suggested-by: Sean Christopherson Signed-off-by: Yan Zhao [sean: explain *why* the flag is never cleared] Signed-off-by: Sean Christopherson --- arch/x86/kvm/mmu.h | 4 ++++ arch/x86/kvm/mmu/mmu.c | 21 ++++++++++++++++++--- 2 files changed, 22 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h index 830f46145692..fa6a8daf4b05 100644 --- a/arch/x86/kvm/mmu.h +++ b/arch/x86/kvm/mmu.h @@ -322,4 +322,8 @@ static inline bool kvm_is_gfn_alias(struct kvm *kvm, gfn_t gfn) { return gfn & kvm_gfn_direct_bits(kvm); } + +void hugepage_set_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, int level); +bool hugepage_test_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, int level); + #endif diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c index 45650f70eeab..c2765bfc8492 100644 --- a/arch/x86/kvm/mmu/mmu.c +++ b/arch/x86/kvm/mmu/mmu.c @@ -718,12 +718,14 @@ static struct kvm_lpage_info *lpage_info_slot(gfn_t gfn, } /* - * The most significant bit in disallow_lpage tracks whether or not memory - * attributes are mixed, i.e. not identical for all gfns at the current level. + * The most 2 significant bits in disallow_lpage tracks whether or not memory + * attributes are mixed, i.e. not identical for all gfns at the current level, + * or whether or not guest inhibits the current level of hugepage at the gfn. * The lower order bits are used to refcount other cases where a hugepage is * disallowed, e.g. if KVM has shadow a page table at the gfn. */ #define KVM_LPAGE_MIXED_FLAG BIT(31) +#define KVM_LPAGE_GUEST_INHIBIT_FLAG BIT(30) static void update_gfn_disallow_lpage_count(const struct kvm_memory_slot *slot, gfn_t gfn, int count) @@ -736,7 +738,8 @@ static void update_gfn_disallow_lpage_count(const struct kvm_memory_slot *slot, old = linfo->disallow_lpage; linfo->disallow_lpage += count; - WARN_ON_ONCE((old ^ linfo->disallow_lpage) & KVM_LPAGE_MIXED_FLAG); + WARN_ON_ONCE((old ^ linfo->disallow_lpage) & + (KVM_LPAGE_MIXED_FLAG | KVM_LPAGE_GUEST_INHIBIT_FLAG)); } } @@ -1648,6 +1651,18 @@ static bool __kvm_rmap_zap_gfn_range(struct kvm *kvm, start, end - 1, can_yield, true, flush); } +bool hugepage_test_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, int level) +{ + return lpage_info_slot(gfn, slot, level)->disallow_lpage & KVM_LPAGE_GUEST_INHIBIT_FLAG; +} +EXPORT_SYMBOL_FOR_KVM_INTERNAL(hugepage_test_guest_inhibit); + +void hugepage_set_guest_inhibit(struct kvm_memory_slot *slot, gfn_t gfn, int level) +{ + lpage_info_slot(gfn, slot, level)->disallow_lpage |= KVM_LPAGE_GUEST_INHIBIT_FLAG; +} +EXPORT_SYMBOL_FOR_KVM_INTERNAL(hugepage_set_guest_inhibit); + bool kvm_unmap_gfn_range(struct kvm *kvm, struct kvm_gfn_range *range) { bool flush = false; -- 2.53.0.rc1.217.geba53bf80e-goog