TDX support for using the MWAIT instruction in a guest has issues, so disable it for now. Background Like VMX, TDX can allow the MWAIT instruction to be executed in a guest. Unlike VMX, TDX cannot necessarily provide for virtualization of MSRs that a guest might reasonably expect to exist as well. For example, in the case of a Linux guest, the default idle driver intel_idle may access MSR_POWER_CTL or MSR_PKG_CST_CONFIG_CONTROL. To virtualize those, KVM would need the guest not to enable #VE reduction, which is not something that KVM can control or even be aware of. Note, however, that the consequent unchecked MSR access errors might be harmless. Without #VE reduction enabled, the TDX Module will inject #VE for MSRs that it does not virtualize itself. The guest can then hypercall the host VMM for a resolution. With #VE reduction enabled, accessing MSRs such as the 2 above, results in the TDX Module injecting #GP. Currently, Linux guest opts for #VE reduction unconditionally if it is available, refer reduce_unnecessary_ve(). However, the #VE reduction feature was not added to the TDX Module until versions 1.5.09 and 2.0.04. Refer https://github.com/intel/tdx-module/releases There is also a further issue experienced by a Linux guest. Prior to TDX Module versions 1.5.09 and 2.0.04, the Always-Running-APIC-Timer (ARAT) feature (CPUID leaf 6: EAX bit 2) is not exposed. That results in cpuidle disabling the timer interrupt and invoking the Tick Broadcast framework to provide a wake-up. Currently, that falls back to the PIT timer which does not work for TDX, resulting in the guest becoming stuck in the idle loop. Conclusion User's may expect TDX support of MWAIT in a guest to be similar to VMX support, but KVM cannot ensure that. Consequently KVM should not expose the capability. Fixes: 0186dd29a2518 ("KVM: TDX: add ioctl to initialize VM with TDX specific parameters") Signed-off-by: Adrian Hunter --- arch/x86/include/asm/kvm_host.h | 2 ++ arch/x86/kvm/vmx/tdx.c | 22 +++++++++++++++++++++- arch/x86/kvm/x86.c | 8 +++++--- 3 files changed, 28 insertions(+), 4 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index f7af967aa16f..9c8617217adb 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -1398,6 +1398,8 @@ struct kvm_arch { gpa_t wall_clock; + u64 unsupported_disable_exits; + bool mwait_in_guest; bool hlt_in_guest; bool pause_in_guest; diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c index 9ad460ef97b0..cdf0dc6cf068 100644 --- a/arch/x86/kvm/vmx/tdx.c +++ b/arch/x86/kvm/vmx/tdx.c @@ -132,6 +132,17 @@ static void clear_waitpkg(struct kvm_cpuid_entry2 *entry) entry->ecx &= ~__feature_bit(X86_FEATURE_WAITPKG); } +static bool has_mwait(const struct kvm_cpuid_entry2 *entry) +{ + return entry->function == 1 && + (entry->ecx & __feature_bit(X86_FEATURE_MWAIT)); +} + +static void clear_mwait(struct kvm_cpuid_entry2 *entry) +{ + entry->ecx &= ~__feature_bit(X86_FEATURE_MWAIT); +} + static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry) { if (has_tsx(entry)) @@ -139,11 +150,15 @@ static void tdx_clear_unsupported_cpuid(struct kvm_cpuid_entry2 *entry) if (has_waitpkg(entry)) clear_waitpkg(entry); + + /* Also KVM_X86_DISABLE_EXITS_MWAIT is disallowed in tdx_vm_init() */ + if (has_mwait(entry)) + clear_mwait(entry); } static bool tdx_unsupported_cpuid(const struct kvm_cpuid_entry2 *entry) { - return has_tsx(entry) || has_waitpkg(entry); + return has_tsx(entry) || has_waitpkg(entry) || has_mwait(entry); } #define KVM_TDX_CPUID_NO_SUBLEAF ((__u32)-1) @@ -615,6 +630,11 @@ int tdx_vm_init(struct kvm *kvm) kvm->arch.has_protected_state = true; kvm->arch.has_private_mem = true; kvm->arch.disabled_quirks |= KVM_X86_QUIRK_IGNORE_GUEST_PAT; + /* + * TDX support for using the MWAIT instruction in a guest has issues, + * so disable it for now. See also tdx_clear_unsupported_cpuid(). + */ + kvm->arch.unsupported_disable_exits |= KVM_X86_DISABLE_EXITS_MWAIT; /* * Because guest TD is protected, VMM can't parse the instruction in TD. diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 93636f77c42d..bfd4f52286b8 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4575,7 +4575,7 @@ static inline bool kvm_can_mwait_in_guest(void) boot_cpu_has(X86_FEATURE_ARAT); } -static u64 kvm_get_allowed_disable_exits(void) +static u64 kvm_get_allowed_disable_exits(struct kvm *kvm) { u64 r = KVM_X86_DISABLE_EXITS_PAUSE; @@ -4586,6 +4586,8 @@ static u64 kvm_get_allowed_disable_exits(void) if (kvm_can_mwait_in_guest()) r |= KVM_X86_DISABLE_EXITS_MWAIT; } + if (kvm) + r &= ~kvm->arch.unsupported_disable_exits; return r; } @@ -4736,7 +4738,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) r = KVM_CLOCK_VALID_FLAGS; break; case KVM_CAP_X86_DISABLE_EXITS: - r = kvm_get_allowed_disable_exits(); + r = kvm_get_allowed_disable_exits(kvm); break; case KVM_CAP_X86_SMM: if (!IS_ENABLED(CONFIG_KVM_SMM)) @@ -6613,7 +6615,7 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, break; case KVM_CAP_X86_DISABLE_EXITS: r = -EINVAL; - if (cap->args[0] & ~kvm_get_allowed_disable_exits()) + if (cap->args[0] & ~kvm_get_allowed_disable_exits(kvm)) break; mutex_lock(&kvm->lock); -- 2.48.1