KVM tracks when EFER.SVME is set and cleared to initialize and tear down nested state. However, it doesn't differentiate if EFER.SVME is getting toggled in L1 or L2+. If L2 clears EFER.SVME, and L1 does not intercept the EFER write, KVM exits guest mode and tears down nested state while L2 is running, executing L1 without injecting a proper #VMEXIT. According to the APM: The effect of turning off EFER.SVME while a guest is running is undefined; therefore, the VMM should always prevent guests from writing EFER. Since the behavior is architecturally undefined, KVM gets to choose what to do. Inject a triple fault into L1 as a more graceful option that running L1 with corrupted state. Co-developed-by: Sean Christopherson Signed-off-by: Sean Christopherson Signed-off-by: Yosry Ahmed --- arch/x86/kvm/svm/svm.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 5f0136dbdde6..ccd73a3be3f9 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -216,6 +216,17 @@ int svm_set_efer(struct kvm_vcpu *vcpu, u64 efer) if ((old_efer & EFER_SVME) != (efer & EFER_SVME)) { if (!(efer & EFER_SVME)) { + /* + * Architecturally, clearing EFER.SVME while a guest is + * running yields undefined behavior, i.e. KVM can do + * literally anything. Force the vCPU back into L1 as + * that is the safest option for KVM, but synthesize a + * triple fault (for L1!) so that KVM at least doesn't + * run random L2 code in the context of L1. + */ + if (is_guest_mode(vcpu)) + kvm_make_request(KVM_REQ_TRIPLE_FAULT, vcpu); + svm_leave_nested(vcpu); /* #GP intercept is still needed for vmware backdoor */ if (!enable_vmware_backdoor) -- 2.53.0.rc2.204.g2597b5adb4-goog