KVM currently uses the value of CR2 from vmcb02 to update vmcb12 on nested #VMEXIT. Use the value from vcpu->arch.cr2 instead. The value in vcpu->arch.cr2 is sync'd to vmcb02 shortly before a VMRUN of L2, and sync'd back to vcpu->arch.cr2 shortly after. The value are only out-of-sync in two cases: after migration, and after a #PF is injected into L2. After migration, the value of CR2 in vmcb02 is uninitialized (i.e. zero), as KVM_SET_SREGS restores CR2 value to vcpu->arch.cr2. Using vcpu->arch.cr2 to update vmcb12 is the right thing to do. The #PF injection case is more nuanced. It occurs if KVM injects a #PF into L2, then exits to L1 before it actually runs L2. Although the APM is a bit unclear about when CR2 is written during a #PF, the SDM is more clear: Processors update CR2 whenever a page fault is detected. If a second page fault occurs while an earlier page fault is being delivered, the faulting linear address of the second fault will overwrite the contents of CR2 (replacing the previous address). These updates to CR2 occur even if the page fault results in a double fault or occurs during the delivery of a double fault. KVM injecting the exception surely counts as the #PF being "detected". More importantly, when an exception is injected into L2 at the time of a synthesized #VMEXIT, KVM updates exit_int_info in vmcb12 accordingly, such that an L1 hypervisor can re-inject the exception. If CR2 is not written at that point, the L1 hypervisor have no way of correctly re-injecting the #PF. Hence, using vcpu->arch.cr2 is also the right thing to write in vmcb12 in this case. Note that KVM does _not_ update vcpu->arch.cr2 when a #PF is pending for L2, only when it is injected. The distinction is important, because only injected exceptions are propagated to L1 through exit_int_info. It would be incorrect to update CR2 in vmcb12 for a pending #PF, as L1 would perceive an updated CR2 value with no #PF. Update the comment in kvm_deliver_exception_payload() to clarify this. Signed-off-by: Yosry Ahmed --- arch/x86/kvm/svm/nested.c | 2 +- arch/x86/kvm/x86.c | 7 +++++++ 2 files changed, 8 insertions(+), 1 deletion(-) diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index de90b104a0dd5..9031746ce2db1 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -1156,7 +1156,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm) vmcb12->save.efer = svm->vcpu.arch.efer; vmcb12->save.cr0 = kvm_read_cr0(vcpu); vmcb12->save.cr3 = kvm_read_cr3(vcpu); - vmcb12->save.cr2 = vmcb02->save.cr2; + vmcb12->save.cr2 = vcpu->arch.cr2; vmcb12->save.cr4 = svm->vcpu.arch.cr4; vmcb12->save.rflags = kvm_get_rflags(vcpu); vmcb12->save.rip = kvm_rip_read(vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index db3f393192d94..1015522d0fbd7 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -864,6 +864,13 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, unsigned int nr, vcpu->arch.exception.error_code = error_code; vcpu->arch.exception.has_payload = has_payload; vcpu->arch.exception.payload = payload; + /* + * Only injected exceptions are propagated to L1 in + * vmcb12/vmcs12 on nested #VMEXIT. Hence, do not deliver the + * exception payload for L2 until the exception is injected. + * Otherwise, L1 would perceive the updated payload without a + * corresponding exception. + */ if (!is_guest_mode(vcpu)) kvm_deliver_exception_payload(vcpu, &vcpu->arch.exception); -- 2.53.0.rc2.204.g2597b5adb4-goog