KVM currently uses the value of CR2 from vmcb02 to update vmcb12 on
nested #VMEXIT. Use the value from vcpu->arch.cr2 instead.

The value in vcpu->arch.cr2 is sync'd to vmcb02 shortly before a VMRUN
of L2, and sync'd back to vcpu->arch.cr2 shortly after. The value are
only out-of-sync in two cases: after migration, and after a #PF is
injected into L2.

After migration, the value of CR2 in vmcb02 is uninitialized (i.e.
zero), as KVM_SET_SREGS restores CR2 value to vcpu->arch.cr2. Using
vcpu->arch.cr2 to update vmcb12 is the right thing to do.

The #PF injection case is more nuanced. It occurs if KVM injects a #PF
into L2, then exits to L1 before it actually runs L2. Although the APM
is a bit unclear about when CR2 is written during a #PF, the SDM is more
clear:

	Processors update CR2 whenever a page fault is detected. If a
	second page fault occurs while an earlier page fault is being
	delivered, the faulting linear address of the second fault will
	overwrite the contents of CR2 (replacing the previous address).
	These updates to CR2 occur even if the page fault results in a
	double fault or occurs during the delivery of a double fault.

KVM injecting the exception surely counts as the #PF being "detected".
More importantly, when an exception is injected into L2 at the time of a
synthesized #VMEXIT, KVM updates exit_int_info in vmcb12 accordingly,
such that an L1 hypervisor can re-inject the exception. If CR2 is not
written at that point, the L1 hypervisor have no way of correctly
re-injecting the #PF. Hence, using vcpu->arch.cr2 is also the right
thing to write in vmcb12 in this case.

Note that KVM does _not_ update vcpu->arch.cr2 when a #PF is pending for
L2, only when it is injected. The distinction is important, because only
injected exceptions are propagated to L1 through exit_int_info. It would
be incorrect to update CR2 in vmcb12 for a pending #PF, as L1 would
perceive an updated CR2 value with no #PF. Update the comment in
kvm_deliver_exception_payload() to clarify this.

Signed-off-by: Yosry Ahmed <yosry.ahmed@linux.dev>
---
 arch/x86/kvm/svm/nested.c | 2 +-
 arch/x86/kvm/x86.c        | 7 +++++++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c
index de90b104a0dd5..9031746ce2db1 100644
--- a/arch/x86/kvm/svm/nested.c
+++ b/arch/x86/kvm/svm/nested.c
@@ -1156,7 +1156,7 @@ int nested_svm_vmexit(struct vcpu_svm *svm)
 	vmcb12->save.efer   = svm->vcpu.arch.efer;
 	vmcb12->save.cr0    = kvm_read_cr0(vcpu);
 	vmcb12->save.cr3    = kvm_read_cr3(vcpu);
-	vmcb12->save.cr2    = vmcb02->save.cr2;
+	vmcb12->save.cr2    = vcpu->arch.cr2;
 	vmcb12->save.cr4    = svm->vcpu.arch.cr4;
 	vmcb12->save.rflags = kvm_get_rflags(vcpu);
 	vmcb12->save.rip    = kvm_rip_read(vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index db3f393192d94..1015522d0fbd7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -864,6 +864,13 @@ static void kvm_multiple_exception(struct kvm_vcpu *vcpu, unsigned int nr,
 		vcpu->arch.exception.error_code = error_code;
 		vcpu->arch.exception.has_payload = has_payload;
 		vcpu->arch.exception.payload = payload;
+		/*
+		 * Only injected exceptions are propagated to L1 in
+		 * vmcb12/vmcs12 on nested #VMEXIT. Hence, do not deliver the
+		 * exception payload for L2 until the exception is injected.
+		 * Otherwise, L1 would perceive the updated payload without a
+		 * corresponding exception.
+		 */
 		if (!is_guest_mode(vcpu))
 			kvm_deliver_exception_payload(vcpu,
 						      &vcpu->arch.exception);
-- 
2.53.0.rc2.204.g2597b5adb4-goog