When L2 runs under nested NPT and uses PAE paging, KVM's cached PDPTRs in mmu->pdptrs[] can hold stale or wrong values after nested transitions and across migration restore, because both nested_svm_load_cr3() and svm_get_nested_state_pages() only refresh PDPTRs on the !nested_npt path. The user-visible bug is on migration restore of an L2 running with nested NPT and 32-bit PAE paging, if userspace uses KVM_SET_SREGS rather than KVM_SET_SREGS2. In that case, load_pdptrs() leaves VCPU_EXREG_PDPTR marked as available, and kvm_pdptr_read() will use a stale translation that used L1 GPAs instead of L2 nGPAs. svm_get_nested_state_pages() runs on first KVM_RUN but skips the refresh because nested_npt_enabled() is true. The CPU itself reads L2's PDPTRs correctly from memory via L1's NPT, but KVM-side walking of guest PAE page tables uses the bogus cached values. Unlike Intel's GUEST_PDPTR0..3 fields in the VMCS, SVM has no VMCB-cached PDPTR state: the in-memory PDPTEs at the current CR3 are the only source of truth, and svm_cache_reg(VCPU_EXREG_PDPTR) simply reloads them from memory via load_pdptrs(). Clearing the avail bit (and the dirty bit because !avail/dirty is invalid) to force a reload when PDPTRs as needed fixes the bug. Do the same for nested_svm_load_cr3()'s nested_npt branch, so that the invariant "PDPTRs need reloading" is handled similarly for both immediate and deferred loading. Signed-off-by: Paolo Bonzini --- arch/x86/kvm/kvm_cache_regs.h | 8 ++++++++ arch/x86/kvm/svm/nested.c | 27 ++++++++++++++++++--------- 2 files changed, 26 insertions(+), 9 deletions(-) diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 2ae492ad6412..6bae5db5a54e 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -77,6 +77,14 @@ static inline bool kvm_register_is_dirty(struct kvm_vcpu *vcpu, return test_bit(reg, vcpu->arch.regs_dirty); } +static inline void kvm_register_mark_for_reload(struct kvm_vcpu *vcpu, + enum kvm_reg reg) +{ + kvm_assert_register_caching_allowed(vcpu); + __clear_bit(reg, vcpu->arch.regs_avail); + __clear_bit(reg, vcpu->arch.regs_dirty); +} + static inline void kvm_register_mark_available(struct kvm_vcpu *vcpu, enum kvm_reg reg) { diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c index 3d1fd1776e19..aa5a1d8ea136 100644 --- a/arch/x86/kvm/svm/nested.c +++ b/arch/x86/kvm/svm/nested.c @@ -680,9 +680,12 @@ static int nested_svm_load_cr3(struct kvm_vcpu *vcpu, unsigned long cr3, if (CC(!kvm_vcpu_is_legal_cr3(vcpu, cr3))) return -EINVAL; - if (reload_pdptrs && !nested_npt && is_pae_paging(vcpu) && - CC(!load_pdptrs(vcpu, cr3))) - return -EINVAL; + if (reload_pdptrs && is_pae_paging(vcpu)) { + if (nested_npt) + kvm_register_mark_for_reload(vcpu, VCPU_REG_PDPTR); + else if (CC(!load_pdptrs(vcpu, cr3))) + return -EINVAL; + } vcpu->arch.cr3 = cr3; @@ -2055,15 +2058,21 @@ static bool svm_get_nested_state_pages(struct kvm_vcpu *vcpu) if (WARN_ON(!is_guest_mode(vcpu))) return true; - if (!vcpu->arch.pdptrs_from_userspace && - !nested_npt_enabled(to_svm(vcpu)) && is_pae_paging(vcpu)) + if (is_pae_paging(vcpu)) { /* - * Reload the guest's PDPTRs since after a migration - * the guest CR3 might be restored prior to setting the nested - * state which can lead to a load of wrong PDPTRs. + * After migration, CR3 may have been restored before + * KVM_SET_NESTED_STATE, so the PDPTR load into mmu->pdptrs[] + * may have treated CR3 as an L1 GPA. For nNPT, drop the + * cache so the next access reloads them with the proper + * nGPA translation. For !nNPT, reload eagerly unless userspace + * already supplied authoritative PDPTRs via KVM_SET_SREGS2. */ - if (CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) + if (nested_npt_enabled(to_svm(vcpu))) + kvm_register_mark_for_reload(vcpu, VCPU_REG_PDPTR); + else if (!vcpu->arch.pdptrs_from_userspace && + CC(!load_pdptrs(vcpu, vcpu->arch.cr3))) return false; + } if (!nested_svm_merge_msrpm(vcpu)) { vcpu->run->exit_reason = KVM_EXIT_INTERNAL_ERROR; -- 2.52.0