From: Omar Sandoval We've been seeing guest VM kernel panics with "Oops: int3" coming from static branch checks. I found that the reported RIP is one instruction after where it's supposed to be, and I determined that when this happens, the RIP was injected by __svm_skip_emulated_instruction() on the host when a nested page fault was hit while vectoring an int3. Static branches use the smp_text_poke mechanism, which works by temporarily inserting an int3, then overwriting it. In the meantime, smp_text_poke_int3_handler() relies on getting an accurate RIP. This temporary int3 from smp_text_poke is triggering the exact scenario described in the fixes commit: "if the guest executes INTn (no KVM injection), an exit occurs while vectoring the INTn, and the guest modifies the code stream while the exit is being handled, KVM will compute the incorrect next_rip due to "skipping" the wrong instruction." I'm able to reproduce this almost instantly by patching the guest kernel to hammer on a static branch [1] while a drgn script [2] on the host constantly swaps out the memory containing the guest's Task State Segment. The fixes commit also suggests a workaround: "A future enhancement to make this less awful would be for KVM to detect that the decoded instruction is not the correct INTn and drop the to-be-injected soft event." This implements that workaround. [1]: https://gist.github.com/osandov/44d17c51c28c0ac998ea0334edf90b5a [2]: https://gist.github.com/osandov/10e45e45afa29b11e0c7209247afc00b Fixes: 6ef88d6e36c2 ("KVM: SVM: Re-inject INT3/INTO instead of retrying the instruction") Signed-off-by: Omar Sandoval --- Based on Linus's current tree. arch/x86/kvm/svm/svm.c | 40 ++++++++++++++++++++++++++++++++++++---- 1 file changed, 36 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index 153c12dbf3eb..4d72c308b50b 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -271,8 +271,29 @@ static void svm_set_interrupt_shadow(struct kvm_vcpu *vcpu, int mask) } +static bool emulated_instruction_matches_vector(struct kvm_vcpu *vcpu, + unsigned int vector) +{ + switch (vector) { + case BP_VECTOR: + return vcpu->arch.emulate_ctxt->b == 0xcc || + (vcpu->arch.emulate_ctxt->b == 0xcd && + vcpu->arch.emulate_ctxt->src.val == BP_VECTOR); + case OF_VECTOR: + return vcpu->arch.emulate_ctxt->b == 0xce; + default: + return false; + } +} + +/* + * If vector != 0, then this skips the instruction only if the instruction could + * generate an interrupt with that vector. If not, then it fails, indicating + * that the instruction should be retried. + */ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu, - bool commit_side_effects) + bool commit_side_effects, + unsigned int vector) { struct vcpu_svm *svm = to_svm(vcpu); unsigned long old_rflags; @@ -293,8 +314,18 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu, if (unlikely(!commit_side_effects)) old_rflags = svm->vmcb->save.rflags; - if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP)) + if (vector == 0) { + if (!kvm_emulate_instruction(vcpu, EMULTYPE_SKIP)) + return 0; + } else if (x86_decode_emulated_instruction(vcpu, EMULTYPE_SKIP, + NULL, + 0) != EMULATION_OK || + !emulated_instruction_matches_vector(vcpu, vector) || + !kvm_emulate_instruction(vcpu, + EMULTYPE_SKIP | + EMULTYPE_NO_DECODE)) { return 0; + } if (unlikely(!commit_side_effects)) svm->vmcb->save.rflags = old_rflags; @@ -311,7 +342,7 @@ static int __svm_skip_emulated_instruction(struct kvm_vcpu *vcpu, static int svm_skip_emulated_instruction(struct kvm_vcpu *vcpu) { - return __svm_skip_emulated_instruction(vcpu, true); + return __svm_skip_emulated_instruction(vcpu, true, 0); } static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu) @@ -331,7 +362,8 @@ static int svm_update_soft_interrupt_rip(struct kvm_vcpu *vcpu) * in use, the skip must not commit any side effects such as clearing * the interrupt shadow or RFLAGS.RF. */ - if (!__svm_skip_emulated_instruction(vcpu, !nrips)) + if (!__svm_skip_emulated_instruction(vcpu, !nrips, + vcpu->arch.exception.vector)) return -EIO; rip = kvm_rip_read(vcpu); -- 2.51.0