Use a global asm label to get the expected IP address for nested NMI interception instead of reading a hardcoded offset from the stack. the NMI test in eventinj.c verifies that a nested NMI occurs immediately at the return address (IP register) in the IRET frame, as IRET opens the NMI window. Currently, nested_nmi_iret_isr() reads the return address using a magic offset (iret_stack[-3]), which is unclear and may break if more values are pushed to the "iret_stack". To improve readability, add a global 'ip_after_iret' label for the expected return address, push it to the IRET frame, and verify it matches the interrupted address in the nested NMI handler. Also make 'iret_stack' local to nmi_iret_isr() since it isn't accessed anywhere else. Signed-off-by: Chao Gao --- Changes in v2: - make 'iret_stack' local to nmi_iret_isr(). x86/eventinj.c | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/x86/eventinj.c b/x86/eventinj.c index 6fbb2d0f..3f28b9b5 100644 --- a/x86/eventinj.c +++ b/x86/eventinj.c @@ -126,13 +126,13 @@ static void nmi_isr(struct ex_regs *r) printf("After nested NMI to self\n"); } -unsigned long *iret_stack; +extern char ip_after_iret[]; static void nested_nmi_iret_isr(struct ex_regs *r) { printf("Nested NMI isr running rip=%lx\n", r->rip); - if (r->rip == iret_stack[-3]) + if (r->rip == (unsigned long)ip_after_iret) test_count++; } @@ -156,11 +156,11 @@ asm("do_iret:" "mov %cs, %ecx \n\t" "push"W" %"R "cx \n\t" #ifndef __x86_64__ - "push"W" $2f \n\t" + "push"W" $ip_after_iret \n\t" "cmpb $0, no_test_device\n\t" // see if need to flush #else - "leaq 2f(%rip), %rbx \n\t" + "leaq ip_after_iret(%rip), %rbx \n\t" "pushq %rbx \n\t" "mov no_test_device(%rip), %bl \n\t" @@ -170,13 +170,17 @@ asm("do_iret:" "outl %eax, $0xe4 \n\t" // flush page "1: \n\t" "iret"W" \n\t" - "2: xchg %"R "dx, %"R "sp \n\t" // point to old stack + ".global ip_after_iret \n\t" + "ip_after_iret: \n\t" + "xchg %"R "dx, %"R "sp \n\t" // point to old stack "ret\n\t" ); static void nmi_iret_isr(struct ex_regs *r) { unsigned long *s = alloc_page(); + unsigned long *iret_stack; + test_count++; printf("NMI isr running stack %p\n", s); handle_exception(2, nested_nmi_iret_isr); -- 2.47.3 Push the current stack segment (SS) and stack pointer (SP) to the IRET frame in do_iret() to ensure a valid stack after IRET execution, particularly in 64-bit mode. Currently, do_iret() crafts an IRET frame with a zeroed SP. For 32-bit guests, SP is not popped during IRET due to no privilege change. so, the stack after IRET is still valid. But for 64-bit guests, IRET unconditionally pops RSP, restoring it to zero. This can cause a nested NMI to push its interrupt frame to the topmost page (0xffffffff_fffff000), which may be not mapped and cause triple fault [1]. To fix this issue, push the current SS & SP to the IRET frame, ensuring SS & SP are restored to a valid stack in 64-bit mode. Opportunistically, drop the 'extern bool no_test_device', as fwcfg.h gets included by eventinj.c. Reported-by: Sean Christopherson Signed-off-by: Chao Gao Link: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/ # [1] --- Changes in v2: - push SS to the IRET frame as well - push SS and SP for x86-64 only as IRET in 64 bit mode unconditionally pops SS:SP - remove "extern bool no_test_device" x86/eventinj.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/x86/eventinj.c b/x86/eventinj.c index 3f28b9b5..e72efa16 100644 --- a/x86/eventinj.c +++ b/x86/eventinj.c @@ -138,11 +138,9 @@ static void nested_nmi_iret_isr(struct ex_regs *r) extern void do_iret(ulong phys_stack, void *virt_stack); -// Return to same privilege level won't pop SS or SP, so +// Return to same privilege level won't pop SS or SP for i386 but x86-64, so // save it in RDX while we run on the nested stack -extern bool no_test_device; - asm("do_iret:" #ifdef __x86_64__ "mov %rdi, %rax \n\t" // phys_stack @@ -152,6 +150,12 @@ asm("do_iret:" "mov 8(%esp), %edx \n\t" // virt_stack #endif "xchg %"R "dx, %"R "sp \n\t" // point to new stack +#ifdef __x86_64__ + // IRET in 64 bit mode unconditionally pops SS:xSP + "mov %ss, %ecx \n\t" + "push"W" %"R "cx \n\t" + "push"W" %"R "sp \n\t" +#endif "pushf"W" \n\t" "mov %cs, %ecx \n\t" "push"W" %"R "cx \n\t" -- 2.47.3