Use a global asm label to get the expected IP address for nested NMI interception instead of reading a hardcoded offset from the stack. the NMI test in eventinj.c verifies that a nested NMI occurs immediately at the return address (IP register) in the IRET frame, as IRET opens the NMI window. Currently, nested_nmi_iret_isr() reads the return address using a magic offset (iret_stack[-3]), which is unclear and may break if more values are pushed to the "iret_stack". To improve readability, add a global 'ip_after_iret' label for the expected return address, push it to the IRET frame, and verify it matches the interrupted address in the nested NMI handler. Signed-off-by: Chao Gao --- x86/eventinj.c | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/x86/eventinj.c b/x86/eventinj.c index 6fbb2d0f..ec8a5ef1 100644 --- a/x86/eventinj.c +++ b/x86/eventinj.c @@ -127,12 +127,13 @@ static void nmi_isr(struct ex_regs *r) } unsigned long *iret_stack; +extern char ip_after_iret[]; static void nested_nmi_iret_isr(struct ex_regs *r) { printf("Nested NMI isr running rip=%lx\n", r->rip); - if (r->rip == iret_stack[-3]) + if (r->rip == (unsigned long)ip_after_iret) test_count++; } @@ -156,11 +157,11 @@ asm("do_iret:" "mov %cs, %ecx \n\t" "push"W" %"R "cx \n\t" #ifndef __x86_64__ - "push"W" $2f \n\t" + "push"W" $ip_after_iret \n\t" "cmpb $0, no_test_device\n\t" // see if need to flush #else - "leaq 2f(%rip), %rbx \n\t" + "leaq ip_after_iret(%rip), %rbx \n\t" "pushq %rbx \n\t" "mov no_test_device(%rip), %bl \n\t" @@ -170,7 +171,9 @@ asm("do_iret:" "outl %eax, $0xe4 \n\t" // flush page "1: \n\t" "iret"W" \n\t" - "2: xchg %"R "dx, %"R "sp \n\t" // point to old stack + ".global ip_after_iret \n\t" + "ip_after_iret: \n\t" + "xchg %"R "dx, %"R "sp \n\t" // point to old stack "ret\n\t" ); -- 2.47.3 Push the current stack pointer (SP) to the IRET frame in do_iret() to ensure a valid stack after IRET execution, particularly in 64-bit mode. Currently, do_iret() crafts an IRET frame with a zeroed SP. For 32-bit guests, SP is not popped during IRET due to no privilege change. so, the stack after IRET is still valid. But for 64-bit guests, IRET unconditionally pops RSP, restoring it to zero. This can cause a nested NMI to push its interrupt frame to the topmost page (0xffffffff_fffff000), which may be not mapped and cause triple fault [1]. To fix this issue, push the current SP to the IRET frame, ensuring RSP is restored to a valid stack in 64-bit mode. Reported-by: Sean Christopherson Signed-off-by: Chao Gao Link: https://lore.kernel.org/kvm/aMahfvF1r39Xq6zK@intel.com/ --- x86/eventinj.c | 1 + 1 file changed, 1 insertion(+) diff --git a/x86/eventinj.c b/x86/eventinj.c index ec8a5ef1..63ebbaab 100644 --- a/x86/eventinj.c +++ b/x86/eventinj.c @@ -153,6 +153,7 @@ asm("do_iret:" "mov 8(%esp), %edx \n\t" // virt_stack #endif "xchg %"R "dx, %"R "sp \n\t" // point to new stack + "push"W" %"R "sp \n\t" "pushf"W" \n\t" "mov %cs, %ecx \n\t" "push"W" %"R "cx \n\t" -- 2.47.3