When a virtual machine uses the HV timer during suspend, the KVM timer does not advance. Upon waking after a long period, there may be a significant gap between target_expiration and the current time. Since each timer expiration only advances target_expiration by one period, the expiration handler can be invoked repeatedly to catch up. Additionally, if the advanced target_expiration remained less than the current time, tscdeadline could be set to a negative value. This would cause HV timer setup to fail and fallback to the SW timer. After switching to SW timer, apic_timer_fn could be repeatedly executed within a single clock interrupt handler, resulting in a hardlockup: NMI watchdog: Watchdog detected hard LOCKUP on cpu 45 ... RIP: 0010:advance_periodic_target_expiration+0x4d/0x80 [kvm] ... RSP: 0018:ff4f88f5d98d8ef0 EFLAGS: 00000046 RAX: fff0103f91be678e RBX: fff0103f91be678e RCX: 00843a7d9e127bcc RDX: 0000000000000002 RSI: 0052ca4003697505 RDI: ff440d5bfbdbd500 RBP: ff440d5956f99200 R08: ff2ff2a42deb6a84 R09: 000000000002a6c0 R10: 0122d794016332b3 R11: 0000000000000000 R12: ff440db1af39cfc0 R13: ff440db1af39cfc0 R14: ffffffffc0d4a560 R15: ff440db1af39d0f8 FS: 00007f04a6ffd700(0000) GS:ff440db1af380000(0000) knlGS:000000e38a3b8000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 000000d5651feff8 CR3: 000000684e038002 CR4: 0000000000773ee0 PKRU: 55555554 Call Trace: apic_timer_fn+0x31/0x50 [kvm] __hrtimer_run_queues+0x100/0x280 hrtimer_interrupt+0x100/0x210 ? ttwu_do_wakeup+0x19/0x160 smp_apic_timer_interrupt+0x6a/0x130 apic_timer_interrupt+0xf/0x20 We modify it as follows: if, after advancing, after advancing, target_expiration is still less than the current time, we set target_expiration directly to now. This also ensures that delta is non-negative. Fixes: d8f2f498d9ed ("x86/kvm: fix LAPIC timer drift when guest uses periodic mode") Signed-off-by: fuqiang wang --- arch/x86/kvm/lapic.c | 32 ++++++++++++++++++++++++-------- 1 file changed, 24 insertions(+), 8 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 0ae7f913d782..307e2d6c3450 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -2131,18 +2131,34 @@ static void advance_periodic_target_expiration(struct kvm_lapic *apic) ktime_t delta; /* - * Synchronize both deadlines to the same time source or - * differences in the periods (caused by differences in the - * underlying clocks or numerical approximation errors) will - * cause the two to drift apart over time as the errors - * accumulate. + * Use kernel time as the time source for both deadlines so that they + * stay synchronized. Computing each deadline independently will cause + * the two deadlines to drift apart over time as differences in the + * periods accumulate, e.g. due to differences in the underlying clocks + * or numerical approximation errors. */ apic->lapic_timer.target_expiration = ktime_add_ns(apic->lapic_timer.target_expiration, apic->lapic_timer.period); + + /* + * When the vm is suspend, the hv timer also stops advancing. After it + * is resumed, this may result in a large delta. If the + * target_expiration only advances by one period each time, it will + * cause KVM to frequently handle timer expirations. + */ + if (apic->lapic_timer.period > 0 && + ktime_before(apic->lapic_timer.target_expiration, now)) + apic->lapic_timer.target_expiration = now; + delta = ktime_sub(apic->lapic_timer.target_expiration, now); - apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl) + - nsec_to_cycles(apic->vcpu, delta); + apic->lapic_timer.tscdeadline = kvm_read_l1_tsc(apic->vcpu, tscl); + /* + * Note: delta must not be negative. Otherwise, blindly adding a + * negative delta could cause the deadline to become excessively large + * due to the deadline being an unsigned value. + */ + apic->lapic_timer.tscdeadline += nsec_to_cycles(apic->vcpu, delta); } static void start_sw_period(struct kvm_lapic *apic) @@ -2972,7 +2988,7 @@ static enum hrtimer_restart apic_timer_fn(struct hrtimer *data) if (lapic_is_periodic(apic)) { advance_periodic_target_expiration(apic); - hrtimer_add_expires_ns(&ktimer->timer, ktimer->period); + hrtimer_set_expires(&ktimer->timer, ktimer->target_expiration); return HRTIMER_RESTART; } else return HRTIMER_NORESTART; -- 2.47.0