From: Wanpeng Li The strict IPI-aware candidate filter can find no target if IPI tracking misses the relationship, for example for APICv-delivered IPIs, or if the runnable set changes during the scan. If the strict pass yields nothing, run a second relaxed pass gated only by vcpu->preempted. Control the fallback with the enable_relaxed_boost module parameter (default on), so it can be disabled at runtime if it causes over-boosting. With the full series, PARSEC simlarge on 16-vCPU guests under host CPU overcommit, latency reduction: Dedup (IPI-heavy synchronization): 2 VMs: +8.87% 3 VMs: +10.29% 4 VMs: +15.60% VIPS (balanced sync and compute): 2 VMs: +10.23% 3 VMs: +6.63% 4 VMs: +4.50% The IPI-heavy Dedup workload benefits most, as the confirmed IPI receiver is preferred over the generic preempted lock-holder heuristic. Signed-off-by: Wanpeng Li --- virt/kvm/kvm_main.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 84cbd7a6183f..a327acb198de 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -101,6 +101,19 @@ EXPORT_SYMBOL_FOR_KVM_INTERNAL(halt_poll_ns_shrink); static bool __ro_after_init allow_unsafe_mappings; module_param(allow_unsafe_mappings, bool, 0444); +/* + * enable_relaxed_boost - second-round safety net for kvm_vcpu_on_spin(). + * + * When on (default), if the strict scan finds no eligible yield target, + * fall back to a relaxed scan gated only by vcpu->preempted. This + * preserves forward progress if IPI tracking is missed (e.g. + * APICv-delivered IPIs) or the runnable set changes mid-scan. + * + * Disable this at runtime if the relaxed pass causes over-boosting. + */ +static bool enable_relaxed_boost = true; +module_param(enable_relaxed_boost, bool, 0644); + /* * Ordering of locks: * @@ -4037,6 +4050,8 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) * they may all try to yield to the same vCPU(s). But as above, this * is all best effort due to KVM's lack of visibility into the guest. */ +retry: + yielded = 0; start = READ_ONCE(kvm->last_boosted_vcpu) + 1; for (i = 0; i < nr_vcpus; i++) { idx = (start + i) % nr_vcpus; @@ -4077,6 +4092,15 @@ void kvm_vcpu_on_spin(struct kvm_vcpu *me, bool yield_to_kernel_mode) } } + /* + * Second, relaxed pass if enabled, the strict pass yielded nothing, + * and we still have retry budget for -ESRCH paths. + */ + if (enable_relaxed_boost && first_round && yielded <= 0 && try > 0) { + first_round = false; + goto retry; + } + kvm_vcpu_set_in_spin_loop(me, false); /* Ensure vcpu is not eligible during next spinloop */ -- 2.43.0