The APIC timer runs at wall clock time under KVM. If vCPU is stalled for long enough, timer can expire before the guest reads TMCCT or tick well past expected values, causing various false test failures [1]. Add retry attempts with increasing timer period (10ms, 60ms, 700ms) if any test fails, to handle spurious failures due to vCPU stalls. 1) Failures we sometimes observe in CI are: "FAIL: TMCCT should have a non-zero value" "FAIL: TMCCT should be reset to the initial-count" "FAIL: TMCCT should not be reset to TMICT value" Seen on both Intel and AMD hosts. PS: on most test runs, test completes fine on the 1st iteration. The patch would affect only failure path which will be slowed down due to retries but still fail if there is a bug with benefit of getting rid of false positives caused by vCPU stalls. PS2: Number of tries and tmict delta comes from analyzing vcpu stalls on heavily overcommited Haswell host with test being bounced between 2 sockets. Typically 2nd iteration (60ms) is enough to get rid of false positives. Signed-off-by: Igor Mammedov --- x86/apic.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/x86/apic.c b/x86/apic.c index d4eb8e11..27597323 100644 --- a/x86/apic.c +++ b/x86/apic.c @@ -575,9 +575,12 @@ static void test_apic_change_mode(void) { const uint32_t tmict_values[] = { 0x999999, /* ~10ms */ + 0x3938700, /* ~60ms */ + 0x29b92700, /* ~700ms */ }; int retry, max_retries = ARRAY_SIZE(tmict_values); uint32_t tmict; + bool fail; bool tmict_reset = false, o_nonzero = false, o_reached_zero = false; bool p_nonzero = false, p_not_reset = false, p_after_wrap = false; bool p2o_not_reset = false, p2o_reached_zero = false, p2o_stay_zero = false; @@ -635,6 +638,11 @@ static void test_apic_change_mode(void) /* now tmcct == 0 and tmict != 0 */ apic_change_mode(APIC_LVT_TIMER_PERIODIC); p2o_stay_zero = !apic_read(APIC_TMCCT); + fail = !(tmict_reset && o_nonzero && o_reached_zero && + p_nonzero && p_not_reset && p_after_wrap && + p2o_not_reset && p2o_reached_zero && p2o_stay_zero); + if (!fail) + break; } report(tmict_reset, "TMICT value reset"); -- 2.47.3