When the PERFCORE is disabled with "-cpu host,-perfctr-core", it is reflected in in guest dmesg. [ 0.285136] Performance Events: AMD PMU driver. However, the guest CPUID indicates the PerfMonV2 is still available. CPU: Extended Performance Monitoring and Debugging (0x80000022): AMD performance monitoring V2 = true AMD LBR V2 = false AMD LBR stack & PMC freezing = false number of core perf ctrs = 0x6 (6) number of LBR stack entries = 0x0 (0) number of avail Northbridge perf ctrs = 0x0 (0) number of available UMC PMCs = 0x0 (0) active UMCs bitmask = 0x0 Disable PerfMonV2 in CPUID when PERFCORE is disabled. Suggested-by: Zhao Liu Fixes: 209b0ac12074 ("target/i386: Add PerfMonV2 feature bit") Signed-off-by: Dongli Zhang Reviewed-by: Xiaoyao Li Reviewed-by: Zhao Liu Reviewed-by: Sandipan Das --- Changed since v1: - Use feature_dependencies (suggested by Zhao Liu). Changed since v2: - Nothing. Zhao and Xiaoyao may move it to x86_cpu_expand_features() later. target/i386/cpu.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 6417775786..3653f8953e 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -1994,6 +1994,10 @@ static FeatureDep feature_dependencies[] = { .from = { FEAT_7_1_EDX, CPUID_7_1_EDX_AVX10 }, .to = { FEAT_24_0_EBX, ~0ull }, }, + { + .from = { FEAT_8000_0001_ECX, CPUID_EXT3_PERFCORE }, + .to = { FEAT_8000_0022_EAX, CPUID_8000_0022_EAX_PERFMON_V2 }, + }, }; typedef struct X86RegisterInfo32 { -- 2.39.3 Currently, AMD PMU support isn't determined based on CPUID, that is, the "-pmu" option does not fully disable KVM AMD PMU virtualization. To minimize AMD PMU features, remove PERFCORE when "-pmu" is configured. To completely disable AMD PMU virtualization will be implemented via KVM_CAP_PMU_CAPABILITY in upcoming patches. As a reminder, neither CPUID_EXT3_PERFCORE nor CPUID_8000_0022_EAX_PERFMON_V2 is removed from env->features[] when "-pmu" is configured. Developers should query whether they are supported via cpu_x86_cpuid() rather than relying on env->features[] in future patches. Suggested-by: Zhao Liu Signed-off-by: Dongli Zhang Reviewed-by: Zhao Liu Reviewed-by: Sandipan Das --- Changed since v2: - No need to check "kvm_enabled() && IS_AMD_CPU(env)". Changed since v4: - Add Reviewed-by from Sandipan. target/i386/cpu.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/target/i386/cpu.c b/target/i386/cpu.c index 3653f8953e..4fcade89bc 100644 --- a/target/i386/cpu.c +++ b/target/i386/cpu.c @@ -8360,6 +8360,10 @@ void cpu_x86_cpuid(CPUX86State *env, uint32_t index, uint32_t count, !(env->hflags & HF_LMA_MASK)) { *edx &= ~CPUID_EXT2_SYSCALL; } + + if (!cpu->enable_pmu) { + *ecx &= ~CPUID_EXT3_PERFCORE; + } break; case 0x80000002: case 0x80000003: -- 2.39.3 Although AMD PERFCORE and PerfMonV2 are removed when "-pmu" is configured, there is no way to fully disable KVM AMD PMU virtualization. Neither "-cpu host,-pmu" nor "-cpu EPYC" achieves this. As a result, the following message still appears in the VM dmesg: [ 0.263615] Performance Events: AMD PMU driver. However, the expected output should be: [ 0.596381] Performance Events: PMU not available due to virtualization, using software events only. [ 0.600972] NMI watchdog: Perf NMI watchdog permanently disabled This occurs because AMD does not use any CPUID bit to indicate PMU availability. To address this, KVM_CAP_PMU_CAPABILITY is used to set KVM_PMU_CAP_DISABLE when "-pmu" is configured. Signed-off-by: Dongli Zhang Reviewed-by: Xiaoyao Li Reviewed-by: Zhao Liu Reviewed-by: Dapeng Mi --- Changed since v1: - Switch back to the initial implementation with "-pmu". https://lore.kernel.org/all/20221119122901.2469-3-dongli.zhang@oracle.com - Mention that "KVM_PMU_CAP_DISABLE doesn't change the PMU behavior on Intel platform because current "pmu" property works as expected." Changed since v2: - Change has_pmu_cap to pmu_cap. - Use (pmu_cap & KVM_PMU_CAP_DISABLE) instead of only pmu_cap in if statement. - Add Reviewed-by from Xiaoyao and Zhao as the change is minor. Changed since v5: - Re-base on top of most recent mainline QEMU. - To resolve conflicts, move the PMU related code before the call site of is_tdx_vm(). Changed since v6: - Add Reviewed-by from Dapeng Mi. target/i386/kvm/kvm.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 60c7981138..e5daa8c9fe 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -178,6 +178,8 @@ static int has_triple_fault_event; static bool has_msr_mcg_ext_ctl; +static int pmu_cap; + static struct kvm_cpuid2 *cpuid_cache; static struct kvm_cpuid2 *hv_cpuid_cache; static struct kvm_msr_list *kvm_feature_msrs; @@ -2079,6 +2081,33 @@ full: int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp) { + static bool first = true; + int ret; + + if (first) { + first = false; + + /* + * Since Linux v5.18, KVM provides a VM-level capability to easily + * disable PMUs; however, QEMU has been providing PMU property per + * CPU since v1.6. In order to accommodate both, have to configure + * the VM-level capability here. + * + * KVM_PMU_CAP_DISABLE doesn't change the PMU + * behavior on Intel platform because current "pmu" property works + * as expected. + */ + if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) { + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0, + KVM_PMU_CAP_DISABLE); + if (ret < 0) { + error_setg_errno(errp, -ret, + "Failed to set KVM_PMU_CAP_DISABLE"); + return ret; + } + } + } + if (is_tdx_vm()) { return tdx_pre_create_vcpu(cpu, errp); } @@ -3390,6 +3419,8 @@ int kvm_arch_init(MachineState *ms, KVMState *s) } } + pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY); + return 0; } -- 2.39.3 The initialization of 'has_architectural_pmu_version', 'num_architectural_pmu_gp_counters', and 'num_architectural_pmu_fixed_counters' is unrelated to the process of building the CPUID. Extract them out of kvm_x86_build_cpuid(). In addition, use cpuid_find_entry() instead of cpu_x86_cpuid(), because CPUID has already been filled at this stage. Signed-off-by: Dongli Zhang Reviewed-by: Zhao Liu Reviewed-by: Dapeng Mi --- Changed since v1: - Still extract the code, but call them for all CPUs. Changed since v2: - Use cpuid_find_entry() instead of cpu_x86_cpuid(). - Didn't add Reviewed-by from Dapeng as the change isn't minor. Changed since v6: - Add Reviewed-by from Dapeng Mi. target/i386/kvm/kvm.c | 62 ++++++++++++++++++++++++------------------- 1 file changed, 35 insertions(+), 27 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index e5daa8c9fe..487647271c 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -1985,33 +1985,6 @@ uint32_t kvm_x86_build_cpuid(CPUX86State *env, struct kvm_cpuid_entry2 *entries, } } - if (limit >= 0x0a) { - uint32_t eax, edx; - - cpu_x86_cpuid(env, 0x0a, 0, &eax, &unused, &unused, &edx); - - has_architectural_pmu_version = eax & 0xff; - if (has_architectural_pmu_version > 0) { - num_architectural_pmu_gp_counters = (eax & 0xff00) >> 8; - - /* Shouldn't be more than 32, since that's the number of bits - * available in EBX to tell us _which_ counters are available. - * Play it safe. - */ - if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) { - num_architectural_pmu_gp_counters = MAX_GP_COUNTERS; - } - - if (has_architectural_pmu_version > 1) { - num_architectural_pmu_fixed_counters = edx & 0x1f; - - if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) { - num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS; - } - } - } - } - cpu_x86_cpuid(env, 0x80000000, 0, &limit, &unused, &unused, &unused); for (i = 0x80000000; i <= limit; i++) { @@ -2115,6 +2088,39 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp) return 0; } +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid) +{ + struct kvm_cpuid_entry2 *c; + + c = cpuid_find_entry(cpuid, 0xa, 0); + + if (!c) { + return; + } + + has_architectural_pmu_version = c->eax & 0xff; + if (has_architectural_pmu_version > 0) { + num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8; + + /* + * Shouldn't be more than 32, since that's the number of bits + * available in EBX to tell us _which_ counters are available. + * Play it safe. + */ + if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) { + num_architectural_pmu_gp_counters = MAX_GP_COUNTERS; + } + + if (has_architectural_pmu_version > 1) { + num_architectural_pmu_fixed_counters = c->edx & 0x1f; + + if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) { + num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS; + } + } + } +} + int kvm_arch_init_vcpu(CPUState *cs) { struct { @@ -2305,6 +2311,8 @@ int kvm_arch_init_vcpu(CPUState *cs) cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i); cpuid_data.cpuid.nent = cpuid_i; + kvm_init_pmu_info(&cpuid_data.cpuid); + if (x86_cpu_family(env->cpuid_version) >= 6 && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) == (CPUID_MCE | CPUID_MCA)) { -- 2.39.3 AMD does not have what is commonly referred to as an architectural PMU. Therefore, we need to rename the following variables to be applicable for both Intel and AMD: - has_architectural_pmu_version - num_architectural_pmu_gp_counters - num_architectural_pmu_fixed_counters For Intel processors, the meaning of pmu_version remains unchanged. For AMD processors: pmu_version == 1 corresponds to versions before AMD PerfMonV2. pmu_version == 2 corresponds to AMD PerfMonV2. Signed-off-by: Dongli Zhang Reviewed-by: Dapeng Mi Reviewed-by: Zhao Liu Reviewed-by: Sandipan Das --- Changed since v2: - Change has_pmu_version to pmu_version. - Add Reviewed-by since the change is minor. - As a reminder, there are some contextual change due to PATCH 05, i.e., c->edx vs. edx. Changed since v6: - Add Reviewed-by from Sandipan. target/i386/kvm/kvm.c | 49 ++++++++++++++++++++++++------------------- 1 file changed, 28 insertions(+), 21 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 487647271c..577326537e 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -166,9 +166,16 @@ static bool has_msr_perf_capabs; static bool has_msr_pkrs; static bool has_msr_hwcr; -static uint32_t has_architectural_pmu_version; -static uint32_t num_architectural_pmu_gp_counters; -static uint32_t num_architectural_pmu_fixed_counters; +/* + * For Intel processors, the meaning is the architectural PMU version + * number. + * + * For AMD processors: 1 corresponds to the prior versions, and 2 + * corresponds to AMD PerfMonV2. + */ +static uint32_t pmu_version; +static uint32_t num_pmu_gp_counters; +static uint32_t num_pmu_fixed_counters; static int has_xsave2; static int has_xcrs; @@ -2098,24 +2105,24 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid) return; } - has_architectural_pmu_version = c->eax & 0xff; - if (has_architectural_pmu_version > 0) { - num_architectural_pmu_gp_counters = (c->eax & 0xff00) >> 8; + pmu_version = c->eax & 0xff; + if (pmu_version > 0) { + num_pmu_gp_counters = (c->eax & 0xff00) >> 8; /* * Shouldn't be more than 32, since that's the number of bits * available in EBX to tell us _which_ counters are available. * Play it safe. */ - if (num_architectural_pmu_gp_counters > MAX_GP_COUNTERS) { - num_architectural_pmu_gp_counters = MAX_GP_COUNTERS; + if (num_pmu_gp_counters > MAX_GP_COUNTERS) { + num_pmu_gp_counters = MAX_GP_COUNTERS; } - if (has_architectural_pmu_version > 1) { - num_architectural_pmu_fixed_counters = c->edx & 0x1f; + if (pmu_version > 1) { + num_pmu_fixed_counters = c->edx & 0x1f; - if (num_architectural_pmu_fixed_counters > MAX_FIXED_COUNTERS) { - num_architectural_pmu_fixed_counters = MAX_FIXED_COUNTERS; + if (num_pmu_fixed_counters > MAX_FIXED_COUNTERS) { + num_pmu_fixed_counters = MAX_FIXED_COUNTERS; } } } @@ -4078,25 +4085,25 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level) kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr); } - if (has_architectural_pmu_version > 0) { - if (has_architectural_pmu_version > 1) { + if (pmu_version > 0) { + if (pmu_version > 1) { /* Stop the counter. */ kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0); kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0); } /* Set the counter values. */ - for (i = 0; i < num_architectural_pmu_fixed_counters; i++) { + for (i = 0; i < num_pmu_fixed_counters; i++) { kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, env->msr_fixed_counters[i]); } - for (i = 0; i < num_architectural_pmu_gp_counters; i++) { + for (i = 0; i < num_pmu_gp_counters; i++) { kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, env->msr_gp_counters[i]); kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, env->msr_gp_evtsel[i]); } - if (has_architectural_pmu_version > 1) { + if (pmu_version > 1) { kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, env->msr_global_status); kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, @@ -4556,17 +4563,17 @@ static int kvm_get_msrs(X86CPU *cpu) if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) { kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1); } - if (has_architectural_pmu_version > 0) { - if (has_architectural_pmu_version > 1) { + if (pmu_version > 0) { + if (pmu_version > 1) { kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0); kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0); kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_STATUS, 0); kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, 0); } - for (i = 0; i < num_architectural_pmu_fixed_counters; i++) { + for (i = 0; i < num_pmu_fixed_counters; i++) { kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, 0); } - for (i = 0; i < num_architectural_pmu_gp_counters; i++) { + for (i = 0; i < num_pmu_gp_counters; i++) { kvm_msr_entry_add(cpu, MSR_P6_PERFCTR0 + i, 0); kvm_msr_entry_add(cpu, MSR_P6_EVNTSEL0 + i, 0); } -- 2.39.3 When PMU is enabled in QEMU, there is a chance that PMU virtualization is completely disabled by the KVM module parameter kvm.enable_pmu=N. The kvm.enable_pmu parameter is introduced since Linux v5.17. Its permission is 0444. It does not change until a reload of the KVM module. Read the kvm.enable_pmu value from the module sysfs to give a chance to provide more information about vPMU enablement. Signed-off-by: Dongli Zhang Reviewed-by: Zhao Liu Reviewed-by: Dapeng Mi --- Changed since v2: - Rework the code flow following Zhao's suggestion. - Return error when: (*kvm_enable_pmu == 'N' && X86_CPU(cpu)->enable_pmu) Changed since v3: - Re-split the cases into enable_pmu and !enable_pmu, following Zhao's suggestion. - Rework the commit messages. - Bring back global static variable 'kvm_pmu_disabled' from v2. Changed since v4: - Add Reviewed-by from Zhao. Changed since v5: - Rebase on top of most recent QEMU. Changed since v6: - Add Reviewed-by from Dapeng Mi. target/i386/kvm/kvm.c | 61 +++++++++++++++++++++++++++++++------------ 1 file changed, 44 insertions(+), 17 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 577326537e..97782ce070 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -186,6 +186,10 @@ static int has_triple_fault_event; static bool has_msr_mcg_ext_ctl; static int pmu_cap; +/* + * Read from /sys/module/kvm/parameters/enable_pmu. + */ +static bool kvm_pmu_disabled; static struct kvm_cpuid2 *cpuid_cache; static struct kvm_cpuid2 *hv_cpuid_cache; @@ -2067,23 +2071,30 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp) if (first) { first = false; - /* - * Since Linux v5.18, KVM provides a VM-level capability to easily - * disable PMUs; however, QEMU has been providing PMU property per - * CPU since v1.6. In order to accommodate both, have to configure - * the VM-level capability here. - * - * KVM_PMU_CAP_DISABLE doesn't change the PMU - * behavior on Intel platform because current "pmu" property works - * as expected. - */ - if ((pmu_cap & KVM_PMU_CAP_DISABLE) && !X86_CPU(cpu)->enable_pmu) { - ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0, - KVM_PMU_CAP_DISABLE); - if (ret < 0) { - error_setg_errno(errp, -ret, - "Failed to set KVM_PMU_CAP_DISABLE"); - return ret; + if (X86_CPU(cpu)->enable_pmu) { + if (kvm_pmu_disabled) { + warn_report("Failed to enable PMU since " + "KVM's enable_pmu parameter is disabled"); + } + } else { + /* + * Since Linux v5.18, KVM provides a VM-level capability to easily + * disable PMUs; however, QEMU has been providing PMU property per + * CPU since v1.6. In order to accommodate both, have to configure + * the VM-level capability here. + * + * KVM_PMU_CAP_DISABLE doesn't change the PMU + * behavior on Intel platform because current "pmu" property works + * as expected. + */ + if (pmu_cap & KVM_PMU_CAP_DISABLE) { + ret = kvm_vm_enable_cap(kvm_state, KVM_CAP_PMU_CAPABILITY, 0, + KVM_PMU_CAP_DISABLE); + if (ret < 0) { + error_setg_errno(errp, -ret, + "Failed to set KVM_PMU_CAP_DISABLE"); + return ret; + } } } } @@ -3301,6 +3312,7 @@ int kvm_arch_init(MachineState *ms, KVMState *s) int ret; struct utsname utsname; Error *local_err = NULL; + g_autofree char *kvm_enable_pmu; /* * Initialize confidential guest (SEV/TDX) context, if required @@ -3436,6 +3448,21 @@ int kvm_arch_init(MachineState *ms, KVMState *s) pmu_cap = kvm_check_extension(s, KVM_CAP_PMU_CAPABILITY); + /* + * The enable_pmu parameter is introduced since Linux v5.17, + * give a chance to provide more information about vPMU + * enablement. + * + * The kvm.enable_pmu's permission is 0444. It does not change + * until a reload of the KVM module. + */ + if (g_file_get_contents("/sys/module/kvm/parameters/enable_pmu", + &kvm_enable_pmu, NULL, NULL)) { + if (*kvm_enable_pmu == 'N') { + kvm_pmu_disabled = true; + } + } + return 0; } -- 2.39.3 QEMU uses the kvm_get_msrs() function to save Intel PMU registers from KVM and kvm_put_msrs() to restore them to KVM. However, there is no support for AMD PMU registers. Currently, pmu_version and num_pmu_gp_counters are initialized based on cpuid(0xa), which does not apply to AMD processors. For AMD CPUs, prior to PerfMonV2, the number of general-purpose registers is determined based on the CPU version. To address this issue, we need to add support for AMD PMU registers. Without this support, the following problems can arise: 1. If the VM is reset (e.g., via QEMU system_reset or VM kdump/kexec) while running "perf top", the PMU registers are not disabled properly. 2. Despite x86_cpu_reset() resetting many registers to zero, kvm_put_msrs() does not handle AMD PMU registers, causing some PMU events to remain enabled in KVM. 3. The KVM kvm_pmc_speculative_in_use() function consistently returns true, preventing the reclamation of these events. Consequently, the kvm_pmc->perf_event remains active. 4. After a reboot, the VM kernel may report the following error: [ 0.092011] Performance Events: Fam17h+ core perfctr, Broken BIOS detected, complain to your hardware vendor. [ 0.092023] [Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR c0010200 is 530076) 5. In the worst case, the active kvm_pmc->perf_event may inject unknown NMIs randomly into the VM kernel: [...] Uhhuh. NMI received for unknown reason 30 on CPU 0. To resolve these issues, we propose resetting AMD PMU registers during the VM reset process. Signed-off-by: Dongli Zhang Reviewed-by: Zhao Liu Reviewed-by: Sandipan Das Reviewed-by: Dapeng Mi --- Changed since v1: - Modify "MSR_K7_EVNTSEL0 + 3" and "MSR_K7_PERFCTR0 + 3" by using AMD64_NUM_COUNTERS (suggested by Sandipan Das). - Use "AMD64_NUM_COUNTERS_CORE * 2 - 1", not "MSR_F15H_PERF_CTL0 + 0xb". (suggested by Sandipan Das). - Switch back to "-pmu" instead of using a global "pmu-cap-disabled". - Don't initialize PMU info if kvm.enable_pmu=N. Changed since v2: - Remove 'static' from host_cpuid_vendorX. - Change has_pmu_version to pmu_version. - Use object_property_get_int() to get CPU family. - Use cpuid_find_entry() instead of cpu_x86_cpuid(). - Send error log when host and guest are from different vendors. - Move "if (!cpu->enable_pmu)" to begin of function. Add comments to reminder developers. - Add support to Zhaoxin. Change is_same_vendor() to is_host_compat_vendor(). - Didn't add Reviewed-by from Sandipan because the change isn't minor. Changed since v3: - Use host_cpu_vendor_fms() from Zhao's patch. - Check AMD directly makes the "compat" rule clear. - Add comment to MAX_GP_COUNTERS. - Skip PMU info initialization if !kvm_pmu_disabled. Changed since v4: - Add Reviewed-by from Zhao and Sandipan. Changed since v6: - Add Reviewed-by from Dapeng Mi. target/i386/cpu.h | 12 +++ target/i386/kvm/kvm.c | 175 +++++++++++++++++++++++++++++++++++++++++- 2 files changed, 183 insertions(+), 4 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index cee1f692a1..ed4d0c375b 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -489,6 +489,14 @@ typedef enum X86Seg { #define MSR_CORE_PERF_GLOBAL_CTRL 0x38f #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390 +#define MSR_K7_EVNTSEL0 0xc0010000 +#define MSR_K7_PERFCTR0 0xc0010004 +#define MSR_F15H_PERF_CTL0 0xc0010200 +#define MSR_F15H_PERF_CTR0 0xc0010201 + +#define AMD64_NUM_COUNTERS 4 +#define AMD64_NUM_COUNTERS_CORE 6 + #define MSR_MC0_CTL 0x400 #define MSR_MC0_STATUS 0x401 #define MSR_MC0_ADDR 0x402 @@ -1648,6 +1656,10 @@ typedef struct { #endif #define MAX_FIXED_COUNTERS 3 +/* + * This formula is based on Intel's MSR. The current size also meets AMD's + * needs. + */ #define MAX_GP_COUNTERS (MSR_IA32_PERF_STATUS - MSR_P6_EVNTSEL0) #define NB_OPMASK_REGS 8 diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 97782ce070..cbdd797be3 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2106,7 +2106,7 @@ int kvm_arch_pre_create_vcpu(CPUState *cpu, Error **errp) return 0; } -static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid) +static void kvm_init_pmu_info_intel(struct kvm_cpuid2 *cpuid) { struct kvm_cpuid_entry2 *c; @@ -2139,6 +2139,96 @@ static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid) } } +static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu) +{ + struct kvm_cpuid_entry2 *c; + int64_t family; + + family = object_property_get_int(OBJECT(cpu), "family", NULL); + if (family < 0) { + return; + } + + if (family < 6) { + error_report("AMD performance-monitoring is supported from " + "K7 and later"); + return; + } + + pmu_version = 1; + num_pmu_gp_counters = AMD64_NUM_COUNTERS; + + c = cpuid_find_entry(cpuid, 0x80000001, 0); + if (!c) { + return; + } + + if (!(c->ecx & CPUID_EXT3_PERFCORE)) { + return; + } + + num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE; +} + +static bool is_host_compat_vendor(CPUX86State *env) +{ + char host_vendor[CPUID_VENDOR_SZ + 1]; + + host_cpu_vendor_fms(host_vendor, NULL, NULL, NULL); + + /* + * Intel and Zhaoxin are compatible. + */ + if ((g_str_equal(host_vendor, CPUID_VENDOR_INTEL) || + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN1) || + g_str_equal(host_vendor, CPUID_VENDOR_ZHAOXIN2)) && + (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env))) { + return true; + } + + return g_str_equal(host_vendor, CPUID_VENDOR_AMD) && + IS_AMD_CPU(env); +} + +static void kvm_init_pmu_info(struct kvm_cpuid2 *cpuid, X86CPU *cpu) +{ + CPUX86State *env = &cpu->env; + + /* + * The PMU virtualization is disabled by kvm.enable_pmu=N. + */ + if (kvm_pmu_disabled) { + return; + } + + /* + * If KVM_CAP_PMU_CAPABILITY is not supported, there is no way to + * disable the AMD PMU virtualization. + * + * Assume the user is aware of this when !cpu->enable_pmu. AMD PMU + * registers are not going to reset, even they are still available to + * guest VM. + */ + if (!cpu->enable_pmu) { + return; + } + + /* + * It is not supported to virtualize AMD PMU registers on Intel + * processors, nor to virtualize Intel PMU registers on AMD processors. + */ + if (!is_host_compat_vendor(env)) { + error_report("host doesn't support requested feature: vPMU"); + return; + } + + if (IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) { + kvm_init_pmu_info_intel(cpuid); + } else if (IS_AMD_CPU(env)) { + kvm_init_pmu_info_amd(cpuid, cpu); + } +} + int kvm_arch_init_vcpu(CPUState *cs) { struct { @@ -2329,7 +2419,7 @@ int kvm_arch_init_vcpu(CPUState *cs) cpuid_i = kvm_x86_build_cpuid(env, cpuid_data.entries, cpuid_i); cpuid_data.cpuid.nent = cpuid_i; - kvm_init_pmu_info(&cpuid_data.cpuid); + kvm_init_pmu_info(&cpuid_data.cpuid, cpu); if (x86_cpu_family(env->cpuid_version) >= 6 && (env->features[FEAT_1_EDX] & (CPUID_MCE | CPUID_MCA)) == @@ -4112,7 +4202,7 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level) kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, env->poll_control_msr); } - if (pmu_version > 0) { + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) { if (pmu_version > 1) { /* Stop the counter. */ kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0); @@ -4143,6 +4233,38 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level) env->msr_global_ctrl); } } + + if (IS_AMD_CPU(env) && pmu_version > 0) { + uint32_t sel_base = MSR_K7_EVNTSEL0; + uint32_t ctr_base = MSR_K7_PERFCTR0; + /* + * The address of the next selector or counter register is + * obtained by incrementing the address of the current selector + * or counter register by one. + */ + uint32_t step = 1; + + /* + * When PERFCORE is enabled, AMD PMU uses a separate set of + * addresses for the selector and counter registers. + * Additionally, the address of the next selector or counter + * register is determined by incrementing the address of the + * current register by two. + */ + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) { + sel_base = MSR_F15H_PERF_CTL0; + ctr_base = MSR_F15H_PERF_CTR0; + step = 2; + } + + for (i = 0; i < num_pmu_gp_counters; i++) { + kvm_msr_entry_add(cpu, ctr_base + i * step, + env->msr_gp_counters[i]); + kvm_msr_entry_add(cpu, sel_base + i * step, + env->msr_gp_evtsel[i]); + } + } + /* * Hyper-V partition-wide MSRs: to avoid clearing them on cpu hot-add, * only sync them to KVM on the first cpu @@ -4590,7 +4712,8 @@ static int kvm_get_msrs(X86CPU *cpu) if (env->features[FEAT_KVM] & CPUID_KVM_POLL_CONTROL) { kvm_msr_entry_add(cpu, MSR_KVM_POLL_CONTROL, 1); } - if (pmu_version > 0) { + + if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) { if (pmu_version > 1) { kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0); kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0); @@ -4606,6 +4729,35 @@ static int kvm_get_msrs(X86CPU *cpu) } } + if (IS_AMD_CPU(env) && pmu_version > 0) { + uint32_t sel_base = MSR_K7_EVNTSEL0; + uint32_t ctr_base = MSR_K7_PERFCTR0; + /* + * The address of the next selector or counter register is + * obtained by incrementing the address of the current selector + * or counter register by one. + */ + uint32_t step = 1; + + /* + * When PERFCORE is enabled, AMD PMU uses a separate set of + * addresses for the selector and counter registers. + * Additionally, the address of the next selector or counter + * register is determined by incrementing the address of the + * current register by two. + */ + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) { + sel_base = MSR_F15H_PERF_CTL0; + ctr_base = MSR_F15H_PERF_CTR0; + step = 2; + } + + for (i = 0; i < num_pmu_gp_counters; i++) { + kvm_msr_entry_add(cpu, ctr_base + i * step, 0); + kvm_msr_entry_add(cpu, sel_base + i * step, 0); + } + } + if (env->mcg_cap) { kvm_msr_entry_add(cpu, MSR_MCG_STATUS, 0); kvm_msr_entry_add(cpu, MSR_MCG_CTL, 0); @@ -4917,6 +5069,21 @@ static int kvm_get_msrs(X86CPU *cpu) case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL0 + MAX_GP_COUNTERS - 1: env->msr_gp_evtsel[index - MSR_P6_EVNTSEL0] = msrs[i].data; break; + case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL0 + AMD64_NUM_COUNTERS - 1: + env->msr_gp_evtsel[index - MSR_K7_EVNTSEL0] = msrs[i].data; + break; + case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR0 + AMD64_NUM_COUNTERS - 1: + env->msr_gp_counters[index - MSR_K7_PERFCTR0] = msrs[i].data; + break; + case MSR_F15H_PERF_CTL0 ... + MSR_F15H_PERF_CTL0 + AMD64_NUM_COUNTERS_CORE * 2 - 1: + index = index - MSR_F15H_PERF_CTL0; + if (index & 0x1) { + env->msr_gp_counters[index] = msrs[i].data; + } else { + env->msr_gp_evtsel[index] = msrs[i].data; + } + break; case HV_X64_MSR_HYPERCALL: env->msr_hv_hypercall = msrs[i].data; break; -- 2.39.3 Since perfmon-v2, the AMD PMU supports additional registers. This update includes get/put functionality for these extra registers. Similar to the implementation in KVM: - MSR_CORE_PERF_GLOBAL_STATUS and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS both use env->msr_global_status. - MSR_CORE_PERF_GLOBAL_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_CTL both use env->msr_global_ctrl. - MSR_CORE_PERF_GLOBAL_OVF_CTRL and MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR both use env->msr_global_ovf_ctrl. No changes are needed for vmstate_msr_architectural_pmu or pmu_enable_needed(). Signed-off-by: Dongli Zhang Reviewed-by: Zhao Liu Reviewed-by: Sandipan Das --- Changed since v1: - Use "has_pmu_version > 1", not "has_pmu_version == 2". Changed since v2: - Use cpuid_find_entry() instead of cpu_x86_cpuid(). - Change has_pmu_version to pmu_version. - Cap num_pmu_gp_counters with MAX_GP_COUNTERS. Changed since v4: - Add Reviewed-by from Sandipan. target/i386/cpu.h | 4 ++++ target/i386/kvm/kvm.c | 48 +++++++++++++++++++++++++++++++++++-------- 2 files changed, 43 insertions(+), 9 deletions(-) diff --git a/target/i386/cpu.h b/target/i386/cpu.h index ed4d0c375b..6d78f3995b 100644 --- a/target/i386/cpu.h +++ b/target/i386/cpu.h @@ -489,6 +489,10 @@ typedef enum X86Seg { #define MSR_CORE_PERF_GLOBAL_CTRL 0x38f #define MSR_CORE_PERF_GLOBAL_OVF_CTRL 0x390 +#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS 0xc0000300 +#define MSR_AMD64_PERF_CNTR_GLOBAL_CTL 0xc0000301 +#define MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR 0xc0000302 + #define MSR_K7_EVNTSEL0 0xc0010000 #define MSR_K7_PERFCTR0 0xc0010004 #define MSR_F15H_PERF_CTL0 0xc0010200 diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index cbdd797be3..5258023fe7 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -2168,6 +2168,16 @@ static void kvm_init_pmu_info_amd(struct kvm_cpuid2 *cpuid, X86CPU *cpu) } num_pmu_gp_counters = AMD64_NUM_COUNTERS_CORE; + + c = cpuid_find_entry(cpuid, 0x80000022, 0); + if (c && (c->eax & CPUID_8000_0022_EAX_PERFMON_V2)) { + pmu_version = 2; + num_pmu_gp_counters = c->ebx & 0xf; + + if (num_pmu_gp_counters > MAX_GP_COUNTERS) { + num_pmu_gp_counters = MAX_GP_COUNTERS; + } + } } static bool is_host_compat_vendor(CPUX86State *env) @@ -4245,13 +4255,14 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level) uint32_t step = 1; /* - * When PERFCORE is enabled, AMD PMU uses a separate set of - * addresses for the selector and counter registers. - * Additionally, the address of the next selector or counter - * register is determined by incrementing the address of the - * current register by two. + * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a + * separate set of addresses for the selector and counter + * registers. Additionally, the address of the next selector or + * counter register is determined by incrementing the address + * of the current register by two. */ - if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) { + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE || + pmu_version > 1) { sel_base = MSR_F15H_PERF_CTL0; ctr_base = MSR_F15H_PERF_CTR0; step = 2; @@ -4263,6 +4274,15 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level) kvm_msr_entry_add(cpu, sel_base + i * step, env->msr_gp_evtsel[i]); } + + if (pmu_version > 1) { + kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, + env->msr_global_status); + kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, + env->msr_global_ovf_ctrl); + kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, + env->msr_global_ctrl); + } } /* @@ -4740,13 +4760,14 @@ static int kvm_get_msrs(X86CPU *cpu) uint32_t step = 1; /* - * When PERFCORE is enabled, AMD PMU uses a separate set of - * addresses for the selector and counter registers. + * When PERFCORE or PerfMonV2 is enabled, AMD PMU uses a separate + * set of addresses for the selector and counter registers. * Additionally, the address of the next selector or counter * register is determined by incrementing the address of the * current register by two. */ - if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE) { + if (num_pmu_gp_counters == AMD64_NUM_COUNTERS_CORE || + pmu_version > 1) { sel_base = MSR_F15H_PERF_CTL0; ctr_base = MSR_F15H_PERF_CTR0; step = 2; @@ -4756,6 +4777,12 @@ static int kvm_get_msrs(X86CPU *cpu) kvm_msr_entry_add(cpu, ctr_base + i * step, 0); kvm_msr_entry_add(cpu, sel_base + i * step, 0); } + + if (pmu_version > 1) { + kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_CTL, 0); + kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS, 0); + kvm_msr_entry_add(cpu, MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR, 0); + } } if (env->mcg_cap) { @@ -5052,12 +5079,15 @@ static int kvm_get_msrs(X86CPU *cpu) env->msr_fixed_ctr_ctrl = msrs[i].data; break; case MSR_CORE_PERF_GLOBAL_CTRL: + case MSR_AMD64_PERF_CNTR_GLOBAL_CTL: env->msr_global_ctrl = msrs[i].data; break; case MSR_CORE_PERF_GLOBAL_STATUS: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS: env->msr_global_status = msrs[i].data; break; case MSR_CORE_PERF_GLOBAL_OVF_CTRL: + case MSR_AMD64_PERF_CNTR_GLOBAL_STATUS_CLR: env->msr_global_ovf_ctrl = msrs[i].data; break; case MSR_CORE_PERF_FIXED_CTR0 ... MSR_CORE_PERF_FIXED_CTR0 + MAX_FIXED_COUNTERS - 1: -- 2.39.3 PMU MSRs are set by QEMU only at levels >= KVM_PUT_RESET_STATE, excluding runtime. Therefore, updating these MSRs without stopping events should be acceptable. In addition, KVM creates kernel perf events with host mode excluded (exclude_host = 1). While the events remain active, they don't increment the counter during QEMU vCPU userspace mode. Finally, The kvm_put_msrs() sets the MSRs using KVM_SET_MSRS. The x86 KVM processes these MSRs one by one in a loop, only saving the config and triggering the KVM_REQ_PMU request. This approach does not immediately stop the event before updating PMC. This approach is true since Linux kernel commit 68fb4757e867 ("KVM: x86/pmu: Defer reprogram_counter() to kvm_pmu_handle_event"), that is, v6.2. No Fixed tag is going to be added for the commit 0d89436786b0 ("kvm: migrate vPMU state"), because this isn't a bugfix. Signed-off-by: Dongli Zhang Reviewed-by: Zhao Liu Reviewed-by: Dapeng Mi --- Changed since v3: - Re-order reasons in commit messages. - Mention KVM's commit 68fb4757e867 (v6.2). - Keep Zhao's review as there isn't code change. Changed since v6: - Add Reviewed-by from Dapeng Mi. target/i386/kvm/kvm.c | 9 --------- 1 file changed, 9 deletions(-) diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c index 5258023fe7..d0df53807f 100644 --- a/target/i386/kvm/kvm.c +++ b/target/i386/kvm/kvm.c @@ -4213,13 +4213,6 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level) } if ((IS_INTEL_CPU(env) || IS_ZHAOXIN_CPU(env)) && pmu_version > 0) { - if (pmu_version > 1) { - /* Stop the counter. */ - kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, 0); - kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, 0); - } - - /* Set the counter values. */ for (i = 0; i < num_pmu_fixed_counters; i++) { kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR0 + i, env->msr_fixed_counters[i]); @@ -4235,8 +4228,6 @@ static int kvm_put_msrs(X86CPU *cpu, KvmPutState level) env->msr_global_status); kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_OVF_CTRL, env->msr_global_ovf_ctrl); - - /* Now start the PMU. */ kvm_msr_entry_add(cpu, MSR_CORE_PERF_FIXED_CTR_CTRL, env->msr_fixed_ctr_ctrl); kvm_msr_entry_add(cpu, MSR_CORE_PERF_GLOBAL_CTRL, -- 2.39.3