From: Dave Hansen == CR Pinning Background == Modern CPU hardening features like SMAP/SMEP are enabled by flipping control register (CR) bits. Attackers find these features inconvenient and often try to disable them. CR-pinning is a kernel hardening feature that detects when security-sensitive control bits are flipped off, complains about it, then turns them back on. The CR-pinning checks are performed in the CR manipulation helpers. X86_CR4_FRED controls FRED enabling and is pinned. There is a single, system-wide static key that controls CR-pinning behavior. The static key is enabled by the boot CPU after it has established its CR configuration. The end result is that CR-pinning is not active while initializing the boot CPU but it is active while bringing up secondary CPUs. == FRED Background == FRED is a new hardware entry/exit feature for the kernel. It is not on by default and started out as Intel-only. AMD is just adding support now. FRED has MSRs for configuration and is enabled by the pinned X86_CR4_FRED bit. It should not be enabled until after MSRs are properly initialized. == SEV Background == AMD SEV-ES and SEV-SNP use #VC (Virtualization Communication) exceptions to handle operations that require hypervisor assistance. These exceptions occur during various operations including MMIO access, CPUID instructions, and certain memory accesses. Writes to the console can generate #VC. == Problem == CR-pinning implicitly enables FRED on secondary CPUs at a different point than the boot CPU. This point is *before* the CPU has done an explicit cr4_set_bits(X86_CR4_FRED) and before the MSRs are initialized. This means that there is a window where no exceptions can be handled. For SEV-ES/SNP and TDX guests, any console output during this window triggers #VC or #VE exceptions that result in triple faults because the exception handlers rely on FRED MSRs that aren't yet configured. == Fix == Defer CR-pinning enforcement during secondary CPU bringup. This avoids any implicit CR changes during CPU bringup, ensuring that FRED is not enabled before it is configured and able to handle a #VC or #VE. This also aligns boot and secondary CPU bringup. CR-pinning is now enforced only when the CPU is online. cr4_init() is called during secondary CPU bringup, while the CPU is still offline, so the pinning logic in cr4_init() is redundant. Remove it and add WARN_ON_ONCE() to catch any future break of this assumption. Note: FRED is not on by default anywhere so this is not likely to be causing many problems. The only reason this was noticed was that AMD started to enable FRED and was turning it on. Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code") Reported-by: Nikunj A Dadhania Signed-off-by: Dave Hansen Signed-off-by: Nikunj A Dadhania [ Nikunj: Updated SEV background section wording ] Reviewed-by: Sohil Mehta Cc: stable@vger.kernel.org # 6.9+ --- arch/x86/kernel/cpu/common.c | 23 +++++++++++++++++++---- 1 file changed, 19 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/common.c b/arch/x86/kernel/cpu/common.c index 1c3261cae40c..3ccc6416a11d 100644 --- a/arch/x86/kernel/cpu/common.c +++ b/arch/x86/kernel/cpu/common.c @@ -434,6 +434,21 @@ static const unsigned long cr4_pinned_mask = X86_CR4_SMEP | X86_CR4_SMAP | X86_C static DEFINE_STATIC_KEY_FALSE_RO(cr_pinning); static unsigned long cr4_pinned_bits __ro_after_init; +static bool cr_pinning_enabled(void) +{ + if (!static_branch_likely(&cr_pinning)) + return false; + + /* + * Do not enforce pinning during CPU bringup. It might + * turn on features that are not set up yet, like FRED. + */ + if (!cpu_online(smp_processor_id())) + return false; + + return true; +} + void native_write_cr0(unsigned long val) { unsigned long bits_missing = 0; @@ -441,7 +456,7 @@ void native_write_cr0(unsigned long val) set_register: asm volatile("mov %0,%%cr0": "+r" (val) : : "memory"); - if (static_branch_likely(&cr_pinning)) { + if (cr_pinning_enabled()) { if (unlikely((val & X86_CR0_WP) != X86_CR0_WP)) { bits_missing = X86_CR0_WP; val |= bits_missing; @@ -460,7 +475,7 @@ void __no_profile native_write_cr4(unsigned long val) set_register: asm volatile("mov %0,%%cr4": "+r" (val) : : "memory"); - if (static_branch_likely(&cr_pinning)) { + if (cr_pinning_enabled()) { if (unlikely((val & cr4_pinned_mask) != cr4_pinned_bits)) { bits_changed = (val & cr4_pinned_mask) ^ cr4_pinned_bits; val = (val & ~cr4_pinned_mask) | cr4_pinned_bits; @@ -502,8 +517,8 @@ void cr4_init(void) if (boot_cpu_has(X86_FEATURE_PCID)) cr4 |= X86_CR4_PCIDE; - if (static_branch_likely(&cr_pinning)) - cr4 = (cr4 & ~cr4_pinned_mask) | cr4_pinned_bits; + + WARN_ON_ONCE(cr_pinning_enabled()); __write_cr4(cr4); -- 2.48.1 FRED-enabled SEV-ES and SNP guests fail to boot due to the following issues in the early boot sequence: * FRED does not have a #VC exception handler in the dispatch logic * Early FRED #VC exceptions attempt to use uninitialized per-CPU GHCBs instead of boot_ghcb Add X86_TRAP_VC case to fred_hwexc() with a new exc_vmm_communication() function that provides the unified entry point FRED requires, dispatching to existing user/kernel handlers based on privilege level. The function is already declared via DECLARE_IDTENTRY_VC(). Fix early GHCB access by falling back to boot_ghcb in __sev_{get,put}_ghcb() when per-CPU GHCBs are not yet initialized. Fixes: 14619d912b65 ("x86/fred: FRED entry/exit and dispatch code") Cc: stable@vger.kernel.org # 6.9+ Signed-off-by: Nikunj A Dadhania --- arch/x86/coco/sev/noinstr.c | 6 ++++++ arch/x86/entry/entry_fred.c | 14 ++++++++++++++ 2 files changed, 20 insertions(+) diff --git a/arch/x86/coco/sev/noinstr.c b/arch/x86/coco/sev/noinstr.c index 9d94aca4a698..5afd663a1c21 100644 --- a/arch/x86/coco/sev/noinstr.c +++ b/arch/x86/coco/sev/noinstr.c @@ -121,6 +121,9 @@ noinstr struct ghcb *__sev_get_ghcb(struct ghcb_state *state) WARN_ON(!irqs_disabled()); + if (!sev_cfg.ghcbs_initialized) + return boot_ghcb; + data = this_cpu_read(runtime_data); ghcb = &data->ghcb_page; @@ -164,6 +167,9 @@ noinstr void __sev_put_ghcb(struct ghcb_state *state) WARN_ON(!irqs_disabled()); + if (!sev_cfg.ghcbs_initialized) + return; + data = this_cpu_read(runtime_data); ghcb = &data->ghcb_page; diff --git a/arch/x86/entry/entry_fred.c b/arch/x86/entry/entry_fred.c index 88c757ac8ccd..fbe2d10dd737 100644 --- a/arch/x86/entry/entry_fred.c +++ b/arch/x86/entry/entry_fred.c @@ -177,6 +177,16 @@ static noinstr void fred_extint(struct pt_regs *regs) } } +#ifdef CONFIG_AMD_MEM_ENCRYPT +noinstr void exc_vmm_communication(struct pt_regs *regs, unsigned long error_code) +{ + if (user_mode(regs)) + return user_exc_vmm_communication(regs, error_code); + else + return kernel_exc_vmm_communication(regs, error_code); +} +#endif + static noinstr void fred_hwexc(struct pt_regs *regs, unsigned long error_code) { /* Optimize for #PF. That's the only exception which matters performance wise */ @@ -207,6 +217,10 @@ static noinstr void fred_hwexc(struct pt_regs *regs, unsigned long error_code) #ifdef CONFIG_X86_CET case X86_TRAP_CP: return exc_control_protection(regs, error_code); #endif +#ifdef CONFIG_AMD_MEM_ENCRYPT + case X86_TRAP_VC: return exc_vmm_communication(regs, error_code); +#endif + default: return fred_bad_type(regs, error_code); } -- 2.48.1