Currently, BHB clearing sequence is followed by an LFENCE to prevent transient execution of subsequent indirect branches prematurely. However, LFENCE barrier could be unnecessary in certain cases. For example, when kernel is using BHI_DIS_S mitigation, and BHB clearing is only needed for userspace. In such cases, LFENCE is redundant because ring transitions would provide the necessary serialization. Below is a quick recap of BHI mitigation options: On Alder Lake and newer - BHI_DIS_S: Hardware control to mitigate BHI in ring0. This has low performance overhead. - Long loop: Alternatively, longer version of BHB clearing sequence on older processors can be used to mitigate BHI. This is not yet implemented in Linux. On older CPUs - Short loop: Clears BHB at kernel entry and VMexit. On Alder Lake and newer CPUs, eIBRS isolates the indirect targets between guest and host. But when affected by the BHI variant of VMSCAPE, a guest's branch history may still influence indirect branches in userspace. This also means the big hammer IBPB could be replaced with a cheaper option that clears the BHB at exit-to-userspace after a VMexit. In preparation for adding the support for BHB sequence (without LFENCE) on newer CPUs, move the LFENCE to the caller side after clear_bhb_loop() is executed. This allows callers to decide whether they need the LFENCE or not. This does adds a few extra bytes to the call sites, but it obviates the need for multiple variants of clear_bhb_loop(). Suggested-by: Dave Hansen Signed-off-by: Pawan Gupta --- arch/x86/entry/entry_64.S | 5 ++++- arch/x86/include/asm/nospec-branch.h | 4 ++-- arch/x86/net/bpf_jit_comp.c | 2 ++ 3 files changed, 8 insertions(+), 3 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index ed04a968cc7d0095ab0185b2e3b5beffb7680afd..886f86790b4467347031bc27d3d761d5cc286da1 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1528,6 +1528,9 @@ SYM_CODE_END(rewind_stack_and_make_dead) * refactored in the future if needed. The .skips are for safety, to ensure * that all RETs are in the second half of a cacheline to mitigate Indirect * Target Selection, rather than taking the slowpath via its_return_thunk. + * + * Note, callers should use a speculation barrier like LFENCE immediately after + * a call to this function to ensure BHB is cleared before indirect branches. */ SYM_FUNC_START(clear_bhb_loop) ANNOTATE_NOENDBR @@ -1562,7 +1565,7 @@ SYM_FUNC_START(clear_bhb_loop) sub $1, %ecx jnz 1b .Lret2: RET -5: lfence +5: pop %rbp RET SYM_FUNC_END(clear_bhb_loop) diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 08ed5a2e46a5fd790bcb1b73feb6469518809c06..ec5ebf96dbb9e240f402f39efc6929ae45ec8f0b 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -329,11 +329,11 @@ #ifdef CONFIG_X86_64 .macro CLEAR_BRANCH_HISTORY - ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_LOOP + ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_LOOP .endm .macro CLEAR_BRANCH_HISTORY_VMEXIT - ALTERNATIVE "", "call clear_bhb_loop", X86_FEATURE_CLEAR_BHB_VMEXIT + ALTERNATIVE "", "call clear_bhb_loop; lfence", X86_FEATURE_CLEAR_BHB_VMEXIT .endm #else #define CLEAR_BRANCH_HISTORY diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c index de5083cb1d3747bba00effca3703a4f6eea80d8d..c1ec14c559119b120edfac079aeb07948e9844b8 100644 --- a/arch/x86/net/bpf_jit_comp.c +++ b/arch/x86/net/bpf_jit_comp.c @@ -1603,6 +1603,8 @@ static int emit_spectre_bhb_barrier(u8 **pprog, u8 *ip, if (emit_call(&prog, func, ip)) return -EINVAL; + /* Don't speculate past this until BHB is cleared */ + EMIT_LFENCE(); EMIT1(0x59); /* pop rcx */ EMIT1(0x58); /* pop rax */ } -- 2.34.1 In preparation to make clear_bhb_loop() work for CPUs with larger BHB, move the sequence to a macro. This will allow setting the depth of BHB-clearing easily via arguments. No functional change intended. Signed-off-by: Pawan Gupta --- arch/x86/entry/entry_64.S | 37 +++++++++++++++++++++++-------------- 1 file changed, 23 insertions(+), 14 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index 886f86790b4467347031bc27d3d761d5cc286da1..a62dbc89c5e75b955ebf6d84f20d157d4bce0253 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1499,11 +1499,6 @@ SYM_CODE_END(rewind_stack_and_make_dead) * from the branch history tracker in the Branch Predictor, therefore removing * user influence on subsequent BTB lookups. * - * It should be used on parts prior to Alder Lake. Newer parts should use the - * BHI_DIS_S hardware control instead. If a pre-Alder Lake part is being - * virtualized on newer hardware the VMM should protect against BHI attacks by - * setting BHI_DIS_S for the guests. - * * CALLs/RETs are necessary to prevent Loop Stream Detector(LSD) from engaging * and not clearing the branch history. The call tree looks like: * @@ -1532,10 +1527,7 @@ SYM_CODE_END(rewind_stack_and_make_dead) * Note, callers should use a speculation barrier like LFENCE immediately after * a call to this function to ensure BHB is cleared before indirect branches. */ -SYM_FUNC_START(clear_bhb_loop) - ANNOTATE_NOENDBR - push %rbp - mov %rsp, %rbp +.macro CLEAR_BHB_LOOP_SEQ movl $5, %ecx ANNOTATE_INTRA_FUNCTION_CALL call 1f @@ -1545,15 +1537,16 @@ SYM_FUNC_START(clear_bhb_loop) * Shift instructions so that the RET is in the upper half of the * cacheline and don't take the slowpath to its_return_thunk. */ - .skip 32 - (.Lret1 - 1f), 0xcc + .skip 32 - (.Lret1_\@ - 1f), 0xcc ANNOTATE_INTRA_FUNCTION_CALL 1: call 2f -.Lret1: RET +.Lret1_\@: + RET .align 64, 0xcc /* - * As above shift instructions for RET at .Lret2 as well. + * As above shift instructions for RET at .Lret2_\@ as well. * - * This should be ideally be: .skip 32 - (.Lret2 - 2f), 0xcc + * This should ideally be: .skip 32 - (.Lret2_\@ - 2f), 0xcc * but some Clang versions (e.g. 18) don't like this. */ .skip 32 - 18, 0xcc @@ -1564,8 +1557,24 @@ SYM_FUNC_START(clear_bhb_loop) jnz 3b sub $1, %ecx jnz 1b -.Lret2: RET +.Lret2_\@: + RET 5: +.endm + +/* + * This should be used on parts prior to Alder Lake. Newer parts should use the + * BHI_DIS_S hardware control instead. If a pre-Alder Lake part is being + * virtualized on newer hardware the VMM should protect against BHI attacks by + * setting BHI_DIS_S for the guests. + */ +SYM_FUNC_START(clear_bhb_loop) + ANNOTATE_NOENDBR + push %rbp + mov %rsp, %rbp + + CLEAR_BHB_LOOP_SEQ + pop %rbp RET SYM_FUNC_END(clear_bhb_loop) -- 2.34.1 The BHB clearing sequence has two nested loops that determine the depth of BHB to be cleared. Introduce an argument to the macro to allow the loop count (and hence the depth) to be controlled from outside the macro. Signed-off-by: Pawan Gupta --- arch/x86/entry/entry_64.S | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index a62dbc89c5e75b955ebf6d84f20d157d4bce0253..a2891af416c874349c065160708752c41bc6ba36 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1527,8 +1527,8 @@ SYM_CODE_END(rewind_stack_and_make_dead) * Note, callers should use a speculation barrier like LFENCE immediately after * a call to this function to ensure BHB is cleared before indirect branches. */ -.macro CLEAR_BHB_LOOP_SEQ - movl $5, %ecx +.macro CLEAR_BHB_LOOP_SEQ outer_loop_count:req, inner_loop_count:req + movl $\outer_loop_count, %ecx ANNOTATE_INTRA_FUNCTION_CALL call 1f jmp 5f @@ -1550,7 +1550,7 @@ SYM_CODE_END(rewind_stack_and_make_dead) * but some Clang versions (e.g. 18) don't like this. */ .skip 32 - 18, 0xcc -2: movl $5, %eax +2: movl $\inner_loop_count, %eax 3: jmp 4f nop 4: sub $1, %eax @@ -1573,7 +1573,7 @@ SYM_FUNC_START(clear_bhb_loop) push %rbp mov %rsp, %rbp - CLEAR_BHB_LOOP_SEQ + CLEAR_BHB_LOOP_SEQ 5, 5 pop %rbp RET -- 2.34.1 As a mitigation for BHI, clear_bhb_loop() executes branches that overwrites the Branch History Buffer (BHB). On Alder Lake and newer parts this sequence is not sufficient because it doesn't clear enough entries. This was not an issue because these CPUs have a hardware control (BHI_DIS_S) that mitigates BHI in kernel. BHI variant of VMSCAPE requires isolating branch history between guests and userspace. Note that there is no equivalent hardware control for userspace. To effectively isolate branch history on newer CPUs, clear_bhb_loop() should execute sufficient number of branches to clear a larger BHB. Dynamically set the loop count of clear_bhb_loop() such that it is effective on newer CPUs too. Use the hardware control enumeration X86_FEATURE_BHI_CTRL to select the appropriate loop count. Suggested-by: Dave Hansen Signed-off-by: Pawan Gupta --- arch/x86/entry/entry_64.S | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/arch/x86/entry/entry_64.S b/arch/x86/entry/entry_64.S index a2891af416c874349c065160708752c41bc6ba36..6cddd7d6dc927704357d97346fcbb34559608888 100644 --- a/arch/x86/entry/entry_64.S +++ b/arch/x86/entry/entry_64.S @@ -1563,17 +1563,20 @@ SYM_CODE_END(rewind_stack_and_make_dead) .endm /* - * This should be used on parts prior to Alder Lake. Newer parts should use the - * BHI_DIS_S hardware control instead. If a pre-Alder Lake part is being - * virtualized on newer hardware the VMM should protect against BHI attacks by - * setting BHI_DIS_S for the guests. + * For protecting the kernel against BHI, this should be used on parts prior to + * Alder Lake. Although the sequence works on newer parts, BHI_DIS_S hardware + * control has lower overhead, and should be used instead. If a pre-Alder Lake + * part is being virtualized on newer hardware the VMM should protect against + * BHI attacks by setting BHI_DIS_S for the guests. */ SYM_FUNC_START(clear_bhb_loop) ANNOTATE_NOENDBR push %rbp mov %rsp, %rbp - CLEAR_BHB_LOOP_SEQ 5, 5 + /* loop count differs based on CPU-gen, see Intel's BHI guidance */ + ALTERNATIVE __stringify(CLEAR_BHB_LOOP_SEQ 5, 5), \ + __stringify(CLEAR_BHB_LOOP_SEQ 12, 7), X86_FEATURE_BHI_CTRL pop %rbp RET -- 2.34.1 With the upcoming changes x86_ibpb_exit_to_user will also be used when BHB clearing sequence is used. Rename it cover both the cases. No functional change. Signed-off-by: Pawan Gupta --- arch/x86/include/asm/entry-common.h | 6 +++--- arch/x86/include/asm/nospec-branch.h | 2 +- arch/x86/kernel/cpu/bugs.c | 4 ++-- arch/x86/kvm/x86.c | 2 +- 4 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h index ce3eb6d5fdf9f2dba59b7bad24afbfafc8c36918..c45858db16c92fc1364fb818185fba7657840991 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -94,11 +94,11 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, */ choose_random_kstack_offset(rdtsc()); - /* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */ + /* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */ if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) && - this_cpu_read(x86_ibpb_exit_to_user)) { + this_cpu_read(x86_predictor_flush_exit_to_user)) { indirect_branch_prediction_barrier(); - this_cpu_write(x86_ibpb_exit_to_user, false); + this_cpu_write(x86_predictor_flush_exit_to_user, false); } } #define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index ec5ebf96dbb9e240f402f39efc6929ae45ec8f0b..df60f9cf51b84e5b75e5db70713188d2e6ad0f5d 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -531,7 +531,7 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature) : "memory"); } -DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user); +DECLARE_PER_CPU(bool, x86_predictor_flush_exit_to_user); static inline void indirect_branch_prediction_barrier(void) { diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index d7fa03bf51b4517c12cc68e7c441f7589a4983d1..1e9b11198db0fe2483bd17b1327bcfd44a2c1dbf 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -113,8 +113,8 @@ EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current); * be needed to before running userspace. That IBPB will flush the branch * predictor content. */ -DEFINE_PER_CPU(bool, x86_ibpb_exit_to_user); -EXPORT_PER_CPU_SYMBOL_GPL(x86_ibpb_exit_to_user); +DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user); +EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user); u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index c9c2aa6f4705e1ae257bf94572967a5724a940a7..60123568fba85c8a445f9220d3f4a1d11fd0eb77 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11397,7 +11397,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) * may migrate to. */ if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER)) - this_cpu_write(x86_ibpb_exit_to_user, true); + this_cpu_write(x86_predictor_flush_exit_to_user, true); /* * Consume any pending interrupts, including the possible source of -- 2.34.1 This ensures that all mitigation modes are explicitly handled, while keeping the mitigation selection for each mode together. This also prepares for adding BHB-clearing mitigation mode for VMSCAPE. Signed-off-by: Pawan Gupta --- arch/x86/kernel/cpu/bugs.c | 22 ++++++++++++++++++---- 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 1e9b11198db0fe2483bd17b1327bcfd44a2c1dbf..233594ede19bf971c999f4d3cc0f6f213002c16c 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -3231,17 +3231,31 @@ early_param("vmscape", vmscape_parse_cmdline); static void __init vmscape_select_mitigation(void) { - if (!boot_cpu_has_bug(X86_BUG_VMSCAPE) || - !boot_cpu_has(X86_FEATURE_IBPB)) { + if (!boot_cpu_has_bug(X86_BUG_VMSCAPE)) { vmscape_mitigation = VMSCAPE_MITIGATION_NONE; return; } - if (vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) { - if (should_mitigate_vuln(X86_BUG_VMSCAPE)) + if ((vmscape_mitigation == VMSCAPE_MITIGATION_AUTO) && + !should_mitigate_vuln(X86_BUG_VMSCAPE)) + vmscape_mitigation = VMSCAPE_MITIGATION_NONE; + + switch (vmscape_mitigation) { + case VMSCAPE_MITIGATION_NONE: + break; + + case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT: + case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER: + if (!boot_cpu_has(X86_FEATURE_IBPB)) + vmscape_mitigation = VMSCAPE_MITIGATION_NONE; + break; + + case VMSCAPE_MITIGATION_AUTO: + if (boot_cpu_has(X86_FEATURE_IBPB)) vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER; else vmscape_mitigation = VMSCAPE_MITIGATION_NONE; + break; } } -- 2.34.1 indirect_branch_prediction_barrier() is a wrapper to write_ibpb(), which also checks if the CPU supports IBPB. For VMSCAPE, call to indirect_branch_prediction_barrier() is only possible when CPU supports IBPB. Simply call write_ibpb() directly to avoid unnecessary alternative patching. Suggested-by: Dave Hansen Signed-off-by: Pawan Gupta --- arch/x86/include/asm/entry-common.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h index c45858db16c92fc1364fb818185fba7657840991..78b143673ca72642149eb2dbf3e3e31370fe6b28 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -97,7 +97,7 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, /* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */ if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) && this_cpu_read(x86_predictor_flush_exit_to_user)) { - indirect_branch_prediction_barrier(); + write_ibpb(); this_cpu_write(x86_predictor_flush_exit_to_user, false); } } -- 2.34.1 Adding more mitigation options at exit-to-userspace for VMSCAPE would usually require a series of checks to decide which mitigation to use. In this case, the mitigation is done by calling a function, which is decided at boot. So, adding more feature flags and multiple checks can be avoided by using static_call() to the mitigating function. Replace the flag-based mitigation selector with a static_call(). This also frees the existing X86_FEATURE_IBPB_EXIT_TO_USER. Suggested-by: Dave Hansen Signed-off-by: Pawan Gupta --- arch/x86/Kconfig | 1 + arch/x86/include/asm/cpufeatures.h | 2 +- arch/x86/include/asm/entry-common.h | 7 +++---- arch/x86/include/asm/nospec-branch.h | 3 +++ arch/x86/kernel/cpu/bugs.c | 5 ++++- arch/x86/kvm/x86.c | 2 +- 6 files changed, 13 insertions(+), 7 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index fa3b616af03a2d50eaf5f922bc8cd4e08a284045..066f62f15e67e85fda0f3fd66acabad9a9794ff8 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -2706,6 +2706,7 @@ config MITIGATION_TSA config MITIGATION_VMSCAPE bool "Mitigate VMSCAPE" depends on KVM + select HAVE_STATIC_CALL default y help Enable mitigation for VMSCAPE attacks. VMSCAPE is a hardware security diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h index 4091a776e37aaed67ca93b0a0cd23cc25dbc33d4..02871318c999f94ec8557e5fb0b8fb299960d454 100644 --- a/arch/x86/include/asm/cpufeatures.h +++ b/arch/x86/include/asm/cpufeatures.h @@ -496,7 +496,7 @@ #define X86_FEATURE_TSA_SQ_NO (21*32+11) /* AMD CPU not vulnerable to TSA-SQ */ #define X86_FEATURE_TSA_L1_NO (21*32+12) /* AMD CPU not vulnerable to TSA-L1 */ #define X86_FEATURE_CLEAR_CPU_BUF_VM (21*32+13) /* Clear CPU buffers using VERW before VMRUN */ -#define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */ +/* Free */ #define X86_FEATURE_ABMC (21*32+15) /* Assignable Bandwidth Monitoring Counters */ #define X86_FEATURE_MSR_IMM (21*32+16) /* MSR immediate form instructions */ diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h index 78b143673ca72642149eb2dbf3e3e31370fe6b28..783e7cb50caeb6c6fc68e0a5c75ab43e75e37116 100644 --- a/arch/x86/include/asm/entry-common.h +++ b/arch/x86/include/asm/entry-common.h @@ -4,6 +4,7 @@ #include #include +#include #include #include @@ -94,10 +95,8 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs, */ choose_random_kstack_offset(rdtsc()); - /* Avoid unnecessary reads of 'x86_predictor_flush_exit_to_user' */ - if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) && - this_cpu_read(x86_predictor_flush_exit_to_user)) { - write_ibpb(); + if (unlikely(this_cpu_read(x86_predictor_flush_exit_to_user))) { + static_call_cond(vmscape_predictor_flush)(); this_cpu_write(x86_predictor_flush_exit_to_user, false); } } diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index df60f9cf51b84e5b75e5db70713188d2e6ad0f5d..15a2fa8f2f48a066e102263513eff9537ac1d25f 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -540,6 +540,9 @@ static inline void indirect_branch_prediction_barrier(void) :: "rax", "rcx", "rdx", "memory"); } +#include +DECLARE_STATIC_CALL(vmscape_predictor_flush, write_ibpb); + /* The Intel SPEC CTRL MSR base value cache */ extern u64 x86_spec_ctrl_base; DECLARE_PER_CPU(u64, x86_spec_ctrl_current); diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 233594ede19bf971c999f4d3cc0f6f213002c16c..cbb3341b9a19f835738eda7226323d88b7e41e52 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -200,6 +200,9 @@ DEFINE_STATIC_KEY_FALSE(switch_mm_cond_l1d_flush); DEFINE_STATIC_KEY_FALSE(cpu_buf_vm_clear); EXPORT_SYMBOL_GPL(cpu_buf_vm_clear); +DEFINE_STATIC_CALL_NULL(vmscape_predictor_flush, write_ibpb); +EXPORT_STATIC_CALL_GPL(vmscape_predictor_flush); + #undef pr_fmt #define pr_fmt(fmt) "mitigations: " fmt @@ -3274,7 +3277,7 @@ static void __init vmscape_update_mitigation(void) static void __init vmscape_apply_mitigation(void) { if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER) - setup_force_cpu_cap(X86_FEATURE_IBPB_EXIT_TO_USER); + static_call_update(vmscape_predictor_flush, write_ibpb); } #undef pr_fmt diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 60123568fba85c8a445f9220d3f4a1d11fd0eb77..7e55ef3b3203a26be1a138c8fa838a8c5aae0125 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11396,7 +11396,7 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu) * set for the CPU that actually ran the guest, and not the CPU that it * may migrate to. */ - if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER)) + if (static_call_query(vmscape_predictor_flush)) this_cpu_write(x86_predictor_flush_exit_to_user, true); /* -- 2.34.1 IBPB mitigation for VMSCAPE is an overkill on CPUs that are only affected by the BHI variant of VMSCAPE. On such CPUs, eIBRS already provides indirect branch isolation between guest and host userspace. However, branch history from guest may also influence the indirect branches in host userspace. To mitigate the BHI aspect, use clear_bhb_loop(). Signed-off-by: Pawan Gupta --- Documentation/admin-guide/hw-vuln/vmscape.rst | 4 ++++ arch/x86/include/asm/nospec-branch.h | 2 ++ arch/x86/kernel/cpu/bugs.c | 30 ++++++++++++++++++++------- 3 files changed, 29 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst index d9b9a2b6c114c05a7325e5f3c9d42129339b870b..dc63a0bac03d43d1e295de0791dd6497d101f986 100644 --- a/Documentation/admin-guide/hw-vuln/vmscape.rst +++ b/Documentation/admin-guide/hw-vuln/vmscape.rst @@ -86,6 +86,10 @@ The possible values in this file are: run a potentially malicious guest and issues an IBPB before the first exit to userspace after VM-exit. + * 'Mitigation: Clear BHB before exit to userspace': + + As above, conditional BHB clearing mitigation is enabled. + * 'Mitigation: IBPB on VMEXIT': IBPB is issued on every VM-exit. This occurs when other mitigations like diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h index 15a2fa8f2f48a066e102263513eff9537ac1d25f..1e8c26c37dbed4256b35101fb41c0e1eb6ef9272 100644 --- a/arch/x86/include/asm/nospec-branch.h +++ b/arch/x86/include/asm/nospec-branch.h @@ -388,6 +388,8 @@ extern void write_ibpb(void); #ifdef CONFIG_X86_64 extern void clear_bhb_loop(void); +#else +static inline void clear_bhb_loop(void) {} #endif extern void (*x86_return_thunk)(void); diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index cbb3341b9a19f835738eda7226323d88b7e41e52..d12c07ccf59479ecf590935607394492c988b2ff 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -109,9 +109,8 @@ DEFINE_PER_CPU(u64, x86_spec_ctrl_current); EXPORT_PER_CPU_SYMBOL_GPL(x86_spec_ctrl_current); /* - * Set when the CPU has run a potentially malicious guest. An IBPB will - * be needed to before running userspace. That IBPB will flush the branch - * predictor content. + * Set when the CPU has run a potentially malicious guest. Indicates that a + * branch predictor flush is needed before running userspace. */ DEFINE_PER_CPU(bool, x86_predictor_flush_exit_to_user); EXPORT_PER_CPU_SYMBOL_GPL(x86_predictor_flush_exit_to_user); @@ -3200,13 +3199,15 @@ enum vmscape_mitigations { VMSCAPE_MITIGATION_AUTO, VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER, VMSCAPE_MITIGATION_IBPB_ON_VMEXIT, + VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER, }; static const char * const vmscape_strings[] = { - [VMSCAPE_MITIGATION_NONE] = "Vulnerable", + [VMSCAPE_MITIGATION_NONE] = "Vulnerable", /* [VMSCAPE_MITIGATION_AUTO] */ - [VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER] = "Mitigation: IBPB before exit to userspace", - [VMSCAPE_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT", + [VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER] = "Mitigation: IBPB before exit to userspace", + [VMSCAPE_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT", + [VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER] = "Mitigation: Clear BHB before exit to userspace", }; static enum vmscape_mitigations vmscape_mitigation __ro_after_init = @@ -3253,8 +3254,19 @@ static void __init vmscape_select_mitigation(void) vmscape_mitigation = VMSCAPE_MITIGATION_NONE; break; + case VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER: + if (!boot_cpu_has(X86_FEATURE_BHI_CTRL)) + vmscape_mitigation = VMSCAPE_MITIGATION_NONE; + break; case VMSCAPE_MITIGATION_AUTO: - if (boot_cpu_has(X86_FEATURE_IBPB)) + /* + * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use BHB + * clear sequence. These CPUs are only vulnerable to the BHI variant + * of the VMSCAPE attack and does not require an IBPB flush. + */ + if (boot_cpu_has(X86_FEATURE_BHI_CTRL)) + vmscape_mitigation = VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER; + else if (boot_cpu_has(X86_FEATURE_IBPB)) vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER; else vmscape_mitigation = VMSCAPE_MITIGATION_NONE; @@ -3278,6 +3290,9 @@ static void __init vmscape_apply_mitigation(void) { if (vmscape_mitigation == VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER) static_call_update(vmscape_predictor_flush, write_ibpb); + else if (vmscape_mitigation == VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER && + IS_ENABLED(CONFIG_X86_64)) + static_call_update(vmscape_predictor_flush, clear_bhb_loop); } #undef pr_fmt @@ -3369,6 +3384,7 @@ void cpu_bugs_smt_update(void) break; case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT: case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER: + case VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER: /* * Hypervisors can be attacked across-threads, warn for SMT when * STIBP is not already enabled system-wide. -- 2.34.1 vmscape=force option currently defaults to AUTO mitigation. This is not correct because attack-vector controls override a mitigation when in AUTO mode. This prevents a user from being able to force VMSCAPE mitigation when it conflicts with attack-vector controls. Kernel should deploy a forced mitigation irrespective of attack vectors. Instead of AUTO, use VMSCAPE_MITIGATION_ON that wins over attack-vector controls. Signed-off-by: Pawan Gupta --- arch/x86/kernel/cpu/bugs.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index d12c07ccf59479ecf590935607394492c988b2ff..81b0db27f4094c90ebf4704c74f5e7e6b809560f 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -3197,6 +3197,7 @@ static void __init srso_apply_mitigation(void) enum vmscape_mitigations { VMSCAPE_MITIGATION_NONE, VMSCAPE_MITIGATION_AUTO, + VMSCAPE_MITIGATION_ON, VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER, VMSCAPE_MITIGATION_IBPB_ON_VMEXIT, VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER, @@ -3205,6 +3206,7 @@ enum vmscape_mitigations { static const char * const vmscape_strings[] = { [VMSCAPE_MITIGATION_NONE] = "Vulnerable", /* [VMSCAPE_MITIGATION_AUTO] */ + /* [VMSCAPE_MITIGATION_ON] */ [VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER] = "Mitigation: IBPB before exit to userspace", [VMSCAPE_MITIGATION_IBPB_ON_VMEXIT] = "Mitigation: IBPB on VMEXIT", [VMSCAPE_MITIGATION_BHB_CLEAR_EXIT_TO_USER] = "Mitigation: Clear BHB before exit to userspace", @@ -3224,7 +3226,7 @@ static int __init vmscape_parse_cmdline(char *str) vmscape_mitigation = VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER; } else if (!strcmp(str, "force")) { setup_force_cpu_bug(X86_BUG_VMSCAPE); - vmscape_mitigation = VMSCAPE_MITIGATION_AUTO; + vmscape_mitigation = VMSCAPE_MITIGATION_ON; } else { pr_err("Ignoring unknown vmscape=%s option.\n", str); } @@ -3259,6 +3261,7 @@ static void __init vmscape_select_mitigation(void) vmscape_mitigation = VMSCAPE_MITIGATION_NONE; break; case VMSCAPE_MITIGATION_AUTO: + case VMSCAPE_MITIGATION_ON: /* * CPUs with BHI_CTRL(ADL and newer) can avoid the IBPB and use BHB * clear sequence. These CPUs are only vulnerable to the BHI variant @@ -3381,6 +3384,7 @@ void cpu_bugs_smt_update(void) switch (vmscape_mitigation) { case VMSCAPE_MITIGATION_NONE: case VMSCAPE_MITIGATION_AUTO: + case VMSCAPE_MITIGATION_ON: break; case VMSCAPE_MITIGATION_IBPB_ON_VMEXIT: case VMSCAPE_MITIGATION_IBPB_EXIT_TO_USER: -- 2.34.1 In general, individual mitigation controls can be used to override the attack vector controls. But, nothing exists to select BHB clearing mitigation for VMSCAPE. The =force option comes close, but with a side-effect of also forcibly setting the bug, hence deploying the mitigation on unaffected parts too. Add a new cmdline option vmscape=on to enable the mitigation based on the VMSCAPE variant the CPU is affected by. Signed-off-by: Pawan Gupta --- Documentation/admin-guide/hw-vuln/vmscape.rst | 4 ++++ Documentation/admin-guide/kernel-parameters.txt | 4 +++- arch/x86/kernel/cpu/bugs.c | 2 ++ 3 files changed, 9 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/hw-vuln/vmscape.rst b/Documentation/admin-guide/hw-vuln/vmscape.rst index dc63a0bac03d43d1e295de0791dd6497d101f986..580f288ae8bfc601ff000d6d95d711bb9084459e 100644 --- a/Documentation/admin-guide/hw-vuln/vmscape.rst +++ b/Documentation/admin-guide/hw-vuln/vmscape.rst @@ -112,3 +112,7 @@ The mitigation can be controlled via the ``vmscape=`` command line parameter: Force vulnerability detection and mitigation even on processors that are not known to be affected. + + * ``vmscape=on``: + + Choose the mitigation based on the VMSCAPE variant the CPU is affected by. diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6c42061ca20e581b5192b66c6f25aba38d4f8ff8..4b4711ced5e187495476b5365cd7b3df81db893b 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -8104,9 +8104,11 @@ off - disable the mitigation ibpb - use Indirect Branch Prediction Barrier - (IBPB) mitigation (default) + (IBPB) mitigation force - force vulnerability detection even on unaffected processors + on - (default) automatically select IBPB + or BHB clear mitigation based on CPU vsyscall= [X86-64,EARLY] Controls the behavior of vsyscalls (i.e. calls to diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c index 81b0db27f4094c90ebf4704c74f5e7e6b809560f..b4a21434869fcc01c40a2973f986a3f275f92ef2 100644 --- a/arch/x86/kernel/cpu/bugs.c +++ b/arch/x86/kernel/cpu/bugs.c @@ -3227,6 +3227,8 @@ static int __init vmscape_parse_cmdline(char *str) } else if (!strcmp(str, "force")) { setup_force_cpu_bug(X86_BUG_VMSCAPE); vmscape_mitigation = VMSCAPE_MITIGATION_ON; + } else if (!strcmp(str, "on")) { + vmscape_mitigation = VMSCAPE_MITIGATION_ON; } else { pr_err("Ignoring unknown vmscape=%s option.\n", str); } -- 2.34.1