A TDX module erratum can cause TD state corruption if a module update races with a compatibility-sensitive operation. For example, if an update races with TD build, the TD measurement hash may be corrupted, which can later cause attestation failure. Handle this by requesting the TDX module to detect such races during TDH.SYS.SHUTDOWN and reject the update when one is found. Report the failure to userspace as -EBUSY so the update can be retried. The downside is that module updates can be blocked indefinitely if compatibility-sensitive operations do not quiesce. In that case, userspace must resolve the conflict and retry the update. Do not pre-check whether the TDX module supports this race-detection capability. If it does not, rely on the TDX module to reject module shutdown. == Alternatives == Two alternatives were considered and rejected [1]: a. Fail TD build when the race occurs. This would complicate KVM error handling and risk KVM uABI instability. b. Allow the issue to leak through. This would make the problem harder to detect and recover from. Signed-off-by: Chao Gao Link: https://lore.kernel.org/linux-coco/aQIbM5m09G0FYTzE@google.com/ # [1] --- v10: - Don't add a "dead" TDX_FEATURE0 bit [Sashiko] - s/BIT/BIT_ULL --- arch/x86/include/asm/tdx.h | 5 +++-- arch/x86/virt/vmx/tdx/tdx.c | 30 ++++++++++++++++++++++++--- drivers/virt/coco/tdx-host/tdx-host.c | 2 ++ 3 files changed, 32 insertions(+), 5 deletions(-) diff --git a/arch/x86/include/asm/tdx.h b/arch/x86/include/asm/tdx.h index 5d750fe53669..282cb0e08b8e 100644 --- a/arch/x86/include/asm/tdx.h +++ b/arch/x86/include/asm/tdx.h @@ -29,8 +29,9 @@ /* * TDX module SEAMCALL leaf function error codes */ -#define TDX_SUCCESS 0ULL -#define TDX_RND_NO_ENTROPY 0x8000020300000000ULL +#define TDX_SUCCESS 0ULL +#define TDX_RND_NO_ENTROPY 0x8000020300000000ULL +#define TDX_UPDATE_COMPAT_SENSITIVE 0x8000051200000000ULL /* Bit definitions of TDX_FEATURES0 metadata field */ #define TDX_FEATURES0_NO_RBP_MOD BIT_ULL(18) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 55670365a388..0c5660c9ab45 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -1274,11 +1274,14 @@ static __init int tdx_enable(void) } subsys_initcall(tdx_enable); +#define TDX_SYS_SHUTDOWN_AVOID_COMPAT_SENSITIVE BIT_ULL(16) + int tdx_module_shutdown(void) { struct tdx_sys_info_handoff handoff = {}; struct tdx_module_args args = {}; int ret, cpu; + u64 err; ret = get_tdx_sys_info_handoff(&handoff); WARN_ON_ONCE(ret); @@ -1288,9 +1291,30 @@ int tdx_module_shutdown(void) * module can produce and most likely supported by newer modules. */ args.rcx = handoff.module_hv; - ret = seamcall_prerr(TDH_SYS_SHUTDOWN, &args); - if (ret) - return ret; + + /* + * This flag tells the TDX module to reject shutdown if it races + * with a "sensitive" ongoing operation. That eliminates exposure + * to a TDX erratum which can corrupt TDX guest states. + * + * This flag is not supported by all TDX modules and may cause + * the shutdown (and subsequent update procedure) to fail. + */ + args.rcx |= TDX_SYS_SHUTDOWN_AVOID_COMPAT_SENSITIVE; + + err = seamcall(TDH_SYS_SHUTDOWN, &args); + + /* + * The shutdown ran into a "sensitive" ongoing operation. Signal + * to userspace that it can retry. + */ + if ((err & TDX_SEAMCALL_STATUS_MASK) == TDX_UPDATE_COMPAT_SENSITIVE) + return -EBUSY; + + if (err) { + seamcall_err(TDH_SYS_SHUTDOWN, err, &args); + return -EIO; + } /* * Clear global and per-CPU initialization flags so the new module diff --git a/drivers/virt/coco/tdx-host/tdx-host.c b/drivers/virt/coco/tdx-host/tdx-host.c index b32ab595047f..291464490fe0 100644 --- a/drivers/virt/coco/tdx-host/tdx-host.c +++ b/drivers/virt/coco/tdx-host/tdx-host.c @@ -145,6 +145,8 @@ static enum fw_upload_err tdx_fw_write(struct fw_upload *fwl, const u8 *data, case 0: *written = data_len; return FW_UPLOAD_ERR_NONE; + case -EBUSY: + return FW_UPLOAD_ERR_BUSY; default: return FW_UPLOAD_ERR_FW_INVALID; } -- 2.52.0