From: "Kirill A. Shutemov" The PAMT memory holds metadata for all possible TDX protected memory. Each physical address range is covered by PAMT entries at three levels (1GB, 2MB, 4KB). With Dynamic PAMT, the 4KB range of PAMT is allocated on demand. The kernel supplies the TDX module with page pairs to store the 4KB entries, which cover 2MB of host physical memory. The kernel must provide this page pair before using pages from the range for TDX. If this is not done, SEAMCALLs that give the pages to be protected by the TDX module will fail. Allocate reference counters for every 2MB range to track TDX memory usage. This can be used to handle concurrent get/put callers, in order to accurately determine when the dynamic 4KB level of Dynamic PAMT needs to be allocated and when it can be freed. This allocation will currently consume 2 MB for every 1 TB of address space from 0 to max_pfn. The allocation size will depend on how the RAM is physically laid out. In a worst case scenario where the entire 52-bit address space is covered this would be 8GB. Then the DPAMT refcount allocations could hypothetically cause the savings from Dynamic PAMT to go negative on exotic platforms with sparse, small amounts of memory. Future changes could reduce this refcount overhead to be only allocating refcounts for physical ranges that contain memory that TDX can use. However, this is left for future work. Assisted-by: Sashiko:claude-opus-4-6 GitHub Copilot:claude-opus-4-6 Sashiko:claude-opus-4-6 Signed-off-by: Kirill A. Shutemov Co-developed-by: Rick Edgecombe Signed-off-by: Rick Edgecombe --- v6: - Remove confusing reference to allocating PAMT memory in pamt_refcounts comment. (Yan) - Rename "metadata" function names that really deal with refcounts, as metadata already has a different meaning in TDX. - Move tdx_find_pamt_refcount() to this patch to aid in reviewability v4: - Log typo (Binbin) - round correctly when computing PAMT refcount size (Binbin) - Zero refcount vmalloc allocation (Note: This got replaced in optimization patch with a zero-ed allocation, but this showed up in testing with the optimization patches removed. Since it's fixed before this code is exercised, it's not a bisectability issue, but fix it anyway.) v3: - Split out lazily populate optimization to next patch (Dave) - Add comment around pamt_refcounts (Dave) - Improve log --- arch/x86/virt/vmx/tdx/tdx.c | 54 ++++++++++++++++++++++++++++++++++++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a/arch/x86/virt/vmx/tdx/tdx.c b/arch/x86/virt/vmx/tdx/tdx.c index 9e0812d87ab06..6658a6be6697c 100644 --- a/arch/x86/virt/vmx/tdx/tdx.c +++ b/arch/x86/virt/vmx/tdx/tdx.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -52,6 +53,14 @@ static DEFINE_PER_CPU(bool, tdx_lp_initialized); static struct tdmr_info_list tdx_tdmr_list; +/* + * On a machine with Dynamic PAMT, the kernel maintains a reference counter + * for every 2M range. The counter indicates how many users there are for + * the PAMT memory of the 2M range. The kernel allocates PAMT refcounts at + * initialization. + */ +static atomic_t *pamt_refcounts; + /* All TDX-usable memory regions. Protected by mem_hotplug_lock. */ static LIST_HEAD(tdx_memlist); @@ -254,6 +263,43 @@ static struct syscore tdx_syscore = { .ops = &tdx_syscore_ops, }; +/* + * Allocate PAMT reference counters for all physical memory. + * + * It consumes 2MiB for every 1TiB of physical memory. + */ +static int init_pamt_refcounts(void) +{ + size_t size = DIV_ROUND_UP(max_pfn, PTRS_PER_PTE) * sizeof(*pamt_refcounts); + + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return 0; + + pamt_refcounts = __vmalloc(size, GFP_KERNEL | __GFP_ZERO); + if (!pamt_refcounts) + return -ENOMEM; + + return 0; +} + +static void free_pamt_refcounts(void) +{ + if (!tdx_supports_dynamic_pamt(&tdx_sysinfo)) + return; + + vfree(pamt_refcounts); + pamt_refcounts = NULL; +} + +/* Find PAMT refcount for a given physical address */ +static atomic_t * __maybe_unused tdx_find_pamt_refcount(unsigned long pfn) +{ + /* Find which PMD a PFN is in. */ + unsigned long index = pfn >> (PMD_SHIFT - PAGE_SHIFT); + + return &pamt_refcounts[index]; +} + /* * Add a memory region as a TDX memory block. The caller must make sure * all memory regions are added in address ascending order and don't @@ -1151,10 +1197,14 @@ static __init int init_tdx_module(void) */ get_online_mems(); - ret = build_tdx_memlist(&tdx_memlist); + ret = init_pamt_refcounts(); if (ret) goto out_put_tdxmem; + ret = build_tdx_memlist(&tdx_memlist); + if (ret) + goto err_free_pamt_refcounts; + /* Allocate enough space for constructing TDMRs */ ret = alloc_tdmr_list(&tdx_tdmr_list, &tdx_sysinfo.tdmr); if (ret) @@ -1204,6 +1254,8 @@ static __init int init_tdx_module(void) free_tdmr_list(&tdx_tdmr_list); err_free_tdxmem: free_tdx_memlist(&tdx_memlist); +err_free_pamt_refcounts: + free_pamt_refcounts(); goto out_put_tdxmem; } -- 2.54.0