Currently, pfncaches map RAM pages via kmap(), which typically returns a kernel address derived from the direct map. However, guest_memfd created with GUEST_MEMFD_FLAG_NO_DIRECT_MAP has their direct map removed and uses an AS_NO_DIRECT_MAP mapping. So kmap() cannot be used in this case. pfncaches can be used from atomic context where page faults cannot be tolerated. Therefore, it cannot fall back to access via a userspace mapping like KVM does for other accesses to NO_DIRECT_MAP guest_memfd. To obtain a fault-free kernel host virtual address (KHVA), use vmap() for NO_DIRECT_MAP pages. Since gpc_map() is the sole producer of KHVA for pfncaches and only vmap() returns a vmalloc address, gpc_unmap() can reliably pair vunmap() using is_vmalloc_addr(). Although vm_map_ram() could be faster than vmap(), mixing short-lived and long-lived vm_map_ram() can lead to fragmentation. For this reason, vm_map_ram() is recommended only for short-lived ones. Since pfncaches typically have a lifetime comparable to that of the VM, vm_map_ram() is deliberately not used here. pfncaches are not dynamically allocated but are statically allocated on a per-VM and per-vCPU basis. For a normal VM (i.e. non-Xen), there is one pfncache per vCPU. For a Xen VM, there is one per-VM pfncache and five per-vCPU pfncaches. Given the maximum of 1024 vCPUs, a normal VM can have up to 1024 pfncaches, consuming 4 MB of virtual address space. A Xen VM can have up to 5121 pfncaches, consuming approximately 20 MB of virtual address space. Although the vmalloc area is limited on 32-bit systems, it should be large enough and typically tens of TB on 64-bit systems (e.g. 32 TB for 4-level paging and 12800 TB for 5-level paging on x86_64). If virtual address space exhaustion becomes a concern, migration to an mm-local region (like forthcoming mermap?) could be considered in the future. Note that vmap() and vm_map_ram() only create virtual mappings to existing pages; they do not allocate new physical pages. Signed-off-by: Takahiro Itazuri --- virt/kvm/pfncache.c | 33 ++++++++++++++++++++++++++++----- 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/virt/kvm/pfncache.c b/virt/kvm/pfncache.c index 100a8e2f114b..531adc4dcb11 100644 --- a/virt/kvm/pfncache.c +++ b/virt/kvm/pfncache.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "kvm_mm.h" @@ -98,8 +99,19 @@ bool kvm_gpc_check(struct gfn_to_pfn_cache *gpc, unsigned long len) static void *gpc_map(kvm_pfn_t pfn) { - if (pfn_valid(pfn)) - return kmap(pfn_to_page(pfn)); + if (pfn_valid(pfn)) { + struct page *page = pfn_to_page(pfn); + struct page *head = compound_head(page); + struct address_space *mapping = READ_ONCE(head->mapping); + + if (mapping && mapping_no_direct_map(mapping)) { + struct page *pages[] = { page }; + + return vmap(pages, 1, VM_MAP, PAGE_KERNEL); + } + + return kmap(page); + } #ifdef CONFIG_HAS_IOMEM return memremap(pfn_to_hpa(pfn), PAGE_SIZE, MEMREMAP_WB); @@ -115,7 +127,15 @@ static void gpc_unmap(kvm_pfn_t pfn, void *khva) return; if (pfn_valid(pfn)) { - kunmap(pfn_to_page(pfn)); + /* + * For valid PFNs, gpc_map() returns either a kmap() address + * (non-vmalloc) or a vmap() address (vmalloc). + */ + if (is_vmalloc_addr(khva)) + vunmap(khva); + else + kunmap(pfn_to_page(pfn)); + return; } @@ -233,8 +253,11 @@ static kvm_pfn_t gpc_to_pfn_retry(struct gfn_to_pfn_cache *gpc) /* * Obtain a new kernel mapping if KVM itself will access the - * pfn. Note, kmap() and memremap() can both sleep, so this - * too must be done outside of gpc->lock! + * pfn. Note, kmap(), vmap() and memremap() can all sleep, so + * this too must be done outside of gpc->lock! + * Note that even though gpc->lock is dropped, it's still fine + * to read gpc->pfn and other fields because gpc->refresh_lock + * mutex prevents them from being updated. */ if (new_pfn == gpc->pfn) new_khva = old_khva; -- 2.50.1