For guest_memfd-only memslots (kvm_memslot_is_gmem_only() is true), the memory provider for the virtual machine is the guest_memfd file, not the userspace mapping. Faults are resolved using the guest_memfd page cache, and the permissions for the secondary MMU mapping depends exclusively on the memslot (i.e, if the memslot is read-only). How userspace happens to have the memory mmaped at fault time, or even if the memory is mapped at all into userspace, is not taken into consideration. guest_memfd memory is not evictable, is not movable and there's no backing storage. Once memory is allocated for an offset in guest_memfd file, the offset will not change, and that memory is not freed unless userspace explicitly punches a hole in the file. As a result, memory reclaim, page migration, page aging and dirty page tracking for the userspace mapping serve little purpose. Despite this, KVM's MMU notifiers still modify the secondary MMU page tables, similar to ordinary memslots, only for the same memory to be remapped next time a guest accesses it. Make the disconnect between the user mapping and the secondary MMU page tables explicit by ignoring the MMU notifiers for guest_memfd-only memslots. Signed-off-by: Alexandru Elisei --- The only theoretical instance where the MMU notifiers are invoked for the userspace mapping of a guest_memfd-only memslot that I was able to find was automatic NUMA balancing with a non-NULL NUMA policy for the guest_memfd file. I wasn't able to test it in practice. Also my knowledge of MM is very limited, so there might be other cases where it happens, or I might be wrong and today the MMU notifiers are never invoked. Either way, when and if it happens, having memory unmapped from the seconday MMU in the case of guest_memfd-only memslot is at most a performance issue (it causes unnecessary guest faults), but I wanted to start a conversation about this because having memory that stays mapped at stage 2 (unless userspace explicitly unmaps it from the VM) is needed for a Arm feature (called SPE, Statistical Profiling Extension) that I'm working to upstream. This patch aims to provide the guarantee that memory won't be unmapped from the secondary MMU behind the VMMs back, which is what happens for non guest_memfd memslots. virt/kvm/kvm_main.c | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 881f92d7a469..8c4158996928 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -592,6 +592,10 @@ static __always_inline kvm_mn_ret_t kvm_handle_hva_range(struct kvm *kvm, unsigned long hva_start, hva_end; slot = container_of(node, struct kvm_memory_slot, hva_node[slots->node_idx]); + + if (kvm_slot_has_gmem(slot) && kvm_memslot_is_gmem_only(slot)) + continue; + hva_start = max_t(unsigned long, range->start, slot->userspace_addr); hva_end = min_t(unsigned long, range->end, slot->userspace_addr + (slot->npages << PAGE_SHIFT)); base-commit: 8cd9520d35a6c38db6567e97dd93b1f11f185dc6 -- 2.54.0