When do_anonymous_page() creates mappings for huge pages, it currently sets the access bit for all mapped PTEs (Page Table Entries) by default. This causes an issue where the Referenced field in /proc/pid/smaps cannot distinguish whether a page was actually accessed. So here introduces a new interface, set_anon_ptes(), which only sets the access bit for the PTE corresponding to the faulting address. This allows accurate tracking of page access status in /proc/pid/smaps before memory reclaim scan the folios. During memory reclaim: folio_referenced() checks and clears the access bits of PTEs, rmap verifies all PTEs under a folio. If any PTE mapped subpage of folio has access bit set, the folio is retained during reclaim. So only set the access bit for the faulting PTE in do_anonymous_page() is safe, as it does not interfere with reclaim decisions. The patch only supports architectures without custom set_ptes() implementations (e.g., x86). ARM64 and other architectures are not yet supported. Additionally, I have some questions regarding the contiguous page tables for 64K huge pages on the ARM64 architecture. 'commit 4602e5757bcc ("arm64/mm: wire up PTE_CONT for user mappings")' described as following: > Since a contpte block only has a single access and dirty bit, the semantic > here changes slightly; when getting a pte (e.g. ptep_get()) that is part > of a contpte mapping, the access and dirty information are pulled from the > block (so all ptes in the block return the same access/dirty info). While the ARM64 manual states: > If hardware updates a translation table entry, and if the Contiguous bit in > that entry is 1, then the members in a group of contiguous translation table > entries can have different AF, AP[2], and S2AP[1] values. Does this mean the 16 PTEs are not necessary to share same AF for ARM? Currently, for ARM64 huge pages with contiguous page tables enabled, the access and dirty bits for 64K huge pages are actually folded in software. However, I haven't found whether these access and dirty bits affect the TLB coalescing of contiguous page tables. If they do not affect it, I think ARM64 can also set the access bit only for the PTE corresponding to the actual fault address in do_anonymous_page(). Signed-off-by: Wenchao Hao --- include/linux/pgtable.h | 28 ++++++++++++++++++++++++++++ mm/memory.c | 2 +- 2 files changed, 29 insertions(+), 1 deletion(-) diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h index 652f287c1ef6..e2f3c932d672 100644 --- a/include/linux/pgtable.h +++ b/include/linux/pgtable.h @@ -302,6 +302,34 @@ static inline void set_ptes(struct mm_struct *mm, unsigned long addr, #endif #define set_pte_at(mm, addr, ptep, pte) set_ptes(mm, addr, ptep, pte, 1) +#ifndef set_ptes +static inline void set_anon_ptes(struct mm_struct *mm, unsigned long addr, + unsigned long fault_addr, pte_t *ptep, pte_t pte, unsigned int nr) +{ + bool young = pte_young(pte); + + page_table_check_ptes_set(mm, ptep, pte, nr); + + for (;;) { + if (young && addr == fault_addr) + pte = pte_mkyoung(pte); + else + pte = pte_mkold(pte); + + set_pte(ptep, pte); + if (--nr == 0) + break; + + addr += PAGE_SIZE; + ptep++; + pte = pte_next_pfn(pte); + } +} +#else +#define set_anon_ptes(mm, addr, fault_addr, ptep, pte, nr) \ + set_ptes(mm, addr, ptep, pte, nr) +#endif + #ifndef __HAVE_ARCH_PTEP_SET_ACCESS_FLAGS extern int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, diff --git a/mm/memory.c b/mm/memory.c index da360a6eb8a4..65c69c7116a7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5273,7 +5273,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf) setpte: if (vmf_orig_pte_uffd_wp(vmf)) entry = pte_mkuffd_wp(entry); - set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages); + set_anon_ptes(vma->vm_mm, addr, vmf->address, vmf->pte, entry, nr_pages); /* No need to invalidate - it was non-present before */ update_mmu_cache_range(vmf, vma, addr, vmf->pte, nr_pages); -- 2.45.0