exec_folio_order() was introduced [1] to request readahead of executable file-backed pages at an arch-preferred folio order, so that the hardware can coalesce contiguous PTEs into fewer iTLB entries (contpte). The current implementation uses ilog2(SZ_64K >> PAGE_SHIFT), which requests 64K folios. This is optimal for 4K base pages (where CONT_PTES = 16, contpte size = 64K), but suboptimal for 16K and 64K base pages: Page size | Before (order) | After (order) | contpte ----------|----------------|---------------|-------- 4K | 4 (64K) | 4 (64K) | Yes (unchanged) 16K | 2 (64K) | 7 (2M) | Yes (new) 64K | 0 (64K) | 5 (2M) | Yes (new) For 16K pages, CONT_PTES = 128 and the contpte size is 2M (order 7). For 64K pages, CONT_PTES = 32 and the contpte size is 2M (order 5). Use ilog2(CONT_PTES) instead, which directly evaluates to contpte-aligned order for all page sizes. The worst-case waste is bounded to one folio (up to 2MB - 64KB) at the end of the file, since page_cache_ra_order() reduces the folio order near EOF to avoid allocating past i_size. [1] https://lore.kernel.org/all/20250430145920.3748738-6-ryan.roberts@arm.com/ Signed-off-by: Usama Arif --- arch/arm64/include/asm/pgtable.h | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index b3e58735c49bd..a1110a33acb35 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -1600,12 +1600,11 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf, #define arch_wants_old_prefaulted_pte cpu_has_hw_af /* - * Request exec memory is read into pagecache in at least 64K folios. This size - * can be contpte-mapped when 4K base pages are in use (16 pages into 1 iTLB - * entry), and HPA can coalesce it (4 pages into 1 TLB entry) when 16K base - * pages are in use. + * Request exec memory is read into pagecache in contpte-sized folios. The + * contpte size is the number of contiguous PTEs that the hardware can coalesce + * into a single iTLB entry: 64K for 4K pages, 2M for 16K and 64K pages. */ -#define exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT) +#define exec_folio_order() ilog2(CONT_PTES) static inline bool pud_sect_supported(void) { -- 2.47.3