In current mincore_pte_range(), if pte_batch_hint() return one pte, it's not efficient, just call new added can_pte_batch_count(). In ARM64 qemu, with 8 CPUs, 32G memory, a simple test demo like: 1. mmap 1G anon memory 2. write 1G data by 4k step 3. mincore the mmaped 1G memory 4. get the time consumed by mincore Tested the following cases: - 4k, disabled all hugepage setting. - 64k mTHP, only enable 64k hugepage setting. Before Case status | Consumed time (us) | ----------------------------------| 4k | 7356 | 64k mTHP | 3670 | Pathed: Case status | Consumed time (us) | ----------------------------------| 4k | 4419 | 64k mTHP | 3061 | The result is evident and demonstrate a significant improvement in the pte batch. While verification within a single environment may have inherent randomness. there is a high probability of achieving positive effects. Signed-off-by: Zhang Qilong --- mm/mincore.c | 10 +++------- 1 file changed, 3 insertions(+), 7 deletions(-) diff --git a/mm/mincore.c b/mm/mincore.c index 8ec4719370e1..2cc5d276d1cd 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -178,18 +178,14 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, /* We need to do cache lookup too for pte markers */ if (pte_none_mostly(pte)) __mincore_unmapped_range(addr, addr + PAGE_SIZE, vma, vec); else if (pte_present(pte)) { - unsigned int batch = pte_batch_hint(ptep, pte); - - if (batch > 1) { - unsigned int max_nr = (end - addr) >> PAGE_SHIFT; - - step = min_t(unsigned int, batch, max_nr); - } + unsigned int max_nr = (end - addr) >> PAGE_SHIFT; + step = can_pte_batch_count(vma, ptep, &pte, + max_nr, 0); for (i = 0; i < step; i++) vec[i] = 1; } else { /* pte is a swap entry */ *vec = mincore_swap(pte_to_swp_entry(pte), false); } -- 2.43.0