For example, create three task: hot1 -> cold -> hot2. After all three task are created, each allocate memory 128MB. the hot1/hot2 task continuously access 128 MB memory, while the cold task only accesses its memory briefly andthen call madvise(MADV_COLD). However, khugepaged still prioritizes scanning the cold task and only scans the hot2 task after completing the scan of the cold task. So if the user has explicitly informed us via MADV_COLD/FREE that this memory is cold or will be freed, it is appropriate for khugepaged to skip it only, thereby avoiding unnecessary scan and collapse operations to reducing CPU wastage. Here are the performance test results: (Throughput bigger is better, other smaller is better) Testing on x86_64 machine: | task hot2 | without patch | with patch | delta | |---------------------|---------------|---------------|---------| | total accesses time | 3.14 sec | 2.93 sec | -6.69% | | cycles per access | 4.96 | 2.21 | -55.44% | | Throughput | 104.38 M/sec | 111.89 M/sec | +7.19% | | dTLB-load-misses | 284814532 | 69597236 | -75.56% | Testing on qemu-system-x86_64 -enable-kvm: | task hot2 | without patch | with patch | delta | |---------------------|---------------|---------------|---------| | total accesses time | 3.35 sec | 2.96 sec | -11.64% | | cycles per access | 7.29 | 2.07 | -71.60% | | Throughput | 97.67 M/sec | 110.77 M/sec | +13.41% | | dTLB-load-misses | 241600871 | 3216108 | -98.67% | Signed-off-by: Vernon Yang --- mm/madvise.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index b617b1be0f53..3a48d725a3fc 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1360,11 +1360,8 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) return madvise_remove(madv_behavior); case MADV_WILLNEED: return madvise_willneed(madv_behavior); - case MADV_COLD: - return madvise_cold(madv_behavior); case MADV_PAGEOUT: return madvise_pageout(madv_behavior); - case MADV_FREE: case MADV_DONTNEED: case MADV_DONTNEED_LOCKED: return madvise_dontneed_free(madv_behavior); @@ -1378,6 +1375,18 @@ static int madvise_vma_behavior(struct madvise_behavior *madv_behavior) /* The below behaviours update VMAs via madvise_update_vma(). */ + case MADV_COLD: + error = madvise_cold(madv_behavior); + if (error) + goto out; + new_flags = (new_flags & ~VM_HUGEPAGE) | VM_NOHUGEPAGE; + break; + case MADV_FREE: + error = madvise_dontneed_free(madv_behavior); + if (error) + goto out; + new_flags = (new_flags & ~VM_HUGEPAGE) | VM_NOHUGEPAGE; + break; case MADV_NORMAL: new_flags = new_flags & ~VM_RAND_READ & ~VM_SEQ_READ; break; @@ -1756,7 +1765,6 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi switch (madv_behavior->behavior) { case MADV_REMOVE: case MADV_WILLNEED: - case MADV_COLD: case MADV_PAGEOUT: case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: @@ -1766,7 +1774,6 @@ static enum madvise_lock_mode get_lock_mode(struct madvise_behavior *madv_behavi case MADV_GUARD_REMOVE: case MADV_DONTNEED: case MADV_DONTNEED_LOCKED: - case MADV_FREE: return MADVISE_VMA_READ_LOCK; default: return MADVISE_MMAP_WRITE_LOCK; -- 2.51.0