When zswap writeback is enabled and it fails compressing a given page, the page is swapped out to the backing swap device. This behavior breaks the zswap's writeback LRU order, and hence users can experience unexpected latency spikes. If the page is compressed without failure, but results in a size of PAGE_SIZE, the LRU order is kept, but the decompression overhead for loading the page back on the later access is unnecessary. Keep the LRU order and optimize unnecessary decompression overheads in those cases, by storing the original content as-is in zpool. The length field of zswap_entry will be set appropriately, as PAGE_SIZE, Hence whether it is saved as-is or not (whether decompression is unnecessary) is identified by 'zswap_entry->length == PAGE_SIZE'. Because the uncompressed data is saved in zpool, same to the compressed ones, this introduces no change in terms of memory management including movability and migratability of involved pages. This change is also not increasing per zswap entry metadata overhead. But as the number of incompressible pages increases, total zswap metadata overhead is proportionally increased. The overhead should not be problematic in usual cases, since the zswap metadata for single zswap entry is much smaller than PAGE_SIZE, and in common zswap use cases there should be a sufficient amount of compressible pages. Also it can be mitigated by the zswap writeback. When the writeback is disabled, the additional overhead could be problematic. For the case, keep the current behavior that just returns the failure and let swap_writeout() put the page back to the active LRU list in the case. Knowing how many compression failures happened will be useful for future investigations. investigations. Add a new debugfs file, compress_fail, for the purpose. Tests ----- I tested this patch using a simple self-written microbenchmark that is available at GitHub[1]. You can reproduce the test I did by executing run_tests.sh of the repo on your system. Note that the repo's documentation is not good as of this writing, so you may need to read and use the code. The basic test scenario is simple. Run a test program making artificial accesses to memory having artificial content under memory.high-set memory limit and measure how many accesses were made in given time. The test program repeatedly and randomly access three anonymous memory regions. The regions are all 500 MiB size, and accessed in the same probability. Two of those are filled up with a simple content that can easily be compressed, while the remaining one is filled up with a content that read from /dev/urandom, which is easy to fail at compressing to Suggested-by: Takero Funaki Signed-off-by: SeongJae Park --- Changes from v1 (https://lore.kernel.org/20250807181616.1895-1-sj@kernel.org) - Optimize out memcpy() per incompressible page saving, using k[un]map_local(). - Add a debugfs file for counting compression failures. - Use a clear form of a ternary operation. - Add the history of writeback disabling with a link. - Wordsmith comments. Changes from RFC v2 (https://lore.kernel.org/20250805002954.1496-1-sj@kernel.org) - Fix race conditions at decompressed pages identification. - Remove the parameter and make saving as-is the default behavior. - Open-code main changes. - Clarify there is no memory management changes on the cover letter. - Remove 20% pressure case from test results, since it is arguably too extreme and only adds confusion. - Drop RFC tag. Changes from RFC v1 (https://lore.kernel.org/20250730234059.4603-1-sj@kernel.org) - Consider PAGE_SIZE compression successes as failures. - Use zpool for storing incompressible pages. - Test with zswap shrinker enabled. - Wordsmith changelog and comments. - Add documentation of save_incompressible_pages parameter. mm/zswap.c | 36 ++++++++++++++++++++++++++++++++++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/mm/zswap.c b/mm/zswap.c index 3c0fd8a13718..0fb940d03268 100644 --- a/mm/zswap.c +++ b/mm/zswap.c @@ -60,6 +60,8 @@ static u64 zswap_written_back_pages; static u64 zswap_reject_reclaim_fail; /* Store failed due to compression algorithm failure */ static u64 zswap_reject_compress_fail; +/* Compression into a size of req), &acomp_ctx->wait); dlen = acomp_ctx->req->dlen; - if (comp_ret) - goto unlock; + + /* + * If a page cannot be compressed into a size smaller than PAGE_SIZE, + * save the content as is without a compression, to keep the LRU order + * of writebacks. If writeback is disabled, reject the page since it + * only adds metadata overhead. swap_writeout() will put the page back + * to the active LRU list in the case. + */ + if (comp_ret || dlen >= PAGE_SIZE) { + zswap_compress_fail++; + if (mem_cgroup_zswap_writeback_enabled( + folio_memcg(page_folio(page)))) { + comp_ret = 0; + dlen = PAGE_SIZE; + dst = kmap_local_page(page); + } else { + comp_ret = comp_ret ? comp_ret : -EINVAL; + goto unlock; + } + } zpool = pool->zpool; gfp = GFP_NOWAIT | __GFP_NORETRY | __GFP_HIGHMEM | __GFP_MOVABLE; @@ -990,6 +1010,8 @@ static bool zswap_compress(struct page *page, struct zswap_entry *entry, entry->length = dlen; unlock: + if (dst != acomp_ctx->buffer) + kunmap_local(dst); if (comp_ret == -ENOSPC || alloc_ret == -ENOSPC) zswap_reject_compress_poor++; else if (comp_ret) @@ -1012,6 +1034,14 @@ static bool zswap_decompress(struct zswap_entry *entry, struct folio *folio) acomp_ctx = acomp_ctx_get_cpu_lock(entry->pool); obj = zpool_obj_read_begin(zpool, entry->handle, acomp_ctx->buffer); + /* zswap entries of length PAGE_SIZE are not compressed. */ + if (entry->length == PAGE_SIZE) { + memcpy_to_folio(folio, 0, obj, entry->length); + zpool_obj_read_end(zpool, entry->handle, obj); + acomp_ctx_put_unlock(acomp_ctx); + return true; + } + /* * zpool_obj_read_begin() might return a kmap address of highmem when * acomp_ctx->buffer is not used. However, sg_init_one() does not @@ -1809,6 +1839,8 @@ static int zswap_debugfs_init(void) zswap_debugfs_root, &zswap_reject_kmemcache_fail); debugfs_create_u64("reject_compress_fail", 0444, zswap_debugfs_root, &zswap_reject_compress_fail); + debugfs_create_u64("compress_fail", 0444, + zswap_debugfs_root, &zswap_compress_fail); debugfs_create_u64("reject_compress_poor", 0444, zswap_debugfs_root, &zswap_reject_compress_poor); debugfs_create_u64("decompress_fail", 0444, base-commit: 44fa6646d975349f6499d1aeb0ed826680d0bb5c -- 2.39.5