Btrfs currently tracks its metadata pages in the page cache, using a fake inode (fs_info->btree_inode) with offsets corresponding to where the metadata is stored in the filesystem's full logical address space. A consequence of this is that when btrfs uses filemap_add_folio(), this usage is charged to the cgroup of whichever task happens to be running at the time. These folios don't belong to any particular user cgroup, so I don't think it makes much sense for them to be charged in that way. Some negative consequences as a result: - A task can be holding some important btrfs locks, then need to lookup some metadata and go into reclaim, extending the duration it holds that lock for, and unfairly pushing its own reclaim pain onto other cgroups. - If that cgroup goes into reclaim, it might reclaim these folios a different non-reclaiming cgroup might need soon. This is naturally offset by LRU reclaim, but still. A very similar proposal to use the root cgroup was previously made by Qu, where he eventually proposed the idea of setting it per address_space. This makes good sense for the btrfs use case, as the uncharged behavior should apply to all use of the address_space, not select allocations. I.e., if someone adds another filemap_add_folio() call using btrfs's btree_inode, we would almost certainly want the uncharged behavior. Link: https://lore.kernel.org/linux-mm/b5fef5372ae454a7b6da4f2f75c427aeab6a07d6.1727498749.git.wqu@suse.com/ Suggested-by: Qu Wenruo Acked-by: Shakeel Butt Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Boris Burkov --- include/linux/pagemap.h | 1 + mm/filemap.c | 12 ++++++++---- 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index c9ba69e02e3e..06dc3fae8124 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -211,6 +211,7 @@ enum mapping_flags { folio contents */ AS_INACCESSIBLE = 8, /* Do not attempt direct R/W access to the mapping */ AS_WRITEBACK_MAY_DEADLOCK_ON_RECLAIM = 9, + AS_UNCHARGED = 10, /* Do not charge usage to a cgroup */ /* Bits 16-25 are used for FOLIO_ORDER */ AS_FOLIO_ORDER_BITS = 5, AS_FOLIO_ORDER_MIN = 16, diff --git a/mm/filemap.c b/mm/filemap.c index e4a5a46db89b..5004a2cfa0cc 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -960,15 +960,19 @@ int filemap_add_folio(struct address_space *mapping, struct folio *folio, { void *shadow = NULL; int ret; + bool charge_mem_cgroup = !test_bit(AS_UNCHARGED, &mapping->flags); - ret = mem_cgroup_charge(folio, NULL, gfp); - if (ret) - return ret; + if (charge_mem_cgroup) { + ret = mem_cgroup_charge(folio, NULL, gfp); + if (ret) + return ret; + } __folio_set_locked(folio); ret = __filemap_add_folio(mapping, folio, index, gfp, &shadow); if (unlikely(ret)) { - mem_cgroup_uncharge(folio); + if (charge_mem_cgroup) + mem_cgroup_uncharge(folio); __folio_clear_locked(folio); } else { /* -- 2.50.1