Add new information about mems_allowed and sysram_nodes, which says mems_allowed may contain union(N_MEMORY, N_PRIVATE) nodes, while sysram_nodes may only contain a subset of N_MEMORY nodes. cpuset.mems.sysram is a new RO ABI which reports the list of N_MEMORY nodes the cpuset is allowed to use, while cpusets.mems and mems.effective may also contain N_PRIVATE. Signed-off-by: Gregory Price --- .../admin-guide/cgroup-v1/cpusets.rst | 19 +++++++++++--- Documentation/admin-guide/cgroup-v2.rst | 26 +++++++++++++++++-- Documentation/filesystems/proc.rst | 2 +- 3 files changed, 40 insertions(+), 7 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v1/cpusets.rst b/Documentation/admin-guide/cgroup-v1/cpusets.rst index c7909e5ac136..6d326056f7b4 100644 --- a/Documentation/admin-guide/cgroup-v1/cpusets.rst +++ b/Documentation/admin-guide/cgroup-v1/cpusets.rst @@ -158,21 +158,26 @@ new system calls are added for cpusets - all support for querying and modifying cpusets is via this cpuset file system. The /proc//status file for each task has four added lines, -displaying the task's cpus_allowed (on which CPUs it may be scheduled) -and mems_allowed (on which Memory Nodes it may obtain memory), -in the two formats seen in the following example:: +displaying the task's cpus_allowed (on which CPUs it may be scheduled), +and mems_allowed (on which SystemRAM nodes it may obtain memory), +in the formats seen in the following example:: Cpus_allowed: ffffffff,ffffffff,ffffffff,ffffffff Cpus_allowed_list: 0-127 Mems_allowed: ffffffff,ffffffff Mems_allowed_list: 0-63 +Note that Mems_allowed only shows SystemRAM nodes (N_MEMORY), not +Private Nodes. Private Nodes may be accessible via __GFP_THISNODE +allocations if they appear in the task's cpuset.effective_mems. + Each cpuset is represented by a directory in the cgroup file system containing (on top of the standard cgroup files) the following files describing that cpuset: - cpuset.cpus: list of CPUs in that cpuset - cpuset.mems: list of Memory Nodes in that cpuset + - cpuset.mems.sysram: read-only list of SystemRAM nodes (excludes Private Nodes) - cpuset.memory_migrate flag: if set, move pages to cpusets nodes - cpuset.cpu_exclusive flag: is cpu placement exclusive? - cpuset.mem_exclusive flag: is memory placement exclusive? @@ -227,7 +232,9 @@ nodes with memory--using the cpuset_track_online_nodes() hook. The cpuset.effective_cpus and cpuset.effective_mems files are normally read-only copies of cpuset.cpus and cpuset.mems files -respectively. If the cpuset cgroup filesystem is mounted with the +respectively. The cpuset.effective_mems file may include both +regular SystemRAM nodes (N_MEMORY) and Private Nodes (N_PRIVATE). +If the cpuset cgroup filesystem is mounted with the special "cpuset_v2_mode" option, the behavior of these files will become similar to the corresponding files in cpuset v2. In other words, hotplug events will not change cpuset.cpus and cpuset.mems. Those events will @@ -236,6 +243,10 @@ the actual cpus and memory nodes that are currently used by this cpuset. See Documentation/admin-guide/cgroup-v2.rst for more information about cpuset v2 behavior. +The cpuset.mems.sysram file shows only the SystemRAM nodes (N_MEMORY) +from cpuset.effective_mems, excluding any Private Nodes. This +represents the nodes available for general memory allocation. + 1.4 What are exclusive cpusets ? -------------------------------- diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 7f5b59d95fce..6af54efb84a2 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -2530,8 +2530,11 @@ Cpuset Interface Files cpuset-enabled cgroups. It lists the onlined memory nodes that are actually granted to - this cgroup by its parent. These memory nodes are allowed to - be used by tasks within the current cgroup. + this cgroup by its parent. This includes both regular SystemRAM + nodes (N_MEMORY) and Private Nodes (N_PRIVATE) that provide + device-specific memory not intended for general consumption. + Tasks within this cgroup may access Private Nodes using explicit + __GFP_THISNODE allocations if the node is in this mask. If "cpuset.mems" is empty, it shows all the memory nodes from the parent cgroup that will be available to be used by this cgroup. @@ -2541,6 +2544,25 @@ Cpuset Interface Files Its value will be affected by memory nodes hotplug events. + cpuset.mems.sysram + A read-only multiple values file which exists on all + cpuset-enabled cgroups. + + It lists the SystemRAM nodes (N_MEMORY) that are available for + general memory allocation by tasks within this cgroup. This is + a subset of "cpuset.mems.effective" that excludes Private Nodes. + + Normal page allocations are restricted to nodes in this mask. + The kernel page allocator, slab allocator, and compaction only + consider SystemRAM nodes when allocating memory for tasks. + + Private Nodes are excluded from this mask because their memory + is managed by device drivers for specific purposes (e.g., CXL + compressed memory, accelerator memory) and should not be used + for general allocations. + + Its value will be affected by memory nodes hotplug events. + cpuset.cpus.exclusive A read-write multiple values file which exists on non-root cpuset-enabled cgroups. diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index c92e95e28047..68f3d8ffc03b 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -294,7 +294,7 @@ It's slow but very precise. Cpus_active_mm mask of CPUs on which this process has an active memory context Cpus_active_mm_list Same as previous, but in "list format" - Mems_allowed mask of memory nodes allowed to this process + Mems_allowed mask of SystemRAM nodes for general allocations Mems_allowed_list Same as previous, but in "list format" voluntary_ctxt_switches number of voluntary context switches nonvoluntary_ctxt_switches number of non voluntary context switches -- 2.52.0