Currently if a user enqueue a work item using schedule_delayed_work() the used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to schedule_work() that is using system_wq and queue_work(), that makes use again of WORK_CPU_UNBOUND. This lack of consistentcy cannot be addressed without refactoring the API. alloc_workqueue() treats all queues as per-CPU by default, while unbound workqueues must opt-in via WQ_UNBOUND. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This default is suboptimal: most workloads benefit from unbound queues, allowing the scheduler to place worker threads where they’re needed and reducing noise when CPUs are isolated. This patch adds a new WQ_PERCPU flag to all the mm subsystem users to explicitly request the use of the per-CPU behavior. Both flags coexist for one release cycle to allow callers to transition their calls. Once migration is complete, WQ_UNBOUND can be removed and unbound will become the implicit default. With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND), any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND must now use WQ_PERCPU. All existing users have been updated accordingly. Suggested-by: Tejun Heo Signed-off-by: Marco Crivellari --- mm/backing-dev.c | 2 +- mm/slub.c | 3 ++- mm/vmstat.c | 3 ++- 3 files changed, 5 insertions(+), 3 deletions(-) diff --git a/mm/backing-dev.c b/mm/backing-dev.c index 7e672424f928..3b392de6367e 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -969,7 +969,7 @@ static int __init cgwb_init(void) * system_percpu_wq. Put them in a separate wq and limit concurrency. * There's no point in executing many of these in parallel. */ - cgwb_release_wq = alloc_workqueue("cgwb_release", 0, 1); + cgwb_release_wq = alloc_workqueue("cgwb_release", WQ_PERCPU, 1); if (!cgwb_release_wq) return -ENOMEM; diff --git a/mm/slub.c b/mm/slub.c index b46f87662e71..cac9d5d7c924 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -6364,7 +6364,8 @@ void __init kmem_cache_init(void) void __init kmem_cache_init_late(void) { #ifndef CONFIG_SLUB_TINY - flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM, 0); + flushwq = alloc_workqueue("slub_flushwq", WQ_MEM_RECLAIM | WQ_PERCPU, + 0); WARN_ON(!flushwq); #endif } diff --git a/mm/vmstat.c b/mm/vmstat.c index 4c268ce39ff2..57bf76b1d9d4 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -2244,7 +2244,8 @@ void __init init_mm_internals(void) { int ret __maybe_unused; - mm_percpu_wq = alloc_workqueue("mm_percpu_wq", WQ_MEM_RECLAIM, 0); + mm_percpu_wq = alloc_workqueue("mm_percpu_wq", + WQ_MEM_RECLAIM | WQ_PERCPU, 0); #ifdef CONFIG_SMP ret = cpuhp_setup_state_nocalls(CPUHP_MM_VMSTAT_DEAD, "mm/vmstat:dead", -- 2.51.0