At present, the managed interrupt spreading algorithm distributes vectors across all available CPUs within a given node or system. On systems employing CPU isolation (e.g., "isolcpus=io_queue"), this behaviour defeats the primary purpose of isolation by routing hardware interrupts (such as NVMe completion queues) directly to isolated cores. Update irq_create_affinity_masks() to respect the housekeeping CPU mask. By passing the HK_TYPE_IO_QUEUE mask directly to the topological distribution function (group_mask_cpus_evenly()), we ensure that managed interrupts are kept strictly off isolated CPUs. This patch additionally addresses the architectural constraints of restricted vector distribution: 1. Vector Limits and Overrides: Updated irq_calc_affinity_vectors() to strictly bound the maximum number of allocated vectors to the weight of the housekeeping mask. This correctly overrides drivers providing a calc_sets() callback, preventing them from wasting memory on dead hardware queues that cannot be routed to isolated CPUs. 2. Multi-set Alignment and Leak Prevention: When isolation constraints result in fewer available masks than requested vectors for a given set, the remaining vector slots are padded with the housekeeping mask. This replaces the historical irq_default_affinity padding, ensuring excess managed queues do not leak interrupts onto isolated CPUs. 3. Minimum Vector Safety Net: To prevent fatal -ENOSPC device probe aborts on heavily isolated systems (where the housekeeping CPU count might be lower than a device's structural minimum), the final vector calculation is safeguarded to never drop below minvec. Queues will safely share the available housekeeping CPUs instead of failing the probe. 4. Zero Overhead: The housekeeping mask is conditionally assigned via a direct pointer, completely avoiding temporary mask allocations (e.g., alloc_cpumask_var) and bitwise operations when CPU isolation is disabled. This guarantees zero performance or memory overhead for standard configurations. Signed-off-by: Aaron Tomlin --- kernel/irq/affinity.c | 31 +++++++++++++++++++++++-------- 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index 78f2418a8925..dade92f8b4b3 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -8,6 +8,7 @@ #include #include #include +#include static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs) { @@ -25,8 +26,10 @@ static void default_calc_sets(struct irq_affinity *affd, unsigned int affvecs) struct irq_affinity_desc * irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) { - unsigned int affvecs, curvec, usedvecs, i; + unsigned int affvecs, curvec, usedvecs, i, j; struct irq_affinity_desc *masks = NULL; + const struct cpumask *hk_mask = housekeeping_cpumask(HK_TYPE_IO_QUEUE); + bool hk_enabled = housekeeping_enabled(HK_TYPE_IO_QUEUE); /* * Determine the number of vectors which need interrupt affinities @@ -70,19 +73,29 @@ irq_create_affinity_masks(unsigned int nvecs, struct irq_affinity *affd) */ for (i = 0, usedvecs = 0; i < affd->nr_sets; i++) { unsigned int nr_masks, this_vecs = affd->set_size[i]; - struct cpumask *result = group_cpus_evenly(this_vecs, &nr_masks); + struct cpumask *result; + const struct cpumask *mask; + if (hk_enabled) + mask = hk_mask; + else + mask = cpu_possible_mask; + + result = group_mask_cpus_evenly(this_vecs, mask, + &nr_masks); if (!result) { kfree(masks); return NULL; } - - for (int j = 0; j < nr_masks; j++) + for (j = 0; j < nr_masks; j++) cpumask_copy(&masks[curvec + j].mask, &result[j]); + for (j = nr_masks; j < this_vecs; j++) + cpumask_copy(&masks[curvec + j].mask, mask); + kfree(result); - curvec += nr_masks; - usedvecs += nr_masks; + curvec += this_vecs; + usedvecs += this_vecs; } /* Fill out vectors at the end that don't need affinity */ @@ -115,10 +128,12 @@ unsigned int irq_calc_affinity_vectors(unsigned int minvec, unsigned int maxvec, if (resv > minvec) return 0; - if (affd->calc_sets) + if (housekeeping_enabled(HK_TYPE_IO_QUEUE)) + set_vecs = cpumask_weight(housekeeping_cpumask(HK_TYPE_IO_QUEUE)); + else if (affd->calc_sets) set_vecs = maxvec - resv; else set_vecs = cpumask_weight(cpu_possible_mask); - return resv + min(set_vecs, maxvec - resv); + return max(minvec, resv + min(set_vecs, maxvec - resv)); } -- 2.51.0