When running a PREEMPT_RT debug kernel on a 2-socket Grace arm64 system, the following bug report was produced at bootup time. BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/72 preempt_count: 1, expected: 0 RCU nest depth: 1, expected: 1 : CPU: 72 UID: 0 PID: 0 Comm: swapper/72 Tainted: G W 6.19.0-rc4-test+ #4 PREEMPT_{RT,(full)} Tainted: [W]=WARN Call trace: : rt_spin_lock+0xe4/0x408 rmqueue_bulk+0x48/0x1de8 __rmqueue_pcplist+0x410/0x650 rmqueue.constprop.0+0x6a8/0x2b50 get_page_from_freelist+0x3c0/0xe68 __alloc_frozen_pages_noprof+0x1dc/0x348 alloc_pages_mpol+0xe4/0x2f8 alloc_frozen_pages_noprof+0x124/0x190 allocate_slab+0x2f0/0x438 new_slab+0x4c/0x80 ___slab_alloc+0x410/0x798 __slab_alloc.constprop.0+0x88/0x1e0 __kmalloc_cache_noprof+0x2dc/0x4b0 allocate_vpe_l1_table+0x114/0x788 its_cpu_init_lpis+0x344/0x790 its_cpu_init+0x60/0x220 gic_starting_cpu+0x64/0xe8 cpuhp_invoke_callback+0x438/0x6d8 __cpuhp_invoke_callback_range+0xd8/0x1f8 notify_cpu_starting+0x11c/0x178 secondary_start_kernel+0xc8/0x188 __secondary_switched+0xc0/0xc8 This is due to the fact that allocate_vpe_l1_table() will call kzalloc() to allocate a cpumask_t when the first CPU of the second node of the 72-cpu Grace system is being called from the CPUHP_AP_IRQ_GIC_STARTING state inside the starting section of the CPU hotplug bringup pipeline where interrupt is disabled. This is an atomic context where sleeping is not allowed and acquiring a sleeping rt_spin_lock within kzalloc() may lead to system hang in case there is a lock contention. A possible workaround is to use the new GFP_ATOMIC_RT gfp flag where only spin_trylock() will be used to attempt to acquire spinlocks in the memory allocation path to disallow sleeping. As this memory allocation is only needed for the first core of a new socket in early boot, the chance of memory allocation request collision is low. In case it happens, direct injection of virtual interrupts from the physical Interrupt Translation Service (ITS) into a guest Virtual Machine (VM) will be disabled. A longer term solution is to defer the allocation to a later stage of the hotplug pipeline where interrupt isn't disabled. With that change applied, booting up a debug kernel on the same 2-socket Grace system does not produce such a bug report anymore with no direct injection disable warning. Signed-off-by: Waiman Long --- drivers/irqchip/irq-gic-v3-its.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c index 291d7668cc8d..d78057fb40df 100644 --- a/drivers/irqchip/irq-gic-v3-its.c +++ b/drivers/irqchip/irq-gic-v3-its.c @@ -2927,7 +2927,7 @@ static int allocate_vpe_l1_table(void) if (val & GICR_VPROPBASER_4_1_VALID) goto out; - gic_data_rdist()->vpe_table_mask = kzalloc_obj(cpumask_t, GFP_ATOMIC); + gic_data_rdist()->vpe_table_mask = kzalloc_obj(cpumask_t, GFP_ATOMIC_RT); if (!gic_data_rdist()->vpe_table_mask) return -ENOMEM; @@ -3271,6 +3271,8 @@ static void its_cpu_init_lpis(void) */ gic_rdists->has_rvpeid = false; gic_rdists->has_vlpis = false; + pr_warn("GICv3: CPU%d: direct injection of virtual interrupt disabled\n", + smp_processor_id()); } /* Make sure the GIC has seen the above */ -- 2.54.0