Fix a AA deadlock in refill_skbs() where memory allocation while holding skb_pool->lock can trigger a recursive lock acquisition attempt. The deadlock scenario occurs when the system is under severe memory pressure: 1. refill_skbs() acquires skb_pool->lock (spinlock) 2. alloc_skb() is called while holding the lock 3. Memory allocator fails and calls slab_out_of_memory() 4. This triggers printk() for the OOM warning 5. The console output path calls netpoll_send_udp() 6. netpoll_send_udp() attempts to acquire the same skb_pool->lock 7. Deadlock: the lock is already held by the same CPU Call stack: refill_skbs() spin_lock_irqsave(&skb_pool->lock) <- lock acquired __alloc_skb() kmem_cache_alloc_node_noprof() slab_out_of_memory() printk() console_flush_all() netpoll_send_udp() skb_dequeue() spin_lock_irqsave(&skb_pool->lock) <- deadlock attempt This bug was exposed by commit 248f6571fd4c51 ("netpoll: Optimize skb refilling on critical path") which removed refill_skbs() from the critical path (where nested printk was being deferred), letting nested printk being calld form inside refill_skbs() Refactor refill_skbs() to never allocate memory while holding the spinlock. Another possible solution to fix this problem is protecting the refill_skbs() from nested printks, basically calling printk_deferred_{enter,exit}() in refill_skbs(), then, any nested pr_warn() would be deferred. I prefer tthis approach, given I _think_ it might be a good idea to move the alloc_skb() from GFP_ATOMIC to GFP_KERNEL in the future, so, having the alloc_skb() outside of the lock will be necessary step. Signed-off-by: Breno Leitao Fixes: 248f6571fd4c51 ("netpoll: Optimize skb refilling on critical path") --- Changes in v3: - Removed the "return" before the exit labels. (Simon) - Link to v2: https://lore.kernel.org/r/20251014-fix_netpoll_aa-v2-1-dafa6a378649@debian.org Changes in v2: - Added a return after the successful path (Rik van Riel) - Changed the Fixes tag to point to the commit that exposed the problem. - Link to v1: https://lore.kernel.org/r/20251013-fix_netpoll_aa-v1-1-94a1091f92f0@debian.org --- net/core/netpoll.c | 19 ++++++++++++++++--- 1 file changed, 16 insertions(+), 3 deletions(-) diff --git a/net/core/netpoll.c b/net/core/netpoll.c index 60a05d3b7c249..b8729ec1daeb8 100644 --- a/net/core/netpoll.c +++ b/net/core/netpoll.c @@ -232,14 +232,27 @@ static void refill_skbs(struct netpoll *np) skb_pool = &np->skb_pool; - spin_lock_irqsave(&skb_pool->lock, flags); - while (skb_pool->qlen < MAX_SKBS) { + while (1) { + spin_lock_irqsave(&skb_pool->lock, flags); + if (skb_pool->qlen >= MAX_SKBS) + goto unlock; + spin_unlock_irqrestore(&skb_pool->lock, flags); + skb = alloc_skb(MAX_SKB_SIZE, GFP_ATOMIC); if (!skb) - break; + return; + spin_lock_irqsave(&skb_pool->lock, flags); + if (skb_pool->qlen >= MAX_SKBS) + /* Discard if len got increased (TOCTOU) */ + goto discard; __skb_queue_tail(skb_pool, skb); + spin_unlock_irqrestore(&skb_pool->lock, flags); } + +discard: + dev_kfree_skb_any(skb); +unlock: spin_unlock_irqrestore(&skb_pool->lock, flags); } --- base-commit: c5705a2a4aa35350e504b72a94b5c71c3754833c change-id: 20251013-fix_netpoll_aa-c991ac5f2138 Best regards, -- Breno Leitao