On PCIe-attached macb instances (BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 is the case I have in front of me), a TCOMP interrupt can be lost: the TSTART doorbell can be lost in the posted-write fabric (addressed by an earlier patch), or the descriptor TX_USED DMA write can be observed late by the driver (also addressed earlier). When that happens the TX ring stalls silently until something else kicks TSTART. Add a per-queue delayed_work that runs once per second. It detects forward progress on the TX completion path via a per-queue bool tx_stall_tail_moved that macb_tx_complete() sets when tx_tail advances and the watchdog clears on each tick. If the ring is non-empty (queue->tx_head != queue->tx_tail) and the flag is unset when the tick runs, the watchdog calls the existing macb_tx_restart() to re-assert TSTART. The bool form (rather than a tx_tail snapshot) sidesteps any concern about ring-index aliasing between ticks and is the form suggested by Phil Elwell when reviewing the same series anchored against rpi-6.18.y at raspberrypi/linux#7340 (merged 2026-05-08). No new recovery logic is introduced. macb_tx_restart() already exists in this file, is correctly locked (tx_ptr_lock, bp->lock), and verifies that the hardware's TBQP is behind the driver's head index before re-asserting TSTART. On a healthy ring it is a no-op at the hardware level. Cost on a healthy queue: one spin_lock_irqsave / spin_unlock and two field assignments per tick. The delayed_work is only scheduled between macb_open() and macb_close() and is cancelled synchronously on close. A netif_carrier_ok() gate at the top of the tick skips the stall check when there is no carrier (no completion is possible without a link), eliminating a boot-time false positive where queue->tx_head can advance from kernel-queued packets between macb_open() and link autoneg completion, while tx_tail stays unchanged because no TCOMPs have arrived yet. netdev_warn_ratelimited() is used rather than netdev_warn_once() so operators can count occurrences across the lifetime of the netdev. Link: https://github.com/cilium/cilium/issues/43198 Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877 Link: https://github.com/raspberrypi/linux/pull/7340 Signed-off-by: Lukasz Raczylo --- drivers/net/ethernet/cadence/macb.h | 10 ++++ drivers/net/ethernet/cadence/macb_main.c | 72 ++++++++++++++++++++++++ 2 files changed, 82 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb.h b/drivers/net/ethernet/cadence/macb.h index ce9037f9e..75df0f75b 100644 --- a/drivers/net/ethernet/cadence/macb.h +++ b/drivers/net/ethernet/cadence/macb.h @@ -1282,6 +1282,16 @@ struct macb_queue { dma_addr_t tx_ring_dma; struct work_struct tx_error_task; bool txubr_pending; + + /* TX stall watchdog -- see macb_tx_stall_watchdog() in macb_main.c. + * tx_stall_tail_moved is set by macb_tx_complete() when tx_tail + * advances and cleared by the watchdog tick on each pass (both + * under tx_ptr_lock). Using a bool sidesteps any ring-index + * aliasing concern between ticks. + */ + struct delayed_work tx_stall_watchdog_work; + bool tx_stall_tail_moved; + struct napi_struct napi_tx; dma_addr_t rx_ring_dma; diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index f7fa9e7ad..8245c69e1 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -1473,6 +1473,8 @@ static int macb_tx_complete(struct macb_queue *queue, int budget) packets, bytes); queue->tx_tail = tail; + if (packets) + queue->tx_stall_tail_moved = true; if (__netif_subqueue_stopped(bp->dev, queue_index) && CIRC_CNT(queue->tx_head, queue->tx_tail, bp->tx_ring_size) <= MACB_TX_WAKEUP_THRESH(bp)) @@ -2003,6 +2005,70 @@ static int macb_tx_poll(struct napi_struct *napi, int budget) return work_done; } +#define MACB_TX_STALL_INTERVAL_MS 1000 + +/* + * TX stall watchdog. + * + * Recovers from lost TCOMP interrupts on PCIe-attached macb + * instances. macb already has a recovery chain + * (txubr_pending -> macb_tx_restart()) that fires on TCOMP; if + * TCOMP itself is lost the TX ring stalls silently until something + * else kicks TSTART. This watchdog runs once per second per queue + * and calls macb_tx_restart() if the ring is non-empty and + * tx_tail has not advanced since the previous tick. + * + * Movement is tracked via the tx_stall_tail_moved boolean rather + * than a tx_tail snapshot, sidestepping any ring-index aliasing + * concern. The bool is set by macb_tx_complete() when tx_tail + * advances and cleared here on each tick; both writes are under + * tx_ptr_lock so no atomic is required. + * + * macb_tx_restart() already checks the hardware's TBQP against + * the driver's head index before re-asserting TSTART, so on a + * healthy ring this is a no-op at the hardware level. The + * watchdog only supplies the missing trigger. + */ +static void macb_tx_stall_watchdog(struct work_struct *work) +{ + struct macb_queue *queue = container_of(to_delayed_work(work), + struct macb_queue, + tx_stall_watchdog_work); + struct macb *bp = queue->bp; + unsigned int cur_tail, cur_head; + bool stalled = false; + unsigned long flags; + + if (!netif_running(bp->dev)) + return; + + /* No carrier => no completion is possible. Skip the check + * but keep the watchdog ticking for when carrier comes up. + */ + if (!netif_carrier_ok(bp->dev)) + goto reschedule; + + spin_lock_irqsave(&queue->tx_ptr_lock, flags); + cur_tail = queue->tx_tail; + cur_head = queue->tx_head; + if (cur_head != cur_tail && !queue->tx_stall_tail_moved) + stalled = true; + queue->tx_stall_tail_moved = false; + spin_unlock_irqrestore(&queue->tx_ptr_lock, flags); + + if (stalled) { + netdev_warn_ratelimited(bp->dev, + "TX stall detected on queue %u (tail=%u head=%u); re-kicking TSTART\n", + (unsigned int)(queue - bp->queues), + cur_tail, cur_head); + macb_tx_restart(queue); + } + +reschedule: + schedule_delayed_work(&queue->tx_stall_watchdog_work, + msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS)); +} + static void macb_hresp_error_task(struct work_struct *work) { struct macb *bp = from_work(bp, work, hresp_err_bh_work); @@ -3192,6 +3258,9 @@ static int macb_open(struct net_device *dev) for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { napi_enable(&queue->napi_rx); napi_enable(&queue->napi_tx); + queue->tx_stall_tail_moved = true; + schedule_delayed_work(&queue->tx_stall_watchdog_work, + msecs_to_jiffies(MACB_TX_STALL_INTERVAL_MS)); } macb_init_hw(bp); @@ -3242,6 +3311,7 @@ static int macb_close(struct net_device *dev) for (q = 0, queue = bp->queues; q < bp->num_queues; ++q, ++queue) { napi_disable(&queue->napi_rx); napi_disable(&queue->napi_tx); + cancel_delayed_work_sync(&queue->tx_stall_watchdog_work); netdev_tx_reset_queue(netdev_get_tx_queue(dev, q)); } @@ -4804,6 +4874,8 @@ static int macb_init_dflt(struct platform_device *pdev) } INIT_WORK(&queue->tx_error_task, macb_tx_error_task); + INIT_DELAYED_WORK(&queue->tx_stall_watchdog_work, + macb_tx_stall_watchdog); q++; } -- 2.54.0