macb_tx_poll() runs with TCOMP masked, drains the TX ring, then calls napi_complete_done() and re-enables TCOMP via IER. An existing comment in the function notes that completions raised while TCOMP is masked do not re-fire on IER re-enable, and mitigates this by calling macb_tx_complete_pending(), which inspects driver-visible ring state (descriptor->ctrl, after rmb()) and reschedules NAPI if a completion is observable in memory. On PCIe-attached parts (BCM2712 + RP1 PCIe south bridge on Raspberry Pi 5 is the case I have in front of me), the descriptor DMA write that sets TX_USED may not have retired to system memory at the point macb_tx_complete_pending() runs. The rmb() synchronises the CPU view of earlier CPU writes; it is not sufficient to retire an in-flight peripheral DMA write. Under that ordering the in-memory descriptor can still read TX_USED=0 when the hardware has in fact completed the frame; the check returns false; NAPI exits; the quirk above prevents the re-enabled IER from re-firing; the ring goes quiescent. Add a side-effect-free MMIO read between the IER write and the macb_tx_complete_pending() check. The read functions as an architected PCIe read barrier for earlier peripheral-originated DMA writes on the same path, so any in-flight TX_USED update retires to system memory before the descriptor read. The register chosen is IMR (the read-only interrupt mask mirror); reading it has no side effects on either read-clear or W1C ISR silicon (it is not the ISR), and the read still flushes prior DMA writes via the PCIe completion-ordering guarantee. Link: https://github.com/cilium/cilium/issues/43198 Link: https://bugs.launchpad.net/ubuntu/+source/linux-raspi/+bug/2133877 Signed-off-by: Lukasz Raczylo --- drivers/net/ethernet/cadence/macb_main.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/cadence/macb_main.c b/drivers/net/ethernet/cadence/macb_main.c index 6879f3458..f7fa9e7ad 100644 --- a/drivers/net/ethernet/cadence/macb_main.c +++ b/drivers/net/ethernet/cadence/macb_main.c @@ -1984,6 +1984,14 @@ static int macb_tx_poll(struct napi_struct *napi, int budget) * actions if an interrupt is raised just after enabling them, * but this should be harmless. */ + /* + * PCIe read barrier: flush any in-flight peripheral DMA + * writes (descriptor TX_USED updates) so the subsequent + * macb_tx_complete_pending() check observes them. IMR is + * the read-only interrupt mask mirror; the read has no + * side effects on either read-clear or W1C ISR silicon. + */ + (void)queue_readl(queue, IMR); if (macb_tx_complete_pending(queue)) { queue_writel(queue, IDR, MACB_BIT(TCOMP)); macb_queue_isr_clear(bp, queue, MACB_BIT(TCOMP)); -- 2.54.0