Hi, We found a race in igc_ptp_adjfine_i225() that causes "Tx timestamp timeout" errors and eventually wedges EXTTS when a PTP grandmaster (ptp4l with hardware timestamping) runs concurrently with PHC frequency discipline (any GPSDO calling clock_adjtime ADJ_FREQUENCY). Root cause: igc_ptp_adjfine_i225() writes IGC_TIMINCA without holding any lock. Every other PTP clock operation in igc_ptp.c (adjtime, gettime, settime) holds tmreg_lock, but adjfine does not. When the increment rate changes while the hardware is capturing a TX timestamp, the captured value is corrupt. The driver retries for IGC_PTP_TX_TIMEOUT (15s), then logs the timeout and frees the skb. Repeated occurrences eventually prevent EXTTS from delivering events. The attached reproducer (triggers in ~17 seconds on i226): One thread calling clock_adjtime(ADJ_FREQUENCY) at ~200k/s on the PHC, another sending UDP packets with SO_TIMESTAMPING requesting hardware TX timestamps at ~100k/s. A Python reproducer is at: https://github.com/bobvan/PePPAR-Fix/blob/main/tools/igc_tx_timeout_repro.py At realistic rates (1 Hz adjfine from a GPSDO + ptp4l at 128 Hz sync), the race triggers in ~30 minutes. The attached patch holds ptp_tx_lock around the TIMINCA write and skips the write if any TX timestamps are pending (tx_tstamp[i].skb != NULL), returning -EBUSY. This doesn't fully close the hardware race (a new TX capture can start between the check and the write), but at realistic rates the residual probability gives ~25 year MTBF vs ~30 minutes without the patch. A complete fix would likely require either disabling TX timestamping around TIMINCA writes (via TSYNCTXCTL), or making the timeout recovery path more robust so a single corrupt timestamp doesn't wedge the subsystem. We'd welcome guidance from the igc maintainers on the preferred approach. Tested on: - Intel i226 (TimeHAT v5 board on Raspberry Pi 5) - Kernel 6.12.62+rpt-rpi-2712 (Raspberry Pi OS) - Intel out-of-tree igc driver 5.4.0-7642.46 - Stock upstream igc_ptp.c (same code, same bug) Bob --- drivers/net/ethernet/intel/igc/igc_ptp.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/igc/igc_ptp.c b/drivers/net/ethernet/intel/igc/igc_ptp.c index XXXXXXX..XXXXXXX 100644 --- a/drivers/net/ethernet/intel/igc/igc_ptp.c +++ b/drivers/net/ethernet/intel/igc/igc_ptp.c @@ -47,8 +47,10 @@ static int igc_ptp_adjfine_i225(struct ptp_clock_info *ptp, long scaled_ppm) { struct igc_adapter *igc = container_of(ptp, struct igc_adapter, ptp_caps); struct igc_hw *hw = &igc->hw; + unsigned long flags; int neg_adj = 0; u64 rate; u32 inca; + int i; if (scaled_ppm < 0) { neg_adj = 1; @@ -63,7 +65,21 @@ static int igc_ptp_adjfine_i225(struct ptp_clock_info *ptp, long scaled_ppm) if (neg_adj) inca |= ISGN; - wr32(IGC_TIMINCA, inca); + /* Changing the clock increment rate while a TX timestamp is being + * captured by the hardware can corrupt the timestamp, causing the + * driver to report "Tx timestamp timeout" and eventually wedging + * the EXTTS subsystem. Serialize with pending TX timestamps: + * skip the rate change if any are in flight. + */ + spin_lock_irqsave(&igc->ptp_tx_lock, flags); + for (i = 0; i < IGC_MAX_TX_TSTAMP_REGS; i++) { + if (igc->tx_tstamp[i].skb) { + spin_unlock_irqrestore(&igc->ptp_tx_lock, flags); + return -EBUSY; + } + } + wr32(IGC_TIMINCA, inca); + spin_unlock_irqrestore(&igc->ptp_tx_lock, flags); return 0; } -- 2.39.2