Add NULL pointer checks in ice_vsi_set_napi_queues() to prevent crashes during resume from suspend when rings[q_idx]->q_vector is NULL. Tested adaptor: 60:00.0 Ethernet controller [0200]: Intel Corporation Ethernet Controller E810-XXV for SFP [8086:159b] (rev 02) Subsystem: Intel Corporation Ethernet Network Adapter E810-XXV-2 [8086:4003] SR-IOV state: both disabled and enabled can reproduce this issue. kernel version: v6.18 Reproduce steps: Bootup and execute suspend like systemctl suspend or rtcwake. Log: <1>[ 231.443607] BUG: kernel NULL pointer dereference, address: 0000000000000040 <1>[ 231.444052] #PF: supervisor read access in kernel mode <1>[ 231.444484] #PF: error_code(0x0000) - not-present page <6>[ 231.444913] PGD 0 P4D 0 <4>[ 231.445342] Oops: Oops: 0000 [#1] SMP NOPTI <4>[ 231.446635] RIP: 0010:netif_queue_set_napi+0xa/0x170 <4>[ 231.447067] Code: 31 f6 31 ff c3 cc cc cc cc 0f 1f 80 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 48 85 c9 74 0b <48> 83 79 30 00 0f 84 39 01 00 00 55 41 89 d1 49 89 f8 89 f2 48 89 <4>[ 231.447513] RSP: 0018:ffffcc780fc078c0 EFLAGS: 00010202 <4>[ 231.447961] RAX: ffff8b848ca30400 RBX: ffff8b848caf2028 RCX: 0000000000000010 <4>[ 231.448443] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8b848dbd4000 <4>[ 231.448896] RBP: ffffcc780fc078e8 R08: 0000000000000000 R09: 0000000000000000 <4>[ 231.449345] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 <4>[ 231.449817] R13: ffff8b848dbd4000 R14: ffff8b84833390c8 R15: 0000000000000000 <4>[ 231.450265] FS: 00007c7b29e9d740(0000) GS:ffff8b8c068e2000(0000) knlGS:0000000000000000 <4>[ 231.450715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 <4>[ 231.451179] CR2: 0000000000000040 CR3: 000000030626f004 CR4: 0000000000f72ef0 <4>[ 231.451629] PKRU: 55555554 <4>[ 231.452076] Call Trace: <4>[ 231.452549] <4>[ 231.452996] ? ice_vsi_set_napi_queues+0x4d/0x110 [ice] <4>[ 231.453482] ice_resume+0xfd/0x220 [ice] <4>[ 231.453977] ? __pfx_pci_pm_resume+0x10/0x10 <4>[ 231.454425] pci_pm_resume+0x8c/0x140 <4>[ 231.454872] ? __pfx_pci_pm_resume+0x10/0x10 <4>[ 231.455347] dpm_run_callback+0x5f/0x160 <4>[ 231.455796] ? dpm_wait_for_superior+0x107/0x170 <4>[ 231.456244] device_resume+0x177/0x270 <4>[ 231.456708] dpm_resume+0x209/0x2f0 <4>[ 231.457151] dpm_resume_end+0x15/0x30 <4>[ 231.457596] suspend_devices_and_enter+0x1da/0x2b0 <4>[ 231.458054] enter_state+0x10e/0x570 Add defensive checks for both the ring pointer and its q_vector before dereferencing, allowing the system to resume successfully even when q_vectors are unmapped. Fixes: 2a5dc090b92cf ("ice: move netif_queue_set_napi to rtnl-protected sections") Reviewed-by: Aleksandr Loktionov Signed-off-by: Aaron Ma --- V1 -> V2: add test device info. drivers/net/ethernet/intel/ice/ice_lib.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 15621707fbf81..9d1178bde4495 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -2779,11 +2779,13 @@ void ice_vsi_set_napi_queues(struct ice_vsi *vsi) ASSERT_RTNL(); ice_for_each_rxq(vsi, q_idx) - netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX, + if (vsi->rx_rings[q_idx] && vsi->rx_rings[q_idx]->q_vector) + netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_RX, &vsi->rx_rings[q_idx]->q_vector->napi); ice_for_each_txq(vsi, q_idx) - netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX, + if (vsi->tx_rings[q_idx] && vsi->tx_rings[q_idx]->q_vector) + netif_queue_set_napi(netdev, q_idx, NETDEV_QUEUE_TYPE_TX, &vsi->tx_rings[q_idx]->q_vector->napi); /* Also set the interrupt number for the NAPI */ ice_for_each_q_vector(vsi, v_idx) { -- 2.43.0 After wakeup from suspend, IRDMA is initialized with error: kernel: ice 0000:60:00.0: IRDMA hardware initialization FAILED init_state=4 status=-110 kernel: ice 0000:60:00.1: IRDMA hardware initialization FAILED init_state=4 status=-110 kernel: irdma.gen_2 ice.roce.1: probe with driver irdma.gen_2 failed with error -110 kernel: irdma.gen_2 ice.roce.2: probe with driver irdma.gen_2 failed with error -110 IRDMA times out because the initialization before the schedule reset. The ice_init_rdma() function already calls ice_plug_aux_dev() internally, ensuring proper initialization order. Fixes: bc69ad74867db ("ice: avoid IRQ collision to fix init failure on ACPI S3 resume") Reviewed-by: Aleksandr Loktionov Signed-off-by: Aaron Ma --- V1 -> V2: no changes. drivers/net/ethernet/intel/ice/ice_main.c | 12 ++++++------ 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 2533876f1a2fd..c6dd04d24ac09 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -5677,11 +5677,6 @@ static int ice_resume(struct device *dev) if (ret) dev_err(dev, "Cannot restore interrupt scheme: %d\n", ret); - ret = ice_init_rdma(pf); - if (ret) - dev_err(dev, "Reinitialize RDMA during resume failed: %d\n", - ret); - clear_bit(ICE_DOWN, pf->state); /* Now perform PF reset and rebuild */ reset_type = ICE_RESET_PFR; @@ -7805,7 +7800,12 @@ static void ice_rebuild(struct ice_pf *pf, enum ice_reset_req reset_type) ice_health_clear(pf); - ice_plug_aux_dev(pf); + /* Initialize RDMA after control queues are ready */ + err = ice_init_rdma(pf); + if (err) + dev_err(dev, "Reinitialize RDMA after rebuild failed: %d\n", + err); + if (ice_is_feature_supported(pf, ICE_F_SRIOV_LAG)) ice_lag_rebuild(pf); -- 2.43.0