syzbot is reporting a net_device refcount leak in RDMA code. A debug printk() patch reported that ib_enum_roce_netdev() is called for allocating GID entry but is not called for releasing GID entry. This result suggests that something is preventing ib_enum_roce_netdev() from ib_enum_all_roce_netdevs() from netdevice_event_work_handler() from being called when releasing GID entry. Commit 03db3a2d81e6 ("IB/core: Add RoCE GID table management") introduced ib_enum_all_roce_netdevs(), but calling this function asynchronously from WQ context is racy. I can observe using simple atomic_t counters that there are sometimes pending netdevice_event() works as of immediately before clearing DEVICE_REGISTERED flag in disable_device() from __ib_unregister_device(). If pending works contained ib_enum_roce_netdev() call for releasing GID entry, this race can result in a net_device refcount leak. Therefore, flush pending works immediately before clearing DEVICE_REGISTERED flag. Also, since commit 8fe8bacb92f2 ("IB/core: Add ordered workqueue for RoCE GID management") was intended to ensure that netdev events are processed in the order netdevice_event() is called, failing to invoke corresponding event handler due to memory allocation failure is as bad as processing netdev events in parallel. Therefore, add __GFP_NOFAIL when allocating memory for a work for netdev events. Reported-by: syzbot+881d65229ca4f9ae8c84@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=881d65229ca4f9ae8c84 Fixes: 03db3a2d81e6 ("IB/core: Add RoCE GID table management") Signed-off-by: Tetsuo Handa --- I haven't confirmed that netdevice_event_work_handler() is called for releasing GID entry. But I'd like to try this patch in linux-next tree via my tree for testing. drivers/infiniband/core/core_priv.h | 1 + drivers/infiniband/core/device.c | 1 + drivers/infiniband/core/roce_gid_mgmt.c | 10 ++++++---- 3 files changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/infiniband/core/core_priv.h b/drivers/infiniband/core/core_priv.h index 05102769a918..8355020bb98a 100644 --- a/drivers/infiniband/core/core_priv.h +++ b/drivers/infiniband/core/core_priv.h @@ -142,6 +142,7 @@ int ib_cache_gid_del_all_netdev_gids(struct ib_device *ib_dev, u32 port, int roce_gid_mgmt_init(void); void roce_gid_mgmt_cleanup(void); +void roce_flush_gid_cache_wq(void); unsigned long roce_gid_type_mask_support(struct ib_device *ib_dev, u32 port); diff --git a/drivers/infiniband/core/device.c b/drivers/infiniband/core/device.c index 13e8a1714bbd..8638583a64f2 100644 --- a/drivers/infiniband/core/device.c +++ b/drivers/infiniband/core/device.c @@ -1300,6 +1300,7 @@ static void disable_device(struct ib_device *device) WARN_ON(!refcount_read(&device->refcount)); + roce_flush_gid_cache_wq(); down_write(&devices_rwsem); xa_clear_mark(&devices, device->index, DEVICE_REGISTERED); up_write(&devices_rwsem); diff --git a/drivers/infiniband/core/roce_gid_mgmt.c b/drivers/infiniband/core/roce_gid_mgmt.c index a9f2c6b1b29e..79982d448cd2 100644 --- a/drivers/infiniband/core/roce_gid_mgmt.c +++ b/drivers/infiniband/core/roce_gid_mgmt.c @@ -661,10 +661,7 @@ static int netdevice_queue_work(struct netdev_event_work_cmd *cmds, { unsigned int i; struct netdev_event_work *ndev_work = - kmalloc(sizeof(*ndev_work), GFP_KERNEL); - - if (!ndev_work) - return NOTIFY_DONE; + kmalloc(sizeof(*ndev_work), GFP_KERNEL | __GFP_NOFAIL); memcpy(ndev_work->cmds, cmds, sizeof(ndev_work->cmds)); for (i = 0; i < ARRAY_SIZE(ndev_work->cmds) && ndev_work->cmds[i].cb; i++) { @@ -948,3 +945,8 @@ void __exit roce_gid_mgmt_cleanup(void) */ destroy_workqueue(gid_cache_wq); } + +void roce_flush_gid_cache_wq(void) +{ + flush_workqueue(gid_cache_wq); +} -- 2.47.3