From: Zizhi Wo When blkg_create() fails before the blkg is linked onto blkcg->blkg_list and q->blkg_list (e.g. radix_tree_insert() fails or the blkg_lookup() returns NULL), the blkg is freed asynchronously via blkg_free_workfn(). Since such a blkg was never linked, it is invisible to blkcg_deactivate_policy(), so its blkg->pd[] entries can not be cleared in it. blkg_free_workfn() then calls blkcg_policy->pd_free_fn() on them, which can race with bfq module exit (bfq_exit() -> blkcg_policy_unregister()) clearing the blkcg_policy[] slot, leading to a NULL pointer dereference: [ 72.597786] KASAN: null-ptr-deref in range [0x0000000000000048-0x000000000000004f] [ 72.598690] CPU: 35 UID: 0 PID: 458 Comm: kworker/35:1 Not tainted 7.1.0+ #33 PREEMPT(full) [ 72.599518] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.17.0-4.fc41 04/01/2014 [ 72.600342] Workqueue: events blkg_free_workfn [ 72.600991] RIP: 0010:blkg_free_workfn+0x115/0x3d0 ...... [ 72.613278] Call Trace: [ 72.613988] [ 72.614357] process_one_work+0x6b4/0xff0 [ 72.615251] ? __pfx_blkg_free_workfn+0x10/0x10 [ 72.616041] ? assign_work+0x131/0x3f0 [ 72.616962] worker_thread+0x4eb/0xd50 [ 72.617599] ? __kthread_parkme+0x8d/0x170 [ 72.618565] ? __pfx_worker_thread+0x10/0x10 [ 72.619566] ? __pfx_worker_thread+0x10/0x10 [ 72.620213] kthread+0x327/0x410 ...... Fix this by introducing blkg_free_pd() to synchronously free the pd and clear blkg->pd[] in the blkg_create() error path, while the blkcg_policy is still valid. Signed-off-by: Zizhi Wo --- block/blk-cgroup.c | 30 ++++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 6386fe413994..673886d71c26 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -109,28 +109,37 @@ static struct cgroup_subsys_state *blkcg_css(void) if (css) return css; return task_css(current, io_cgrp_id); } +static void blkg_free_pd(struct blkcg_gq *blkg) +{ + int i; + + for (i = 0; i < BLKCG_MAX_POLS; i++) { + if (blkg->pd[i]) { + blkcg_policy[i]->pd_free_fn(blkg->pd[i]); + blkg->pd[i] = NULL; + } + } +} + static void blkg_free_workfn(struct work_struct *work) { struct blkcg_gq *blkg = container_of(work, struct blkcg_gq, free_work); struct request_queue *q = blkg->q; - int i; /* * pd_free_fn() can also be called from blkcg_deactivate_policy(), * in order to make sure pd_free_fn() is called in order, the deletion * of the list blkg->q_node is delayed to here from blkg_destroy(), and * blkcg_mutex is used to synchronize blkg_free_workfn() and * blkcg_deactivate_policy(). */ mutex_lock(&q->blkcg_mutex); - for (i = 0; i < BLKCG_MAX_POLS; i++) - if (blkg->pd[i]) - blkcg_policy[i]->pd_free_fn(blkg->pd[i]); + blkg_free_pd(blkg); if (blkg->parent) blkg_put(blkg->parent); spin_lock_irq(&q->queue_lock); list_del_init(&blkg->q_node); spin_unlock_irq(&q->queue_lock); @@ -436,19 +445,28 @@ static struct blkcg_gq *blkg_create(struct blkcg *blkcg, struct gendisk *disk, spin_unlock(&blkcg->lock); if (!ret) return blkg; - /* @blkg failed fully initialized, use the usual release path */ + /* + * @blkg failed fully initialized and never linked, so its pd[] is + * invisible to blkcg_deactivate_policy(). Free pd[] synchronously + * while blkcg_policy[] is still valid, otherwise the async free path + * may call pd_free_fn() after the policy is unregistered (e.g. rmmod bfq). + * The err_free_blkg path below frees pd[] for the same reason. + */ + blkg_free_pd(blkg); percpu_ref_kill(&blkg->refcnt); return ERR_PTR(ret); err_put_css: css_put(&blkcg->css); err_free_blkg: - if (new_blkg) + if (new_blkg) { + blkg_free_pd(new_blkg); blkg_free(new_blkg); + } return ERR_PTR(ret); } /** * blkg_lookup_create - lookup blkg, try to create one if not there -- 2.52.0