When elevator_change() is called concurrently for the same queue, the
elevator_change_done() function runs concurrently as well. This function
adds or deletes kobjects for the debugfs entry of the queue. Then the
concurrent calls cause memory corruption of the kobjects and result in a
process hang. The core part of the elevator switch is protected by queue
freeze and q->elevator_lock. However, since the commit 559dc11143eb
("block: move elv_register[unregister]_queue out of elevator_lock"), the
elevator_change_done() is not serialized. Hence the memory corruption
and the hang.

The failures are observed when udev-worker writes to a sysfs
queue/scheduler attribute file while the blktests test case block/005
writes to the same attribute file. The failure also can be recreated by
running two processes that write to the same queue/scheduler file
concurrently. The failure is observed since another commit 370ac285f23a
("block: avoid cpu_hotplug_lock depedency on freeze_lock"). This commit
changed the behavior of queue freeze and it unveiled the failure.

Fix the failure by changing elv_iosched_store() to acquire
update_nr_hwq_lock as the writer lock instead of the reader lock. This
serializes the whole elevator switch steps, including the
elevator_change_done() call.

Fixes: 559dc11143eb ("block: move elv_register[unregister]_queue out of elevator_lock")
Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
---
I observed that the blktests test case block/005 hung on a specific
server hardware using a specific HDD as a block device. During the test
case run, the kernel reported KASAN null-ptr-deref and slab-use-after-
free errors. The failure happened when a sysfs queue/scheduler attribute
file is written concurrently. I reported the failure and shared a
candidate fix patch as RFC [1]. Based on the comments and discussion on
the RFC patch, I propose this v2 patch that avoids introducing a new
lock. My thanks go to Ming and Nilay for the discussion.

Please refer to [1] for details of the failure. Also, I created a
blktests test case that recreates the hang [2], which I used to test the
fix.

* Changes from RFC v1
- Instead of adding a new mutex to struct request_queue, replace the
  reader lock on update_nr_hwq_lock with the writer lock in
  elv_iosched_store().

[1] https://lore.kernel.org/linux-block/20260611074200.474676-1-shinichiro.kawasaki@wdc.com/
[2] https://github.com/kawasaki/blktests/commit/8e80b3ccc0bbbe3f209d00eacd138d020de97fc6

 block/elevator.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/block/elevator.c b/block/elevator.c
index 3bcd37c2aa34..b03185a217ff 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -813,7 +813,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	 *   update_nr_hwq_lock -> kn->active (via del_gendisk -> kobject_del)
 	 *   kn->active -> update_nr_hwq_lock (via this sysfs write path)
 	 */
-	if (!down_read_trylock(&set->update_nr_hwq_lock)) {
+	if (!down_write_trylock(&set->update_nr_hwq_lock)) {
 		ret = -EBUSY;
 		goto out;
 	}
@@ -824,7 +824,7 @@ ssize_t elv_iosched_store(struct gendisk *disk, const char *buf,
 	} else {
 		ret = -ENOENT;
 	}
-	up_read(&set->update_nr_hwq_lock);
+	up_write(&set->update_nr_hwq_lock);
 
 out:
 	if (ctx.type)
-- 
2.54.0