Jan Kara reports that commit bc264fea0f6f subtly changed error handling behavior in iomap_iter() in the case where both iter.status and ->iomap_end() return error codes. Previously, iter.status had priority and would return to the caller regardless of the ->iomap_end() result. After the change, an ->iomap_end() error returns immediately. This had the unexpected side effect of enabling a DIO fallback to buffered write on ext4 because ->iomap_end() could return -ENOTBLK and overload an -EINVAL error from the core iomap direct I/O code. This has been fixed independently in ext4, but nonetheless the change in iomap was unintentional. Since other filesystems may use this in similar ways, restore long standing behavior and always return the value of iter.status if it happens to contain an error code. Fixes: bc264fea0f6f ("iomap: support incremental iomap_iter advances") Diagnosed-by: Jan Kara Signed-off-by: Brian Foster Reviewed-by: Christoph Hellwig Reviewed-by: Jan Kara Reviewed-by: Ritesh Harjani (IBM) --- fs/iomap/iter.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c index cef77ca0c20b..7cc4599b9c9b 100644 --- a/fs/iomap/iter.c +++ b/fs/iomap/iter.c @@ -80,7 +80,7 @@ int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops) iomap_length_trim(iter, iter->iter_start_pos, olen), advanced, iter->flags, &iter->iomap); - if (ret < 0 && !advanced) + if (ret < 0 && !advanced && !iter->status) return ret; } -- 2.51.0 An iomap op iteration should not be considered successful if ->iomap_end() fails. Most ->iomap_end() callbacks do not return errors, and for those that do we return the error to the caller, but this is still not sufficient in some corner cases. For example, if a DAX write to a shared iomap fails at ->iomap_end() on XFS, this means the remap of shared blocks from the COW fork to the data fork has possibly failed. In turn this means that just written data may not be accessible in the file. dax_iomap_rw() returns partial success over a returned error code and the operation has already advanced iter.pos by the time ->iomap_end() is called. This means that dax_iomap_rw() can return more bytes processed than have been completed successfully, including partial success instead of an error code if the first iteration happens to fail. To address this problem, first tweak the ->iomap_end() error handling logic to run regardless of whether the current iteration advanced the iter. Next, revert pos in the error handling path. Add a new helper to undo the changes from iomap_iter_advance(). It is static to start since the only initial user is in iomap_iter.c. Signed-off-by: Brian Foster Reviewed-by: Jan Kara --- fs/iomap/iter.c | 20 +++++++++++++++++++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/fs/iomap/iter.c b/fs/iomap/iter.c index 7cc4599b9c9b..69c993fe51fa 100644 --- a/fs/iomap/iter.c +++ b/fs/iomap/iter.c @@ -27,6 +27,22 @@ int iomap_iter_advance(struct iomap_iter *iter, u64 *count) return 0; } +/** + * iomap_iter_revert - revert the iterator position + * @iter: iteration structure + * @count: number of bytes to revert + * + * Revert the iterator position by the specified number of bytes, undoing + * the effect of a previous iomap_iter_advance() call. The count must not + * exceed the amount previously advanced in the current iter. + */ +static void iomap_iter_revert(struct iomap_iter *iter, u64 count) +{ + count = min_t(u64, iter->pos - iter->iter_start_pos, count); + iter->pos -= count; + iter->len += count; +} + static inline void iomap_iter_done(struct iomap_iter *iter) { WARN_ON_ONCE(iter->iomap.offset > iter->pos); @@ -80,8 +96,10 @@ int iomap_iter(struct iomap_iter *iter, const struct iomap_ops *ops) iomap_length_trim(iter, iter->iter_start_pos, olen), advanced, iter->flags, &iter->iomap); - if (ret < 0 && !advanced && !iter->status) + if (ret < 0 && !iter->status) { + iomap_iter_revert(iter, advanced); return ret; + } } /* detect old return semantics where this would advance */ -- 2.51.0