From: Dominique Martinet <asmadeus@codewreck.org>

9p suffered from a corruption in mmap-heavy workloads such as compiling
with clang (e.g. kernel build with `make LLVM=1`, failing with random
errors on mmaped headers and clang segfaults)

David Howells identified a problem in the netfs trace of such a crash,
below:
> We unlocked page ix=00005 before doing the ZERO subreq that clears
> the page tail.  That shouldn't have happened since the collection
> point hasn't reached the end of the folio yet.

netfs_collect_folio: R=00001b55 ix=00003 r=3000-4000 t=3000/5fb2
netfs_folio: i=157f3 ix=00003-00003 read-done
netfs_folio: i=157f3 ix=00003-00003 read-unlock
netfs_collect_folio: R=00001b55 ix=00004 r=4000-5000 t=4000/5fb2
netfs_folio: i=157f3 ix=00004-00004 read-done
netfs_folio: i=157f3 ix=00004-00004 read-unlock
netfs_collect_folio: R=00001b55 ix=00005 r=5000-5fb2 t=5000/5fb2
netfs_folio: i=157f3 ix=00005-00005 read-done
netfs_folio: i=157f3 ix=00005-00005 read-unlock
...
netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=c
netfs_collect_stream: R=00001b55[0:] cto=5fb2 frn=ffffffff
netfs_collect_state: R=00001b55 col=5fb2 cln=6000 n=8
...
netfs_sreq: R=00001b55[2] ZERO SUBMT f=000 s=5fb2 0/4e s=0 e=0
netfs_sreq: R=00001b55[2] ZERO TERM  f=102 s=5fb2 4e/4e s=5 e=0

This patch changes the end condition for collecting folios that have
been read, from "is the IO over?" to "have we read up till the end of
the folio?" (collected_to >= fend), which delays the read-unlock to the
"ZERO CONSUM" step after the page tail has been zeroed

Reported-by: Christian Schoenebeck <linux_oss@crudebyte.com>
Link: https://lkml.kernel.org/r/2332946.iZASKD2KPV@weasel
Fixes: e2d46f2ec332 ("netfs: Change the read result collector to only use one work item")
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
---
A few notes out of order:
- I don't really understand why this actually matter -- sure,
  there'll be junk after the mmap region ends, but why does
  this cause a bug? Since the folio is unlocked the zero fill operation
  ends up zeroing the end of another page? Well, either way there's no
  doubt that the early unlock was a problem...
- I have no idea how to test that this doesn't break something else by
  not unlocking a page that should be unlocked e.g. something that
  wouldn't need zero-filling (partial refresh?) -- I don't *think* we
  have anything like that, but I don't actually know, so pleae don't
  trust me too much here... Nothing I normally test on 9p broke so it's
  probably not too far off though.

This has been broken for about a year so I guess there's no great hurry
at this point, but any extra pair of eyes would be appreciated...

(Christian S, I've tested as much as I could think of, but any extra
validation would be great too if you have time!)
---
 fs/netfs/read_collect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/netfs/read_collect.c b/fs/netfs/read_collect.c
index a95e7aadafd072edbb0f1743e0020eaf65b7d1ed..7a0ffa675fb17aa7431c0fd6cc546deb0be0be22 100644
--- a/fs/netfs/read_collect.c
+++ b/fs/netfs/read_collect.c
@@ -137,7 +137,7 @@ static void netfs_read_unlock_folios(struct netfs_io_request *rreq,
 		rreq->front_folio_order = order;
 		fsize = PAGE_SIZE << order;
 		fpos = folio_pos(folio);
-		fend = umin(fpos + fsize, rreq->i_size);
+		fend = fpos + fsize;
 
 		trace_netfs_collect_folio(rreq, folio, fend, collected_to);
 

---
base-commit: b10e94717f7e295d7ff6ce08249f7e31b630ef1b
change-id: 20251220-netfs_mmap_corruption-62e187ffc28c

Best regards,
-- 
Dominique Martinet <asmadeus@codewreck.org>