tcp_ooo_rcv_mss.pkt cares about the OOO SACK state and the resulting tcpi_rcv_mss update. Its exact advertised receive-window value is incidental to that test and can legitimately move when unrelated rwnd accounting changes adjust the ACK window. Drop the hard-coded win 81 checks and keep only the ACK/SACK shape and the tcpi_rcv_mss assertion. Signed-off-by: Wesley Atwell --- tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt | 8 +++++--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt b/tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt index 7e6bc5fb0c8d78f36dc3d18842ff11d938c4e41b..0b19de9f9307b3d0ee579bc9e3a2b9219a88cd8a 100644 --- a/tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt +++ b/tools/testing/selftests/net/packetdrill/tcp_ooo_rcv_mss.pkt @@ -17,11 +17,13 @@ sysctl -q net.ipv4.tcp_rmem="4096 131072 $((32*1024*1024))"` +0 accept(3, ..., ...) = 4 +0 < . 2001:11001(9000) ack 1 win 257 - +0 > . 1:1(0) ack 1 win 81 +// This test cares about the OOO SACK state and the resulting tcpi_rcv_mss. +// Keep the ACK/SACK shape exact, but do not pin the precise advertised +// receive window here because unrelated rwnd accounting changes can adjust it. + +0 > . 1:1(0) ack 1 // check that ooo packet properly updates tcpi_rcv_mss +0 %{ assert tcpi_rcv_mss == 1000, tcpi_rcv_mss }% +0 < . 11001:21001(10000) ack 1 win 257 - +0 > . 1:1(0) ack 1 win 81 - + +0 > . 1:1(0) ack 1 -- 2.43.0 In the scaled no-shrink path, __tcp_select_window() currently rounds the raw free-space value up to the receive-window scale quantum. That can expose fresh sender-visible credit beyond the currently backed free space. Fix this without changing the meaning of the stored receive-window state. Keep tp->rcv_wnd representable in scaled units by rounding larger windows down to the scale quantum and preserving only the small non-zero case that would otherwise scale away to zero. tcp_select_window() already preserves the no-shrink guarantee from the currently offered window, so later no-shrink decisions continue to reason from a right edge the peer actually saw on the wire. This removes the larger-window quantization slack from rounding free_space up, while preserving the small non-zero case needed to avoid scaling away to zero. Signed-off-by: Wesley Atwell --- net/ipv4/tcp_output.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 35c3b0ab5a0cb714155d5720fe56888f71aecced..bd3a43148a87e891bc632a47ffb5b82c475e8f6f 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -3375,13 +3375,19 @@ u32 __tcp_select_window(struct sock *sk) * scaled window will not line up with the MSS boundary anyway. */ if (tp->rx_opt.rcv_wscale) { - window = free_space; + u32 gran = 1U << tp->rx_opt.rcv_wscale; - /* Advertise enough space so that it won't get scaled away. - * Import case: prevent zero window announcement if - * 1< mss. + /* Keep tp->rcv_wnd representable in scaled units so later + * no-shrink decisions reason about the same right edge we + * can advertise on the wire. Preserve only a small non-zero + * offer that would otherwise get scaled away to zero. */ - window = ALIGN(window, (1 << tp->rx_opt.rcv_wscale)); + if (free_space >= gran) + window = round_down(free_space, gran); + else if (free_space > 0) + window = gran; + else + window = 0; } else { window = tp->rcv_wnd; /* Get the largest window that is a nice multiple of mss. -- 2.43.0 Add a packetdrill reproducer for the scaled no-shrink quantization case. The sequence leaves slightly more than 84 scaled units of backed credit after one skb is drained. The buggy ALIGN() path rounds that up and exposes a fresh extra unit, so the wire-visible window becomes 85. Then queue a tiny OOO skb so the next ACK re-runs the no-shrink path after a small receive-memory change without advancing rcv_nxt. With the fix in place, both ACKs keep the sender-visible window at 84. This provides fail-before/pass-after coverage for both the immediate quantization bug and the follow-on ACK transition that reuses the stored window state. Signed-off-by: Wesley Atwell --- .../packetdrill/tcp_rcv_quantization_credit.pkt | 45 ++++++++++++++++++++++ 1 file changed, 45 insertions(+) create mode 100644 tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt diff --git a/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt b/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt new file mode 100644 index 0000000000000000000000000000000000000000..8ea96281b601f2d161cfd84967cad91cedb03151 --- /dev/null +++ b/tools/testing/selftests/net/packetdrill/tcp_rcv_quantization_credit.pkt @@ -0,0 +1,45 @@ +// SPDX-License-Identifier: GPL-2.0 + +--ip_version=ipv4 +--mss=1000 + +`./defaults.sh +sysctl -q net.ipv4.tcp_moderate_rcvbuf=0 +sysctl -q net.ipv4.tcp_shrink_window=0 +sysctl -q net.ipv4.tcp_rmem="4096 131072 $((32*1024*1024))"` + +// Exercise the scaled no-shrink path in __tcp_select_window(). +// The sequence below leaves slightly more than 84 scaled units of backed +// credit after one skb is drained. The buggy ALIGN() path rounds that up and +// exposes a fresh extra unit; the fixed path keeps the sender-visible window +// at 84. Then queue a tiny OOO skb so the next ACK re-runs the no-shrink +// path after a small receive-memory change without advancing rcv_nxt. + +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3 + +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0 + +0 bind(3, ..., ...) = 0 + +0 listen(3, 1) = 0 + + +0 < S 0:0(0) win 32792 + +0 > S. 0:0(0) ack 1 + +0 < . 1:1(0) ack 1 win 257 + + +0 accept(3, ..., ...) = 4 + + +0 < P. 1:10001(10000) ack 1 win 257 + * > . 1:1(0) ack 10001 + + +0 < P. 10001:11024(1023) ack 1 win 257 + * > . 1:1(0) ack 11024 + +// Free one skb, then force an outbound packet so the current advertised +// window is observable both on the wire and via TCP_INFO. + +0 read(4, ..., 10000) = 10000 + +0 write(4, ..., 1) = 1 + * > P. 1:2(1) ack 11024 win 84 + +0 %{ assert (tcpi_rcv_wnd >> 10) == 84, tcpi_rcv_wnd }% + +// Queue a tiny OOO skb. This should not create fresh sender-visible credit +// on the next ACK after the first post-drain window update. + +0 < P. 12024:12025(1) ack 2 win 257 + * > . 2:2(0) ack 11024 win 84 + +0 %{ assert (tcpi_rcv_wnd >> 10) == 84, tcpi_rcv_wnd }% -- 2.43.0