The limit of the small queue check is calculated from the pacing rate, the pacing rate is calculated from the cwnd. If the cwnd is small, the small queue check may fail. When the samll queue check fails, the tcp layer will send less packages, then the tcp_is_cwnd_limited would alreays return false, then the cwnd would have no chance to get updated. The cwnd has no chance to get updated, it keeps small, then the pacing rate keeps small, and the limit of the small queue check keeps small, then the small queue check would always fail. It is a kind of dead lock, when a tcp flow comes into this situation, it's throughput would be very small, obviously less then the correct throughput it should have. We set is_cwnd_limited to true when the small queue check fails, then the cwnd would have a chance to get updated, then we can break this deadlock. Below ss output shows this issue: skmem:(r0,rb131072, t7712, <------------------------------ wmem_alloc = 7712 tb243712,f2128,w219056,o0,bl0,d0) ts sack cubic wscale:7,10 rto:224 rtt:23.364/0.019 ato:40 mss:1448 pmtu:8500 rcvmss:536 advmss:8448 cwnd:28 <------------------------------ cwnd=28 bytes_sent:2166208 bytes_acked:2148832 bytes_received:37 segs_out:1497 segs_in:751 data_segs_out:1496 data_segs_in:1 send 13882554bps lastsnd:7 lastrcv:2992 lastack:7 pacing_rate 27764216bps <--------------------- pacing_rate=27764216bps delivery_rate 5786688bps delivered:1485 busy:2991ms unacked:12 rcv_space:57088 rcv_ssthresh:57088 notsent:188240 minrtt:23.319 snd_wnd:57088 limit=(27764216 / 8) / 1024 = 3389 < 7712 So the samll queue check fails. When it happens, the throughput is obviously less than the normal situation. By setting the tcp_is_cwnd_limited to true when the small queue check failed, we can avoid this issue, the cwnd could increase to a reasonalbe size, in my test environment, it is about 4000. Then the small queue check won't fail. Signed-off-by: Peng Yu --- net/ipv4/tcp_output.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index b94efb3050d2..8c70acf3a060 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2985,8 +2985,10 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, unlikely(tso_fragment(sk, skb, limit, mss_now, gfp))) break; - if (tcp_small_queue_check(sk, skb, 0)) + if (tcp_small_queue_check(sk, skb, 0)) { + is_cwnd_limited = true; break; + } /* Argh, we hit an empty skb(), presumably a thread * is sleeping in sendmsg()/sk_stream_wait_memory(). -- 2.47.3