From: Geliang Tang Currently the ADD_ADDR option is retransmitted with a fixed timeout. This patch makes the retransmission timeout adaptive by using the maximum RTO among all the subflows, while still capping it at the configured maximum value (add_addr_timeout_max). This improves responsiveness when establishing new subflows. Specifically: 1. Adds mptcp_adjust_add_addr_timeout() helper to compute the adaptive timeout. 2. Uses maximum subflow RTO (icsk_rto) when available. 3. Applies exponential backoff based on retransmission count. 4. Maintains fallback to configured max timeout when no RTO data exists. This slightly changes the behaviour of the MPTCP "add_addr_timeout" sysctl knob to be used as a maximum instead of a fixed value. But this is seen as an improvement: the ADD_ADDR might be sent quicker than before to improve the overall MPTCP connection. Also, the default value is set to 2 min, which was already way too long, and caused the ADD_ADDR not to be retransmitted for connections shorter than 2 minutes. Suggested-by: Matthieu Baerts (NGI0) Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/576 Reviewed-by: Christoph Paasch Signed-off-by: Geliang Tang Reviewed-by: Matthieu Baerts (NGI0) Signed-off-by: Matthieu Baerts (NGI0) --- v2: no changes. --- Documentation/networking/mptcp-sysctl.rst | 8 +++++--- net/mptcp/pm.c | 28 ++++++++++++++++++++++++---- 2 files changed, 29 insertions(+), 7 deletions(-) diff --git a/Documentation/networking/mptcp-sysctl.rst b/Documentation/networking/mptcp-sysctl.rst index 1683c139821e3ba6d9eaa3c59330a523d29f1164..1eb6af26b4a7acdedd575a126c576210a78f0d4d 100644 --- a/Documentation/networking/mptcp-sysctl.rst +++ b/Documentation/networking/mptcp-sysctl.rst @@ -8,9 +8,11 @@ MPTCP Sysfs variables =============================== add_addr_timeout - INTEGER (seconds) - Set the timeout after which an ADD_ADDR control message will be - resent to an MPTCP peer that has not acknowledged a previous - ADD_ADDR message. + Set the maximum value of timeout after which an ADD_ADDR control message + will be resent to an MPTCP peer that has not acknowledged a previous + ADD_ADDR message. A dynamically estimated retransmission timeout based + on the estimated connection round-trip-time is used if this value is + lower than the maximum one. Do not retransmit if set to 0. diff --git a/net/mptcp/pm.c b/net/mptcp/pm.c index 136a380602cae872b76560649c924330e5f42533..204e1f61212e2be77a8476f024b59be67d04b80a 100644 --- a/net/mptcp/pm.c +++ b/net/mptcp/pm.c @@ -268,6 +268,27 @@ int mptcp_pm_mp_prio_send_ack(struct mptcp_sock *msk, return -EINVAL; } +static unsigned int mptcp_adjust_add_addr_timeout(struct mptcp_sock *msk) +{ + const struct net *net = sock_net((struct sock *)msk); + unsigned int rto = mptcp_get_add_addr_timeout(net); + struct mptcp_subflow_context *subflow; + unsigned int max = 0; + + mptcp_for_each_subflow(msk, subflow) { + struct sock *ssk = mptcp_subflow_tcp_sock(subflow); + struct inet_connection_sock *icsk = inet_csk(ssk); + + if (icsk->icsk_rto > max) + max = icsk->icsk_rto; + } + + if (max && max < rto) + rto = max; + + return rto; +} + static void mptcp_pm_add_timer(struct timer_list *timer) { struct mptcp_pm_add_entry *entry = timer_container_of(entry, timer, @@ -292,7 +313,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer) goto out; } - timeout = mptcp_get_add_addr_timeout(sock_net(sk)); + timeout = mptcp_adjust_add_addr_timeout(msk); if (!timeout) goto out; @@ -307,7 +328,7 @@ static void mptcp_pm_add_timer(struct timer_list *timer) if (entry->retrans_times < ADD_ADDR_RETRANS_MAX) sk_reset_timer(sk, timer, - jiffies + timeout); + jiffies + (timeout << entry->retrans_times)); spin_unlock_bh(&msk->pm.lock); @@ -348,7 +369,6 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk, { struct mptcp_pm_add_entry *add_entry = NULL; struct sock *sk = (struct sock *)msk; - struct net *net = sock_net(sk); unsigned int timeout; lockdep_assert_held(&msk->pm.lock); @@ -374,7 +394,7 @@ bool mptcp_pm_alloc_anno_list(struct mptcp_sock *msk, timer_setup(&add_entry->add_timer, mptcp_pm_add_timer, 0); reset_timer: - timeout = mptcp_get_add_addr_timeout(net); + timeout = mptcp_adjust_add_addr_timeout(msk); if (timeout) sk_reset_timer(sk, &add_entry->add_timer, jiffies + timeout); -- 2.51.0 ADD_ADDR can be retransmitted, and with, the parent commit, these retransmissions can be sent quicker: from 2 minutes to less than one second. To avoid false positives where retransmitted ADD_ADDR causes higher counters than expected, it is required to be more tolerant. Errors are now only reported when fewer ADD_ADDRs have been sent/received, except if no ADD_ADDR are expected. Before the parent commit, the tolerance was present for each tests where the ADD_ADDR could be retransmitted in a reasonable time (1 sec). Now that all tests can have retransmitted ADD_ADDR, it is normal to apply the same tolerance for all tests. An alternative could be to disable the ADD_ADDR retransmissions by default, but that's changing the default kernel behaviour. Plus, ADD_ADDR retransmissions can be required for some tests. To avoid adding exceptions to many tests, it seems better to increase the tolerance. Later, we could add a new MIB counter to identify the ADD_ADDR retransmissions, and remove the tolerance when this counter is available. Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index 2f046167a0b6cc6fb5531a033d8d95c9ea399cf9..e9e11a9e60fd5374c8a98c3b7159ccbca8053030 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -358,6 +358,7 @@ reset_with_add_addr_timeout() tables="${ip6tables}" fi + # set a maximum, to avoid too long timeout with exponential backoff ip netns exec $ns1 sysctl -q net.mptcp.add_addr_timeout=1 if ! ip netns exec $ns2 $tables -A OUTPUT -p tcp \ @@ -1669,7 +1670,6 @@ chk_add_nr() local tx="" local rx="" local count - local timeout if [[ $ns_invert = "invert" ]]; then ns_tx=$ns2 @@ -1678,15 +1678,13 @@ chk_add_nr() rx=" server" fi - timeout=$(ip netns exec ${ns_tx} sysctl -n net.mptcp.add_addr_timeout) - print_check "add addr rx${rx}" count=$(mptcp_lib_get_counter ${ns_rx} "MPTcpExtAddAddr") if [ -z "$count" ]; then print_skip - # if the test configured a short timeout tolerate greater then expected - # add addrs options, due to retransmissions - elif [ "$count" != "$add_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_nr" ]; }; then + # Tolerate more ADD_ADDR then expected (if any), due to retransmissions + elif [ "$count" != "$add_nr" ] && + { [ "$add_nr" -eq 0 ] || [ "$count" -lt "$add_nr" ]; }; then fail_test "got $count ADD_ADDR[s] expected $add_nr" else print_ok @@ -1774,18 +1772,15 @@ chk_add_tx_nr() { local add_tx_nr=$1 local echo_tx_nr=$2 - local timeout local count - timeout=$(ip netns exec $ns1 sysctl -n net.mptcp.add_addr_timeout) - print_check "add addr tx" count=$(mptcp_lib_get_counter ${ns1} "MPTcpExtAddAddrTx") if [ -z "$count" ]; then print_skip - # if the test configured a short timeout tolerate greater then expected - # add addrs options, due to retransmissions - elif [ "$count" != "$add_tx_nr" ] && { [ "$timeout" -gt 1 ] || [ "$count" -lt "$add_tx_nr" ]; }; then + # Tolerate more ADD_ADDR then expected (if any), due to retransmissions + elif [ "$count" != "$add_tx_nr" ] && + { [ "$add_tx_nr" -eq 0 ] || [ "$count" -lt "$add_tx_nr" ]; }; then fail_test "got $count ADD_ADDR[s] TX, expected $add_tx_nr" else print_ok -- 2.51.0 When many ADD_ADDR need to be sent, it can take some time to send each of them, and create new subflows. Some CIs seem to occasionally have issues with these tests, especially with "debug" kernels. Two subtests will now run for a slightly longer time: the last two where 3 or more ADD_ADDR are sent during the test. Reviewed-by: Geliang Tang Signed-off-by: Matthieu Baerts (NGI0) --- tools/testing/selftests/net/mptcp/mptcp_join.sh | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/net/mptcp/mptcp_join.sh b/tools/testing/selftests/net/mptcp/mptcp_join.sh index e9e11a9e60fd5374c8a98c3b7159ccbca8053030..b41cebfa1f921ce9ea6a88a908bf6d5e6027b367 100755 --- a/tools/testing/selftests/net/mptcp/mptcp_join.sh +++ b/tools/testing/selftests/net/mptcp/mptcp_join.sh @@ -2268,7 +2268,8 @@ signal_address_tests() pm_nl_add_endpoint $ns1 10.0.3.1 flags signal pm_nl_add_endpoint $ns1 10.0.4.1 flags signal pm_nl_set_limits $ns2 3 3 - run_tests $ns1 $ns2 10.0.1.1 + speed=slow \ + run_tests $ns1 $ns2 10.0.1.1 chk_join_nr 3 3 3 chk_add_nr 3 3 fi @@ -2280,7 +2281,8 @@ signal_address_tests() pm_nl_add_endpoint $ns1 10.0.3.1 flags signal pm_nl_add_endpoint $ns1 10.0.14.1 flags signal pm_nl_set_limits $ns2 3 3 - run_tests $ns1 $ns2 10.0.1.1 + speed=slow \ + run_tests $ns1 $ns2 10.0.1.1 join_syn_tx=3 \ chk_join_nr 1 1 1 chk_add_nr 3 3 -- 2.51.0