The protodown functionality allows user space to turn off the carrier of a net device: # ip link add name dummy1 up type dummy # ip link add name macvlan1 up link dummy1 type macvlan mode bridge # ip link set dev macvlan1 protodown on $ ip -br link show dev macvlan1 macvlan1@dummy1 DOWN 0a:5c:a3:05:c7:86 Different applications can set different protodown reasons, which prevents an application from turning on the carrier of a net device as long as others want it down: # ip link set dev macvlan1 protodown_reason 1 on # ip link set dev macvlan1 protodown_reason 2 on # ip link set dev macvlan1 protodown off Error: Cannot clear protodown, active reasons. # ip link set dev macvlan1 protodown_reason 2 off # ip link set dev macvlan1 protodown off Error: Cannot clear protodown, active reasons. # ip link set dev macvlan1 protodown_reason 1 off # ip link set dev macvlan1 protodown off $ ip -br link show dev macvlan1 macvlan1@dummy1 UP 0a:5c:a3:05:c7:86 Unfortunately, this mechanism is not very useful when the carrier of a net device can be toggled by toggling the carrier of its lower device: # ip link set dev macvlan1 protodown on $ ip -br link show dev macvlan1 macvlan1@dummy1 DOWN 0a:5c:a3:05:c7:86 # ip link set dev dummy1 carrier off # ip link set dev dummy1 carrier on $ ip -br link show dev macvlan1 macvlan1@dummy1 UP 0a:5c:a3:05:c7:86 Obviously, this is not the intended behavior and it is unlikely to be relied on by anyone. In fact, it is a problem for applications like FRR that use protodown with macvlan on top of a bridge as part of Virtual Router Redundancy Protocol (VRRP). Solve this by preventing a net device configured with protodown on from inheriting the operational state of its lower device. Note that READ_ONCE() is not needed as RTNL is held. Output with the patch: # ip link add name dummy1 up type dummy # ip link add name macvlan1 up link dummy1 type macvlan mode bridge # ip link set dev macvlan1 protodown on $ ip -br link show dev macvlan1 macvlan1@dummy1 DOWN 0a:5c:a3:05:c7:86 # ip link set dev dummy1 carrier off # ip link set dev dummy1 carrier on $ ip -br link show dev macvlan1 macvlan1@dummy1 DOWN 0a:5c:a3:05:c7:86 # ip link set dev macvlan1 protodown off $ ip -br link show dev macvlan1 macvlan1@dummy1 UP 0a:5c:a3:05:c7:86 Signed-off-by: Ido Schimmel --- net/core/dev.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/core/dev.c b/net/core/dev.c index 06c195906231..bfb0f297b234 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -11113,6 +11113,9 @@ EXPORT_SYMBOL(netdev_change_features); void netif_stacked_transfer_operstate(const struct net_device *rootdev, struct net_device *dev) { + if (dev->proto_down) + return; + if (rootdev->operstate == IF_OPER_DORMANT) netif_dormant_on(dev); else -- 2.54.0 The protodown functionality allows user space to turn off the carrier of a net device: # ip link add name dummy1 up type dummy # ip link add name macvlan1 up link dummy1 type macvlan mode bridge # ip link set dev macvlan1 protodown on $ ip -br link show dev macvlan1 macvlan1@dummy1 DOWN 0a:5c:a3:05:c7:86 When protodown is turned off, the core unconditionally turns on the carrier of the net device: # ip link set dev macvlan1 protodown off $ ip -br link show dev macvlan1 macvlan1@dummy1 UP 0a:5c:a3:05:c7:86 This is wrong as it means that a macvlan can end up with a carrier when its lower device does not have a carrier: # ip link set dev dummy1 carrier off $ ip -br link show dev macvlan1 macvlan1@dummy1 LOWERLAYERDOWN 0a:5c:a3:05:c7:86 # ip link set dev macvlan1 protodown on # ip link set dev macvlan1 protodown off $ ip -br link show dev macvlan1 macvlan1@dummy1 UP 0a:5c:a3:05:c7:86 Solve this by resolving the linked net device and if one exists, inherit its operational state when protodown is turned off. Otherwise, as before, simply turn on the carrier. Set 'dev->proto_down' before calling netif_stacked_transfer_operstate() as this function is a NOP when protodown is turned on. Resolve the linked net device using a new helper and have it return the device itself (in a similar fashion to dev_get_iflink()) if the device does not implement both ndo_get_iflink() and get_link_net(). If the latter is not implemented, it is unclear in which network namespace we should look up the linked net device. Currently, this helper is only used for net devices that support protodown (macvlan and vxlan) and for both it returns the correct result. Output with the patch: # ip link add name dummy1 up type dummy # ip link add name macvlan1 up link dummy1 type macvlan mode bridge # ip link set dev dummy1 carrier off $ ip -br link show dev macvlan1 macvlan1@dummy1 LOWERLAYERDOWN 0a:5c:a3:05:c7:86 # ip link set dev macvlan1 protodown on # ip link set dev macvlan1 protodown off $ ip -br link show dev macvlan1 macvlan1@dummy1 LOWERLAYERDOWN 0a:5c:a3:05:c7:86 # ip link set dev dummy1 carrier on $ ip -br link show dev macvlan1 macvlan1@dummy1 UP 0a:5c:a3:05:c7:86 # ip link set dev macvlan1 protodown on # ip link set dev macvlan1 protodown off $ ip -br link show dev macvlan1 macvlan1@dummy1 UP 0a:5c:a3:05:c7:86 Signed-off-by: Ido Schimmel --- net/core/dev.c | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index bfb0f297b234..46f8a2efd982 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10141,17 +10141,38 @@ bool netdev_port_same_parent_id(struct net_device *a, struct net_device *b) } EXPORT_SYMBOL(netdev_port_same_parent_id); +static struct net_device *dev_get_iflink_dev(struct net_device *dev) +{ + struct net *net; + + ASSERT_RTNL(); + + if (!dev->netdev_ops->ndo_get_iflink || !dev->rtnl_link_ops || + !dev->rtnl_link_ops->get_link_net) + return dev; + + net = dev->rtnl_link_ops->get_link_net(dev); + return __dev_get_by_index(net, dev_get_iflink(dev)); +} + int netif_change_proto_down(struct net_device *dev, bool proto_down) { + struct net_device *iflink_dev; + if (!dev->change_proto_down) return -EOPNOTSUPP; if (!netif_device_present(dev)) return -ENODEV; + iflink_dev = dev_get_iflink_dev(dev); + if (!iflink_dev) + return -ENODEV; + WRITE_ONCE(dev->proto_down, proto_down); if (proto_down) netif_carrier_off(dev); - else + else if (dev == iflink_dev) netif_carrier_on(dev); - WRITE_ONCE(dev->proto_down, proto_down); + else + netif_stacked_transfer_operstate(iflink_dev, dev); return 0; } -- 2.54.0 Add a selftest for the protodown mechanism. Five test cases are included: 1. Basic protodown toggling: Verify that setting protodown on macvlan results in DOWN operational state and clearing it restores UP. 2. Same as the previous test case, but with vxlan. 3. Protodown reasons: Verify that protodown cannot be cleared while there are active protodown reasons, but can be cleared once all reasons are removed. 4. Operational state inheritance: Verify that toggling the lower device's carrier while protodown is on does not cause the macvlan to inherit the UP operational state. 5. Lower layer down: Verify that toggling protodown while the lower device has no carrier does not cause the macvlan to transition to UP operational state. Note that the last two test cases fail without "net: Do not inherit operational state when protodown is on" and "net: Do not unconditionally turn on carrier when turning off protodown": # ./protodown.sh TEST: Basic protodown on/off with macvlan [ OK ] TEST: Basic protodown on/off with vxlan [ OK ] TEST: Protodown reasons [ OK ] TEST: Inheriting operational state with protodown [FAIL] Macvlan operational state is not DOWN despite protodown TEST: Protodown with lower layer down [FAIL] Macvlan is not LOWERLAYERDOWN after clearing protodown Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Ido Schimmel --- tools/testing/selftests/net/Makefile | 1 + tools/testing/selftests/net/protodown.sh | 182 +++++++++++++++++++++++ 2 files changed, 183 insertions(+) create mode 100755 tools/testing/selftests/net/protodown.sh diff --git a/tools/testing/selftests/net/Makefile b/tools/testing/selftests/net/Makefile index baa30287cf22..c6ff7b504e97 100644 --- a/tools/testing/selftests/net/Makefile +++ b/tools/testing/selftests/net/Makefile @@ -69,6 +69,7 @@ TEST_PROGS := \ nl_netdev.py \ nl_nlctrl.py \ pmtu.sh \ + protodown.sh \ psock_snd.sh \ reuseaddr_ports_exhausted.sh \ reuseport_addr_any.sh \ diff --git a/tools/testing/selftests/net/protodown.sh b/tools/testing/selftests/net/protodown.sh new file mode 100755 index 000000000000..de6ab90c521a --- /dev/null +++ b/tools/testing/selftests/net/protodown.sh @@ -0,0 +1,182 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Test the "protodown" mechanism. Verify basic protodown toggling, protodown +# reasons, operational state inheritance when the lower device carrier changes, +# and correct operational state when the lower device has no carrier. + +# shellcheck disable=SC1091,SC2034,SC2154,SC2317 +source lib.sh + +require_command jq + +ALL_TESTS=" + protodown_basic_macvlan + protodown_basic_vxlan + protodown_reasons + protodown_inherit_operstate + protodown_lower_layer_down +" + +operstate_get() +{ + local ns=$1; shift + local dev=$1; shift + + ip -n "$ns" -j link show dev "$dev" | jq -r '.[].operstate' +} + +operstate_check() +{ + local ns=$1; shift + local dev=$1; shift + local expected=$1; shift + + local current + current=$(operstate_get "$ns" "$dev") + + [ "$current" = "$expected" ] +} + +setup_prepare() +{ + setup_ns NS + defer cleanup_all_ns + + ip -n "$NS" link add name dummy0 up type dummy + + ip -n "$NS" link add name macvlan0 link dummy0 up type macvlan mode bridge + + ip -n "$NS" link add name vxlan0 up type vxlan id 10010 dstport 4789 +} + +protodown_basic() +{ + local dev=$1; shift + + ip -n "$NS" link set dev "$dev" protodown on + check_err $? "Failed to set protodown on" + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" "$dev" DOWN + check_err $? "Operational state is not DOWN after setting protodown" + + ip -n "$NS" link set dev "$dev" protodown off + check_err $? "Failed to set protodown off" + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" "$dev" UP + check_err $? "Operational state is not UP after clearing protodown" +} + +protodown_basic_macvlan() +{ + RET=0 + + protodown_basic macvlan0 + + log_test "Basic protodown on/off with macvlan" +} + +protodown_basic_vxlan() +{ + RET=0 + + protodown_basic vxlan0 + + log_test "Basic protodown on/off with vxlan" +} + +protodown_reasons() +{ + RET=0 + + ip -n "$NS" link set dev macvlan0 protodown on + + ip -n "$NS" link set dev macvlan0 protodown_reason 0 on + check_err $? "Failed to set protodown reason bit 0" + + # Cannot clear protodown while reasons are active. + ip -n "$NS" link set dev macvlan0 protodown off 2>/dev/null + check_fail $? "Clearing protodown succeeded with active reasons" + + ip -n "$NS" link set dev macvlan0 protodown_reason 0 off + check_err $? "Failed to clear protodown reason bit 0" + + # Can clear protodown when no reasons are active. + ip -n "$NS" link set dev macvlan0 protodown off + check_err $? "Failed to clear protodown with no active reasons" + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" macvlan0 UP + check_err $? "Operational state is not UP after clearing protodown" + + log_test "Protodown reasons" +} + +protodown_inherit_operstate() +{ + RET=0 + + ip -n "$NS" link set dev macvlan0 protodown on + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" macvlan0 DOWN + check_err $? "Operational state is not DOWN after setting protodown" + + # Toggle carrier on the lower device. The macvlan should stay DOWN + # because protodown is on. + ip -n "$NS" link set dev dummy0 carrier off + ip -n "$NS" link set dev dummy0 carrier on + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" dummy0 UP + check_err $? "Lower device is not UP after carrier on" + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" macvlan0 DOWN + check_err $? "Macvlan operational state is not DOWN despite protodown" + + # Clear protodown and verify the macvlan comes back up. + ip -n "$NS" link set dev macvlan0 protodown off + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" macvlan0 UP + check_err $? "Operational state is not UP after clearing protodown" + + log_test "Inheriting operational state with protodown" +} + +protodown_lower_layer_down() +{ + RET=0 + + # Bring the lower device carrier down first. + ip -n "$NS" link set dev dummy0 carrier off + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" macvlan0 LOWERLAYERDOWN + check_err $? "Macvlan is not LOWERLAYERDOWN with lower carrier off" + + # Toggle protodown on and off while lower has no carrier. The macvlan + # should not transition to UP. + ip -n "$NS" link set dev macvlan0 protodown on + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" macvlan0 LOWERLAYERDOWN + check_err $? "Macvlan is not LOWERLAYERDOWN after setting protodown" + + ip -n "$NS" link set dev macvlan0 protodown off + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" macvlan0 LOWERLAYERDOWN + check_err $? "Macvlan is not LOWERLAYERDOWN after clearing protodown" + + # Bring the lower device carrier up. The macvlan should transition to + # UP. + ip -n "$NS" link set dev dummy0 carrier on + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" dummy0 UP + check_err $? "Lower device is not UP after carrier on" + + busywait "$BUSYWAIT_TIMEOUT" operstate_check "$NS" macvlan0 UP + check_err $? "Macvlan is not UP after lower device is UP" + + log_test "Protodown with lower layer down" +} + +trap defer_scopes_cleanup EXIT +setup_prepare +tests_run + +exit "$EXIT_STATUS" -- 2.54.0