For most netlink replies and notifications the expected behavior is: - if NSID is not reported, then the device is local to the querier. - if NSID is reported, then the device is remote, i.e., located in the provided namespace that is not the same as the querier's. Userspace applications like ovs-vswitchd expect this behavior. And ip monitor uses this logic for printing out [nsid current] vs [nsid N]. But this doesn't work for link nsid in cross-namespace RTM_GETLINK requests. For some reason the code checks if the original device and the link are in the same namespace and not if the querier's namespace is the same as the link's. So the logic becomes: - if NSID is not reported, then the link is in the same namespace as the queried device. - if NSID is reported, then the link is not in the same namespace with the queried device. If the link is in the same namespace as the querier, the code will allocate a self-referential nsid for the querier's namespace and report it as a link nsid. This is problematic because: 1. Application doesn't expect to see nsid reported for its own namespace. 2. Application can't know if this nsid is the nsid of the current namespace without making extra requests. So a lot of extra logic is needed to understand if the link is local or not. 3. Implicit allocation of self-referential nsid for the current namespace affects notifications on sockets listening on all namespaces, since this nsid is now reported in every notification. And so those notification handlers also now need extra logic to understand which namespace the events are coming from. 4. A seemingly read-only RTM_GETLINK request for a different namespace allocates a self-referential nsid for the current namespace, which is a little unexpected. Let's fix that by applying the same rules to cross-namespace requests as for standard ones, which is: - Report NSID if it is different from the querier's namespace. This changes two things: 1. If both the device and the link are in the same namespace which is not the querier's namespace, the LINK_NSID will now be reported. This just gives more info and the user can check if the reported id is the same as TARGET_NSID, which is in the same message. 2. If the link is in the same namespace as the querier, but the device isn't, the LINK_NSID will no longer be reported. This is the main change, as previously this would mean that the link is in the TARGET namespace, but now it will mean that the link is in the SOURCE namespace. There are no changes in logic for queries that are not cross-namespace queries. A research across open-source projects doesn't show any projects that rely on the things that are being changed. I couldn't find any project that uses the reported LINK_NSID with cross-namespace requests. And no projects that use cross-namespace requests seem to even parse the reported LINK_NSID. Of course, that doesn't mean there are no such applications, but the current behavior feels like a logical bug that IMO should be fixed, otherwise it's hard to use all-nsid sockets properly. Note that the logic for notifications in rtmsg_ifinfo_build_skb() remains the same as those are formatted from the perspective of the namespace where event occurred, i.e., they are always "local", but then distributed to sockets listening on all NSIDs with extra metadata pointing out the original namespace. And users need to translate the reported NSIDs to their reference point. While RTM_GETLINK always reports NSID from the querier's reference point. Hence the reason it should not be reported if it is the same. Fixes: 79e1ad148c84 ("rtnetlink: use netnsid to query interface") Reported-by: Matteo Perin Signed-off-by: Ilya Maximets --- net/core/rtnetlink.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index df042da422ef3..0d539b8e4bf61 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1881,7 +1881,7 @@ static int rtnl_fill_link_netnsid(struct sk_buff *skb, if (dev->rtnl_link_ops && dev->rtnl_link_ops->get_link_net) { struct net *link_net = dev->rtnl_link_ops->get_link_net(dev); - if (!net_eq(dev_net(dev), link_net)) { + if (!net_eq(src_net, link_net)) { int id = peernet2id_alloc(src_net, link_net, gfp); if (nla_put_s32(skb, IFLA_LINK_NETNSID, id)) -- 2.53.0