Willem de Bruijn wrote: > Simon Schippers wrote: >> Stephen Hemminger wrote: >>> On Tue, 12 Aug 2025 00:03:48 +0200 >>> Simon Schippers wrote: >>> >>>> This patch is the result of our paper with the title "The NODROP Patch: >>>> Hardening Secure Networking for Real-time Teleoperation by Preventing >>>> Packet Drops in the Linux TUN Driver" [1]. >>>> It deals with the tun_net_xmit function which drops SKB's with the reason >>>> SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full, >>>> resulting in reduced TCP performance and packet loss for bursty video >>>> streams when used over VPN's. >>>> >>>> The abstract reads as follows: >>>> "Throughput-critical teleoperation requires robust and low-latency >>>> communication to ensure safety and performance. Often, these kinds of >>>> applications are implemented in Linux-based operating systems and transmit >>>> over virtual private networks, which ensure encryption and ease of use by >>>> providing a dedicated tunneling interface (TUN) to user space >>>> applications. In this work, we identified a specific behavior in the Linux >>>> TUN driver, which results in significant performance degradation due to >>>> the sender stack silently dropping packets. This design issue drastically >>>> impacts real-time video streaming, inducing up to 29 % packet loss with >>>> noticeable video artifacts when the internal queue of the TUN driver is >>>> reduced to 25 packets to minimize latency. Furthermore, a small queue >>>> length also drastically reduces the throughput of TCP traffic due to many >>>> retransmissions. Instead, with our open-source NODROP Patch, we propose >>>> generating backpressure in case of burst traffic or network congestion. >>>> The patch effectively addresses the packet-dropping behavior, hardening >>>> real-time video streaming and improving TCP throughput by 36 % in high >>>> latency scenarios." >>>> >>>> In addition to the mentioned performance and latency improvements for VPN >>>> applications, this patch also allows the proper usage of qdisc's. For >>>> example a fq_codel can not control the queuing delay when packets are >>>> already dropped in the TUN driver. This issue is also described in [2]. >>>> >>>> The performance evaluation of the paper (see Fig. 4) showed a 4% >>>> performance hit for a single queue TUN with the default TUN queue size of >>>> 500 packets. However it is important to notice that with the proposed >>>> patch no packet drop ever occurred even with a TUN queue size of 1 packet. >>>> The utilized validation pipeline is available under [3]. >>>> >>>> As the reduction of the TUN queue to a size of down to 5 packets showed no >>>> further performance hit in the paper, a reduction of the default TUN queue >>>> size might be desirable accompanying this patch. A reduction would >>>> obviously reduce buffer bloat and memory requirements. >>>> >>>> Implementation details: >>>> - The netdev queue start/stop flow control is utilized. >>>> - Compatible with multi-queue by only stopping/waking the specific >>>> netdevice subqueue. >>>> - No additional locking is used. >>>> >>>> In the tun_net_xmit function: >>>> - Stopping the subqueue is done when the tx_ring gets full after inserting >>>> the SKB into the tx_ring. >>>> - In the unlikely case when the insertion with ptr_ring_produce fails, the >>>> old dropping behavior is used for this SKB. >>>> >>>> In the tun_ring_recv function: >>>> - Waking the subqueue is done after consuming a SKB from the tx_ring when >>>> the tx_ring is empty. Waking the subqueue when the tx_ring has any >>>> available space, so when it is not full, showed crashes in our testing. We >>>> are open to suggestions. >>>> - When the tx_ring is configured to be small (for example to hold 1 SKB), >>>> queuing might be stopped in the tun_net_xmit function while at the same >>>> time, ptr_ring_consume is not able to grab a SKB. This prevents >>>> tun_net_xmit from being called again and causes tun_ring_recv to wait >>>> indefinitely for a SKB in the blocking wait queue. Therefore, the netdev >>>> queue is woken in the wait queue if it has stopped. >>>> - Because the tun_struct is required to get the tx_queue into the new txq >>>> pointer, the tun_struct is passed in tun_do_read aswell. This is likely >>>> faster then trying to get it via the tun_file tfile because it utilizes a >>>> rcu lock. >>>> >>>> We are open to suggestions regarding the implementation :) >>>> Thank you for your work! >>>> >>>> [1] Link: >>>> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf >>>> [2] Link: >>>> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device >>>> [3] Link: https://github.com/tudo-cni/nodrop >>>> >>>> Co-developed-by: Tim Gebauer >>>> Signed-off-by: Tim Gebauer >>>> Signed-off-by: Simon Schippers >>> >>> I wonder if it would be possible to implement BQL in TUN/TAP? >>> >>> https://lwn.net/Articles/454390/ >>> >>> BQL provides a feedback mechanism to application when queue fills. >> >> Thank you very much for your reply, >> I also thought about BQL before and like the idea! > > I would start with this patch series to convert TUN to a driver that > pauses the stack rather than drops. > > Please reword the commit to describe the functional change concisely. > In general the effect of drops on TCP are well understood. You can > link to your paper for specific details. > I will remove the paper abstract for the v3 to have a more concise description. Also I will clarify why no packets are dropped anymore. > I still suggest stopping the ring before a packet has to be dropped. > Note also that there is a mechanism to requeue an skb rather than > drop, see dev_requeue_skb and NETDEV_TX_BUSY. But simply pausing > before empty likely suffices. > As explained before in my reply to Jason, this patch does stop the netdev queue before a packet has to be dropped. It uses a very similar approach to the suggested virtio_net. > Relevant to BQL: did your workload include particularly large packets, > e.g., TSO? Only then does byte limits vs packet limits matter. > No, in my workload I did not use TSO/GSO. However I think the most important aspect is that the BQL algorithm utilizes a dynamic queue limit. This will in most cases reduce the TUN queue size and reduce buffer bloat. I now have a idea how to include BQL, but first I will add TAP support in a v3. BQL could then be added in a v4. Thank you :) Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank. Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen. Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.