Willem de Bruijn wrote:
> Simon Schippers wrote:
>> Stephen Hemminger wrote:
>>> On Tue, 12 Aug 2025 00:03:48 +0200
>>> Simon Schippers <simon.schippers@tu-dortmund.de> wrote:
>>>
>>>> This patch is the result of our paper with the title "The NODROP Patch:
>>>> Hardening Secure Networking for Real-time Teleoperation by Preventing
>>>> Packet Drops in the Linux TUN Driver" [1].
>>>> It deals with the tun_net_xmit function which drops SKB's with the reason
>>>> SKB_DROP_REASON_FULL_RING whenever the tx_ring (TUN queue) is full,
>>>> resulting in reduced TCP performance and packet loss for bursty video
>>>> streams when used over VPN's.
>>>>
>>>> The abstract reads as follows:
>>>> "Throughput-critical teleoperation requires robust and low-latency
>>>> communication to ensure safety and performance. Often, these kinds of
>>>> applications are implemented in Linux-based operating systems and transmit
>>>> over virtual private networks, which ensure encryption and ease of use by
>>>> providing a dedicated tunneling interface (TUN) to user space
>>>> applications. In this work, we identified a specific behavior in the Linux
>>>> TUN driver, which results in significant performance degradation due to
>>>> the sender stack silently dropping packets. This design issue drastically
>>>> impacts real-time video streaming, inducing up to 29 % packet loss with
>>>> noticeable video artifacts when the internal queue of the TUN driver is
>>>> reduced to 25 packets to minimize latency. Furthermore, a small queue
>>>> length also drastically reduces the throughput of TCP traffic due to many
>>>> retransmissions. Instead, with our open-source NODROP Patch, we propose
>>>> generating backpressure in case of burst traffic or network congestion.
>>>> The patch effectively addresses the packet-dropping behavior, hardening
>>>> real-time video streaming and improving TCP throughput by 36 % in high
>>>> latency scenarios."
>>>>
>>>> In addition to the mentioned performance and latency improvements for VPN
>>>> applications, this patch also allows the proper usage of qdisc's. For
>>>> example a fq_codel can not control the queuing delay when packets are
>>>> already dropped in the TUN driver. This issue is also described in [2].
>>>>
>>>> The performance evaluation of the paper (see Fig. 4) showed a 4%
>>>> performance hit for a single queue TUN with the default TUN queue size of
>>>> 500 packets. However it is important to notice that with the proposed
>>>> patch no packet drop ever occurred even with a TUN queue size of 1 packet.
>>>> The utilized validation pipeline is available under [3].
>>>>
>>>> As the reduction of the TUN queue to a size of down to 5 packets showed no
>>>> further performance hit in the paper, a reduction of the default TUN queue
>>>> size might be desirable accompanying this patch. A reduction would
>>>> obviously reduce buffer bloat and memory requirements.
>>>>
>>>> Implementation details:
>>>> - The netdev queue start/stop flow control is utilized.
>>>> - Compatible with multi-queue by only stopping/waking the specific
>>>> netdevice subqueue.
>>>> - No additional locking is used.
>>>>
>>>> In the tun_net_xmit function:
>>>> - Stopping the subqueue is done when the tx_ring gets full after inserting
>>>> the SKB into the tx_ring.
>>>> - In the unlikely case when the insertion with ptr_ring_produce fails, the
>>>> old dropping behavior is used for this SKB.
>>>>
>>>> In the tun_ring_recv function:
>>>> - Waking the subqueue is done after consuming a SKB from the tx_ring when
>>>> the tx_ring is empty. Waking the subqueue when the tx_ring has any
>>>> available space, so when it is not full, showed crashes in our testing. We
>>>> are open to suggestions.
>>>> - When the tx_ring is configured to be small (for example to hold 1 SKB),
>>>> queuing might be stopped in the tun_net_xmit function while at the same
>>>> time, ptr_ring_consume is not able to grab a SKB. This prevents
>>>> tun_net_xmit from being called again and causes tun_ring_recv to wait
>>>> indefinitely for a SKB in the blocking wait queue. Therefore, the netdev
>>>> queue is woken in the wait queue if it has stopped.
>>>> - Because the tun_struct is required to get the tx_queue into the new txq
>>>> pointer, the tun_struct is passed in tun_do_read aswell. This is likely
>>>> faster then trying to get it via the tun_file tfile because it utilizes a
>>>> rcu lock.
>>>>
>>>> We are open to suggestions regarding the implementation :)
>>>> Thank you for your work!
>>>>
>>>> [1] Link:
>>>> https://cni.etit.tu-dortmund.de/storages/cni-etit/r/Research/Publications/2025/Gebauer_2025_VTCFall/Gebauer_VTCFall2025_AuthorsVersion.pdf
>>>> [2] Link:
>>>> https://unix.stackexchange.com/questions/762935/traffic-shaping-ineffective-on-tun-device
>>>> [3] Link: https://github.com/tudo-cni/nodrop
>>>>
>>>> Co-developed-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>>>> Signed-off-by: Tim Gebauer <tim.gebauer@tu-dortmund.de>
>>>> Signed-off-by: Simon Schippers <simon.schippers@tu-dortmund.de>
>>>
>>> I wonder if it would be possible to implement BQL in TUN/TAP?
>>>
>>> https://lwn.net/Articles/454390/
>>>
>>> BQL provides a feedback mechanism to application when queue fills.
>>
>> Thank you very much for your reply,
>> I also thought about BQL before and like the idea!
>
> I would start with this patch series to convert TUN to a driver that
> pauses the stack rather than drops.
>
> Please reword the commit to describe the functional change concisely.
> In general the effect of drops on TCP are well understood. You can
> link to your paper for specific details.
>

I will remove the paper abstract for the v3 to have a more concise
description.
Also I will clarify why no packets are dropped anymore.

> I still suggest stopping the ring before a packet has to be dropped.
> Note also that there is a mechanism to requeue an skb rather than
> drop, see dev_requeue_skb and NETDEV_TX_BUSY. But simply pausing
> before empty likely suffices.
>

As explained before in my reply to Jason, this patch does stop the netdev
queue before a packet has to be dropped. It uses a very similar approach
to the suggested virtio_net.

> Relevant to BQL: did your workload include particularly large packets,
> e.g., TSO? Only then does byte limits vs packet limits matter.
>

No, in my workload I did not use TSO/GSO. However I think the most
important aspect is that the BQL algorithm utilizes a dynamic queue limit.
This will in most cases reduce the TUN queue size and reduce buffer bloat.

I now have a idea how to include BQL, but first I will add TAP support in
a v3. BQL could then be added in a v4.

Thank you :)

Wichtiger Hinweis: Die Information in dieser E-Mail ist vertraulich. Sie ist ausschließlich für den Adressaten bestimmt. Sollten Sie nicht der für diese E-Mail bestimmte Adressat sein, unterrichten Sie bitte den Absender und vernichten Sie diese Mail. Vielen Dank.
Unbeschadet der Korrespondenz per E-Mail, sind unsere Erklärungen ausschließlich final rechtsverbindlich, wenn sie in herkömmlicher Schriftform (mit eigenhändiger Unterschrift) oder durch Übermittlung eines solchen Schriftstücks per Telefax erfolgen.

Important note: The information included in this e-mail is confidential. It is solely intended for the recipient. If you are not the intended recipient of this e-mail please contact the sender and delete this message. Thank you. Without prejudice of e-mail correspondence, our statements are only legally binding when they are made in the conventional written form (with personal signature) or when such documents are sent by fax.