In particular: - Mention that grouping of chains in tables is irrelevant to the evaluation order. - Clarify that priorities only define the ordering of chains per hook. - Improved potentially ambiguous wording “lower priority values have precedence over higher ones”, which could be mistaken as that rules from lower priority chains might “win” over such from higher ones (which is however only the case if they drop/reject packets). The new wording merely describes which chains are evaluated first, implicitly referring the question which verdict “wins” to the section where verdicts are described, and also should work when lower priority chains mangle packets (in which case they might actually be considered as having “precedence”). Link: https://lore.kernel.org/netfilter-devel/3c7ddca7029fa04baa2402d895f3a594a6480a3a.camel@scientia.org/T/#t Signed-off-by: Christoph Anton Mitterer --- doc/nft.txt | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/doc/nft.txt b/doc/nft.txt index 87129819..88c08618 100644 --- a/doc/nft.txt +++ b/doc/nft.txt @@ -453,8 +453,10 @@ interface specified in the *device* parameter. The *priority* parameter accepts a signed integer value or a standard priority name which specifies the order in which chains with the same *hook* value are -traversed. The ordering is ascending, i.e. lower priority values have precedence -over higher ones. +traversed (regardless of the table to which they belong). The ordering is +ascending, i.e. per hook, chains with lower priority values are evaluated before +such with higher ones. The ordering of any with the same priority value is +undefined. With *nat* type chains, there's a lower excluding limit of -200 for *priority* values, because conntrack hooks at this priority and NAT requires it. -- 2.51.0 - Clarify that a terminating statement also prevents the execution of later statements in the same rule and give an example about that. - Correct that `accept` won’t terminate the evaluation of the ruleset (which is generally used for the whole set of all chains, rules, etc.) but only that of the current base chain (and any regular chains called from that). Indicate that `accept` only accepts the packet from the current base chain’s point of view. Clarify that not only chains of a later hook could still drop the packet, but also ones from the same hook if they have a higher priority. - Overhaul the description of `jump`/`goto`/`return`. `jump` only explains what the statement causes from the point of view of the new chain (that is: not, how the returning works), which includes that an implicit `return` is issued at the end of the chain. `goto` is explained in reference to `jump`. `return` describes abstractly how the return position is determined and what happens if there’s no position to return to (but not for example where an implicit `return` is issued). - Various other minor improvements/clarifications to wording. - List and explain verdict-like statements like `reject` which internally imply `accept` or `drop`. Further explain that with respect to evaluation these behave like their respectively implied verdicts. Link: https://lore.kernel.org/netfilter-devel/3c7ddca7029fa04baa2402d895f3a594a6480a3a.camel@scientia.org/T/#t Signed-off-by: Christoph Anton Mitterer --- doc/statements.txt | 84 ++++++++++++++++++++++++++++++++-------------- 1 file changed, 58 insertions(+), 26 deletions(-) diff --git a/doc/statements.txt b/doc/statements.txt index 6226713b..f812dec8 100644 --- a/doc/statements.txt +++ b/doc/statements.txt @@ -1,6 +1,7 @@ -VERDICT STATEMENT -~~~~~~~~~~~~~~~~~ -The verdict statement alters control flow in the ruleset and issues policy decisions for packets. +VERDICT STATEMENTS +~~~~~~~~~~~~~~~~~~ +The verdict statements alter control flow in the ruleset and issue policy +decisions for packets. [verse] ____ @@ -10,40 +11,71 @@ ____ 'CHAIN' := 'chain_name' | *{* 'statement' ... *}* ____ -*accept* and *drop* are absolute verdicts -- they terminate ruleset evaluation immediately. +*accept* and *drop* are absolute verdicts, which immediately terminate the +evaluation of the current rule, i.e. even any later statements of the current +rule won’t get executed. + +.*counter* will get executed: +------------------------------ +… counter accept +------------------------------ + +.*counter* won’t get executed: +------------------------------ +… accept counter +------------------------------ + +Further: [horizontal] -*accept*:: Terminate ruleset evaluation and accept the packet. -The packet can still be dropped later by another hook, for instance accept -in the forward hook still allows one to drop the packet later in the postrouting hook, -or another forward base chain that has a higher priority number and is evaluated -afterwards in the processing pipeline. -*drop*:: Terminate ruleset evaluation and drop the packet. -The drop occurs instantly, no further chains or hooks are evaluated. -It is not possible to accept the packet in a later chain again, as those -are not evaluated anymore for the packet. +*accept*:: Terminate the evaluation of the current base chain (and any regular +chains called from it) and accept the packet from their point of view. +The packet may however still be dropped by either another chain with a higher +priority of the same hook or any chain of a later hook. +For example, an *accept* in a chain of the *forward* hook still allows one to +*drop* (or *reject*, etc.) the packet in another *forward* hook base chain (and +any regular chains called from it) that has a higher priority number as well as +later in a chain of the *postrouting* hook. +*drop*:: Terminate ruleset evaluation and drop the packet. This occurs +instantly, no further chains of any hooks are evaluated and it is thus not +possible to again accept the packet in a higher priority or later chain, as +those are not evaluated anymore for the packet. +*jump* 'CHAIN':: Store the current position in the call stack of chains and + continue evaluation at the first rule of 'CHAIN'. + When the end of 'CHAIN' is reached, an implicit *return* verdict is issued. + When an absolute verdict is issued (respectively implied by a verdict-like + statement) in 'CHAIN', evaluation terminates as described above. +*goto* 'CHAIN':: Equal to *jump* except that the current position is not stored + in the call stack of chains. +*return*:: End evaluation of the current chain, pop the most recently added + position from the call stack of chains and continue evaluation after that + position. + When there’s no position to pop (which is the case when the current chain is + either the base chain or a regular chain that was reached solely via *goto* + verdicts) end evaluation of the current base chain (and any regular chains + called from it) using the base chain’s policy as implicit verdict. +*continue*:: Continue ruleset evaluation with the next rule. This + is the default behaviour in case a rule issues no verdict. *queue*:: Terminate ruleset evaluation and queue the packet to userspace. Userspace must provide a drop or accept verdict. In case of accept, processing resumes with the next base chain hook, not the rule following the queue verdict. -*continue*:: Continue ruleset evaluation with the next rule. This - is the default behaviour in case a rule issues no verdict. -*return*:: Return from the current chain and continue evaluation at the - next rule in the last chain. If issued in a base chain, it is equivalent to the - base chain policy. -*jump* 'CHAIN':: Continue evaluation at the first rule in 'CHAIN'. The current - position in the ruleset is pushed to a call stack and evaluation will continue - there when the new chain is entirely evaluated or a *return* verdict is issued. - In case an absolute verdict is issued by a rule in the chain, ruleset evaluation - terminates immediately and the specific action is taken. -*goto* 'CHAIN':: Similar to *jump*, but the current position is not pushed to the - call stack, meaning that after the new chain evaluation will continue at the last - chain instead of the one containing the goto statement. An alternative to specifying the name of an existing, regular chain in 'CHAIN' is to specify an anonymous chain ad-hoc. Like with anonymous sets, it can't be referenced from another rule and will be removed along with the rule containing it. +All the above applies analogously to statements that imply a verdict: +*redirect*, *dnat*, *snat* and *masquerade* internally issue eventually an +*accept* verdict. +*reject* and *synproxy* internally issue eventually a *drop* verdict. +These statements thus behave like their implied verdicts, but with side effects. + +For example, a *reject* also immediately terminates the evaluation of the +current rule, overrules any *accept* from any other chains and can itself not be +overruled, while the various NAT statements may be overruled by other *drop* +verdicts respectively statements that imply this. + .Using verdict statements ------------------- # process packets from eth0 and the internal network in from_lan -- 2.51.0 Statements are elements of rules. Non-terminal statement are in particular passive with respect to their rules (and thus automatically with respect to the whole ruleset). In “Continue ruleset evaluation”, it’s not necessary to mention the ruleset as it’s obvious that the evaluation of the current chain will be continued. Signed-off-by: Christoph Anton Mitterer --- doc/nft.txt | 6 +++--- doc/statements.txt | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/doc/nft.txt b/doc/nft.txt index 88c08618..a32fb10c 100644 --- a/doc/nft.txt +++ b/doc/nft.txt @@ -910,9 +910,9 @@ actions, such as logging, rejecting a packet, etc. + Statements exist in two kinds. Terminal statements unconditionally terminate evaluation of the current rule, non-terminal statements either only conditionally or never terminate evaluation of the current rule, in other words, -they are passive from the ruleset evaluation perspective. There can be an -arbitrary amount of non-terminal statements in a rule, but only a single -terminal statement as the final statement. +they are passive from the rule evaluation perspective. There can be an arbitrary +amount of non-terminal statements in a rule, but only a single terminal +statement as the final statement. include::statements.txt[] diff --git a/doc/statements.txt b/doc/statements.txt index f812dec8..850c32cb 100644 --- a/doc/statements.txt +++ b/doc/statements.txt @@ -54,8 +54,8 @@ those are not evaluated anymore for the packet. either the base chain or a regular chain that was reached solely via *goto* verdicts) end evaluation of the current base chain (and any regular chains called from it) using the base chain’s policy as implicit verdict. -*continue*:: Continue ruleset evaluation with the next rule. This - is the default behaviour in case a rule issues no verdict. +*continue*:: Continue evaluation with the next rule. This is the default + behaviour in case a rule issues no verdict. *queue*:: Terminate ruleset evaluation and queue the packet to userspace. Userspace must provide a drop or accept verdict. In case of accept, processing resumes with the next base chain hook, not the rule following the queue verdict. -- 2.51.0 Signed-off-by: Christoph Anton Mitterer --- doc/nft.txt | 102 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 102 insertions(+) diff --git a/doc/nft.txt b/doc/nft.txt index a32fb10c..20c63f98 100644 --- a/doc/nft.txt +++ b/doc/nft.txt @@ -560,6 +560,108 @@ table inet filter { nft delete rule inet filter input handle 5 ------------------------- +OVERALL EVALUATION OF THE RULESET +--------------------------------- +This is a summary of how the ruleset is evaluated. + +* Even if a packet is accepted by the ruleset (and thus by netfilter), it may + still get discarded by other means, for example Linux generally ignores + various ICMP types and there are sysctl options like + `net.ipv{4,6}.conf.*.forwarding` or `net.ipv4.conf.*.rp_filter`. +* Tables are merely a concept of nftables to structure the ruleset and not known + to netfilter itself. + They are thus irrelevant with respect to netfilter’s evaluation of the + ruleset. +* Packets traverse the network stack and at various hooks (see + <> above for lists of hooks per address family) they’re + evaluated by any base chains attached to these hooks. +* Base chains may call regular chains and regular chains may call other regular + chains (via *jump* and *goto* verdicts), in which case evaluation continues in + the called chain. + Base chains themsevlves cannot be called and only chains of the same table can + be called. +* For each hook, the attached chains are evaluated in order of their priorities. + Chains with lower priority values are evaluated before those with higher ones. + The order of chains with the same priority value is undefined. +* An *accept* verdict (including an implict one via the base chain’s policy) + ends the evaluation of the current base chain (and any regular chains called + from that). + It accepts the packet only with respect to the current base chain. Any other + base chain (or regular chain called by such) with a higher priority of the + same hook as well as any other base chain (or regular chain called by such) of + any later hook may however still ultimately *drop* (which might also be done + via verdict-like statements that imply *drop*, like *reject*) the packet with + an according verdict (with consequences as described below for *drop*). + Thus and merely from netfilter’s point of view, a packet is only ultimately + accepted if none of the chains (regardless of their tables) that are attached + to any of the respectively relevant hooks issues a *drop* verdict (be it + explicitly or implicitly by policy or via a verdict-like statement that + implies *drop*, for example *reject*), which already means that there has to + be at least one *accept* verdict (be it explicitly or implicitly by policy). + All this applies analogously to verdict-like statements that imply *accept*, + for example the NAT statements. +* A *drop* verdict (including an implict one via the base chain’s policy) + immediately ends the evaluation of the whole ruleset and ultimately drops the + packet. + Unlike with an *accept* verdict, no further chains of any hook and regardless + of their table get evaluated and it’s therefore not possible to have an *drop* + verdict overruled. + Thus, if any base chain uses drop as its policy, the same base chain (or any + regular chain directly or indirectly called by it) must accept a packet or it + is ensured to be ultimately dropped by it. + All this applies analogously to verdict-like statements that imply *drop*, + for example *reject*. +* Given the semantics of *accept*/*drop* and only with respect to the utlimate + decision of whether a packet is accepted or dropped, the ordering of the + various base chains per hook via their priorities matters only in so far, as + any of them modifies the packet or its meta data and that has an influence on + the verdicts issued by the chains – other than that, the ordering shouldn’t + matter (except for performance and other side effects). + It also means that short-circuiting the ultimate decision is only possible via + *drop* verdicts (respectively verdict-like statements that imply *drop*, for + example *reject*). +* A *jump* verdict causes the current position to be stored in the call stack of + chains and evaluation to continue at the beginning of the called regular + chain. + Called chains must be from the same table and cannot be base chains. + When the end of the called chain is reached, an implicit *return* verdict is + issued. + Other verdicts (respectively verdict-like statements) are processed as + described above and below. +* A *goto* verdict is equal to *jump* except that the current position is not + stored in the call stack of chains. +* A *return* verdict ends the evaluation of the current chain, pops the most + recently added position from the call stack of chains and causes evaluation to + continue after that position. + When there’s no position to pop (which is the case when the current chain is + either the base chain or a regular chain that was reached solely via *goto* + verdicts) it ends the evaluation of the current base chain (and any regular + chains called from it) using the base chain’s policy as implicit verdict. +* Examples for *jump*/*goto*/*return*: + * 'base' {*jump*}→ 'regular-1' {*jump*}→ 'regular-2' + At the end of 'regular-2' or when a *return* is issued in that, evaluation + continues after the *jump* position in 'regular-1'. + At the end of 'regular-1' or when a *return* is issued in that, evaluation + continues after the *jump* position in 'base'. + * 'base' {*jump*}→ 'regular-1' {*goto*}→ 'regular-2' + At the end of 'regular-2' or when a *return* is issued in that, evaluation + continues after the *jump* position in 'base'. + * 'base' {*jump*}→ 'regular-1' {*jump*}→ 'regular-2' {*goto*}→ 'regular-3' + At the end of 'regular-3' or when a *return* is issued in that, evaluation + continues after the *jump* position in 'regular-1'. + At the end of 'regular-1' or when a *return* is issued in that, evaluation + continues after the *jump* position in 'base'. + * 'base' {*jump*}→ 'regular-1' {*goto*}→ 'regular-2' {*goto*}→ 'regular-3' + At the end of 'regular-3' or when a *return* is issued in that, evaluation + continues after the *jump* position in 'base'. +* Verdicts (that is: *accept*, *drop*, *jump*, *goto*, *return*, *continue* and + *queue*) as well as statements that imply a verdict (like *reject* or the NAT + statements) also end the evaluation of any later statements in their + respective rules (respectively cause an error when loading such rules). + For example in `… counter accept` the `counter` statement is processed, but in + `… accept counter` it is not. + This does not apply to the `comment` statement, which is always evaluated. + SETS ---- nftables offers two kinds of set concepts. Anonymous sets are sets that have no -- 2.51.0 Signed-off-by: Christoph Anton Mitterer --- doc/data-types.txt | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/doc/data-types.txt b/doc/data-types.txt index 18af266a..47a0d25a 100644 --- a/doc/data-types.txt +++ b/doc/data-types.txt @@ -26,6 +26,22 @@ integer The bitmask type (*bitmask*) is used for bitmasks. +In expressions the bits of a bitmask may be specified as *'bit'[,'bit']...* with +'bit' being the value of the bit or a pre-defined symbolic constant, if any (for +example *ct state*’s bit 0x1 has the symbolic constant `new`). + +Equality of a value with such bitmask is given, if the value has any of the +bitmask’s bits set (and optionally others). + +The syntax *'expression' 'value' / 'mask'* is identical to +*'expression' and 'mask' == 'value'*. +For example `tcp flags syn,ack / syn,ack,fin,rst` is the same as +`tcp flags and (syn|ack|fin|rst) == syn|ack`. + +It should further be noted that *'expression' 'bit'[,'bit']...* is not the same +as *'expression' {'bit'[,'bit']...}*. + + STRING TYPE ~~~~~~~~~~~~ [options="header"] -- 2.51.0 Currently, `nft` doesn’t call `setlocale(3)` and thus `glob(3)` uses the `C` locale. Document this as it’s possibly relevant to the ordering of included rules. This also makes the collation order “official” so any future localisation would need to adhere to that. Signed-off-by: Christoph Anton Mitterer --- doc/nft.txt | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/nft.txt b/doc/nft.txt index 20c63f98..3fef1882 100644 --- a/doc/nft.txt +++ b/doc/nft.txt @@ -165,8 +165,8 @@ Include statements support the usual shell wildcard symbols (*,?,[]). Having no matches for an include statement is not an error, if wildcard symbols are used in the include statement. This allows having potentially empty include directories for statements like **include "/etc/firewall/rules/*"**. The wildcard -matches are loaded in alphabetical order. Files beginning with dot (.) are not -matched by include statements. +matches are loaded in the collation order of the C locale. Files beginning with +dot (.) are not matched by include statements. SYMBOLIC VARIABLES ~~~~~~~~~~~~~~~~~~ -- 2.51.0 Signed-off-by: Christoph Anton Mitterer --- doc/data-types.txt | 1 + doc/nft.txt | 10 ++++++++++ 2 files changed, 11 insertions(+) diff --git a/doc/data-types.txt b/doc/data-types.txt index 47a0d25a..dad7e31b 100644 --- a/doc/data-types.txt +++ b/doc/data-types.txt @@ -40,6 +40,7 @@ For example `tcp flags syn,ack / syn,ack,fin,rst` is the same as It should further be noted that *'expression' 'bit'[,'bit']...* is not the same as *'expression' {'bit'[,'bit']...}*. +See <> above. STRING TYPE diff --git a/doc/nft.txt b/doc/nft.txt index 3fef1882..4d1daf5c 100644 --- a/doc/nft.txt +++ b/doc/nft.txt @@ -764,6 +764,16 @@ Example: When the set contains range *1.2.3.1-1.2.3.4*, then adding element *1.2 effect. Adding *1.2.3.5* changes the existing range to cover *1.2.3.1-1.2.3.5*. Without this flag, *1.2.3.2* can not be added and *1.2.3.5* is inserted as a new entry. +Equality of a value with a set is given if the value matches exactly one value +in the set. +It shall be noted that for bitmask values this means, that +*'expression' 'bit'[,'bit']...* (which yields true if *any* of the bits are set) +is not the same as *'expression' {'bit'[,'bit']...}* (which yields true if +exactly one of the bits are set). +It may however be (effectively) the same, in cases like +`ct state established,related` and `ct state {established,related}`, where these +states are mutually exclusive. + MAPS ----- [verse] -- 2.51.0