netfilter project logo people.netfilter.org developer blogs

Patrick McHardy's blog

Tue, 02 Sep 2008

nftables part II


A lot has happened since my last posting about nftables, so here's an update with the current status. I unfortunately failed to reach my goal of being able to replace my iptables based firewall by the end of August, but I'm getting close :)

New kernel modules

I've added a few new kernel modules for missing functionality:

  • A conntrack module, replacing xt_state, xt_conntrack, xt_helper and xt_connmark. It loads a specified item from struct nf_conn and related structs into a user defined register.
  • A logging module, which simply calls nf_log to pass the packet to the active logging backend.
  • A limit module, replacing xt_limit. While at it, I extended the possible range of values so we can use higher limits and don't loose as much precision. Hashlimit is still missing, I'm considering merging them into a single module.
  • A module to wire up bridging with nftables. It consists only of a few lines of code to register a data structure containing the highest hooks value, a module reference and the family. Transport header payload matching doesn't work yet though since it also needs to initialize the offsets.
For all these modules I've also implemented their respective userspace counterparts in libnl and the nftables userspace frontend. With these modules, a large part of the existing iptables and ip6tables matches are covered. More about that below ...

Payload expressions

I've added tons of new payload descriptions, it now supports:

  • Ethernet header: fully covered
  • IP header: no options
  • ICMP header: fully covered
  • IPv6 header: fully covered
  • AH header: fully covered
  • ESP header: fully covered
  • COMP header: fully covered
  • UDP/UDP-Lite header: fully covered
  • TCP header: no options
  • DCCP header: only ports
  • SCTP header: only base header members (ports, vtag, checksum)
Effectively this means its fully covering the fixed size portion of all headers. Chained header parsing can't be expressed very efficiently using offsets and lengths, so I'll probably add special modules for them. So far it seems we need a IP-style option parsing module for IP, TCP and DCCP options, as well as an IPv6 extension header parsing module. And then there is SCTP ...

Optimizations

Constant folding

Constant folding is now implemented, propagation of operations to the constant size is still missing.

Adjacent payload load merging

I've added a first rule level optimization, adjacent payload expression merging. When payload expressions refer to consecutive header fields, we can merge the load operation

  • when the payload expressions are contained in match statements and the match expressions are equality expressions. In that case, we can simply join all the LHS and the RHS expressions (after reordering them to match the order of the fields in the header).

    Syntax: ip saddr 192.168.0.1 daddr 192.168.0.100
           [ payload load 8b @ network header + 12 => reg 1 ] 
           [ cmp reg 1 0x0100a8c0 0x6400a8c0 ] 
       
  • when the payload expressions are part of a concatentation. This is by definition a merged expression.
During ruleset dumping, the merged expressions are expanded again to the original components.

So far the expressions are mindlessly merged together, without regard to maximum size, alignment or whatever. The size must of course be taken into account since the kernel data types are fixed. Alignment currently doesn't matter, but I'm planning to optimize small, aligned data loads in the kernel, either by overloading the evaluation function or just special-casing. Providing the ability to overload operations probably would be useful for other common case optimizations.

Value range tracking

This is not fully finished yet, but will allow two nice things when it is. The purpose is to track constraints of dynamic expressions when operations are applied to them. For instance, its easy to see that the expression "ip daddr & 0xf" can only take the values 0x0-0xf. More generally, we can track constraints bitwise and use them to determine the possible input range for lookups in dynamic sets and use that to choose the optimal representation. It can also be used to determine ineffective matches, but it only works on each expression separately and thus is just a subset of ineffective rule detection.

Type checks and error reporting

Expressions are fully checked for type compatiblity now. This includes sets, maps and concatenations.

  • Basic type mismatches:

    Syntax: rule add filter output ip daddr 22
        :1:33-34: Error: Datatype mismatch: expected IPv4 address, got numeric
        rule add filter output ip daddr 22
                               ~~~~~~~~ ^^
       
  • Set type mismatches:

    Syntax: rule add filter output ip daddr { 22, 23}
        :1:35-36: Error: Datatype mismatch: expected IPv4 address, got numeric
        rule add filter output ip daddr { 22, 23}
                               ~~~~~~~~   ^^     
       
  • Map type mismatches (only RHS shown, LHS is similar to set):

    Syntax: rule add filter output meta mark == ip daddr map { 192.168.0.1 => 10.0.0.1 }
        :1:67-74: Error: Datatype mismatch: expected numeric, got IPv4 address
        rule add filter output meta mark == ip daddr map { 192.168.0.1 => 10.0.0.1 }
                               ~~~~~~~~~                                  ^^^^^^^^  
       
  • Concatenation type mismatches:

    Syntax: rule add filter output ip saddr . daddr { 192.168.0.1 . 22 }
        :1:60-61: Error: Datatype mismatch: expected IPv4 address, got numeric
        rule add filter output ip saddr . daddr { 192.168.0.1 . 22 }
                                          ~~~~~                 ^^  
       
Type checks happen during type promotion, which needs to fully expand the expression, so as shown in the map example, type checks are even performed across multiple operations.

Open problems

There have been more changes, but I'll write about those another time. There are of course also a lot of open problems, mostly nothing too complicated, but also a few tricky ones.

Match expression parsing

So far, all header expressions except addresses are treated as numerical values. So you can't say "tcp flags SYN/SYN,ACK", but have to specify it numerically. This needs to be fixed of course. The main problem is that both sides of a match expressions are constructed through separate productions, for example the rule for a relational expression is:

  relational_expr	: expr	relational_op	expr
 
The LHS expression might be a payload expression and the RHS a constant to compare. When parsing the RHS, we don't have the necessary context to accept special tokens related to the LHS. The reason for specifing the grammar like this is that dynamic expression can not only occur in matches, but also as arguments to targets or f.i. mappings.

There are basically two possibilities to fix this that I can think of.
  • Introduce new subtypes for every numerical type that can take symbolic values, like TCP flags, IP options, realms, etc. The downside is that the number of types is potentially huge.
  • Add special parsing callbacks to every dynamic expression for the RHS and accept unquoted strings as "to be resolved" type. The main problem with this approach is that its pretty likely that sooner or latter, keywords and arguments would clash, requiring increasing amounts of ugliness in the grammar.
Other suggestions are happily accepted because I'm not fully convinced of either possibility, though I'm leaning towards the first one. The alternative thats sounding more and more attractive is to use a hand written parser.

Again, match expression parsing

Another problem resulting from the fact that we don't have any context once an expression is parsed is that multiple related matches are cumbersome to specify. I've actually been cheating in my examples and shown the syntax as it should be, not as it currently is. For example you can't write "tcp sport 1024: dport 22", but have to write "tcp" twice to provide the necessary context: "tcp sport 1024: tcp dport 22". The only way I see that might *possibly* work (while keeping bison) is to do some YYBACKUP look-ahead token manipulation. Not too appealing either ...

Copyright (C) 2001-2005 Patrick McHardy