TProxy general introduction Real transparency requirements ------------------------------ - traditional redirection 'REDIRECT' target - initiating connections with foreign source address (used to fake source address of the connection towards the server) - intercepting a connection destined to a foreign destination address (used to intercept related connections, e.g. FTP data channel) The current implementation -------------------------- - Uses a separate table named 'tproxy' to perform 'REDIRECT'-like transparency. A target named 'TPROXY' is also provided. Reasons: - NAT remains to be useful for forwarded traffic, and the proxy and NAT rules are cleanly separated - the TPROXY target marks sessions using a bit in conntrack->flags (new field) to make it easier to match tproxied packets in the filter table. - we need special handling for UDP packets. - The other two transparency cases are implemented using a hashtable of sockrefs. This hashtable is manipulated from userspace using setsockopt calls (IP_TPROXY_* and friends). The most important parts of sockrefs are: - local IP:port - foreign IP:port - flags indicating connection mode (LISTEN or CONNECT) When a NEW connection is processed by the TPROXY hook, it consults the sockref hash, looks up an entry with the current source IP (connect from foreign addr), or current destination (listen on foreign addr), and if something is found, the appropriate NAT manip is created using ip_nat_setup_info() If no appropriate sockrefs are present, the tproxy table is traversed (ie. proxy requested mappings have precedence over admin rules) - It runs after mangle and before nat (priority: -130). Special handling of UDP ----------------------- - General assumption about the proxy: - the proxy receives the first packet on a Receiver port (not fully specified socket, e.g. remote address 0.0.0.0:0) - this initiates a new session - the new session opens a new socket (e.g. new local address) and fully specifies this socket by specifying destination address using connect() - when another packet is received in the same session, the stack picks the more specific socket (ie. the session socket) - all communications happens in the second socket (races!) - Translated to NetFilter: - the first initiator packet should be DNATed to the Receiver port but only once, further packets will have a separate conntrack entry (as the DNAT destination will change when the new socket is created) - second and further packets will have a real conntrack assigned with TPROXY working similarly as in TCP Current known usage: -------------------- - Zorp, transparent proxy firewall - patches for Squid exists - rumors that it works with bridging as well (proxy based bridging firewall, anyone?) Current issues in TPROXY ------------------------ - source port allocation - UDP support is not complete, though it is enough for Zorp to work - timeout updates, or infinite timeouts? General NetFilter additions --------------------------- - These are general, small features that are required by TPROXY and might be useful for other projects as well: - flags field in conntrack for general bit-fields the tproxy match is implemented using a bit in this field - the flags argument to ip_nat_setup_info() it is currently used to indicate that no NAT helpers are to be invoked for this NAT mapping (to avoid proxy & nat helper interference) - sock_release callback to nf_sockopts this is currently used to delete entries from the sockref hash when a socket is closed. Current issues in NetFilter core -------------------------------- - removing conntrack entries is slow This would be needed to assure that setting up a new tproxy socket & sockref will result in a successfull ip_nat_setup_info() call, as without removing entries address collision will occur and nat setup will fail. -> possible solution: in addition to assigning sockrefs to sockets, also assign conntrack entries, and remove those as well when the socket is closed. Problem: locking, reference counting - ip_nat_setup_info() should check nat_info->initialized and update the hashes on its own: - additionally the check whether the given hook has already initialized a NAT manip could also be moved there. WRITE_LOCK(&ip_nat_lock); /* Seen it before? This can happen for loopback, retrans, or local packets.. */ move2 ---> if (!(info->initialized & (1 << maniptype))) { int in_hashes = info->initialized; unsigned int ret; if (ct->master && master_ct(ct)->nat.info.helper && master_ct(ct)->nat.info.helper->expect) { ret = call_expect(master_ct(ct), pskb, hooknum, ct, info); } else { #ifdef CONFIG_IP_NF_NAT_LOCAL /* LOCAL_IN hook doesn't have a chain! */ if (hooknum == NF_IP_LOCAL_IN) { ret = NF_ACCEPT; } else #endif ret = ip_nat_rule_find(pskb, hooknum, in, out, ct, info); } if (ret != NF_ACCEPT) { WRITE_UNLOCK(&ip_nat_lock); return ret; } ---> if (in_hashes) { | IP_NF_ASSERT(info->bysource.conntrack); | replace_in_hashes(ct, info); move1 } else { | place_in_hashes(ct, info); ---> } } else DEBUGP("Already setup manip %s for ct %p\n", maniptype == IP_NAT_MANIP_SRC ? "SRC" : "DST", ct); WRITE_UNLOCK(&ip_nat_lock); break;