TProxy general introduction

Real transparency requirements
------------------------------
- traditional redirection 'REDIRECT' target
- initiating connections with foreign source address (used to fake source
  address of the connection towards the server)
- intercepting a connection destined to a foreign destination address
  (used to intercept related connections, e.g. FTP data channel)

The current implementation
--------------------------

- Uses a separate table named 'tproxy' to perform 'REDIRECT'-like
  transparency. A target named 'TPROXY' is also provided.
  Reasons:
  - NAT remains to be useful for forwarded traffic, and the proxy and
    NAT rules are cleanly separated
  - the TPROXY target marks sessions using a bit in conntrack->flags (new
    field) to make it easier to match tproxied packets in the filter table.
  - we need special handling for UDP packets.

- The other two transparency cases are implemented using a hashtable of
  sockrefs. This hashtable is manipulated from userspace using setsockopt
  calls (IP_TPROXY_* and friends). The most important parts of sockrefs
  are:
    - local IP:port
    - foreign IP:port
    - flags indicating connection mode (LISTEN or CONNECT)
  When a NEW connection is processed by the TPROXY hook, it consults the
  sockref hash, looks up an entry with the current source IP (connect from
  foreign addr), or current destination (listen on foreign addr), and if
  something is found, the appropriate NAT manip is created using
  ip_nat_setup_info() If no appropriate sockrefs are present, the tproxy
  table is traversed (ie. proxy requested mappings have precedence over
  admin rules)

- It runs after mangle and before nat (priority: -130).

Special handling of UDP
-----------------------
- General assumption about the proxy:
  - the proxy receives the first packet on a Receiver port (not fully
    specified socket, e.g. remote address 0.0.0.0:0)
  - this initiates a new session
  - the new session opens a new socket (e.g. new local address) and fully
    specifies this socket by specifying destination address using connect()
  - when another packet is received in the same session, the stack picks the
    more specific socket (ie. the session socket)
  - all communications happens in the second socket (races!)
- Translated to NetFilter:
  - the first initiator packet should be DNATed to the Receiver port but only
    once, further packets will have a separate conntrack entry (as the DNAT
    destination will change when the new socket is created)
  - second and further packets will have a real conntrack assigned with
    TPROXY working similarly as in TCP

Current known usage:
--------------------
- Zorp, transparent proxy firewall
- patches for Squid exists
- rumors that it works with bridging as well (proxy based bridging firewall,
  anyone?)

Current issues in TPROXY
------------------------
- source port allocation
- UDP support is not complete, though it is enough for Zorp to work
- timeout updates, or infinite timeouts?

General NetFilter additions
---------------------------
- These are general, small features that are required by TPROXY and might be
  useful for other projects as well:
  - flags field in conntrack for general bit-fields
    the tproxy match is implemented using a bit in this field
  - the flags argument to ip_nat_setup_info()
    it is currently used to indicate that no NAT helpers are to be invoked 
    for this NAT mapping (to avoid proxy & nat helper interference)
  - sock_release callback to nf_sockopts
    this is currently used to delete entries from the sockref hash when a
    socket is closed.


Current issues in NetFilter core
--------------------------------
- removing conntrack entries is slow

  This would be needed to assure that setting up a new tproxy socket &
  sockref will result in a successfull ip_nat_setup_info() call, as without
  removing entries address collision will occur and nat setup will fail.

  -> possible solution: in addition to assigning sockrefs to sockets, also
     assign conntrack entries, and remove those as well when the socket is
     closed. Problem: locking, reference counting


- ip_nat_setup_info() should check nat_info->initialized and update the
  hashes on its own:
- additionally the check whether the given hook has already initialized a
  NAT manip could also be moved there.

                WRITE_LOCK(&ip_nat_lock);
                /* Seen it before?  This can happen for loopback, retrans,
                   or local packets.. */
move2 --->      if (!(info->initialized & (1 << maniptype))) {
                        int in_hashes = info->initialized;
                        unsigned int ret;

                        if (ct->master
                            && master_ct(ct)->nat.info.helper
                            && master_ct(ct)->nat.info.helper->expect) {
                                ret = call_expect(master_ct(ct), pskb,
                                                  hooknum, ct, info);
                        } else {
#ifdef CONFIG_IP_NF_NAT_LOCAL
                                /* LOCAL_IN hook doesn't have a chain!  */
                                if (hooknum == NF_IP_LOCAL_IN) {
                                        ret = NF_ACCEPT;
                                } else
#endif
                                ret = ip_nat_rule_find(pskb, hooknum, in, out,
                                                       ct, info);
                        }

                        if (ret != NF_ACCEPT) {
                                WRITE_UNLOCK(&ip_nat_lock);
                                return ret;
                        }
--->                    if (in_hashes) {
|                               IP_NF_ASSERT(info->bysource.conntrack);
|                               replace_in_hashes(ct, info);
move1                   } else {
|                               place_in_hashes(ct, info);
--->                    }
                } else
                        DEBUGP("Already setup manip %s for ct %p\n",
                               maniptype == IP_NAT_MANIP_SRC ? "SRC" : "DST",
                               ct);
                WRITE_UNLOCK(&ip_nat_lock);
                break;