netfilter project logo people.netfilter.org developer blogs

Patrick McHardy's blog

Wed, 28 May 2008

iptables release status


I managed to push out an iptables release candidate last week and another one this week. There are some issues with endian-annotated types in the netfilter headers when using ancient linux/types.h versions that need to be fixed before a final release, but we'll hopefully have a final 1.4.1 release by next monday.

Leaving for LinuxTag


I'll be leaving for my train to Berlin now. Amazingly it took me more time to decide what to put on my notebook than what to pack in my suitcase :) Hope to see you there ...

Fri, 23 May 2008

Flying outside


did not work to well ...

Geek toys


Received a bunch of things to play with this morning.

A customer wants me to develop a multipath tunneling protocol. The basic idea is to encapsulate packets, distribute them over N paths, decapsulate them on the other side and restore ordering using sequence numbers. This will allow to use both the combined upstream and downstream bandwidth for single connections. I wrote a prototype some time ago, but its very basic at this point and missing all the fancy features, like microflow seperation, delay aware distribution, dead path detection, etc.

I used to have two internet connections for a couple of months, but canceled the second one in march. Since there's nothing like real-life testing, I ordered a second cable connection again, which was installed this morning. So now I have two 32/2.5mbit connections, but only one of them is used currently. I probably won't be able to resist the urge to work on this for long :)

Additionally I received some VoIP testing equipment. Innovaphone kindly provided me with a H.323/SIP test setup consisting of 2 * 3 different telephone models, two PBXs and some test scenarios. We already support a lot of scenarios, my goal is to extend this as far as possible within reason.

The tricky part are things like call transfers crossing two NATs:

  Phone1-\                                              /------Phone3
          -[Registrar1]-[NAT1]-{  }-[NAT2]-[Registrar2]-
  Phone2-/                                              \------Phone4
 
When Phone1 calls Phone3 and the call is transfered to Phone2, we currently fail completely (for so far unknown reasons). The ideal outcome would be that NAT1 detects that the transfered call originated in the local network of Phone1 and the RTP streams are set up between Phone1 and Phone2 directly. This not only reduces latency, it avoids having internal calls go over the Internet. For normal calls between Phone1 and Phone2 using an external registrar this already works, provided that the registrar doesn't decide to proxy the calls.

Speaking of geek toys, ThinkGeek has an incredibly fun toy..

The two helicopters are controlled using IR remote controls. They can only go up and down by controlling main rotor speed and spin around the rotor-axis using the rear rotor, but have some small constant forward movement, which allows to fly them quite precisely. The most important feature however is that they can shoot at each other using IR. On the first hit it spins a bit, on the second one it looses power for a short period of time, on the third hit it completely looses power and goes down. This is accompanied by shooting sounds. Unfortunately they break pretty fast, I wrecked four of them within only a few days - well actually three of them, the fourth one was last seen flying over my neighbours garden :)

Being hooked on the fun, I got a a different model , which is controllable on all axis and supposed to be more robust.

Unfortunately it also has a lot more power, so I managed to wreck another one within hours: one gear broke, the tail bent to a 45° angle and then broke, the flight bar also took some damage. To be fair, it really is more robust (and luckily you can also order replacement parts), I'm just not used to the two two-dimensional controls. Instead of powering down when getting to close to the walls, I tried directing it away by pushing all sticks in the desired direction, causing it to increase power and crash in the wall.

I have some replacement parts and a second helicopter, but I'll try to resist flying inside again until I can get some training in a less dangerous environment. Regardless of these problems, they are really great fun :)

Thu, 22 May 2008

LinuxTag


I'll be visiting LinuxTag in Berlin next week, probably the entire day of Thursday and Friday until sometime in the afternoon. Until a couple of years ago, I used to visit LinuxTag annually, but then the quality of the presentation declined, with a lot of the topics being along the lines of "We're company XYZ and we're using Linux, yay". This year the talks look more interesting again, and I'm looking forward to meet with Harald and DaveM.

Anyone interested in meeting and discussing some netfilter or networking related topics, drop me an email, there should be plenty of time.

Fri, 16 May 2008

Netfilter move to git almost completed


I've completed the move of (most of) the netfilter repositories to git today. I still need to change the email notification script to make the commit emails more readable. They don't look very nice by default and I made it even worse. For today my limit of the amount of shell scripts I can look at is reached though.

These were the last SVN repositories I was using. I'm tempted to leave a long rant about SVN, but its probably better to simply forget about it as quickly as possible :)

Next I'll try if I can manage to roll a release candidate for iptables. We're currently releasing too infrequently. Since we're usually merging at least one new extension or revision per kernel release, there also should be one iptables release per kernel version, so users can actually use new things. The ideal time for this would be shortly before kernel releases, since that allows us to merge userspace extensions for things targeted at the next kernel release early enough so they can be used for testing. So thats what I'll try to do in the future. Luckily we didn't merge anything requiring new userspace extensions during the last merge window, so we won't need a new release for 2.6.26.

Wed, 14 May 2008

Illustrations


Apparently my blog is too boring to read, so Elena kindly offered to illustrate it. Since this spares me from writing some actual content, I gladly accepted.

What you're seing below is me resting in a deck chair, enjoying a Rothaus beer, exhaling some unidentified fumes and apparently being haunted by thoughts of ip_route_me_harder() :)

Wed, 07 May 2008

Summer office


The weather has been great the past days, so I set up my summer workplace :)

Working outside is really pleasant after a month of almost constantly grey sky. Below the balcony there's a small stream, and hundreds of birds sit in the trees and sing, which makes an amazing scenery.

I sent out a first batch of HIFN fixes today to avoid causing too much conflicts in the series in case something turns up during review. Caught a good time during which both Evgeniy and Herbert were responsive and it only took about an hour to get all patches reviewed, fix a minor bug and get them merged. The remaining ones are hopefully in shape by tommorrow, the descriptor accounting still needs a bit more work. Herbert also merged some patches from Loc Ho today for async hashing support, which is cool because I already started adding hashing support to the HIFN driver until I noticed the CrytoAPI doesn't support it asynchronously yet :)

Also sent out a few netfilter patches and fixed a slightly embarrasing bug in the macvlan driver. It would crash the kernel on module unload because cleanup was performed incorrectly, causing the kernel to jump to a NULL function pointer when receiving the next packet on the underlying device. I wonder why I've never noticed this.

Tue, 06 May 2008

Fighting the HIFN driver


What I hoped initially to be just a simple fix for a few arithmetic errors in the driver for the HIFN 795x crypto accelerator cards turned into a week long struggle, accompanied by at least a hundred crashes and reboots.

The initial bug manifested itself by going into an endless loop when the CryptoAPI issued a request for less data than the full scatterlist, caused by an integer underflow while calculating the remaining amount of data to be processed. The fix was straight-forward: only use the minimum of the scatterlist size and the crypto request size. While at it, I also fixed some endian bugs, missing error propagation for errors that shouldn't happen, but did because of the underflow, and some overly strict data alignment checks.

Testing looked good, no more crashes, but surprisingly the testcases of the tcrypt module using algorithms provided by HIFN randomly failed. This turned out to be caused by an incorrect return value indicating synchronous processing to the CryptoAPI, while the request was in fact processed asynchronously. So when the result was not already available when returning from the driver, testcases failed.

After fixing the tcrypt failures, next was some real-life testing using IPsec. The first attempt resulted in an immediate crash in crypto_authenc_genivc(). This one was fixed fairly quickly, the asynchronous completion handler interpreted a pointer as an incorrect structure.

The second attempt looked more promising, no crashes, packets went through and looked like IPsec. The remote side failed to parse them however, closer looking revealed that they were incorrectly constructed and had 16 bytes of garbage at the end. From my last attempt to fix the driver I remembered that this was most likely caused by missing initialization vector size initialisation of the CBC modes. Naively, I changed the driver to properly initialize the ivsize. To my surprise, attempting to add SAs using cbc(aes) now failed with -ENOENT.

Figuring out the reason took me almost an entire day. When the ivsize is already initialized, the CryptoAPI attempts to spawn a new instance of the algorithm. Algorithms are identified by name, possibly combined with modes, like cbc(aes). When spawning new algorithms, the driver name is used for the lookup however, which in the case of HIFN was "hifn-aes" for all AES modes, causing the lookup to return the ofb(aes) algorithm instead of cbc(aes). Using unique driver names for the different algorithm modes fixed this problem.

While chasing this bug, I noticed some DMA memory corruption issues in the HIFN driver. When a request contains more than a single scatterlist element, the driver programmed the hardware to perform one crypt operation per scatterlist element, but for the full request size, corrupting the memory after its tail. The fix for this was a bit more involved since using the correct length also requires to perform only a single operation for all scatterlist elements since the source and destination descriptors don't necessarily have identical lengths. This complicates keeping track of free descriptor entries. Previously, each operation needed exactly one command, source, destination and result descriptor. With only a single operation, it needs one command and result descriptor and a varying amount of source and destination descriptors. On the upside, this reduces the number of interrupts per request to exactly one instead of one per scatterlist element and gets rid of some atomic operations. Additionally tcrypt can now detect destination buffer corruption for cipher tests.

Continuing testing with IPsec, things now looked better, packets were properly sized and the receive side worked properly. Outgoing packets were still dropped by the receiver however. Looking more closely at the packets showed that they contained what looked like a block of unencrypted data at the end. Additionally there still were some rare random crashes in the CryptoAPI. The crashes were caused by a missing check for end-of-scatterlist in one of the CryptoAPI scatterlist helpers, the unencrypted block of data by an off-by-one in the eseqiv sequence number generator. Both problems were fixed by Herbert Xu. The first victory - IPsec now worked properly using ping. TCP connections stalled after a short period however.

Half a day later, I also figured out the reason for the stalls. The HIFN driver needs to keep some context for each request since it processes them asynchronously. The driver used the global per-transform storage for this context instead of the per-request storage, corrupting existing contexts when more than one request was outstanding. Even in flood mode, ping exhibits ping-pong behaviour, waiting for a reply before sending the next request, which is why it wasn't affected by this problem. With this also fixed, IPsec seemed to be working properly, at least on the HIFN side. There still appears to be some corruption of the XFRM CB with asynchronous processing, causing outgoing tunnel mode packets to be sent without IP_DF, but that should be easily fixed.

Next was testing with dm-crypt, for which I actually purchased the card. Testing worked fine while debugging was enabled, without debugging it reproduceably crashed in the device mapper code. This was fairly nasty to debug since enabling debugging stopped the bug from happening. After following lots of dead ends and some suggestions from Evgeniy, I found the cause: when no descriptors are currently available, the request is queued and processed once enough descriptors are available again. The queue length is limited (in the case of HIFN to 1), when the limit is reached the behaviour depends on the flags specified by the caller. When using CRYPTO_TFM_REQ_MAY_SLEEP, the caller goes to sleep and waits for notification from the driver when its ready to accept more requests. When dequeuing the crypto queue, asynchronous crypto drivers need to check for backlogged clients and wake them before continuing processing. This was missing from the HIFN driver, causing it to call the dm-crypt completion handler for a request that wasn't fully initialized.

With this bug also fixed, dm-crypt survived a 24 hour stress test. I'm a bit reluctant at this point to use it for real data though, all those bugs didn't exactly instill confidence. The patches are in an almost upstream-submittable state, just the descriptor accounting needs some minor cleanup. I hope to get this done today or tommorrow and then attend to the huge backlog in my inbox that has grown over the past week.

On the netfilter front, nothing too exciting has happened during the last two weeks. 2.6.25 appears to have gone pretty well, netfilter-wise, except for one nasty hashing regression on ARM, fixed by Philip Craig. The amount of patches merged during the 2.6.26 merge window was smaller than usual, the highlights are:

  • A large amount of SIP helper fixes and improvements
  • DCCP conntrack/NAT
  • UDP-Lite NAT
  • SCTP NAT
  • Completion of network namespace support for {ip,ip6,arp}_tables

I'm particulary happy about finally managing to merge the SIP helper patches, which I had queued for almost 9 month. If you've tried using it and it didn't work, now is a good time to try again and submit bug reports :)

Overcoming laziness


I decided to give blogging another try. My last attempt failed after just one or two entries because of me being too lazy to actually write something, but since I enjoy reading other people's blogs, I hope I can keep the motivation up a bit longer this time :)

Copyright (C) 2001-2005 Patrick McHardy