Per-CPU Data

A great technique for avoiding locking which is used fairly widely is to duplicate information for each CPU. For example, if you wanted to keep a count of a common condition, you could use a spin lock and a single counter. Nice and simple.

If that was too slow [it's probably not], you could instead use a counter for each CPU [don't], then none of them need an exclusive lock [you're wasting your time here]. To make sure the CPUs don't have to synchronize caches all the time, align the counters to cache boundaries by appending `__cacheline_aligned' to the declaration (include/linux/cache.h). [Can't you think of anything better to do?]

They will need a read lock to access their own counters, however. That way you can use a write lock to grant exclusive access to all of them at once, to tally them up.