Colmux - Finding Memory Leaks, High I/O Wait Times, and Hotness on 3000 Node Clusters

Todd had originally posted an entry on collectl here at Collectl - Performance Data Collector. Collectl collects real-time data from a large number of subsystems like buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp, all using one tool and in one consistent format.
Since then a lot has happened. It's now part of both Fedora and Debian distros, not to mention several others. There has also been a pretty good summary written up by Joe Brockmeier. It's also pretty well documented (I like to think) on sourceforge. There have also been a few blog postings by Martin Bach on his blog.
Anyhow, awhile back I released a new version of collectl-utils and gave a complete face-lift to one of the utilities, colmux, which is a collectl multiplexor. This tool has the ability to run collectl on multiple systems, which in turn send all their output back to colmux. Colmux then sorts the output on a user-specified column and reports the 'top-n' results.
For example, here's an example of the top users of slab memory from a 41 node sample: