Entries by Mark Seger (2)

Thursday
Aug252011

Colmux - Finding Memory Leaks, High I/O Wait Times, and Hotness on 3000 Node Clusters

Todd had originally posted an entry on collectl here at Collectl - Performance Data Collector. Collectl collects real-time data from a large number of subsystems like buddyinfo, cpu, disk, inodes, infiniband, lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp, all using one tool and in one consistent format.

Since then a lot has happened.  It's now part of both Fedora and Debian distros, not to mention several others. There has also been a pretty good summary written up by Joe Brockmeier. It's also pretty well documented (I like to think) on sourceforge. There have also been a few blog postings by Martin Bach on his blog.

Anyhow, awhile back I released a new version of collectl-utils and gave a complete face-lift to one of the utilities, colmux, which is a collectl multiplexor.  This tool has the ability to run collectl on multiple systems, which in turn send all their output back to colmux.  Colmux then sorts the output on a user-specified column and reports the 'top-n' results.  

For example, here's an example of the top users of slab memory from a 41 node sample:

Click to read more ...

Friday
Oct092009

Have you collectl'd yet? If not, maybe collectl-utils will make it easier to do so

I'm not sure how many people who follow this have even tried collectl but I wanted to let you all know that I just released a set of utilities called strangely enough collectl-utils, which you can get at http://collectl-utils.sourceforge.net. One web-based utility called colplot gives you the ability to very easily plot data from multiple systems in a way that makes correlating them over time very easy.

Click to read more ...