Have you collectl'd yet? If not, maybe collectl-utils will make it easier to do so
I'm not sure how many people who follow this have even tried collectl but I wanted to let you all know that I just released a set of utilities called strangely enough collectl-utils, which you can get at http://collectl-utils.sourceforge.net. One web-based utility called colplot gives you the ability to very easily plot data from multiple systems in a way that makes correlating them over time very easy.
Another utility called colmux lets you look at multiple systems in real time. In fact if you go the page that describes it in more detail you'll see a photo which shows the CPU loads on 192 systems one a second, one set of data/line! in fact the display so wide it takes 3 large monitors side-by-side to see it all and even though you can't actually read the displays you can easily see which systems are loaded and which aren't.
Anyhow give it a look and let me know what you think.
-mark
Reader Comments (5)
Very interesting. I will definitely check that out. Do you have any experience how long running collectl monitoring impacts performance?
I also picked up this tool called Visage. Might be worth a look in comparison. http://holmwood.id.au/~lindsay/2009/09/08/graphing-collectd-statistics-in-the-browser-with-visage/
Your mileage may vary, but a good starting point has been to use 0.2% of a single cpu on many systems taking samples at 10 seconds for most things and 60 second samples for slabs and processes. Another way to look at it is 0.1% is 86 seconds. collectl uses a little over 60 seconds to sample the processes and a little less than everything else. If you have a lot of processes or devices like disks, those numbers can go up. If you sample less things those numbers can go down. For example if you only sample CPUs, disk, networks and memory, you're down to less than 10 seconds or getting into the 0.01% range.
You can read more about it at http://collectl.sourceforge.net/Performance.html and just remember, some of the largest computers in the world, executing some extremely CPU intense processes (their CPUs live at close to 100%), run collectl continuously because the impact is felt to be minimal and the benefit into the cluster visibility deemed to be worth it.
-mark
If your UI looks that awful, don't include a picture. How is this better than yaketystats?
re: awful UI - are you talking about my UI? If so, which one, colplot? I've always prided myself on the fact that I didn't waste time with fancy colors, icons, etc but rather spent my time worrying about quality graphs.
As for visage it looks pretty cool, but the whole writeup is about collectd and my tool is called collectl.
However there was one comment in there that might be worth a deeper conversation and that has to do with RRD. Long ago I tried storing collectl data in RRD and found a couple of problems, at least for me. I have so much data - 8640 samples/day for each of potentially hundreds of variables it's overwhelming.
The other issue, and this is is really more of an issue with using the rrd plotting capability, is that I discovered the plots did not accurately reflect the data. For example think of those 8000+ samples of mine and then think of a plot that only shows 100 points. That means every point has to represent 80 data points as ether a min or a max or an average and no matter what you choose data is lost and I can't have that. I need all 8000 points to show up on my plots and that's exactly what gnuplot does and rrd doesn't. I hope visage faithfully plots all data and doesn't try to massage it in any way without permission.
btw - if you'd like to support plotting collectl data with visage let me know what I can do to help.
-mark