Oct 12, 2012 9:08 AM
Fix for WMI Windows drive usage stats
-
Like (1)
We love (and sometimes cuss at) Zenoss but are working to improve the great work put into the some of the Zenpacks we depend on like the Window WMI related ZenPacks.
One of the key components of our monitoring system is the stats on available drive space... for alerts of low disk space and also trend information for predictive analysis of 'drive full' issues, hints something is wrong (sudden jump or fall in disk space usage), and capacity planning.
Unfortunately, since moving to Zenoss 4.2.0 we have not been able to get the WMI/Windows drive usage stats to work properly at all. The linux command and SNMP modelling works great. So we are diving in to try and get this important piece sorted out.
Attached is our initial "quick" fix for the WMI FileSystemMap implementation. This provides the following changes to the current FileSystemMap for the WMI modelling:
1) Modelling and stats collection of Used Bytes and Free Bytes now works - but ONLY for Windows NFTS drives formatted to 4096KB block sizes (Windows default)
* the problem with the current drive space stats seems to be the blocksize 'guess' mechanism since Windows does not provide an easy way to get this info. For an initial quick fix, since almost all our monitored Windows systems use the default 4096KB allocation unit (blocksize), we just hard-coded the block size calc to use 4096. We are still working on a better way to do this rather than hard-coded blocksize
.
2) The formating of the drive information labels has been cleaned up (to our preferences) so it shows
Drive letter: Volume name (Serial No.: ###...)
eg D: Data Drive (Serial No: C835434D5)
3) Automatically ignores any drive names of "System Reserved" which eliminates the small 100MB boot/system partion in Windows 7/2008
4) Automatically ignores any drive volumes which start with "//?/" which is what non-native or raw partitions show up as (such as USB drives for Windows backup, some inaccessible but connected SAN volumes, etc.)
To implement our fix just replace (recommend to cp a backup copy of the original files first) the following two files in ZENOSS then restart ZENOSS:
FileSystemMap.py (source)
FileSystemMap.pyc (compiled python binary)
The standard ZENOSS folder location is:
/opt/zenoss/ZenPacks/ZenPacks.zenoss.WindowsMonitor-1.0.2-py2.7.egg/ZenPacks/zenoss/WindowsMonitor/modeler/plugins/zenoss/wmi
After copying the files, restart Zenoss with "service zenoss restart"
If you find the changes do not appear, you may need to remove the cached python binary file. To do this, navigate to the above wmi folder and delete the temp file with: "rm FileSystemMap.py~"
you can also download these two files from here:
http://www.cordeos.com/downloads/zenoss/FileSystemMap.zip
We will appreciate any comments or cooperative assistance to impove this great open source project.
Going forward, you may want to consider using a zProperty query for the block size - then users can change if necessary (easily)...
--
James Pulver
ZCA Member
LEPP Computer Group
Cornell University
we had considered (and already ran a few quick tests using) a zProperty for the blocksize. The problem we found is the zProperty will be one setting per host. The few machines we have which are formatted with different block sizes are very specific "performance" machines where the OS (C:) drive is standard 4096b blocksize, but the data or temp drives are formatted for large 16K blocksize. With a zProperty it would need to be the same blocksize for all disks in a single host - unless we made a chain of numeric zProperty settings which would get pretty messy.
Ahh, yes, I was thinking it would indeed be a per host setting. We don't usually use different settings for different disks inside a system here. Zenoss doesn't deal well with components that need special properties per component that I know of...
--
James Pulver
ZCA Member
LEPP Computer Group
Cornell University
Jay,
This is great stuff. Out of curiosity, have you file a bug report with patch attached to zenoss inc using their Jira system? I'd guess they would like to include the fixes (http://jira.zenoss.com)
Jay,
I'm just starting to use Zenoss again after quite a few years, and decided to try out a WMI connection to a Windows 2008 r2 host with a default and non-default block size after seeing this post, disk utilization is as you say a rather important item to track. Zenoss 4.2 seems to do fine with both of them, and it's size estimates match the OSes own readings so, I'd like to hear more about what kind of systems you were having problems with? I set my second drive to use 16K block size.
I'd like to understand more about where this monitoring breaks down, and dig into why if it's going to cause me problems. My test is obviously a very small test, but I'm happy to see it seems to work OK for Windows 2008 R2 at least, as our environment has lots of similar host.
Thanks
Pete
Hi, I patched these two files with the expected success: all logical volumes were gone.
Yesterday prior applying ZenUp SP71, I managed to remove this quick patch to make sure I don't mess with diffs. Surprise, the bug is back! Zenoss team apparently didn't include this fix in their SP, yet this bug has been raised and fixed months before the release of SP71. Shouldn't we expect such fixes to be included in next SP ?
Eric
Our quick WMI patch was really to address our specific preference for drive display, not a bug. Many people may well want to see external, system or other drives in the monitoring. What might be nice is a new feature to interactively filter drive monitoring - but this would be a bonus, not an expected bug fix.
Pete,
In most cases the drive block size educated guess by zenoss is correct and the size usage is then calculated correctly regardless of the block size set. This issue only crops up when you have non-standard or non aligned-to-sectors drive geometry. Then the calculation to get block size gets messed up (royally). This generally only occurs if you ghost clone, use repartitioning software, pull a virtual server from a physical machine or use some non standard drive geometry in a physical machine.
Ok. I understand your point of view. But trying to catch perf data from a component that doesn't work may not be a bug, but is undesirable. I wouldn't mind seeing Logical Volumes stats if they were working, but since they make Zenoss trash errors because WMI counters are not available, these should be ejected permanently as the patch does. If you were about to find a solution, I wouldn't mind, but I understood it is a WMI limitation so why keeping it reported when no data comes in?
Follow Us On Twitter »
|
Latest from the Zenoss Blog » | Community | Products | Services Resources | Customers Partners | About Us | ||
Copyright © 2005-2011 Zenoss, Inc.
|
||||||||