Some background
I struggled a long time with differences in what df shows me and what zenoss shows me.
After opening a ticket for this rare issue I now know why and wanted to share this information with you.
When a linux ext2/3 filesystem is created a portion of the total blocks will be reservated for the root user.
This is clever since this makes it impossible for user processes to fillup the complete space.
Root processes like syslog are still able to write to the disk until the reservated space is reached.
If you didn't tune the ext2/3 creating properties the reservated space is 5% of the total blocks.
A tool like df will report the filesystem size minus the 5%
This is correct since if you reach 100% level you are not allowed to write anymore.
But, and now it comes, when you use snmp to query the disk usage it shows the usage inculding the 5%This means when a filesystem is 100% full reported by df zenoss will only report it as 95%
This is techniclly correct, but 95% means 100%, means no disk space anymore.
Together with the zenoss support we created a working sollution to beat this issue.
This however is a temp sollution, zenoss is aware of this and progress can be found in this ticket:
http://dev.zenoss.org/trac/ticket/4378
Add a new zProperty
Let's create a new zProperty to say "This filesystem needs a correction"
The default is 1,
# zendmd
dmd.Devices._setProperty("zFileSystemSizeOffset", 1.0, type="float")
commit()
Create a transform rule for filesystem events
Now we need a transform rule on the filesystem which calculates the % and the available space based on the zProperty
Point your browser to the following URL: http://YOUR-ZENOSS:8080/zport/dmd/Events/Perf/Filesystem/editEventClassTransform
This brings you to a event transform rule. It is will be called when an event comes in for /Perf/Filesystem.
And this is the eventclass where the filesystem thresholds will be in.
Now add this to the rule box:
for f in device.os.filesystems():
if f.name() != evt.component: continue
# Extract the percent and free from the summary
import re
m = re.search("threshold of [^:]+: current value ([\d\.]+)", evt.message)
if not m: continue
usedBlocks = float(m.groups()[0])
totalBlocks = f.totalBlocks * getattr(device, "zFileSystemSizeOffset", 1)
p = (usedBlocks / totalBlocks) * 100
freeAmtGB = ((totalBlocks - usedBlocks) * f.blockSize) / 1073741824
# Make a nicer summary
evt.summary = "Filesystem threshold exceeded: %3.1f%% used (%3.2f GB free)" % (p,freeAmtGB)
break
You may notice that I multiply the totalblock with the zFileSystemSizeOffset so if the zFileSystemSizeOffset = 1 (which is default) there is no change in the calculation.
Only when setting it to diferent then 1 like .95 it wil calculate it with 5% difference
Changing the threshold level
Go the the following url:
http://your-zenoss-server:8080/zport/dmd/Devices/Server/rrdTemplates/FileSystem
And click on the defined threshold
Now change the Max Value to:
(here.totalBlocks * here.zFileSystemSizeOffset ) * .90
Changing the graph point
Go the the following url:
http://your-zenoss-server:8080/zport/dmd/Devices/Server/rrdTemplates/FileSystem
And click on the graph definition: Utilization
Then choose the datapoint usedBlocks
Change the RPN to:
${here/totalBlocks},${here/zFileSystemSizeOffset},*,/,100,*
What about the OS tab ?
The OS tab of a device shows a table of all filesystem.
At this point I'm not able to change the template so that it also calculates the the 5%
This means that the OS tab will still say there is 95% in use while it is 100%
I'm looking for a solution here.
How does it work ?
Browse to the linux device and go the the zProperties of the device.
Now change the zFileSystemSizeOffset to .95
You could also do this on the device class /Devices/Server/Linux like I did.
Zenoss will now calulate the % and available space with a 5% difference in mind.