Problem with snmp and faulty size reporting.

Up to Discussions in zenoss-users

43996 Views 5 Replies Latest reply: Nov 24, 2009 10:39 AM by chitambira

Falk

89 posts since
Jul 27, 2007

Currently Being Moderated

Jul 8, 2008 3:51 AM

Hi,

When monitoring fs over snmp i get faulty redings for all linux servers.
I guess that it is a problem with my snmpd conf.

here it is.

On the server:


 Mount   Total bytes   Used bytes   Free bytes   % Util        
 /          9.4GB          8.8GB          565.5MB      94

and in zenoss:


# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.4G  8.9G   78M 100% /
tmpfs                 253M     0  253M   0% /lib/init/rw
udev                   10M   52K   10M   1% /dev
tmpfs                 253M     0  253M   0% /dev/shm

Any ideas why this is?
Is it tmpfs that take the resting space and isn't reported on / by snmpd?

--
Regards Folke

Like (0)

beanfield
161 posts since
Apr 16, 2008

Currently Being Moderated

1. Jul 8, 2008 1:26 PM (in response to Falk)
RE: Problem with snmp and faulty size reporting.

I get similar results. For instance:

# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda3 61G 5.5G 52G 10% /

and in zenoss:

Mount Total bytes Used bytes Free bytes % Util Lock
/ 60.4GB 5.4GB 54.9GB 8

However, if I use "df" without the -h option (human readable), I get the following:

# df
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/xvda3 63282372 5665584 54402184 10% /

So I have 5,665,584 kbytes of used space. If I take the Used 1k blocks 5665584 and divide by 1024 twice I get 5.4031219482421875. So even though size available should be 5.4G, df -h option rounds up to 5.5G. To answer your question directly, I believe it is the -h option on df that's rounding incorrectly. It is confirmed by df without using the -h option and below. You can see what snmp is returning by doing an snmpwalk on hrStorageTable.

snmpwalk -v2c -c COMMUNITY HOSTNAME hrStorageSize

You can get the snmp index number of the volume that you're trying to find the size on by clicking on it under File Systems under the OS tab. For example, the system above is my zenoss server and I'll query it using "localhost" for the HOSTNAME and public for the COMMUNITY. The Snmp Index is 4.

# snmpwalk -v2c -c public localhost hrStorageTable
HOST-RESOURCES-MIB::hrStorageIndex.1 = INTEGER: 1
HOST-RESOURCES-MIB::hrStorageIndex.2 = INTEGER: 2
HOST-RESOURCES-MIB::hrStorageIndex.3 = INTEGER: 3
HOST-RESOURCES-MIB::hrStorageIndex.4 = INTEGER: 4
HOST-RESOURCES-MIB::hrStorageIndex.5 = INTEGER: 5
HOST-RESOURCES-MIB::hrStorageIndex.6 = INTEGER: 6
HOST-RESOURCES-MIB::hrStorageIndex.7 = INTEGER: 7
HOST-RESOURCES-MIB::hrStorageType.1 = OID: HOST-RESOURCES-TYPES::hrStorageOther
HOST-RESOURCES-MIB::hrStorageType.2 = OID: HOST-RESOURCES-TYPES::hrStorageRam
HOST-RESOURCES-MIB::hrStorageType.3 = OID: HOST-RESOURCES-TYPES::hrStorageVirtualMemory
HOST-RESOURCES-MIB::hrStorageType.4 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageType.5 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageType.6 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageType.7 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
HOST-RESOURCES-MIB::hrStorageDescr.1 = STRING: Memory Buffers
HOST-RESOURCES-MIB::hrStorageDescr.2 = STRING: Real Memory
HOST-RESOURCES-MIB::hrStorageDescr.3 = STRING: Swap Space
HOST-RESOURCES-MIB::hrStorageDescr.4 = STRING: /
HOST-RESOURCES-MIB::hrStorageDescr.5 = STRING: /sys
HOST-RESOURCES-MIB::hrStorageDescr.6 = STRING: /sys/kernel/debug
HOST-RESOURCES-MIB::hrStorageDescr.7 = STRING: /boot
HOST-RESOURCES-MIB::hrStorageAllocationUnits.1 = INTEGER: 1024 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.2 = INTEGER: 1024 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.3 = INTEGER: 1024 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.4 = INTEGER: 4096 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.5 = INTEGER: 4096 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.6 = INTEGER: 4096 Bytes
HOST-RESOURCES-MIB::hrStorageAllocationUnits.7 = INTEGER: 1024 Bytes
HOST-RESOURCES-MIB::hrStorageSize.1 = INTEGER: 1048740
HOST-RESOURCES-MIB::hrStorageSize.2 = INTEGER: 1048740
HOST-RESOURCES-MIB::hrStorageSize.3 = INTEGER: 1052248
HOST-RESOURCES-MIB::hrStorageSize.4 = INTEGER: 15820593
HOST-RESOURCES-MIB::hrStorageSize.5 = INTEGER: 0
HOST-RESOURCES-MIB::hrStorageSize.6 = INTEGER: 0
HOST-RESOURCES-MIB::hrStorageSize.7 = INTEGER: 101086
HOST-RESOURCES-MIB::hrStorageUsed.1 = INTEGER: 19500
HOST-RESOURCES-MIB::hrStorageUsed.2 = INTEGER: 1039048
HOST-RESOURCES-MIB::hrStorageUsed.3 = INTEGER: 922588
HOST-RESOURCES-MIB::hrStorageUsed.4 = INTEGER: 1416399
HOST-RESOURCES-MIB::hrStorageUsed.5 = INTEGER: 0
HOST-RESOURCES-MIB::hrStorageUsed.6 = INTEGER: 0
HOST-RESOURCES-MIB::hrStorageUsed.7 = INTEGER: 17733

I can see that hrStorageDescr.4 is "/" so I know I'm querying the correct volume. My hrStorageAllocationUnits.4 is 4096 Bytes which is my block size. hrStorageSize.4 is 15820593...and multiplying that by 4096 I get 64801148928. That's the total size of my disk in bytes. hrStorageUsed.4 is 1416399 and multiplying that by 4096 I get 5801570304. That's the total size of used space in bytes. Note: if you wanted to get free space you'd have to subtract the size of the disk by the size of used space as there is no oid in this mib for available space.

Anyway, my used bytes is 5801570304, and if I divide this by 1024 I get the same used 1k blocks from df (without the -h) as above 5665596.

Again, that's all if I understand this correctly. happy

Report Abuse

Like (0)
Falk
89 posts since
Jul 27, 2007

Currently Being Moderated

2. Jul 14, 2008 5:42 AM (in response to beanfield)
RE: Problem with snmp and faulty size reporting.

"beanfield" wrote:

Again, that's all if I understand this correctly. :)

The thing is that the disk is full before it sends out an alert.
I have put that value to 98% ( :oops: ) so when the small disks shows the wrong value things go bad :p

Off course i can change the value for critical, but it would be better if the graphs where correct..

Did you change anything in you graph template so that it showed the correct value?

--
Regards Folke

Report Abuse

Like (0)
beanfield
161 posts since
Apr 16, 2008

Currently Being Moderated

3. Jul 14, 2008 10:39 AM (in response to Falk)
RE: Problem with snmp and faulty size reporting.

"folke" wrote:

"beanfield" wrote:

Again, that's all if I understand this correctly. :)

The thing is that the disk is full before it sends out an alert.
I have put that value to 98% ( :oops: ) so when the small disks shows the wrong value things go bad :p

Off course i can change the value for critical, but it would be better if the graphs where correct..

Did you change anything in you graph template so that it showed the correct value?

--
Regards Folke

hmmm....when you say "the disk is full before it sends out an alert", how are you finding out that the disk is full? Are you getting errors when you try and write to the disk in your terminal...and more accurately are you getting errors in /var/log/messages to the effect "no space left on device"?

The only reason I ask is because some utilities/scripts that watch disk space actually just do a "df -h", which as pointed out before will incorrectly round up and say the disk is full when it still may have some space available. For instance, logwatch does this (at least on debian it does). I installed logwatch and looked at the script that calculates free space /usr/share/logwatch/scripts/services/zz-disk_space:
if ($OSname eq "Linux") { $df_options = "-h -l -x tmpfs";

So if you're basing the disk status off of something like logwatch....it could be incorrectly reporting that the disk is full. Can you provide the following output from a server that you know has the issue where it filled the disk before zenoss alerted?

df -h
df
snmpwalk -v2c -c COMMUNITY_STRING HOSTNAME 1.3.6.1.2.1.25.2.3

Be sure and replace "COMMUNITY_STRING" and "HOSTNAME" appropriately.

Report Abuse

Like (0)
Falk
89 posts since
Jul 27, 2007

Currently Being Moderated

4. Jul 16, 2008 6:11 AM (in response to beanfield)
RE: Problem with snmp and faulty size reporting.

"beanfield" wrote:

So if you're basing the disk status off of something like logwatch....it could be incorrectly reporting that the disk is full. Can you provide the following output from a server that you know has the issue where it filled the disk before zenoss alerted?

df -h
df
snmpwalk -v2c -c COMMUNITY_STRING HOSTNAME 1.3.6.1.2.1.25.2.3

Be sure and replace "COMMUNITY_STRING" and "HOSTNAME" appropriately.

I'm not at work for some time, but I'll try to connect to the zenoss server and see what the snmpwalk outputs.

The server borked with mysql in the frontline trying to write to the disk.
So I cleaned up some logs and then we where up to speed again.

--
Regards Folke

Report Abuse

Like (0)
chitambira
711 posts since
Oct 15, 2008

Currently Being Moderated

5. Nov 24, 2009 10:39 AM (in response to Falk)
Re: RE: Problem with snmp and faulty size reporting.

This is a very old thread, but I thot someone might stumble here when searching for a similar problem.
The issue with linux filesystems is the filesystem offset, whcih is not caounted for by default in zenoss, but df counts for it.
If you look at the two cases given above, you can see that total and used are both ok, but the free is the culprit algorithm to calculate free space should be:

free = (total*offset) - used

offset is normarly 0.05 (or 5%) for linux filesystems, which means 5% of the filesystem is reserved
docs/DOC-3233

Report Abuse

Like (0)

Go to original post

Jul 8, 2008 3:51 AM

Problem with snmp and faulty size reporting.

Actions

More Like This