Archived community.zenoss.org | full text search
Skip navigation
4909 Views 3 Replies Latest reply: May 21, 2012 5:25 AM by jcurry RSS
zenewb Newbie 2 posts since
May 16, 2012
Currently Being Moderated

May 17, 2012 1:12 AM

Zenoss interface monitoring problems

Hi,

 

I have been trying to monitor interface status for a couple of different types of devices and I feel I have hit a brick wall. I would be very grateful for support from the community in helping us sort out these issues:

 

Requirements:

1. Polling interface status for certain interfaces on Alcatel 7450 Switches.

2. Polling interface status for certain interfaces aggregates on a number of Solaris boxes.

 

We would like to receive alerts when a specific interface would go from up/up to up/down and vice versa. The document I followed to configure this is docs/DOC-2494

 

 

Problems faced with requirement 1 above ( Polling interface status for certain interfaces on Alcatel Switch RTEMR0000001TS)

1. On the Alcatel 7450 switch, for an interface 1/2/23, that is marked to be monitored with 'monitored=true', the status of the interface does not seem to get updated, neither is an alert raised when the ifOperStatus goes from down to up and vice versa. The 'Interfaces' section of Zenoss also continues to show (for example) 'Up/Down', when actually the interface is admin up, and oper up, and thus should be Up/Up. A manual snmpwalk on oid  1.3.6.1.2.1.2.2.1.8 returns values, an snmpget on sub oid`s returns values as well. We are using snmpv3 with authPriv for polling. Polling interval is set for 300 seconds.

emr1ts ethernetcsmacd.jpgemr1ts interfaces.jpg

emr1ts config properties.png

 

snmpwalk:

# snmpwalk -v 3 -l authPriv -a SHA -A xxxxxx -x DES -X xxxxxxx -u SNMP 192.168.16.129 1.3.6.1.2.1.2.2.1.8 | head -n50

IF-MIB::ifOperStatus.1 = INTEGER: up(1)

IF-MIB::ifOperStatus.2 = INTEGER: up(1)

IF-MIB::ifOperStatus.3 = INTEGER: up(1)

IF-MIB::ifOperStatus.4 = INTEGER: up(1)

IF-MIB::ifOperStatus.5 = INTEGER: lowerLayerDown(7)

IF-MIB::ifOperStatus.6 = INTEGER: up(1)

IF-MIB::ifOperStatus.35684352 = INTEGER: up(1)

IF-MIB::ifOperStatus.35717120 = INTEGER: down(2)

IF-MIB::ifOperStatus.35749888 = INTEGER: up(1)

IF-MIB::ifOperStatus.35782656 = INTEGER: up(1)

IF-MIB::ifOperStatus.35815424 = INTEGER: up(1)

IF-MIB::ifOperStatus.35848192 = INTEGER: up(1)

IF-MIB::ifOperStatus.35880960 = INTEGER: up(1)

IF-MIB::ifOperStatus.35913728 = INTEGER: up(1)

IF-MIB::ifOperStatus.35946496 = INTEGER: down(2)

IF-MIB::ifOperStatus.35979264 = INTEGER: down(2)

IF-MIB::ifOperStatus.36012032 = INTEGER: down(2)

IF-MIB::ifOperStatus.36044800 = INTEGER: up(1)

IF-MIB::ifOperStatus.36077568 = INTEGER: up(1)

IF-MIB::ifOperStatus.36110336 = INTEGER: down(2)

 

#snmpget -v 3 -l authPriv -a SHA -A xxxxx -x DES -X xxxxx -u SNMP 192.168.16.129 1.3.6.1.2.1.2.2.1.8|head -n50

IF-MIB::ifOperStatus = No Such Instance currently exists at this OID

(I am assuming that Zenoss would take care of getting individual values as per below)

#snmpget -v 3 -l authPriv -a SHA -A xxxxx -x DES -X xxxxx -u SNMP 192.168.16.129 1.3.6.1.2.1.2.2.1.8.1|head -n50

IF-MIB::ifOperStatus.1 = INTEGER: up(1)

#snmpget -v 3 -l authPriv -a SHA -A xxxxx -x DES -X xxxxx -u SNMP 192.168.16.129 1.3.6.1.2.1.2.2.1.8.2|head -n50

IF-MIB::ifOperStatus.2 = INTEGER: up(1)

 

> When I re-model the device, the interface status changes to Up/Up, however, we would like the interface status for 1/2/23 in the 'Interfaces' tab to change to Up/Up and generate an alert. I have also tried deleting/re-adding the device.

 

As an aside, when looking at zenpersnmp logs, we see regular 'oid 1.3.6.1.2.1.2.2.1.8 is bad' errors for the switch.

2012-05-17 14:32:40,290 WARNING zen.zenperfsnmp: Error reading value for "ifOperStatus" on 192.168.16.129 (oid .1.3.6.1.2.1.2.2.1.8 is bad)

2012-05-17 14:32:40,291 WARNING zen.zenperfsnmp: Error reading value for "ifOutErrors" on 192.168.16.129 (oid .1.3.6.1.2.1.2.2.1.20 is bad)

2012-05-17 14:32:40,291 WARNING zen.zenperfsnmp: Error reading value for "ifOutUcastPackets" on 192.168.16.129 (oid .1.3.6.1.2.1.2.2.1.17 is bad)

2012-05-17 14:32:40,292 WARNING zen.zenperfsnmp: Error reading value for "ifOutOctets" on 192.168.16.129 (oid .1.3.6.1.2.1.2.2.1.16 is bad)

2012-05-17 14:32:40,292 WARNING zen.zenperfsnmp: Error reading value for "ifInErrors" on 192.168.16.129 (oid .1.3.6.1.2.1.2.2.1.14 is bad)

2012-05-17 14:32:40,292 WARNING zen.zenperfsnmp: Error reading value for "ifInUcastPackets" on 192.168.16.129 (oid .1.3.6.1.2.1.2.2.1.11 is bad)

2012-05-17 14:32:40,292 WARNING zen.zenperfsnmp: Error reading value for "ifInOctets" on 192.168.16.129 (oid .1.3.6.1.2.1.2.2.1.10 is bad)

 

Not sure what is going on here, since the throughput graphs (that depend on ifInOctets and ifOutOctets) seem to be generating just fine (as shown in the screenshots).

 

 

Problems faced with requirement 2 above (Polling interface status for certain interfaces aggregates shows up as 'Unknown' on a number of Solaris boxes.):

> All aggregates on the solaris box seem to show up as Up/Unknown although the admin status as well as oper status is both 'Up'. snmpwalk/snmpget (as above) seems to return valid and correct values. I have tried deleting/re-adding/remodelling the device, no luck.  We see constant 'oid is bad' errors for this device as well, although the graphs seem to generate just fine.

> What could the cause of this be? Why does Zenoss show the interface as Unknown status when actually it is 'Up'? As an aside, when i deleted and re-added the device, the interfaces showed up as Up/Up for a short while, after which they went into 'Up/Unknown' status, thus trigerring the ifOperStatus warning (not even these alarms get generated in point 1 above).

Also, how could I go about creating an event transform (or other) that would suppress alerts when the interface is in 'Unknown' status and only generate an event when it is in 'Down' status?

 

5dv_interfaces.png

 

5dv_event.png

 

Guys, any help would be greatly appreciated, i am standing by to provide any further logs and answer any queries.

  • jcurry ZenossMaster 1,021 posts since
    Apr 15, 2008
    Currently Being Moderated
    1. May 17, 2012 5:19 AM (in response to zenewb)
    Re: Zenoss interface monitoring problems

    What a well detailed problem!  Not sure whether you are looking for "free" pointers or whether you are looking for "fee" come-and-fix-my-problem help???

     

    The issue with your first scenario is the difference between configuration polling and performance monitoring.  The panel that show the details of your interfaces, including the Admin / Oper Up / Down data, is "configuration" data that is gathered by the zenmodeler daemon, typically every 12 hours or when you do a manual Model Device.  The data collected by zenmodeler is controlled by the modeler plugins applied either to a specific device or to a device class hierarchy.  The data is held in the Zope database. 

     

    This particular interfaces panel is somewhat unusual in that it also has some data (the Admin / Oper stuff) that could also be viewed as performance data.

     

    Performance data is gathered by performance templates.  SNMP data is gathered by the zenperfsnmp daemon, typically every 5 minutes.  Command templates are driven by the zencommand daemon and you can control the polling interval.  Data is held in Round Robin Database (RRD) files.   In addition to polling for performance data, a template can also threshold that data and that's where you can generate an alert as per the append you referenced.  There is no connection between the interface status held in the Zope database and the template that you have setup to poll for the same ifOper / ifAdmin values in a performance template. 

     

    Hence, your interfaces panel will only show the status change on the next modeler poll - as you have found.  You can generate a TRAP from your template every 5 minutes when zenperfsnmp runs, but it won't affect the Zope database.

     

    I guess what you need is an event transform for your TRAP that modifies the value in the Zope database, based on the value that the performance template brings back.  Not sure whether you can do this but I suspect so.  Perhaps someone else could contribute that???  It would be a good addition to the append that you referenced above.

     

    If you don't get a solution, feel free to come back and we could talk about a small piece ofconsultancy to solve the problem.

     

    Cheers,

    Jane

  • jcurry ZenossMaster 1,021 posts since
    Apr 15, 2008
    Currently Being Moderated
    3. May 21, 2012 5:25 AM (in response to zenewb)
    Re: Zenoss interface monitoring problems

    A couple of thoughts.....

     

    Does the event class /Status/IpInterface actually exist??  It does in some versions of Zenoss and not in others.  If it doesn't exist, simply create it.

     

    Your first snmpget command - the one getting 1.3.6.1.2.1.2.2.1.8 gives a bad instance because you are asking for a specific OID with an snmpget, which has to include the instance number, whereas snmpwalk simply gives a starting point and asks for everything that starts with tha OID.  The way that the component performance templates work (like for interfaces, filesystems, etc), is that when the device is modelled, the instance number is gathered as part of the modelling cycle (every 12 hours).  For interfaces, this is the SNMP index number, ifIndex.  In your first snmpwalk above, you can see these numbers as the last element of the OID, starting 1,2,3,4,5,6,35684352.... and so on.  This value is used to populate the snmpindex attribute of your component (interface in this case).  When the performance template runs for a particular interface, the snmpindex is added to the snmpget (note that templates do you use snmpget, not snmpwalk). 

     

    One thing that can really mess up Zenoss (and any system that works with SNMP this way) is if the index numbers of interfaces change.  With real interfaces, this almost never happens unless you replace a physical NIC card but with logical interfaces, some implementations do change the ifIndex - could this be happening??

     

    The "oid <blah> is bad" may simply be timeouts.  Have you checked that you can snmpwalk those devices that are complaining in the log?  If it is intermittent, try increasing the zSnmpTimeout from the standard 2.5 seconds.

    If the standard performance templates for interface miss values occasionally then the underlying rrd subsystem will probably cope.

     

    You have posted an event of class /Status/IpInterface (so forget question about does it exist - it obviously does!).  Are you saying that you never see these for switches??  Your template is obviously correct if it can generate an event for some sort of a device.  Have you tested with snmpwalk when you have taken the interface administratively down?

     

    Your Solaris one is rather more weird - not sure what your "aggregate" interfaces are (many years since I worked with Solaris)??  Could you post an snmpwalk of the if table for a test Solaris box?  If snmpwalk is getting the right answers then Zenoss should get exactly the same answer - unless there is stuff going on with the ifIndex / snmpindex as discussed above.

     

    Cheers,

    Jane

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points