Interface status in Zenoss Core 2.5

Up to Discussions in zenoss-users

115072 Views 14 Replies Latest reply: Jan 11, 2010 4:11 AM by mlist

Weetos

133 posts since
Oct 17, 2008

Currently Being Moderated

Nov 4, 2009 4:14 AM

Hey there

I recently upgraded from stack 2.4.5 to 2.5, and the way interfaces are monitored now is a bit confusing :

Last night at one of our locations, some tech unplugged a cable that links a switch "A" to a switch "B". "A" is monitored by Zenoss, "B" isn't. I should have received an alert saying the corresponding port went down on switch "A" (ifOperStatus.x = 2), but I haven't.

Before 2.5, an interface with AdminStatus up and OperStatus going from up to down generated an alert, now it seems that only a "/Change/Set' info event appears, generating no alert.

Zenoss apparently thought that was like someone changed the admin status of the interface :

set attribute 'operStatus' to '2' on object 'GigabitEthernet1_0_2'

OperStatus shouldn't be mapped to /Change/Set by default - What the heck ?

Since I didn't started snmp trapping on this device, I understand I wouldn't have an alert in realtime, but I didn't expect I wouldn't receive anything at all. In my book, OperStatus going from up to down between two polling cycles MUST generate an alert with at least severity level set to "error" or "warning" by default - Then, if the OperStatus goes from down to up, this would clear the alert. Does that mean we can only use snmp traps for interface monitoring ? What if a trap get lost ?

If anybody can shed some light on this, I'd be more than happy

Like (0)

tech2000

23 posts since
Dec 14, 2007

Currently Being Moderated

1. Nov 4, 2009 10:55 AM (in response to Weetos)

Re: Interface status in Zenoss Core 2.5

I have the same problem.

However, we have 12 core 10G links (24 ports) to monitor that are critical and they are the same ports on all 12 devices

Here are a few related links in case you haven't check these already

This details the change that was made to the interface monitoring:
 
http://dev.zenoss.org/trac/ticket/4997
 
The following changes are pending:
 
http://dev.zenoss.org/trac/ticket/5627
http://dev.zenoss.org/trac/ticket/5628

Here is the what my event transform is. Note that everything was fine prior to upgrade
and follow one of the interface monitoring how tos from the forum.
Prior to upgrading all that was require for a Net/Link transform was to change the
summary and/or message.

Core(2.5):

if hasattr(evt, 'component'):
   fs_id = device.prepId(evt.component)
   for iface in device.os.interfaces():
      if iface.id == fs_id:
          if iface.adminStatus == 1:
             if (fs_id == "Unit 1 Port 25") or (fs_id == "Unit 1 Port 26") or (fs_id == "Unit 1 Port 27") or \
 (fs_id == "Unit 1 Port 28") :
                evt.severity = 5
             else:
                evt.severity = 2 
if evt.summary.startswith('threshold of operational status exceeded'):
        evt.summary = "Interface operationally down"
elif evt.summary.startswith('threshold of operational status restored'):
        evt.summary = "Interface operationally up"

Core(2.4.x):

if evt.summary.startswith('threshold of operational status '):
    if evt.severity > 0:
        evt.summary = "interface operationally down"
    else:
        evt.summary = "interface operationally up"

Note: if you do the later now on 2.5 you get many alerts for interfaces that are Admin'd (UP), Oper'l(DOWN) which is a normal occurance for many, Ports left active but not used, etc. because for some reason Zenoss now sees the port status as active even if the Operstatus is down.

I too would like a final solution for this and not some hack.

Operstatus and Adminstatus should no way be linked unless adminStatus is down. Why this is, I do not know.

If anyone can shed anymore light on this, that would be great, possible 1 of the devs

Report Abuse
Like (0)

Ryan Matte
653 posts since
Mar 26, 2009

Currently Being Moderated

2. Dec 10, 2009 3:50 PM (in response to Weetos)
Re: Interface status in Zenoss Core 2.5

Zenoss has never had built in polling for interfaces (though it is currently a requested feature and should be implemented some time in the future).

This is the way that it has to be done:

docs/DOC-2494

With King Crab, it is a Status icon instead of O/A. When you implement that mapping/transform it should be in /Status/IpInterface instead of /Net/Link. Create /Status/IpInterface if it's not already there. According to Chet, that should allow for the port status to change when the port is seen as being operationally down.

Also, that /Change/Set event that you saw was as a result of a remodelling. So zenmodel must have remodeled the device as the port was down, and since you didn't have the interfaces locked on the device it marked the port as being down and set it to not monitored.

You need to implement SNMP polling for interface status. It is not currently an out-of-box feature. So you basically had no polling enabled, no traps enabled, and the device got remodeled while the interface was down and not locked. This is more user error than anything.

Report Abuse

Like (0)
mlist
49 posts since
Mar 27, 2009

Currently Being Moderated

3. Jan 7, 2010 9:22 AM (in response to Ryan Matte)
Re: Interface status in Zenoss Core 2.5

Hi Ryan

I still have some confusion about this item and I kindly ask to read my synthesis and tell me if I have correctly understood:

If I have correctly understood, when you add a device (for example a cisco switch), zenoss will monitor ALL interfaces and, starting from zenoss 2.5 it will generate rrd graphs for all interfaces (this is explained by jcurry in this post message/44295#44295) but won't generate alerts whenever interface's status change from up to down. In my opinion this default setting is good because otherwise you could receive lot of alerts (I think to a 48 switch port on which are connected user's pc that often power off their computer) but, could be there situations in which is necessary to get alarms when specific interfaces on a specific device go down (or on a class like /network/cisco/router).
In this case the solution is in the link that you specified (docs/DOC-2494) but it is not correct because the event class is not "/Net/Link" but "/Status/IpInterface".
Questions:

-Does I have correctly understood how zenoss manage the status of interfaces (up/down) and the related alerts?
-Could you fix the documentation in the link?

Please feel free to correct me and give me some tips.

Regard
Marco

Report Abuse

Like (0)
Ryan Matte
653 posts since
Mar 26, 2009

Currently Being Moderated

4. Jan 7, 2010 10:04 AM (in response to mlist)
Re: Interface status in Zenoss Core 2.5

Not quite,

When you first model a device Zenoss will only set the interfaces that are shown as being active to monitored. It is then YOUR responsibility to go through and only set the ports that you want monitored to monitored status when commissioning the device. Interface status polling is more effective than traps in my opinion and it should be a built in option (but not necessarily enabled by default).

You also misunderstood the part about /Status/IpInterface. The document for interface polling was written a long time ago. The way interface status display is handled was changed in King Crab. To have the interface status on the OS tab display the proper status when a port goes down you HAVE to have the event come in for /Status/IpInterface, not for /Net/Link. This was explained to me by Chet himself (the author of the original article).

Report Abuse

Like (0)
mlist
49 posts since
Mar 27, 2009

Currently Being Moderated

5. Jan 7, 2010 11:35 AM (in response to Ryan Matte)
Re: Interface status in Zenoss Core 2.5

Hi Ryan

thans for your clarification.I kindly ask you just some other explanation that I'm sure will be appreciated by other users:
1) Are alerts generated whenever interface status change?
As you explained when you first model a device, Zenoss will ONLY set the interfaces that are shown as being active to monitored. This is clear but, in this case, will events AUTOMATICALLY generated if a monitored interface goes down?
2) If you want to change this behavior ( Zenoss will ONLY set the interfaces that are shown as being active to monitored), the solution is that one descibed in the Chet's document. To sum up this is a workaround because, as you explained, "Interface status polling is more effective than traps in my opinion and it should be a built in option (but not necessarily enabled by default)".
Correct?
3)You say: "it is then YOUR responsibility to go through and only set the ports that you want monitored to monitored status when commissioning the device".
Because this is time expensive and in my opinion quite frustrating, a good solution would be to "to massive disable monitor ipinterface for all devices" as described in this post message/19580#19580 and, after, manually reenable the monitoring for the interfaces on which you need get alerts whenever they go down.
Correct?

Marco

Report Abuse

Like (0)
Ryan Matte
653 posts since
Mar 26, 2009

Currently Being Moderated

6. Jan 7, 2010 11:42 AM (in response to mlist)
Re: Interface status in Zenoss Core 2.5

I think this is really where it gets to the point where it's based on personal opinion. I believe that in the majority of cases, people will want interfaces automatically enabled for monitoring rather than having them all disabled and then going through and enabling them. I also don't see what difference there is between the two. As I stated, Zenoss will only monitor the status of interfaces that it sees as being operationally up during modelling. It will not monitor the interfaces that it sees as operationally down. Obviously in the unlikely event that one of the ports drops off while you're configuring which ports you want monitored, yes, an alert will be generated automatically (what alerts are not generated automatically?). If you're smart though, you'd have the device in Pre-Production state until you are completely satisfied with the monitoring configuration and set Zenoss not to alert on devices unless they are in Production state. This would eliminate the possibility of having false positive alerts while you configure monitoring for the device. As long as you have proper interface labels on your devices, disabling monitoring of user ports is very easy. It is only time consuming if you don't follow proper practice of labeling ports on your devices.

Report Abuse

Like (0)
jmp242
4,060 posts since
Mar 7, 2007

Currently Being Moderated

7. Jan 7, 2010 11:50 AM (in response to Ryan Matte)
Re: Interface status in Zenoss Core 2.5

Ryan Matte wrote, On 1/7/2010 11:43 AM:
(what alerts are not generated automatically?).

Well, EVENTS are generated automatically, but at least for me, if you
haven't created an alerting rule to match on the event (which you might
well have done) you won't get an ALERT (That is, and e-mail)...

--
James Pulver
Information Technology Area Supervisor
LEPP Computer Group
Cornell University

Report Abuse

Like (0)
Ryan Matte
653 posts since
Mar 26, 2009

Currently Being Moderated

8. Jan 7, 2010 11:59 AM (in response to jmp242)
Re: Interface status in Zenoss Core 2.5

Right, I understand that, he seemed to be using the term "Alert" as general definition for events, so I responded accordingly. Of course the actual alerting varies based on user configuration.

Report Abuse

Like (0)
jmp242
4,060 posts since
Mar 7, 2007

Currently Being Moderated

9. Jan 7, 2010 12:05 PM (in response to Ryan Matte)
Re: Interface status in Zenoss Core 2.5

Yes, I figured you did. I was trying to inform (though perhaps too
subtly) that Zenoss terminology is specific, and he may be confusing
himself or others with the wrong terms ...
--
James Pulver
Information Technology Area Supervisor
LEPP Computer Group
Cornell University

Ryan Matte wrote, On 1/7/2010 11:59 AM:
Right, I understand that, he seemed to be using the term "Alert" as general definition for events, so I responded accordingly. Of course the actual alerting varies based on user configuration.

Report Abuse

Like (0)
mlist
49 posts since
Mar 27, 2009

Currently Being Moderated

10. Jan 8, 2010 3:10 AM (in response to jmp242)
Re: Interface status in Zenoss Core 2.5

Ryan I agree with you when you say:
"I think this is really where it gets to the point where it's based on personal opinion".
I thank you for this useful clarification, very very appreciated. Now is quite clear for me (and maybe for all other users) how zenoss "thinks" about interface monitoring.
At this point, in order to have a full understanding I'll make yout LAST 2 questions. Please be patience

1) Modelling
You correctly spoken about "modelling" that, in a default configuration, runs every 12 hours. Supposing that you add a device at 8 am and, at this time, there are 30 ports up and 18 ports down. In this case zenoss will monitor only the 30 ports but, if during the next modelling (8 pm) the ports in "up status" would be 40, I suppose that Zenoss will understand this and that will monitor 40 interfaces instead of 30 correct?

2) SNMP Performance Cycle Interval (5 minutes)
I suppose that the "status" of the interfaces (up or down) is checked every 5 minute because in the collector I have "SNMP Performance Cycle Interval" to "300" seconds. Correct?

ps: James I apologize for bad terminology that I have used. I meant "events" and not "alert".

Report Abuse

Like (0)
jcurry
1,021 posts since
Apr 15, 2008

Currently Being Moderated

11. Jan 8, 2010 6:27 AM (in response to mlist)
Re: Interface status in Zenoss Core 2.5

Hi Marco,
I think James and Ryan have covered just about everything. One thing that doesn't seem to have been emphasised is that with the 2.5.x design, the "whether to model an interface" decision is based on the SNMP ifAdminStatus value. If your organisation ensures that unused interfaces are configured down and this is reflected by the SNMP agent in ifAdminStatus, then there should be no problem. They won't be discovered, modelled or have SNMP data gathered. Our problem comes with lots of switches which have ifAdminStatus = Up and ifOperStatus = Down.

Did you find the TRAC ticket raised by Chet as a result of the earlier discussion? - http://dev.zenoss.org/trac/ticket/5943 .

As James and Ryan discussed, currently you can only generate events (with optional, customised alerts) based on ifOperStatus, if you set that up as an extra performance data point to be tested and thresholded every 5 minutes (by default). If you ensure that the event generated by the threshold is /Status/IpInterface rather than the default /Net/Link then the coloured icon against the interface in the main OS tab, will reflect the status of the most recent event - red for down and green for up.

If you really don't like the 2.5 way of determining active interfaces, based on ifAdminStatus, and want to return to the 2.4 design, based on ifOperStatus, then you can return to the previous model of data collection by modifying $ZENHOME/Products/ZenModel/IpInterface.py and change adminStatus to operStatus in the snmpIgnore method (make quite sure you have a backup of this file first and be aware that you would have to maintain this yourself after any Zenoss upgrade). If you are largely monitoring switches with many inactive interfaces but all have ifAdminStatus = Up, then this change can save LOTS of performance polling every 5 minutes and LOTS of space for unwanted performance RRD files.

To answer your specific questions:
1) - you are correct (assuming by "Up" you mean ifAdminStatus = Up). On the first discovery poll you will discover and start monitoring 30 interfaces. 12 hours later on the next modeler poll, another 10 are also ifAdminStaus = Up so they are added to the configuration database and added to the 5-minute SNMP performance poll list. If you have done the extra configuration to performance poll and threshold ifOperStatus then, initially you could receive events for upto 30 interfaces; after the second modelling poll, potentially you get events from 40 interfaces. If on the third modeler poll the remaining 8 interfaces are ifAdminStatus=Up, then you will model and poll (and potentially get events from) all 48 interfaces. The issue, potentially, is that if the ifAdminStatus is not detected as ifAdminStatus=Down on a subsequent 12-hour modeler poll, but, say, 18 of your interfaces are now ifOperStatus=down, then you are still gathering data every 5 minutes for those 18 down interfaces. Nothing will remove their rrd data files.

2) As you say, if you have a single collector (which is likely for Zenoss Core), then you have a single SNMP polling interval, whose default is 300 seconds, or 5 minutes.

Cheers,
Jane

Report Abuse

Like (0)
mlist
49 posts since
Mar 27, 2009

Currently Being Moderated

12. Jan 8, 2010 9:28 AM (in response to jcurry)
Re: Interface status in Zenoss Core 2.5

HI jane

before all my best compliment for your clarity.
Now I think to have finally understood but I kindly ask you just a confirmation of this new synthesis:

Starting from zenoss 2.5.x, when you add a network device (in this example we suppose a 48 ports switch), zenoss will monitor all interfaces with ifAdminStatus = UP. Previously ifOperStatus was used. As explained in the ticket "5943", It turns out that neither one of these solutions is perfect because When ifOperStatus was used as the qualifier, we would miss monitoring interfaces for long periods of time until the modeler came through and found it was up. Now that we're using ifAdminStatus a lot of users are finding their Zenoss servers overloaded with I/O due to monitoring switch ports that have nothing plugged into them. Actually, es explained by Jane, is possible to reverte this behavior modifying $ZENHOME/Products/ZenModel/IpInterface.py and change adminStatus to operStatus in the snmpIgnore method.
Anyway, regardless the zenoss version and the "performance graphs", events won't be automatically generated when interface status change from up to down. As explained in the post "Polling Interface Status" (docs/DOC-2494), the reason is that Zenoss will monitor the throughput and errors on all network interfaces by default but it doesn't also monitoring the up/down status of the interfaces. In the article is explained how to use a threshold on the ifOperStatus value combined with an event class transform to accomplish this. The only particular in this case is that, as explained by Ryan, If you ensure that the event generated by the threshold is /Status/IpInterface rather than the default /Net/Link then the coloured icon against the interface in the main OS tab, will reflect the status of the most recent event - red for down and green for up.

Jane is correct or still I have badly understood something?
In particular is correct what I wrote about events generation? This is a crucial point to understand because in every company usually there are core switch (in which all devices are always up and connected) and switch on which user's pc are connected (that usually turn off their computer during the night) thus, if what I wrote is correct, I thought that a good approach in my company would be to configure "active monitoring" on core switch (where interfaces with important devices connected never should be go down) and just leave the default on other switch. In practise I should apply the "interface template" only to core switch.
In this way I should receive events ONLY for critical interfaces and not whenever users power off their pc.

Cheers
Marco

Report Abuse

Like (0)
jcurry
1,021 posts since
Apr 15, 2008

Currently Being Moderated

13. Jan 8, 2010 2:00 PM (in response to mlist)
Re: Interface status in Zenoss Core 2.5

Hi Marco,
You are absolutely correct - right up until your last suggestion - "would be to configure "active monitoring" on core switch (where interfaces with important devices connected never should be go down) and just leave the default on other switch. In practise I should apply the "interface template" only to core switch."

The slight problem with your suggestion is that one does not bind component templates (which the interface ethernetCsmacd template is) - they get bound automatically - it's not your choice. Neither can you bind component templates to a specific device - they simply don't appear in the dropdown list.

What you could do, is this.

Go to /Devices/Network/Switch and create a suborganizer, say, CoreSwitch. Back at /Devices/Network , select the Templates tab. Select the ethernetCsmacd template and use the dropdown menu to copy this template to /Devices/Network/Switch/CoreSwitch. From /Devices/Network/Switch/CoreSwitch, use the templates tab to modify this copy of the etherCsmacd template to poll and threshold ifOperStatus, as discussed previously. Whatever you do don't change the name of the template itself - it must be ethernetCsmacd - the same as the name of the component object. You may need to do this for the etherCsmacd_64 template too. Then move your core switch(es) from the /Network/Switch class to the /Network/Switch/CoreSwitch class.

With templates of the same name (in fact, whether they are component templates or device templates), the instance lowest down the device hierarchy applies; so your little switches will get the standard template without ifOperStatus monitoring but your core switch(es) down in /Network/Switch/CoreSwitch will have the extra ifOperStatus monitoring with it's events configured for thresholds.

Does that make sense?? Then, I think your statement "In this way I should receive events ONLY for critical interfaces and not whenever users power off their pc.", should be true. You will still be polling for performance data from all interfaces with ifAdminStatus=Up and ifOperStatus=Down, but at least you won't be getting events about the status.

Cheers,
Jane

Report Abuse

Like (0)
mlist
49 posts since
Mar 27, 2009

Currently Being Moderated

14. Jan 11, 2010 4:11 AM (in response to jcurry)
Re: Interface status in Zenoss Core 2.5

Jane I thank you very much for your clarification. I have perfectly understood how to proceed and I'm sure that this will be appreciated by other beginner users like me.I have understood how to manage template thanks to your suggestion. As soon as I'll run some test, I'll post again if something goes wrong

I think that this was a crucial point to understand! In particular I thank you for the clarity of your exposition that, as usual (I read your white papers) is always impressive.
Anyway I thank again Rayan too. His suggestions on the forum are always appreciated.

Cheers
Marco

Report Abuse

Like (0)

Go to original post

Legend

Correct Answers - 4 points
Helpful Answers - 2 points

Nov 4, 2009 4:14 AM

Interface status in Zenoss Core 2.5

Actions

More Like This

Legend