Oct 18, 2011 10:52 AM
Running processes appear as being down - what am I doing wrong?
-
Like (0)
Hi all, Zenoss newbie here. I have a question for you guys about the behavior of zenprocess on Zenoss 3.2.0.
I'm using Zenoss to monitor some Oracle Database instances. I have a list of processes to monitor for each database - for each instance, they are all of the form:
ora_pmon_<instance name>
ora_smon_<instance name>
ora_ckpt_<instance name>
ora_lgwr_<instance name>
ora_dbw0_<instance name>
and so on. I came up with regexes to match these processes, so that I can use zenprocess to monitor their status. For example, the regex for ora_pmon_<instance name> is
^[^ ]*_pmon_[^ /].*
After adding the regexes for each process, when I model one of the database servers the result is that all 5 of these processes are picked up for each of 3 database instances and show in the list of "OS Processes". All of them register as being up, which is correct because I know they are all up. But, in a few minutes' time, most of them change their status to 'down'! Only one of each type (pmon, smon, etc) remains up.
Does anyone know what could cause this sort of behavior? Does this mean that I have to make a new process entry for each type of process for each database instance? Any help would be appreciated.
This may be a bug. Zenoss wasn't able to reproduce, but if you can add / re-open the ticket and work with them, they may be able to work out what's causing this.
See
http://dev.zenoss.org/trac/ticket/7870
--
James Pulver
Information Technology Area Supervisor
LEPP Computer Group
Cornell University
The bug submitted in that report is very similar to my problem. The only difference is that my installation of Zenoss doesn't show any one process running more than once. They might just be related, though!
Unfortunately, my setup might not help them replicate the issue - I compiled Zenoss from source on a RHEL 6 platform (which according to the release notes for 3.2.0 is not supported yet), rather than using a standardized appliance or stack installer. Is it worth reopening the issue anyway?
sec:
I would reopen the ticket if you have time even if the problem is resolved after a custom compile. I'm interested to see if it reoccurs or not in your compiled version. Keep us posted.
Best,
--Shane W. Scott (Hackman238)
I have reopened the ticket. I'll update this thread as I find out more.
i am facing exactly the same issue. We are on the way to migrate from a productive Zenoss Core v 2.4.5 to current v3.2.1.
Monitoring Oracle Solaris 10 Servers (Global Zones & Non Global Zones/Containers) on both Zenoss with exactly the same pattern whereas the following processes are modeled on both but alerting as down on v3.2.1 only:
zCountProcs set to false on both (tried also true).
- pattern=ora_dbw - only one process (ora_dbw1) is up wheras all others according the pattern (ora_dbw0, 2 - n) are down.
- pattern=snmpd - says down which is not true
- pattern=^sched (Despite zsched processes are discovered?) - snmpd - says down which is not true
I think v3.2.1 does not deal properly with OS Processes as soon a pattern matches more than one process (e.g. On a global zone there are also all the container processes visible.
Just found a difference on IgnoreParameters, maybe due to trying several settings to fix this.Also after makeing it equal there is no change in alerting behaviour.
Another strange behaviour is, when i change globally the name/pattern of a process, on the device the process class does not reflect this. (snmp >> snmpd, still process class is snmp)
Follow Us On Twitter »
|
Latest from the Zenoss Blog » | Community | Products | Services Resources | Customers Partners | About Us | ||
Copyright © 2005-2011 Zenoss, Inc.
|
||||||||