Archived community.zenoss.org | full text search
Skip navigation
3514 Views 5 Replies Latest reply: Dec 1, 2011 7:34 AM by tgdaero9 RSS
sec Newbie 3 posts since
Oct 14, 2011
Currently Being Moderated

Oct 18, 2011 10:52 AM

Running processes appear as being down - what am I doing wrong?

Hi all, Zenoss newbie here. I have a question for you guys about the behavior of zenprocess on Zenoss 3.2.0.

 

I'm using Zenoss to monitor some Oracle Database instances. I have a list of processes to monitor for each database - for each instance, they are all of the form:

 

ora_pmon_<instance name>

ora_smon_<instance name>

ora_ckpt_<instance name>

ora_lgwr_<instance name>

ora_dbw0_<instance name>

 

and so on. I came up with regexes to match these processes, so that I can use zenprocess to monitor their status. For example, the regex for ora_pmon_<instance name> is

 

^[^ ]*_pmon_[^ /].*

 

After adding the regexes for each process, when I model one of the database servers the result is that all 5 of these processes are picked up for each of 3 database instances and show in the list of "OS Processes". All of them register as being up, which is correct because I know they are all up. But, in a few minutes' time, most of them change their status to 'down'! Only one of each type (pmon, smon, etc) remains up.

 

Does anyone know what could cause this sort of behavior? Does this mean that I have to make a new process entry for each type of process for each database instance? Any help would be appreciated.

  • jmp242 ZenossMaster 4,060 posts since
    Mar 7, 2007

    This may be a bug. Zenoss wasn't able to reproduce, but if you can add / re-open the ticket and work with them, they may be able to work out what's causing this.

    See

     

    http://dev.zenoss.org/trac/ticket/7870

     

    --

    James Pulver

    Information Technology Area Supervisor

    LEPP Computer Group

    Cornell University

  • Shane Scott ZenossMaster 1,373 posts since
    Jul 6, 2009

    sec:

     

    I would reopen the ticket if you have time even if the problem is resolved after a custom compile. I'm interested to see if it reoccurs or not in your compiled version. Keep us posted.

     

    Best,
    --Shane W. Scott (Hackman238)

  • tgdaero9 Rank: White Belt 42 posts since
    Sep 23, 2008

    i am facing exactly the same issue. We are on the way to migrate from a productive Zenoss Core v 2.4.5 to current v3.2.1.

     

    Monitoring Oracle Solaris 10 Servers (Global Zones & Non Global Zones/Containers) on both Zenoss with exactly the same pattern whereas the following processes are modeled on both but alerting as down on v3.2.1 only:

     

    zCountProcs set to false on both (tried also true).

     

    - pattern=ora_dbw - only one process (ora_dbw1) is up wheras all others according the pattern (ora_dbw0, 2 - n) are down.

    - pattern=snmpd - says down which is not true

    - pattern=^sched (Despite zsched processes are discovered?) - snmpd - says down which is not true

     

    I think v3.2.1 does not deal properly with OS Processes as soon a pattern matches more than one process (e.g. On a global zone there are also all the container processes visible.

     

    Just found a difference on IgnoreParameters, maybe due to trying several settings to fix this.Also after makeing it equal there is no change in alerting behaviour.

     

    Another strange behaviour is, when i change globally the name/pattern of a process, on the device the process class does not reflect this. (snmp >> snmpd, still process class is snmp)

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points