Nov 22, 2011 10:45 AM
Upgrade from 3.1 to 3.2.1 and zenprocess misbehaving
-
Like (0)
After upgrading from 3.1 to 3.2.1 we have to report a strange erratic behaviour of zenprocess.
To monitor some Oracle Databases, we have defined a Process with this regular expression pattern: ^ora_pmon_.*
And now on some apparently random servers, zenprocess is not able to detect that some process are up and running.
If we check with snmpwalk the process is present.
This behaviour is totally erratic, on some device all instances are correctly detected, on few devices instances are seen as DOWN, I cannot figure out a schema about this.
Is this a known issue of 3.2.1? Please note that this configuration has worked flawlessly for months on 2.5.1 and 3.1.0.
Let me know if you need more info.
Thank you.
Regards,
Luca
Luca, I have just upgraded from 2.5.2 to 3.2.1, and while the process went quite well in general, I am observing the same behaviour with the builtin htttp monitor: random servers are flagged with "Process not running" while the process is in fact running.
I found that restarting both the affected process (httpd) and snmpd service on the affected machines cleared the events.
Anyone can explain me how to debug this issue about process monitoring? I'm getting crazy! With 2.5.1 it has worked without any problem for more than a year!
The Process is defined on Zenoss like this:
Pattern: ^ora_pmon_.*
Via snmpwalk I get the following (actual values masked for privacy):
$ snmpwalk -v1 -c mycommunity X.X.X.X |grep ora_pmon
HOST-RESOURCES-MIB::hrSWRunPath.14392 = STRING: "ora_pmon_inst1"
HOST-RESOURCES-MIB::hrSWRunPath.31230 = STRING: "ora_pmon_INST2"
Zenoss insists to say that the ora_pmon_inst1 process is not running!
From zenprocess.log:
2011-11-29 14:38:36,207 INFO zen.zenprocess: Searching for possible matches for set([ora_pmon_inst1])
2011-11-29 14:38:36,210 WARNING zen.zenprocess: (myhost.mydomain.com) Process not running: ora_pmon_inst1
Using regex '^ora_pmon_.*'
All Processes have stopped since the last model occurred. Last Modification time (2011/11/29 14:06:36)
On other hosts where there are multiple oracle instances this issue is not occurring.
The remote host is a RedHat 5.3 64 bit with net-snmp-5.3.2.2-7.el5_4.2.
Any hint will be greatly appreciated! I do not want to downgrade to Zenoss 2.5.1 :-)
Thank you,
Luca
This issue is being addressed in ticket: http://dev.zenoss.org/trac/ticket/7870
The ticket 7870 has been updated today with a "fixed" resolution. Yeppa!
The fix will be released as a patch? I'm available for testing!
Thanks,
Luca
I guarantee that they are not monitoring this thread. If you want to get their attention, post in the actual trac ticket. I agree that they should have included a patch or some kind of explanation in regards to how they resolved it.
Follow Us On Twitter »
|
Latest from the Zenoss Blog » | Community | Products | Services Resources | Customers Partners | About Us | ||
Copyright © 2005-2011 Zenoss, Inc.
|
||||||||