Sep 3, 2011 11:30 AM
Zenoss 3.2 breaks updating certain graphs
-
Like (0)
Today I updated from 3.1 to 3.2. This process was quite easy and well described in the documentation.
After the update certain templates don't seem to work anymore. I have problems with Apache, b_fping and the Postix-vis-snmp solution described somewhere else on this forum.
The rrd's aren't updated anymore, even though in the zencommand.log (on debug) all seems te be okay:
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_bytesPerReq.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_cpuLoad.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 apache_slotDNSLookup.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_slotKeepAlive.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_slotLogging.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_slotOpen.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_slotReadingRequest.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_slotSendingReply.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_slotWaiting.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_totalAccesses.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:27 apache_totalKBytes.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:24 fping_avg.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:24 fping_loss.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:24 fping_max.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:24 fping_min.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:24 fping_rcv.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 fping_xmt.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 laLoadInt15_laLoadInt15.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 laLoadInt1_laLoadInt1.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 laLoadInt5_laLoadInt5.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 ldap_time.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 memAvailReal_memAvailReal.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 memAvailSwap_memAvailSwap.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:28 memBuffer_memBuffer.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:28 memCached_memCached.rrd
drwxr-x--- 5 zenoss zenoss 4096 May 19 17:28 os
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:28 postfixBounced_postfixBounced.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:28 postfixQueue_postfixQueue.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:28 postfixReceived_postfixReceived.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:28 postfixRejected_postfixRejected.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:28 postfixRelayed_postfixRelayed.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 13:28 postfixSent_postfixSent.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 ssCpuIdle_ssCpuIdle.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 ssCpuRawWait_ssCpuRawWait.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 ssCpuSystem_ssCpuSystem.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 ssCpuUser_ssCpuUser.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 ssIORawReceived_ssIORawReceived.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 ssIORawSent_ssIORawSent.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 SSL_Check_Days.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:24 sysUpTime_sysUpTime.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:21 ZenJMX Heap Memory_committed.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:21 ZenJMX Heap Memory_used.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:21 ZenJMX Non-Heap Memory_committed.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:21 ZenJMX Non-Heap Memory_used.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:21 ZenJMX Open File Descriptors_OpenFileDescriptorCount.rrd
-rw-r--r-- 1 zenoss zenoss 35432 Sep 3 17:21 ZenJMX Thread Count_ThreadCount.rrd
As you can see: I updated Zenoss around 13.30 and since then certain rrd aren's updated anymore.
Also memBuffer_memBuffer.rrd and memCached_memCached.rrd don't get updates.
I'm seeing this to on a test box that I upgraded. This has happened before:
Googling around it seems it is actually a bug with how rrdtool is compiled and the kernel that your distro is running. What distro are you using?
--Dennis
Zenoss is running on Centos 5.6 x64, fully patched.
Scanning through the thread you posted, I couldn't find a solution. Is there one?
There wasn't a fix per se but a work around. Adding this cron under the zenoss user:
27 2 * * 6 find /usr/local/zenoss/zenoss/perf/Devices/ -name "*.rrd" -execdir touch '{}' +
I might play around trying to see if there is a way to recompile the rrdtool and rrdupdate that zenoss uses. However, it is weird that this seems to only happen after an upgrade.
--Dennis
I also should ask. Are you still seeing the graphs updated in the UI? My file timestamps are not updating but the data is definitely being updated when I look at graphs in Zenoss.
The problem with the timestamps not getting updated is zenperfsnmp will remove files older than 30 days and it will mistakenly remove rrds that shouldn't be.
--Dennis
The problem is that the graphs aren't updating in the UI. So the file timestamps aren't the biggest problem.
First I need the graphs to update again, then I'll see what happens to the file timestamps.
Update: fixed the snmp-graphs.
On the graphs that weren't updated, there was a dot in front of the OID. I think this wasn't a problem in 3.1, but 3.2 doesn't like it.
The file timestamps are updated as well.
Strange thing is that when you test the OID from the UI, ik works fine.
Now I need to fix the command-graphs, which is going to be a little harder, I'm afraid.
Amongst them Apache, of which only the slotDNSLookup is working.
Update2: for Apache and f_ping only the first result is stored. Respectively 'slotDNSLookup=0' and 'xmt'.
This seems like a bug to me.
Proof for apache:
2011-09-04 11:03:53,309 DEBUG zen.zencommand: The result of "/opt/zenoss/ZenPacks/ZenPacks.zenoss.ApacheMonitor-2.1.2-py2.6.egg/ZenPacks/zenoss/ApacheMonitor/libexec/check_apache.py -H <IP> -p 80 -u '/server-status?auto'" was "'STATUS OK|slotDNSLookup=0 totalAccesses=136 slotReadingRequest=0 totalKBytes=39 busyServers=1 slotKeepAlive=0 slotGracefullyFinishing=0 bytesPerReq=293.647 cpuLoad=.00660523 bytesPerSec=1.06366 slotLogging=0 slotSendingReply=1 slotStartingUp=0 reqPerSec=.00362222 slotWaiting=5 slotOpen=122 idleServers=5\n'"
2011-09-04 11:03:53,313 DEBUG zen.zencommand: Storing slotDNSLookup = 0.0 into Devices/<NODE>/apache_slotDNSLookup
2011-09-04 11:03:53,313 DEBUG zen.RRDUtil: /opt/zenoss/perf/Devices/<NODE>/apache_slotDNSLookup.rrd: 0.0
Proof for f_ping:
2011-09-04 11:03:55,725 DEBUG zen.zencommand: The result of "/usr/sbin/fping -c 3 -sq <IP> 2>&1 | /bin/sed 's/\([0-9]*\)\/\([0-9]*\)\/\([0-9]\)%/xmt=\1 rcv=\2 loss=\3/;s/\([0-9]*.[0-9]*\)\/\([0-9].*\)\/\([0-9].*\)/min=\1 avg=\2 max=\3/' | /bin/sed -r '/targets/,/real time/d;s/[0-9]*.[0-9]*.[0-9]*.[0-9]* : xmt\/rcv\/%loss = /PING OK|/;s/, min\/avg\/max = / /'" was "'PING OK|xmt=3 rcv=3 loss=0 min=2.73 avg=4.63 max=6.77\n\n\n'"
2011-09-04 11:03:55,726 DEBUG zen.zencommand: Storing xmt = 3.0 into Devices/<NODE>/fping_xmt
2011-09-04 11:03:55,726 DEBUG zen.RRDUtil: /opt/zenoss/perf/Devices/<NODE>/fping_xmt.rrd: 3.0
I looked closer at my data and you are correct. OIDs that start with a '.' do not seem to be getting updated. However telling Zenoss to test that datapoint from the UI with or without the '.' returns data so it just seems like it's a problem with zenperfsnmp when it goes to store the data.
In particular the Linux Device template has memBuffer and memCached that start with a '.' and I verified that out of all of the Device datapoints those where the only not being updated and looked closer at the graphs and those datapoints were showing up as nan% which means they were not getting updated. I removed the '.' and restarted zenoss (probably could have just restarted a daemon or two but figured this would guarantee the new value got read) and the rrd is now being updated.
I agree, this seems like a bug so I opened a ticket:
http://dev.zenoss.org/trac/ticket/7859
--Dennis
I still couldn't fix the problems with zencommand, as described under 'update2' in my previous post.
Hope someone can shine some light on that.
I'm not sure about the f_ping since I don't use it but I'm seeing the same thing with the Apache Monitor. Running:
zencommand run -v 10
gives me this output:
2011-09-04 14:22:56,550 DEBUG zen.zencommand: Storing slotDNSLookup = 0.0 into Devices/172.18.128.102/apache_slotDNSLookup
2011-09-04 14:22:56,551 DEBUG zen.RRDUtil: /opt/zenoss/perf/Devices/172.18.128.102/apache_slotDNSLookup.rrd: 0.0
2011-09-04 14:22:56,551 DEBUG zen.zencommand: RRD save result: 0.0
2011-09-04 14:22:56,551 DEBUG zen.zencommand: Next command in 9 seconds
2011-09-04 14:22:56,552 DEBUG zen.zencommand: Received exit code: 3
It seems something is failing but I haven't be able to figure out what yet. On a 3.1 install I get:
2011-09-04 15:16:00,549 DEBUG zen.zencommand: Storing slotDNSLookup = 0.0 into Devices/172.18.128.102/apache_slotDNSLookup
2011-09-04 15:16:00,550 DEBUG zen.RRDUtil: /opt/zenoss/perf/Devices/172.18.128.102/apache_slotDNSLookup.rrd: 0.0
2011-09-04 15:16:00,550 DEBUG zen.zencommand: RRD save result: 0.0
2011-09-04 15:16:00,550 DEBUG zen.zencommand: Storing totalAccesses = 355.0 into Devices/172.18.128.102/apache_totalAccesses
2011-09-04 15:16:00,551 DEBUG zen.RRDUtil: /opt/zenoss/perf/Devices/172.18.128.102/apache_totalAccesses.rrd: 355L
2011-09-04 15:16:00,551 DEBUG zen.zencommand: RRD save result: None
2011-09-04 15:16:00,552 DEBUG zen.zencommand: Storing slotKeepAlive = 0.0 into Devices/172.18.128.102/apache_slotKeepAlive
And it does the same 3 lines for every datasource defined in the template. I'm going to do a little more debugging and will open another ticket specific to this issue. I did a quick search and don't see any existing tickets yet for zencommand
--Dennis
I don't get an error code 3.
My problem is that only the first returned data is saved. Therefore the LDAPServer-check and SSL-certificate-check work OK: they store only one piece of data per session.
Can someone from Zenoss confirm or deny this 'bug'?
I don't thinnk it is a configuration error, as it has worked like this in 3.1 and only happens to templates that store more than one value per run.
See my post form Sep 4, 2011 5:23 AM
It's definitely a bug. I did a fresh install of 3.2 and added one device and bound the Apache template to the device. It only updates the first value that is returned. I opened a ticket for it.
Hopefully there will be a patch fairly quickly for it since monitoring is essentially broken and there is no good work around (besides breaking out the template into a bunch of separate data sources with 1 data point each).
--Dennis
Confirming that I see this too. Ubuntu 8.0.4, 3.2.0 stack install.
And apache-garph issue is realetd to zenoss/Products/ZenRRD/parsers/Auto.py
if you debug code you'll find that someone did his job fast
Change the Auto Class to :
class Auto(CommandParser):
def processResults(self, cmd, result):
output = cmd.result.output
output = output.split('\n')[0].strip()
exitCode = cmd.result.exitCode
severity = cmd.severity
if output.find('|') >= 0:
msg, values = output.split('|', 1)
elif CacParser.search(output):
msg, values = '', output
else:
msg, values = output, ''
msg = msg.strip() or 'Cmd: %s - Code: %s - Msg: %s' % (
cmd.command, exitCode, getExitMessage(exitCode))
if exitCode != 0:
if exitCode == 2:
severity = min(severity + 1, 5)
result.events.append(dict(device=cmd.deviceConfig.device,
summary=msg,
severity=severity,
message=msg,
performanceData=values,
eventKey=cmd.eventKey,
eventClass=cmd.eventClass,
component=cmd.component))
for value in values.split(' '):
if value.find('=') > 0:
parts = NagParser.match(value)
else:
parts = CacParser.match(value)
if not parts: continue
label = parts.group(1).replace("''", "'")
try:
value = float(parts.group(3))
except:
value = 'U'
for dp in cmd.points:
if dp.id == label:
result.values.append( (dp, value) )
break
UPDATE:
You can also change NagParser value to
NagParser = re.compile(r"""([^ =']+|'(.*)'+)=([-0-9.eE]+)([^; ]*;?){0,5}""")
And don't change Auto class , BTW I think changing class's code is safer because I dunno what's the NagParser(it might be nagios parse !) and I dunno what's it's pattern, I've just changed it to match the return value with regex
Thanks!
That does the trick... even without restarting anything.
Follow Us On Twitter »
|
Latest from the Zenoss Blog » | Community | Products | Services Resources | Customers Partners | About Us | ||
Copyright © 2005-2011 Zenoss, Inc.
|
||||||||