Dec 16, 2009 6:23 PM
zenperfsnmp is deleting current RRD files
-
Like (0)
First, I'd upgrade to the latest version of the major version you're
running, in this case, to 2.5.1. Many many bugs have been fixed. You
should be able to do this without losing anything new, but backups are
always a good idea.
Are these files actually being updated, that is, do you see new graph
data? If so, I'd probably open a ticket
http://dev.zenoss.com/trac/wiki/HowToAddTicket
to at least report the bug/oddity, unless someone else has more info
that I do on this.
--
James Pulver
Information Technology Area Supervisor
LEPP Computer Group
Cornell University
RyanK wrote, On 12/16/2009 6:24 PM:
We have a very basic install of Zenoss that we are using for evaluation purposes. Its running 2.5.0 (on CentOS, installed via the stack installer, if that matters).
We have added several machines with the hopes that we can begin to track performance data over time. However, I went to present some of what Zenoss could potentially do for us and found that our RRD files only went back a few weeks, despite being running for several months.
After digging, I found entries in zenperfsnmp.log for each RRD file:
zenperfsnmp.log.2:2009-11-29 03:22:32,554 WARNING zen.zenperfsnmp: Deleting old RRD file: /usr/local/zenoss/zenoss/perf/Devices/testserver/os/interfaces/MS TCP Loopback interface/ifOutOctets_ifOutOctets.rrd
That coincides with when the data was wiped out. Looking at that file produces:
stat "/usr/local/zenoss/zenoss/perf/Devices/testserver/os/interfaces/MS TCP Loopback interface/ifOutOctets_ifOutOctets.rrd"
File: `/usr/local/zenoss/zenoss/perf/Devices/testserver/os/interfaces/MS TCP Loopback interface/ifOutOctets_ifOutOctets.rrd'
Size: 35296 Blocks: 80 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 6187281 Links: 1
Access: (0644/-rw-r
r) Uid: ( 501/ zenoss) Gid: ( 501/ zenoss)Access: 2009-12-16 17:13:31.000000000 -0600
Modify: 2009-11-29 03:22:33.000000000 -0600
Change: 2009-11-29 03:22:33.000000000 -0600
so, sure enough.. the modification date of the files are not being updated, so when the cleanup function within zenperfsnmp.py runs, it appears as if these files are not being modified. I tried searching and it appears as if nobody else is having this issue and I'm not really sure what the best way to go about dealing with this is. I would expect that the RRD tool would simply be writing these files and the filesystem (which doesn't have nomtime set) would update the modification date, but perhaps some version of RRD is keeping the files open or something? I've tested that when I edit files they indeed get their modification dates updated so, I'm kinda out of ideas at this point.
Just a wild guess... I could easily write a script that will run through and touch all the rrd files to update the modification dates, but that seems to be just a crude workaround that I'd rather avoid if possible.
Thanks for the help!
>
Was a fix found for this problem? I am running into the same issue. I have a basic install of Zenoss Ver. 2.5.1 running on Ubuntu server 8.10, and the RRD files are getting periodically deleted as described earlier. I would love to know what can be done to prevent this from happening.
I simply added the following into /etc/crontab:
Have you guys checked in the Event Manager section to make sure that the value for "Delete Historical Events Older Than (days)" is set to a correct value? I'm not sure whether it actually has anything to do with RRDs though, it may only affects the events stored in the MySQL database. The RRD data defaults to being stored for a year I think.
That setting has nothing to do with the issue.
if you look at zenperfsnmp.py you'll see that it calls FileCleanup on the perf folder for rrd files that exceed maxRrdFileAge which is set in that module to be 30 days.
All that FileCleanup does is look for files that hasn't been modified in the last 30 days and deletes them.
So, the core issue is that something within the module that writes to the RRD file is not causing the file modification date to be updated. I've looked around a bit and think that it must be something internal to the libraries used from the python code, as I don't see anything wrong with the python code.
I've installed Zenoss using the stack installer which shouldn't have such issues, at least I wouldn't think. other files are appropriately updating their modfied timestamps.
The result of the bug is subtle.. but I'm guessing that it impacts many people without them knowing it. You'll only find out that its causing an issue after it has been installed for 30 days and then, only if you go to look for history stored in those RRD files around the time when they get wiped out (or try to go back more then 30 days).
Thanks RyanK for the response. Yes I wound up adding an entry to the cron also - in my case it executes once a week. My entry calls a script and is not as elegant as your one liner. It is good to know that this seems to do the trick. I was really dissappointed to see a whole bunch of data wiped out. One of the biggest reasons I need these graphs is to see how things evolve over time. 30 days is not a very big window.
In case anyone else comes along and looks at this, I can confirm this problem is exactly as RyanK describes it. I dug into quite a bit hoping for a true fix rather than patching things periodically with the cron job. I am pretty sure the following link gets to the heart of the problem which seems to be a kernel issue in some OS flavors. That would explain why not everyone experiences this. The root of the problem seems to be in the implementation of the mmap function in certain kernels. RRDTool makes use of mmap.
How about using 'lsof' to determine if a file is being held open by RRDTool and only delete those that are not open?
Something like (untested):
#!/bin/sh
# Find rrd files with modification time older than 60 days
for i in `find $ZENHOME/perf -mtime +60 -name *.rrd` ; do
if lsof $i > /dev/null; then
# file is open; could do a touch on it to update the timestamp if that helps the find command's accuracy
else
# file isn't open; candidate for deletion
rm $i
done
Sadly, this won't help as lsof does not see the files as open.
As was referenced earlier in this thread, the issue seems to stem from how the kernel handles updating timestamps on mmap'ed files. I'm on CentOS 5, and apparently the issue that was resolved within RRD 15 months ago (it now checks to see if the kernel doesn't update the timestamps to determine if it should force an update to the modification time) has not yet made its way into CentOS.
So, the options to fix this would be:
I don't have the time to dedicate to attempting to update CentOS packages... and I'd rather not try to have local packages or extra repositories on that box since one of the objectives in choosing CentOS was to reduce administrative overhead and potential dependency conflicts.
It wouldn't be difficult to modify zenperfsnmp.py, perhaps based on a configuration setting... but I'd rather leave this up to one of the Zenoss devs as I already have my hands full to participate in another project. I'd rather not just make local changes as they will likely be lost whenever another upgrade gets applied.
So, that leaves the cron job workaround... It fits my needs and only took about 2 minutes to put in place.
Hopefully by posting this some attention can be drawn to the issue and affected distros can be identified and potentially fixed. But failing that, a viable workaround is posted for anyone that may have a similar issue and find this forum thread.
Thanks for your help though! =)
Did this just start with 2.5.x? If so, did anyone think to maybe log a Trac ticket for this?
I asked Matt Ray about this and he dug up this ticket: http://dev.zenoss.org/trac/ticket/6137
There must be something that you guys have in common which is causing the timestamps not to be updated.
What distro are each of you running?
What filesystem are you using?
Yes the link which was posted earlier
http://oss.oetiker.ch/rrdtool-trac/ticket/193
seems to point in the direction that this problem is caused by a faulty implementation of mmap in certain distros.
I am using Ubuntu 8.10 and the file system is ext3.
I do have an XFS file system that I could move the rrd files to. When I get a chance I will try this and see if that helps and report back.
I'm using Ubuntu 8.04 with ext3 and I don't have this issue. Do you have any custom filesystem settings or anything? Also, http://oss.oetiker.ch/rrdtool-trac/ticket/193 is patched in the version of RRDTool used in 2.5.1.
Also, is anyone who is having the problem messing with the ext3 atime setting. Is everyone using ntp?
Follow Us On Twitter »
|
Latest from the Zenoss Blog » | Community | Products | Services Resources | Customers Partners | About Us | ||
Copyright © 2005-2011 Zenoss, Inc.
|
||||||||