Sep 23, 2010 3:11 PM
Poor Infrastructure tab performance 3.0.2
-
Like (0)
I thought this was actually a bug fixed in the 3.0.2 update for Zenoss Core. When moving any system from one class to another, 1 or 100 systems, it causes the "loading" spinning wheel to show perpetually. For 10 or less devices after about 3 minutes the screen returns to normal and a "failed to move devices" alert drops down from the top. However, if you refresh the browswer the infrastructure view is correct. If you move 10 or more (I'm moving clusters of 15-20 in this case) it causes the whole system to slow to an almost stop. The infrastructure views tab stays at "loading" for well over 5 minutes before returning to normal view with no messages. Loading device classes withing the view takes long periods. Even waiting for specific decive property pages is really slow until suddenly it's "normal" speed again - which isn't that great.
Is there anything I can do to help the system move faster? Is it running too large in the MySQL Database? The directory is roughly 320Mb in size and the server is running Optron 8 cores and 4Gb of ram without a RAID. I checked over the logs and there aren't any obvious failures or issues. It's hard to keep telling management "don't worry, it's just still loading".
I'm seeing the same issue and I'm running my install entirely out of ramdisk. Zope DB (Data.fs), RRD backend (perf) and MySQL events table are all stored in RAM. I've also run the zeopack. Nothing seems to help.
Also, this issue is not related to browser or javascript client. I have tested from multiple browsers on multiple platforms (I.E and Firefox on Win7, IE and Firefox on Win XP, Konqueror and Firefox on Linux, as well as Firefox and Safari on Mac OS X).
I'll keep searching and post anything I can find here...
I would crank up the zenhub log level and watch the zenhub.log and see what its doing when this is happening.
I think I've discovered the source of the problem:
When you click on the Infrastructure tab, Zenoss brings up the ENTIRE list of your devices and tries to display them all in the main screen, by default. Even worse, the main window is required to show 5 different fields for each device, including an icon showing the number of alerts (and severity) for each device. So it's required to go grab the entire events database as well, run some calculations on it and then return all that data as well!
After doing some rough calcs, it appears my browser is required to download 80MB of data from the Zenoss server every time I hit the "Infrastructure" tab.
So really, the best (and only) resolution to this issue will be finding a way to limit the number of devices displayed by default in the main Infrastructure window.
I'm going to dig through the docs, see what I can find...
Hello, we been testing with zenoos and the performance on the "infrastructure tab" is really poor, we follow all the performance advice from here, but nothing improve our experience.
We are using a server with 6 cores and 16gb RAM. close to 400 monitors.
Somebody know if the commercial version is better or is the same as this version?
Thanks
Rodrigo
There are two things you can do. One is apply this patch:
$ zenpatch 22685
The other, which will help in other places as well, is to increase your ZODB cache size (which is different from your ZEO cache size). First, go to http://SERVER:8080/zport/global_catalog/manage_catalogView and get the number of records in your global catalog. You can use this as a rough estimate for the size your object cache should be. Next, round that number up to the nearest 100,000. Finally, go to $ZENHOME/etc/zope.conf and find this:
<zodb_db main>
...
cache-size xxxx
<zeo.../>
</zodb_db main>
Set that cache-size parameter to your rounded-up number (no commas, of course). Save, then restart Zope (zopectl restart). You should notice a significant improvement.
Thanks for the tips, Ian. I tried them, but still having the same issue in Infrastructure. I was hoping the patch you listed would fix the issue with Zenoss downloading your entire list of devices when you click on Infrastructure tab (and then displaying all of them in the main window). Didn't fix the issue, though.
And again, I should mention: I'm running the entire system from a 48GB RAM disk. Meaning, EVERYTHING is running from RAM. The issue is not related to disk-bound IO problems. There are no hard disks. Literally everything (including the system kernel) is running from RAM. The OS, MySQL, Zenoss, everything...
And it still takes anywhere from 3-5 minutes to display the main Infrastructure window. I will click on the tab, the page pops right up, then the "Loading. Please wait..." dialog pops up and takes forever to complete. My theory is due to the sheer volume of devices that we are monitoring. There must be a way to limit the size of the list. What if I only want to show 40 devices at a time?
Thanks Ian for the tips.
I have gone ahead and applied the patch as well as increased the zope cache as well. Sadly I'm with David on this one. I have not seen an increase in performance at all. I dowloaded the MySQL Toolbox and connected to the Zenoss MySQL database. I'm not sure if it's just the system we're using or the number of devices but the system load/memory used are always high. In fact the system load can get over 1.2 (120% !?!) whenever I move a few systems, or add a system. @Dave, maybe you can try the same and see if how your MySQL DB is running? It should be noted that the DB was clean and the key efficientcy was 99.98% or better the whole time. Also, Zenoss seems to maintain no fewer than 5 active connections to the DB.
I have to concede that some of the lag may be hosted hardware issues. However, as Dave's set up is evidence, there are still some issues with the way Zenoss Core is reading from and presenting the infrastructure information.
@Dave, the MySQL toolbox let's you monitor the data rate that Zenoss is pulling from your DB. I have 150 or so devices and I don't see it pulling down 20M/b (1/4 of the 80Mb you guestimated) when the infrastructure loads - but that may be unique to our devices/setup.
I'm not sure of the exact number of devices we are monitoring, but it is well over 2,000 and somewhere under 5,000. When I checked the catalog size, as mentioned above, it said we had about 400,000 objects in the catalog.
One thing that helped slightly was running a zeopack and then restarting the zenoss service.
The next thing I'm trying now is running a zendmd -> reindex() and then commit(). Can't see how that would impact performance, but it's worth a shot. I'll report the results as soon as it is done....
-- see next post --
Update: YAY!! PROBLEM FIXED!!!
My Zenoss install now seems to be working properly. Loading the "Infrastructure" tab now takes 5 seconds. The first load after Zenoss has just been started up takes a bit longer, approximately 1 minute. Zeo is obviously caching the device (zope) db somehow. But once you've loaded the Infrastructure page the first time, performance is definitely vastly improved. 5 seconds is acceptable.
Here are the steps I took to improve performance. It's hard to tell which fixed the issue, as I ran this entire set of commands before testing:
All commands run as the zenoss user:
1. First, I ran an fsrecover:
[zenoss]$ fsrecover -v 0 -p -P 0 /opt/zenoss/var/Data.fs /opt/zenoss/var/Data.fs.post-fsrecover
2. Then I backed up the old Data.fs and replaced it with the new recovered Data.fs:
[zenoss]$ mv /opt/zenoss/var/Data.fs /opt/zenoss/var/Data.fs.pre-fsrecover.20100931
[zenoss]$ mv /opt/zenoss/var/Data.fs.post-fsrecover /opt/zenoss/var/Data.fs
3. Next, run fsrefs:
[zenoss]$ fsrefs -v /opt/zenoss/var/Data.fs
4. Then run zeopack:
[zenoss]$ zeopack -d 1 localhost:8100
5. Set my zeo cache settings as follows (in $ZENHOME/etc/zope.conf):
<zodb_db main>
mount-point /
# ZODB cache, in number of objects
cache-size 500000
pool-size 50
<zeoclient>
server localhost:8100
storage 1
name zeostorage
var $INSTANCE/var
# ZEO client cache, in bytes
cache-size 512MB
# Uncomment to have a persistent disk cache
#client zeo1
</zeoclient>
</zodb_db>
6. Restart Zenoss, as root user:
# service zenoss restart
7. Login to Zenoss, click on Infrastructure. It took about a minute to load all my devices. Then, click on any other tab, such as Dashboard. Let it finish loading, now click on Infrastructure. It should take 5 seconds or less to load the page, depending on how many devices you have, etc. It appears that my problem is resolved.
Thanks Dave, I'm going to run through this this morning and see if it works for me as well. Could be a good thing to track for future Zenoss Core 3.0.x releases.
Thanks to all, my zenoos now is working also pretty fast!!
To help track 3.0.x enhancements, I started this wiki article: Zenoss 3.0.x Performance Enhancements Please feel free to update it accordingly.
Thanks,
Matt Ray
Zenoss Community Manager
Hazzah Dave, you have saved the day. What a performance increase. Thanks!
Follow Us On Twitter »
|
Latest from the Zenoss Blog » | Community | Products | Services Resources | Customers Partners | About Us | ||
Copyright © 2005-2011 Zenoss, Inc.
|
||||||||