Archived community.zenoss.org | full text search
Skip navigation
1 2 Previous Next 16805 Views 23 Replies Latest reply: May 16, 2013 11:48 AM by eistconn RSS
kingpin Rank: Green Belt 109 posts since
Sep 22, 2008
Currently Being Moderated

Oct 21, 2011 3:00 AM

ZENTRAP keeps going down

Ever since the upgrade to Zenoss 3.2 the zentrap daemon keeps going down.  It runs for a few hours the goes down.

 

The Zentrap log always looks like this :

2011-10-20 23:04:34,187 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:04:34,187 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 9 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:09:34,287 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:09:34,287 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 9 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:14:34,299 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:14:34,299 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 9 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:19:34,307 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:19:34,307 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 10 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:24:34,316 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:24:34,316 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 10 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:29:34,323 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:29:34,324 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 10 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:34:34,331 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:34:34,331 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 10 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:39:34,339 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:39:34,339 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 10 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:44:34,348 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:44:34,348 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 10 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:49:34,355 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:49:34,355 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 11 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:54:34,363 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:54:34,363 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 11 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-20 23:59:34,371 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-20 23:59:34,371 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 11 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-21 00:04:34,379 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-21 00:04:43,500 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 11 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-21 00:09:43,508 INFO zen.zentrap: 1 devices processed (0 datapoints)

2011-10-21 00:09:43,508 INFO zen.collector.scheduler: Tasks: 3 Successful_Runs: 11 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0

2011-10-21 00:14:43,516 INFO zen.zentrap: 3 devices processed (0 datapoints)

2011-10-21 00:14:43,516 INFO zen.collector.scheduler: Tasks: 5 Successful_Runs: 9 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 2

2011-10-21 00:19:49,263 INFO zen.zentrap: 3 devices processed (0 datapoints)

2011-10-21 00:19:49,263 INFO zen.collector.scheduler: Tasks: 5 Successful_Runs: 10 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 1

2011-10-21 00:37:12,307 INFO zen.zentrap: 52 devices processed (0 datapoints)

2011-10-21 00:37:12,307 INFO zen.collector.scheduler: Tasks: 54 Successful_Runs: 26 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 2

2011-10-21 01:41:38,165 INFO zen.zentrap: 229 devices processed (0 datapoints)

2011-10-21 01:41:38,167 INFO zen.collector.scheduler: Tasks: 231 Successful_Runs: 197 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 5

 

An increase in failed runs and running tasks.

 

Does anyone know what this means and how to resolve this issue.

 

Thanks !

  • blewa Rank: White Belt 6 posts since
    May 5, 2011
    Currently Being Moderated
    2. Nov 10, 2011 12:23 PM (in response to kingpin)
    Re: ZENTRAP keeps going down

    I'm having similar issues.  On one of my Zenoss servers zentrap will begin consumer 2GB of memory overnight and still growing.  It seems to coorelate with the 12 hour remodel schedule for whatever that's worth.

  • jcurry ZenossMaster 1,021 posts since
    Apr 15, 2008
    Currently Being Moderated
    3. Nov 11, 2011 12:19 PM (in response to kingpin)
    Re: ZENTRAP keeps going down

    I also have the same.  But we don't have Failed Runs (any of us, I suspect).  This log is not good for the eye but the last line of your log:

     

    2011-10-21 01:41:38,167 INFO zen.collector.scheduler: Tasks: 231 Successful_Runs: 197 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 5

     

    is saying 231 Tasks (definitely high but may be a trap storm),

    Successful_Runs: 197

    Failed Runs: 0

    ....

    We are at 3.2.1.

     

    Looking even closer at our zentrap.log, the zentrap die seems to happen every 12 hours - regular as clockwork.

     

    We have fixed the issue by restarting zentrap in cron every hour. 

     

    Could anyone else with this problem check whether they are also seeing the 12-hour cycle and could you check whether it coincides with zenmodeler running (we have lost the zenmodeler logs that match up with the zentrap logs from before our cron kludge).  That's the only thing I can think of that typically runs every 12 hours.

     

    Cheers,

    Jane

  • Shane Scott ZenossMaster 1,373 posts since
    Jul 6, 2009
    Currently Being Moderated
    4. Nov 11, 2011 12:34 PM (in response to jcurry)
    Re: ZENTRAP keeps going down

    Jane:

     

    We're seeing this problem in Zenoss v4.1 as well, but with more than just zentrap. Do you see any unjellyable errors in zenhub (in debug mode) when zentrap tries to fetch its updated config?

     

    Best,
    --Shane

  • jcurry ZenossMaster 1,021 posts since
    Apr 15, 2008
    Currently Being Moderated
    5. Nov 14, 2011 8:07 AM (in response to Shane Scott)
    Re: ZENTRAP keeps going down

    Don't see any "jelly" around but CPU on zentrap seems to go through the roof as you approach failure condition.

     

    We are restarting zentrap every hour with cron using:

     

    # Restart the zentrap Daemon every hour

    0 * * * * /usr/local/zenoss/zenoss/bin/zentrap restart >> /home/zenoss/zendisc_crontab.log 2>&1

     

    This sometimes produces following messages in zentrap.log on daemon restart:

     

    2011-11-14 12:33:42,710 INFO zen.zentrap: 107 devices processed (0 datapoints)

    2011-11-14 12:33:42,711 INFO zen.collector.scheduler: Tasks: 108 Successful_Runs: 39 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 4

    2011-11-14 12:39:22,655 INFO zen.zentrap: 126 devices processed (0 datapoints)

    2011-11-14 12:39:22,656 INFO zen.collector.scheduler: Tasks: 127 Successful_Runs: 65 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 6

    2011-11-14 13:00:01,999 INFO zen.zentrap: 232 devices processed (0 datapoints)

    2011-11-14 13:00:02,000 INFO zen.collector.scheduler: Tasks: 233 Successful_Runs: 160 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 10

    2011-11-14 13:00:05,302 INFO zen.zentrap: Deleting PID file /usr/local/zenoss/zenoss/var/zentrap-localhost.pid ...

    2011-11-14 13:00:05,302 INFO zen.zentrap: Daemon TrapDaemon shutting down

    2011-11-14 13:00:05,303 ERROR zen.zentrap: Maintenance failed: [Failure instance: Traceback (failure with no frames): <class 'twisted.internet.error.ConnectionLost'>: Connection to the other side was lost in a non-clean fashion: Connection lost.

     

    I have opened a ticket against 3.2.1 - http://dev.zenoss.com/trac/ticket/7885 .

    Cheers,

    Jane

  • Shane Scott ZenossMaster 1,373 posts since
    Jul 6, 2009
    Currently Being Moderated
    6. Nov 15, 2011 11:58 AM (in response to jcurry)
    Re: ZENTRAP keeps going down

    Jane:

     

    Sorry. I'd meant that the unjellyable message will show up in the zenhub feeding zentrap.

     

    --Shane

  • johnnynoc Rank: Green Belt 102 posts since
    Jan 25, 2011
    Currently Being Moderated
    7. Dec 7, 2011 11:42 AM (in response to kingpin)
    Re: ZENTRAP keeps going down

    I've experienced the same problem and it seems to be compounded by a zenmodeler process that's running.  Meaning, I have a process that kicks off every night to model all of our devices on a collector.  When this runs I see the collected zentrap events shoot through the roof.  I put in a cron job to restart zentrap hourly on Tuesday and you can see in the graphs the collected events will drop but then immediately shoot back up.

     

    Capture.PNG

     

    Is anyone else seeing something similar or can suggest how the zenmodeler and zentrap events correlate?

     

    Thanks,

    John

  • jcurry ZenossMaster 1,021 posts since
    Apr 15, 2008
    Currently Being Moderated
    8. Dec 7, 2011 11:59 AM (in response to johnnynoc)
    Re: ZENTRAP keeps going down

    Yeh - that mirrors what we are seeing.  I opened ticket  7885 3 weeks back but it doesn't look like it has made any progress even though I set it at priority 1.

     

    Any update from Zenoss would be much appreciated by a few of us now

     

    Cheers,

    Jane

  • zenphil ZenossEmployee 6 posts since
    Jun 20, 2011
    Currently Being Moderated
    9. Dec 8, 2011 6:06 PM (in response to jcurry)
    Re: ZENTRAP keeps going down

    We are currently working on this issue. Please hang tight!

     

    Phil Bowman

    Sr. Software Developer, Zenoss

  • andrew savvas Rank: White Belt 8 posts since
    Sep 8, 2011
    Currently Being Moderated
    10. Feb 8, 2012 6:21 AM (in response to zenphil)
    Re: ZENTRAP keeps going down

    Hey Phil, is there any update on this issue? We've just upgraded to v3.2 and are seeing the exact same problem.

     

    Thanks

     

    Andrew

  • jcurry ZenossMaster 1,021 posts since
    Apr 15, 2008
    Currently Being Moderated
    11. Feb 8, 2012 6:57 AM (in response to andrew savvas)
    Re: ZENTRAP keeps going down

    No update to the ticket other than a target of Core 4 alpha. 

     

    Zenoss???  Is it in the current alpha build?

     

    Cheers,

    Jane

  • Shane Scott ZenossMaster 1,373 posts since
    Jul 6, 2009
    Currently Being Moderated
    12. Feb 8, 2012 9:59 AM (in response to jcurry)
    Re: ZENTRAP keeps going down

    jcurry:

     

    We haven't had any issues with zentrap in v4.1.1 as of yet. Should be working in v4.1.7 also.

     

    --Shane

  • blewa Rank: White Belt 6 posts since
    May 5, 2011
    Currently Being Moderated
    13. Feb 23, 2012 1:38 PM (in response to kingpin)
    Re: ZENTRAP keeps going down

    Sad that an issue that's so pronounced and affects a Zenoss environment so much hasn't had a fix come out.  I realize there's a lot of resources going into 4.x, but it's easy to see how one could have missed events because of this bug starving the host for resources.

  • mwcotton Rank: Brown Belt 563 posts since
    Apr 23, 2008
    Currently Being Moderated
    14. Feb 23, 2012 5:09 PM (in response to blewa)
    Re: ZENTRAP keeps going down

    Agreed, I would think a key to avoiding this issue in future releases would be to understand what is causing it in the current release.

1 2 Previous Next

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points