Archived community.zenoss.org | full text search
Skip navigation
6143 Views 8 Replies Latest reply: Oct 14, 2013 8:47 AM by Eric Gemme RSS
Troy Forsythe Newbie 5 posts since
Jan 26, 2011
Currently Being Moderated

Sep 5, 2012 11:39 AM

zenwinperf routinely stops and needs to be restarted

I have a new Zenoss Core 4 installation, and I've noticed that the zenwinperf daemon is routinely stopping and needs to be restarted.

 

The logs seem pretty standard and then include a few things like:

 

2012-09-05 10:32:33,300 INFO zen.maintenance: Performing periodic maintenance
2012-09-05 10:32:33,301 INFO zen.zenwinperf: Counter eventCount, value 9692
2012-09-05 10:32:33,303 INFO zen.zenwinperf: 72 devices processed (8475 datapoints)
2012-09-05 10:32:33,305 INFO zen.collector.scheduler: Tasks: 73 Successful_Runs: 177 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0
2012-09-05 10:34:04,145 ERROR zen.CollectorCmdBase: Unable to scan device somemachine.domain.com: NT_STATUS_HOST_UNREACHABLE 2012-09-05 10:36:57,475 WARNING zen.winperf.PerfRpc: Bad counter for device x.x.x.x: \Network Interface(BASP Virtual Adapter)\Bytes Received/sec
2012-09-05 10:36:57,475 WARNING zen.winperf.PerfRpc: Bad counter for device x.x.x.x: \Network Interface(BASP Virtual Adapter)\Bytes Sent/sec
2012-09-05 10:36:57,476 WARNING zen.winperf.PerfRpc: Bad counter for device x.x.x.x: \Network Interface(BASP Virtual Adapter)\Packets Received Errors
2012-09-05 10:36:57,476 WARNING zen.winperf.PerfRpc: Bad counter for device x.x.x.x: \Network Interface(BASP Virtual Adapter)\Packets Received/sec
2012-09-05 10:36:57,477 WARNING zen.winperf.PerfRpc: Bad counter for device x.x.x.x: \Network Interface(BASP Virtual Adapter)\Packets Outbound Errors
2012-09-05 10:36:57,477 WARNING zen.winperf.PerfRpc: Bad counter for device x.x.x.x: \Network Interface(BASP Virtual Adapter)\Packets Sent/sec
2012-09-05 10:37:33,322 INFO zen.maintenance: Performing periodic maintenance
2012-09-05 10:37:33,322 INFO zen.zenwinperf: Counter eventCount, value 9701
2012-09-05 10:37:33,323 INFO zen.zenwinperf: 72 devices processed (9021 datapoints)
2012-09-05 10:37:33,325 INFO zen.collector.scheduler: Tasks: 73 Successful_Runs: 208 Failed_Runs: 0 Missed_Runs: 0 Queued_Tasks: 0 Running_Tasks: 0
2012-09-05 10:37:57,078 ERROR zen.CollectorCmdBase: Unable to scan device othermachine.domain.com: NT_STATUS_RESOURCE_NAME_NOT_FOUND


 

 

The daemon stopped shortly after the last message shown above.

 

Any ideas as to why this would be shutting down or how to keep it running?

 

Thanks,

Troy.

  • OneLoveAmaru Rank: White Belt 73 posts since
    May 30, 2011

    Same here and I have yet to figure it out besides creating a band aid fix by setting a cron job to keep starting it when it stops.

     

    Anybody know how to troubleshoot this and fix it?

  • Michael Fabricius Rank: White Belt 10 posts since
    Oct 2, 2012

    I had similar issues what I did was setting watchdog = true in the zenwinperf Configuration.

  • OneLoveAmaru Rank: White Belt 73 posts since
    May 30, 2011

    I did that as a temporary fix but the real problem was another service losing connection to mysql and crashing, which when affected this service.

  • ChristianCScott Rank: White Belt 24 posts since
    Oct 30, 2012

    I encountered this issue in the past as well; I do not recommend using cron jobs as they are slow and create gaps in data collections.


    Michael is correct; enable the watchdog service built into Zenoss; when zenwinperf goes down it will brought back up almost instantly.

    $cat /opt/zenoss/etc/zenwinperf.conf | grep watchdog

    Edit zenwinperf.conf and change watchdog to true; restart zenwinperf afterwards.

     

    Also, depending on the amount of devices you are grabbing data from you may want to set your maxparallel lower; you may be collecting to much data at once for the daemon/system to handle.

    $cat /opt/zenoss/etc/zenwinperf.conf | grep maxparallel

  • reighnman Rank: White Belt 60 posts since
    Apr 22, 2008

    Winperf crashes for me every time it gets a timeout from one of our remote branches:

     

    2013-02-07 00:04:22,152 ERROR zen.CollectorCmdBase: Unable to scan device 192.168.3.2: NT_STATUS_IO_TIMEOUT

     

    Or at least that is always the last line in the log every time it crashes.  Any work around for this?  I'd like to avoid using watchdog

  • cjutting Newbie 2 posts since
    Aug 16, 2013

    Sorry to bring up an old thread, but I am also seeing zenwinperf crash after logging an NT_STATUS_IO_TIMEOUT error.  Was this issue ever resolved?

  • mikea730 Rank: Green Belt 131 posts since
    Sep 28, 2010

    My too!  Running 4.2.4 on RedHat 6 with zenup - "zenoss_core-4.2.4-SP71.zup" installed.

     

    Crashes about 5 minutes after starting.

     

    any resolution?

     

     

    
  • Eric Gemme Rank: White Belt 30 posts since
    Aug 5, 2011

    We too have this problem.   Attempted SP71 but it broke basic things such as the device issue portlet.   This left me scared of trying ZenUp anymore with the risk of breaking something important.

     

    winperf is a requirement for us as we monitor exclusively Windows boxes spreaded on two sites (28 in total).  The only Linux that exist in our shop is the one for Zenoss.

     

    Sad that winperf was working a way better in previous 3.2 version, at the time we were able to pick and choose our Zenpacks.   I was first happy to see all of this becoming core function.   Now I think this is a huge undertakement Zenoss Core maintainers made, who need to broader their support.   I hope the Entreprise licensees don't exprerience these problems as I would be personnally shocked to spend mega bucks on such unstable function.

     

    I noticed that no Zenoss expert pointed out their nose in this thread so far.   Please acknowledge.   In one other thread, someone suggested a cron job to restart winpert periodically.   This is not an acceptable workaround for us.

     

    Zenoss Core is the only true free Windows agentless solution I've found,  and the least painfull to deploy for a Linux thing.

     

    Hurry guys,  make us happy again!

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points