Archived community.zenoss.org | full text search
Skip navigation
981 Views 0 Replies Latest reply: Nov 20, 2012 4:14 PM by mfallone RSS
mfallone Rank: White Belt 4 posts since
Sep 17, 2009
Currently Being Moderated

Nov 20, 2012 4:14 PM

Zenoss 4.2 Core - Worklist not clearing

Greetings I am using Zenoss Core 4.2 centos 6 x64 with 16GB of RAM and have been monitoring ~15 servers successfully for over a month.  After I added in more servers (~300) I started having problems adding in more devices and modelling my existing ones.

 

 

- When manually modeling through the UI, I see that "Zenhub has connected" but it then holds for over a minute. 

- When viewing the logs I see "2012-11-20 13:36:48,698 WARNING zen.zensyslog: No service named 'EventService': ZenHub may be disconnected" (zenoss status shows zenhub up and I can netcat to localhost:8789)

- Enabling debugging on zenhub shows that the worklist increases and holds at around 50.  I do not see anything in the 'Jobs' area of the Zenoss UI

 

 

Is there any way to clean out the worklist?  I enabled debugging in zenjobs and see the following:

 

 

2012-11-20 13:59:25,946 ERROR celery.apps.worker:

Mediator

=================================================

  File "/opt/zenoss/lib/python2.7/threading.py", line 525, in __bootstrap

    self.__bootstrap_inner()

  File "/opt/zenoss/lib/python2.7/threading.py", line 552, in __bootstrap_inner

    self.run()

  File "/opt/zenoss/lib/python/celery/utils/threads.py", line 51, in run

    self.body()

  File "/opt/zenoss/lib/python/celery/worker/mediator.py", line 69, in body

    return

  File "/opt/zenoss/lib/python/celery/worker/buckets.py", line 142, in get

    not_empty.wait(timeout)

  File "/opt/zenoss/lib/python2.7/threading.py", line 263, in wait

    _sleep(delay)

=================================================

LOCAL VARIABLES

=================================================

{'delay': 0.03471708297729492,

'endtime': 1353437965.827212,

'gotit': False,

'remaining': -9.989738464355469e-05,

'saved_state': None,

'self': <Condition(<thread.lock object at 0x5779cf0>, 1)>,

'timeout': 1.0,

'waiter': <thread.lock object at 0x57797d0>}

 

 

 

 

Thread-7

=================================================

  File "/opt/zenoss/lib/python2.7/threading.py", line 525, in __bootstrap

    self.__bootstrap_inner()

  File "/opt/zenoss/lib/python2.7/threading.py", line 552, in __bootstrap_inner

    self.run()

  File "/opt/zenoss/lib/python/billiard/pool.py", line 274, in run

    return self.body()

  File "/opt/zenoss/lib/python/billiard/pool.py", line 499, in body

 

 

Thread-5

=================================================

  File "/opt/zenoss/lib/python2.7/threading.py", line 525, in __bootstrap

    self.__bootstrap_inner()

  File "/opt/zenoss/lib/python2.7/threading.py", line 552, in __bootstrap_inner

    self.run()

  File "/opt/zenoss/lib/python/billiard/pool.py", line 274, in run

    return self.body()

  File "/opt/zenoss/lib/python/billiard/pool.py", line 300, in body

    time.sleep(0.8)

=================================================

LOCAL VARIABLES

=================================================

{'self': <Supervisor(Thread-5, started daemon 139794026075904)>}

 

 

 

 

Thread-6

=================================================

  File "/opt/zenoss/lib/python2.7/threading.py", line 525, in __bootstrap

    self.__bootstrap_inner()

  File "/opt/zenoss/lib/python2.7/threading.py", line 552, in __bootstrap_inner

    self.run()

  File "/opt/zenoss/lib/python/billiard/pool.py", line 274, in run

    return self.body()

  File "/opt/zenoss/lib/python/billiard/pool.py", line 319, in body

    for taskseq, set_length in iter(taskqueue.get, None):

  File "/opt/zenoss/lib/python2.7/Queue.py", line 168, in get

    self.not_empty.wait()

  File "/opt/zenoss/lib/python2.7/threading.py", line 244, in wait

    waiter.acquire()

=================================================

LOCAL VARIABLES

=================================================

{'saved_state': None,

'self': <Condition(<thread.lock object at 0x5779930>, 1)>,

'timeout': None,

'waiter': <thread.lock object at 0x57797f0>}

MainThread

=================================================

  File "/opt/zenoss/Products/Jobber/zenjobs.py", line 118, in <module>

    zj.run()

  File "/opt/zenoss/Products/Jobber/zenjobs.py", line 63, in run

    return self.celery.Worker(**kwargs).run()

  File "/opt/zenoss/lib/python/celery/apps/worker.py", line 140, in run

    self.run_worker()

  File "/opt/zenoss/lib/python/celery/apps/worker.py", line 222, in run_worker

    worker.start()

  File "/opt/zenoss/lib/python/celery/worker/__init__.py", line 238, in start

    component.start()

  File "/opt/zenoss/lib/python/celery/worker/consumer.py", line 350, in start

    self.consume_messages()

  File "/opt/zenoss/lib/python/celery/worker/consumer.py", line 364, in consume_messages

    self.connection.drain_events(timeout=1)

  File "/opt/zenoss/lib/python/kombu/connection.py", line 167, in drain_events

    return self.transport.drain_events(self.connection, **kwargs)

  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 261, in drain_events

    return connection.drain_events(**kwargs)

  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 93, in drain_events

    return self.wait_multi(self.channels.values(), timeout=timeout)

  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 99, in wait_multi

    chanmap.keys(), allowed_methods, timeout=timeout)

  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 158, in _wait_multiple

    channel, method_sig, args, content = read_timeout(timeout)

  File "/opt/zenoss/lib/python/kombu/transport/amqplib.py", line 131, in read_timeout

    return self.method_reader.read_method()

  File "/opt/zenoss/lib/python/amqplib/client_0_8/method_framing.py", line 218, in read_method

    self._next_method()

  File "/opt/zenoss/lib/python/amqplib/client_0_8/method_framing.py", line 133, in _next_method

    frame_type, channel, payload = self.source.read_frame()

  File "/opt/zenoss/lib/python/amqplib/client_0_8/transport.py", line 149, in read_frame

    frame_type, channel, size = unpack('>BHI', self._read(7))

  File "/opt/zenoss/lib/python/amqplib/client_0_8/transport.py", line 261, in _read

    s = self.sock.recv(65536)

  File "/opt/zenoss/lib/python/celery/apps/worker.py", line 309, in cry_handler

    logger.error("\n" + cry())

  File "/opt/zenoss/lib/python/celery/utils/__init__.py", line 145, in cry

    traceback.print_stack(frame, file=out)

=================================================

LOCAL VARIABLES

=================================================

{'frame': <frame object at 0x7f243c003800>,

'main_thread': None,

'out': <StringIO.StringIO instance at 0x4e15638>,

'sep': '=================================================\n',

't': <TaskHandler(Thread-6, started daemon 139794015586048)>,

'thread': <_MainThread(MainThread, started 139794333112064)>,

'tid': 139794333112064,

'tmap': {139793927235328: <Mediator(Mediator, started daemon 139793927235328)>,

          139793937725184: <ResultHandler(Thread-7, started daemon 139793937725184)>,

          139794015586048: <TaskHandler(Thread-6, started daemon 139794015586048)>,

          139794026075904: <Supervisor(Thread-5, started daemon 139794026075904)>,

          139794333112064: <_MainThread(MainThread, started 139794333112064)>}}

 

 

 

 

debug in zenhub.log:

2012-11-20 13:40:51,280 INFO zen: Setting logging level to DEBUG

2012-11-20 13:40:51,281 INFO zen.zenoss.protocols.amqp: error closing publisher [Errno 4] Interrupted system call

2012-11-20 13:40:51,318 DEBUG zen.Events: ===============  incoming event  ===============

2012-11-20 13:40:51,318 DEBUG zen.Events: Got a localhost zenhub heartbeat event (timeout 90 sec).

2012-11-20 13:40:51,318 DEBUG zen.zenoss.protocols.amqp: Publishing with routing key zenoss.heartbeat.localhost to exchange

zenoss.heartbeats

2012-11-20 13:40:51,343 INFO zen.ZenHub: Worker (2329) reports 2012-11-20 13:26:28,145 INFO zen.pbclientfactory: Initial con

nect timed out after 30 seconds

2012-11-20 13:40:51,343 INFO zen.ZenHub: Worker (2331) reports 2012-11-20 13:26:28,304 INFO zen.pbclientfactory: Initial con

nect timed out after 30 seconds

2012-11-20 13:40:54,154 DEBUG zen.hub: adding listener for localhost:EventService

2012-11-20 13:40:54,157 DEBUG zen.hub: adding listener for localhost:ZenStatusConfig

2012-11-20 13:40:54,180 DEBUG zen.ZenHub: worklist has 1 items

2012-11-20 13:40:54,180 DEBUG zen.ZenHub: get candidate workers for sendEvents...

2012-11-20 13:40:54,180 DEBUG zen.ZenHub: candidate workers are [0, 1]

2012-11-20 13:40:54,180 DEBUG zen.ZenHub: Giving sendEvents to worker 0, (localhost:Products.ZenHub.services.EventService.se

ndEvents)

2012-11-20 13:40:54,181 DEBUG zen.ZenHub: worklist has 1 items

2012-11-20 13:40:54,181 DEBUG zen.ZenHub: get candidate workers for getDevicePingIssues...

2012-11-20 13:40:54,181 DEBUG zen.ZenHub: candidate workers are [1]

2012-11-20 13:40:54,181 DEBUG zen.ZenHub: Giving getDevicePingIssues to worker 1, (localhost:Products.ZenHub.services.EventS

ervice.getDevicePingIssues)

2012-11-20 13:40:54,217 DEBUG zen.ZenHub: worklist has 1 items

2012-11-20 13:40:54,218 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:40:54,232 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.EventService.getDevicePingIssues

finished in 0.0501899719238

2012-11-20 13:40:54,232 DEBUG zen.ZenHub: worklist has 1 items

2012-11-20 13:40:54,232 DEBUG zen.ZenHub: get candidate workers for getConfigProperties...

2012-11-20 13:40:54,232 DEBUG zen.ZenHub: candidate workers are [1]

2012-11-20 13:40:54,232 DEBUG zen.ZenHub: Giving getConfigProperties to worker 1, (localhost:Products.ZenHub.services.ZenSta

tusConfig.getConfigProperties)

2012-11-20 13:40:54,238 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.ZenStatusConfig.getConfigPropert

ies finished in 0.00522994995117

2012-11-20 13:40:54,274 DEBUG zen.ZenHub: worklist has 1 items

2012-11-20 13:40:54,275 DEBUG zen.ZenHub: get candidate workers for getThresholdClasses...

2012-11-20 13:40:54,275 DEBUG zen.ZenHub: candidate workers are [1]

2012-11-20 13:40:54,275 DEBUG zen.ZenHub: Giving getThresholdClasses to worker 1, (localhost:Products.ZenHub.services.ZenSta

tusConfig.getThresholdClasses)

2012-11-20 13:40:54,283 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.ZenStatusConfig.getThresholdClas

ses finished in 0.00764012336731

2012-11-20 13:40:54,285 DEBUG zen.ZenHub: worklist has 1 items

2012-11-20 13:40:54,285 DEBUG zen.ZenHub: get candidate workers for getCollectorThresholds...

2012-11-20 13:40:54,285 DEBUG zen.ZenHub: candidate workers are [1]

2012-11-20 13:40:54,285 DEBUG zen.ZenHub: Giving getCollectorThresholds to worker 1, (localhost:Products.ZenHub.services.Zen

StatusConfig.getCollectorThresholds)

2012-11-20 13:40:54,333 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.ZenStatusConfig.getCollectorThre

sholds finished in 0.047210931778

2012-11-20 13:40:54,336 DEBUG zen.ZenHub: worklist has 1 items

2012-11-20 13:40:54,336 DEBUG zen.ZenHub: get candidate workers for getDeviceConfigs...

2012-11-20 13:40:54,336 DEBUG zen.ZenHub: candidate workers are [1]

2012-11-20 13:40:54,336 DEBUG zen.ZenHub: Giving getDeviceConfigs to worker 1, (localhost:Products.ZenHub.services.ZenStatus

Config.getDeviceConfigs)

2012-11-20 13:40:56,124 DEBUG zen.hub: adding listener for localhost:EventService

2012-11-20 13:40:56,127 DEBUG zen.hub: adding listener for localhost:ProcessConfig

2012-11-20 13:40:56,151 DEBUG zen.ZenHub: worklist has 1 items

2012-11-20 13:40:56,151 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:40:56,151 DEBUG zen.ZenHub: worklist has 2 items

2012-11-20 13:40:56,152 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:40:56,190 DEBUG zen.ZenHub: worklist has 3 items

2012-11-20 13:40:56,190 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:40:56,283 DEBUG zen.ZenHub: worklist has 3 items

2012-11-20 13:40:56,283 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:40:56,785 DEBUG zen.ZenHub: worker 1, work localhost:Products.ZenHub.services.ZenStatusConfig.getDeviceConfigs

finished in 2.44905090332

2012-11-20 13:40:56,786 DEBUG zen.ZenHub: worklist has 3 items

2012-11-20 13:40:56,786 DEBUG zen.ZenHub: get candidate workers for getDevicePingIssues...

2012-11-20 13:40:56,786 DEBUG zen.ZenHub: candidate workers are [1]

2012-11-20 13:40:56,837 DEBUG zen.ZenHub: Giving sendEvents to worker 1, (localhost:Products.ZenHub.services.EventService.sendEvents)

2012-11-20 13:40:56,837 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:40:57,980 DEBUG zen.ZenHub: worklist has 2 items

2012-11-20 13:40:57,980 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:41:01,284 DEBUG zen.ZenHub: worklist has 2 items

2012-11-20 13:41:01,285 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:41:06,285 DEBUG zen.ZenHub: worklist has 2 items

2012-11-20 13:41:06,285 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:41:11,286 DEBUG zen.ZenHub: worklist has 2 items

2012-11-20 13:41:11,286 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:41:16,287 DEBUG zen.ZenHub: worklist has 2 items

2012-11-20 13:41:16,287 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:41:21,287 DEBUG zen.ZenHub: worklist has 2 items

2012-11-20 13:41:21,287 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:41:21,319 DEBUG zen.Events: ===============  incoming event  ===============

2012-11-20 13:41:21,319 DEBUG zen.Events: Got a localhost zenhub heartbeat event (timeout 90 sec).

2012-11-20 13:41:21,319 DEBUG zen.zenoss.protocols.amqp: Publishing with routing key zenoss.heartbeat.localhost to exchange zenoss.heartbeats

2012-11-20 13:41:22,753 DEBUG zen.hub: adding listener for localhost:EventService

2012-11-20 13:41:22,756 DEBUG zen.hub: adding listener for localhost:EventLogConfig

2012-11-20 13:41:22,780 DEBUG zen.ZenHub: worklist has 3 items

2012-11-20 13:41:22,780 DEBUG zen.ZenHub: all workers are busy

2012-11-20 13:41:22,780 DEBUG zen.ZenHub: worklist has 4 items

2012-11-20 13:41:22,780 DEBUG zen.ZenHub: all workers are busy

 

Thanks,

/mike

 

EDIT: Changed the RAM threshold in the rabbitmq config and adjusted the disk space as per: message/68974#68974

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points