Jun 28, 2012 5:25 AM
Zenoss 4.1.70-1554, network change, zenhub won't start, rabbitmq config?
-
Like (0)
Hi,
Whilst messing about with my Zenoss 4 test server (CentOS 6 x64, Zenoss 4.1.70-1554) I switched from DHCP to a static IP (the same IP for both), turned on nic bonding and put a proper hostname entry in /etc/hosts. Unfortunately this combination has stopped zenhub working, and I think it may be rabbitmq related. I've followed the steps in the relase notes about what to do on a hostname change (below) but no joy:
export VHOST="/zenoss"
export USER="zenoss"
export PASS="zenoss"
rabbitmqctl stop_app
rabbitmqctl reset
rabbitmqctl start_app
rabbitmqctl add_vhost "$VHOST"
rabbitmqctl add_user "$USER" "$PASS"
rabbitmqctl set_permissions -p "$VHOST" "$USER" '.*' '.*' '.*'
So I tuned on debugging for zenhub, and that says:
2012-06-28 10:09:11,430 INFO zen.HubService.RenderConfig: Starting graph retrieval listener on port 8090
2012-06-28 10:09:11,437 DEBUG zen.Events: =============== incoming event ===============
2012-06-28 10:09:11,441 DEBUG zen.queuepublisher: About to publish this event to the raw event queue:uuid: "ecc5ec52-c100-11e1-b75d-005056810132"
created_time: 1340874551437
event_class: "/App/Start"
actor {
element_type_id: DEVICE
element_identifier: "localhost"
element_sub_type_id: COMPONENT
element_sub_identifier: "zenhub"
}
summary: "zenhub started"
severity: SEVERITY_CLEAR
event_key: ""
, with this routing key: zenoss.zenevent.app.start
2012-06-28 10:09:11,443 INFO zen.zenoss.protocols.amqp: amqp connection was closed [Errno 111] Connection refused
2012-06-28 10:09:11,444 INFO zen.zenoss.protocols.amqp: amqp connection was closed [Errno 111] Connection refused
2012-06-28 10:09:11,444 ERROR zen.ZenHub: Unable to send an event
Traceback (most recent call last):
File "/opt/zenoss/Products/ZenHub/zenhub.py", line 488, in sendEvent
self.zem.sendEvent(Event(**kw))
File "/opt/zenoss/Products/ZenEvents/MySqlSendEvent.py", line 63, in sendEvent
event = self._publishEvent(event)
File "/opt/zenoss/Products/ZenEvents/MySqlSendEvent.py", line 85, in _publishEvent
publisher.publish(event)
File "/opt/zenoss/Products/ZenMessaging/queuemessaging/publisher.py", line 285, in publish
self._publish("$RawZenEvents", routing_key, event, mandatory=mandatory, immediate=immediate)
File "/opt/zenoss/Products/ZenMessaging/queuemessaging/publisher.py", line 304, in _publish
mandatory, immediate)
File "/opt/zenoss/Products/ZenMessaging/queuemessaging/publisher.py", line 378, in publish
headers=headers, declareExchange=declareExchange)
File "/opt/zenoss/lib/python/zenoss/protocols/amqp.py", line 140, in publish
raise Exception("Could not publish message. Connection may be down")
Exception: Could not publish message. Connection may be down
2012-06-28 10:09:15,388 DEBUG zen.ZenHub: Registered 2 invalidation filters.
2012-06-28 10:09:15,460 DEBUG zen.thresholds: Updating threshold ('high event queue', ('localhost collector', ''))
2012-06-28 10:09:15,461 DEBUG zen.thresholds: Updating threshold ('zenmodeler cycle time', ('localhost collector', ''))
2012-06-28 10:09:15,466 DEBUG zen.ZenHub: Starting /opt/zenoss/bin/zenhubworker run -C /opt/zenoss/var/zenhub/localhost_worker.conf
2012-06-28 10:09:15,469 DEBUG zen.ZenHub: Starting /opt/zenoss/bin/zenhubworker run -C /opt/zenoss/var/zenhub/localhost_worker.conf
2012-06-28 10:09:15,671 DEBUG zen.Events: =============== incoming event ===============
2012-06-28 10:09:15,672 DEBUG zen.Events: Got a localhost zenhub heartbeat event (timeout 90 sec).
2012-06-28 10:09:15,674 INFO zen.zenoss.protocols.amqp: amqp connection was closed [Errno 111] Connection refused
2012-06-28 10:09:15,675 INFO zen.zenoss.protocols.amqp: amqp connection was closed [Errno 111] Connection refused
I think the rabbitmq setup maybe a bit unhappy at the change. Does anyone have an idea of how to sort this out? Thanks in advance!
Cheers,
J
I should add that the rabbitmqctl commands don't work as they can't connect to anything...
jzen04 tmp $ rabbitmqctl stop_app
Stopping node rabbit@jzen04 ...
Error: unable to connect to node rabbit@jzen04: nodedown
diagnostics:
- nodes and their ports on jzen04: [{rabbitmqctl4947,51175}]
- current node: rabbitmqctl4947@jzen04
- current node home dir: /var/lib/rabbitmq
- current node cookie hash: WqOwg2zowWUM9EIuFpstbQ==
I've put the network settings back to what they were before and it all seems to work again.
I'd still be very interested in what would need to be changed settings wise if the networking of the box was changed (and the hostname).
I am experiencing same issue. Additionally what I have tried:
[root@newname ~]# service rabbitmq-server start
Starting rabbitmq-server: FAILED - check /var/log/rabbitmq/startup_{log, _err}
rabbitmq-server.
[root@newname ~]# cat /var/log/rabbitmq/startup_log
Activating RabbitMQ plugins ...
********************************************************************************
*WARNING* Undefined function ssl:ssl_accept/3
*WARNING* Undefined function unicode:characters_to_binary/3
********************************************************************************
0 plugins activated:
ERROR: epmd error for host "newname": timeout (timed out establishing tcp connection)
Stupid reason for me, I did not yet have newname in DNS. I temporarily added to /etc/hosts and then rabbitmq-server started without issue. Zenhub then seemed to stay running.
To prevent RabbitMQ from changing its node name when the hostname changes, you should create a rabbitmq-env.conf file and specify NODENAME (as documented below). This will be used as the name of the mnesia database used by RabbitMQ.
DOH. For some strange reason I'd put the NODENAME setting in rabbitmq-conf and not rabbitmq-env.conf as per the documentation. I don't think I'll be making that mistake again.
Once I'd moved (and checked) the config file and run the procedure to recreate the queue(s) then Zenoss restarted fine.
I spoke too soon, still having issues. jshardlow, what is procedure to recreate the queues in rabbitmq?
The rabbitmqctl commands listed in the first message in this thread will recreate the vhost, user, and set the proper credentials for the vhost.
The catch is that in order to recreate the queues, you have to have a running rabbitmq server. Which can be a bit of a chicken/egg situation. If you can run rabbitmqctl status and not get any error messages, then you should be OK.
New to Zenoss, testing the 4.2 VM and seeing this issue, your all over it. If I could get clarity on the corralation between rabbitmq-server and zenhub I may be able to put the comments above together and resolve however, at this point I know rabbitmq is running (Valid PID and node running) however zenhub will not start.
Maybe its the ORDER in which my changes are being made(Or lack of understanding the zenhub<->rabbitmq connection.
Goal -> Change to a valid hostname and dns server/resolution
I'm changing one thing at at time(Starting with a working 4.2 VM with 2 added hosts+static IP):
1) created the /etc/rabbitmw/rabbitmq-env.conf (Contains ONLY variables below)
HOSTNAME=localhost
NODENAME=rabbit@localhost
--- have NOT changed localhost.localdomain yet, (sysconfig/network file or hosts etc...)
--- Restart the test host VM
Check that rabbitmq started(Snipitt rabbitmqctl status below)
-------- SNIPPIT ----------
Status of node rabbit@localhost ...
[{pid,3550},
{running_applications,[{rabbit,"RabbitMQ","2.8.4"},
{os_mon,"CPO CXC 138 46","2.1.8"},
{sasl,"SASL CXC 138 11","2.1.5.4"},
{mnesia,"MNESIA CXC 138 12","4.4.7"},
{stdlib,"ERTS CXC 138 10","1.15.5"},
{kernel,"ERTS CXC 138 10","2.12.5"}]},
{os,{unix,linux}},
{erlang_version,"Erlang (BEAM) ............
-------- SNIPPIT ----------
Log into the GUI and check the Daemons
zenhub is the only one not started
If I remove file /etc/rabbitmw/rabbitmq-env.conf and reboot everything comes up fine so....
Are there config changes needed in the zenhub config OR is my /etc/rabbitmw/rabbitmq-env.conf incomplete OR do I need to recreate rabbitmq queues in a certain order, based on the futrure hostname??
It's been a while since I had this problem. But if you've got a running system (pre-name change) and you can do a rabbitmqctl status OK, then I think the order is something like:
Stop Zenoss
Put new hostname in rabbitmq-env.conf
Reconfigure rabbitmq
Rename box (reboot to be on the safe side?)
Start Zenoss
As far as I know, you don't have to make any changes to the zenhub config.
This solution does not work. I too am getting the exact same symptoms and problems. And I have yet to find a working solution.
I was able to get rabbitmq-server to start after adding my host name to the hosts file at the end of the 127.0.0.1 line.
http://stackoverflow.com/questions/8633882/rabbitmq-on-ubuntu-10-04-server
The following worked for me on Fedora 16,
For example, suppose your hostname is 67714.
u
vim /etc/hosts
127.0.0.1 localhost.localdomain localhost YOUR-HOSTNAME
::1 localhost6.localdomain6 localhost6
service rabbitmq-server start
For example, suppose your hostname is 67714.
su
vim /etc/hosts
127.0.0.1 localhost.localdomain localhost 67714
::1 localhost6.localdomain6 localhost6
service rabbitmq-server start
Hope this helps.
Follow Us On Twitter »
|
Latest from the Zenoss Blog » | Community | Products | Services Resources | Customers Partners | About Us | ||
Copyright © 2005-2011 Zenoss, Inc.
|
||||||||