How to configure RabbitMQ using Active/Passive High Availability architecture

I'm trying to setup a cluster of RabbitMQ servers, to get highly available queues using an active/passive server architecture. I'm following this guides:

  • http://www.rabbitmq.com/clustering.html
  • http://www.rabbitmq.com/ha.html
  • http://karlgrz.com/rabbitmq-highly-available-queues-and-clustering-using-amazon-ec2/
  • My requirement for high availability is simple, i have two nodes (CentOS 6.4) with RabbitMQ (v3.2) and Erlang R15B03. The Node1 must be the "active", responding all requests, and the Node2 must be the "passive" node that has all the queues and messages replicated (from Node1).

    To do that, i have configured the following:

  • Node1 with RabbitMQ working fine in non-cluster mode
  • Node2 with RabbitMQ working fine in non-cluster mode
  • The next I did was to create a cluster between both nodes: joining Node2 to Node1 (guide 1). After that I configured a policy to make mirroring of the queues (guide 2), replicating all the queues and messages among all the nodes in the cluster. This works, i can connect to any node and publish or consume message, while both nodes are available.

    The problem occurs when i have a queue "queueA" that was created on the Node1 (master on queueA), and when Node1 is stopped, I can't connect to the queueA in the Node2 to produce or consume messages, Node2 throws an error saying that Node1 is not accessible (I think that queueA is not replicated to Node2, and Node2 can't be promoted as master of queueA).

    The error is:

    {"The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=404, text="NOT_FOUND - home node 'rabbit@node1' of durable queue 'queueA' in vhost 'app01' is down or inaccessible", classId=50, methodId=10, cause="}

    The sequence of steps used is:

    Node1:

    1. rabbitmq-server -detached
    2. rabbitmqctl start_app
    

    Node2:

    3. Copy .erlang.cookie from Node1 to Node2
    4. rabbitmq-server -detached
    

    Join the cluster (Node2):

    5. rabbitmqctl stop_app
    6. rabbitmqctl join_cluster rabbit@node1
    7. rabbitmqctl start_app
    

    Configure Queue mirroring policy:

    8. rabbitmqctl set_policy ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'
    

    Note: The pattern used for queue names is "" (all queues).

    When I run 'rabbitmqctl list_policies' and 'rabbitmqctl cluster_status' is everything ok.

    Why the Node2 cannot respond if Node1 is unavailable? Is there something wrong in this setup?


    You haven't specified the virtual host (app01) in your set_policy call, thus the policy will only apply to the default virtual host (/). This command line should work:

    rabbitmqctl set_policy -p app01 ha-all "" '{"ha-mode":"all","ha-sync-mode":"automatic"}'
    

    In the web management console, is queueA listed as Node1 +1?

    It sounds like there might be some issue with your setup. I've got a set of vagrant boxes that are pre-configured to work in a cluster, might be worth trying that and identifying issues in your setup?


    Make sure that your queue is not durable or exclusive.

    From the documentation (https://www.rabbitmq.com/ha.html):

    Exclusive queues will be deleted when the connection that declared them is closed. For this reason, it is not useful for an exclusive queue to be mirrored (or durable for that matter) since when the node hosting it goes down, the connection will close and the queue will need to be deleted anyway.

    For this reason, exclusive queues are never mirrored (even if they match a policy stating that they should be). They are also never durable (even if declared as such).

    From your error message:

    {"The AMQP operation was interrupted: AMQP close-reason, initiated by Peer, code=404, text="NOT_FOUND - home node 'rabbit@node1' of durable queue 'queueA' in vhost 'app01' is down or inaccessible", classId=50, methodId=10, cause="}

    It looks like you created a durable queue.

    链接地址: http://www.djcxy.com/p/34170.html

    上一篇: 有关node.js&amqp的ETIMEDOUT问题

    下一篇: 如何使用主动/被动高可用性架构来配置RabbitMQ