I'm testing the master-slave functionality of amazonMQ and I would like to trigger a failover under load and ensure that all messages sent are received by my subscribers.
I can't see any options in the gui or cli to trigger a failover scenario. I have tried rebooting the broker but this affects both nodes. Predictably this caused my test to fail as the standby broker was not available when my clients tried to reconnect.
There is some comfort in that my clients did try and reconnect to the standby broker immediately but I still can't be sure that all messages would get through if a failover scenario would occur.
Is this possible at all?
Related
I'm investigating a tech for our cluster. Pulsar looks good, but the usage looks more like a queueing system. Of course, queueing system is good to have, but I have a specific requirement: broadcasting.
We would like to use one machine to generate the data and publish it to a Pulsar topic. Then we use a group of servers, forming a replica. Each server consumes the message flow on that topic, and serves clients via WebSocket.
This is different than the Shared subscription, because each server needs to receive all messages, not a fraction of it.
I came to this post: https://kafkaesque.io/subscriptions-multiple-groups-of-consumers-on-pulsar-topic/ , which explains how to do such a job: each server needs create a new exclusive subscription, say use a UUID as its subscription name, from the unique exclusive subscription you can get the full message flow of that topic.
But since our server replica can be dynamic, so once some of the server restart, they will create new UUID subscription again, which will leave many orphan subscriptions on the topic, which eventually would become maintenance headache.
Anyone has the experience to setup a broadcast use case using Pulsar?
Actually, I found that the "Reader Interface" is exactly for this kind of use case:
https://pulsar.apache.org/docs/en/concepts-clients/#reader-interface
Using an exclusive subscription for each consumer is the only way to ensure that each of your consumers receives ALL of the messages on the topic, and Pulsar handles multiple subscriptions quite well.
The issue it seems is the server restart use case, and I don't think that simply connecting with a new UUID subscription is the right approach (putting aside the orphaned subscriptions). You really want to have the server reuse the previous subscription after it restarts. This is because each subscription keeps track of the last message in the topic that it had processed and acknowledged, so you can pick up exactly where you had left off before the server crashed if you reconnect with the same subscription UUID. If you connect with a new UUID, then you will start processing messages produced from that point in time forward, and all messages produced during the restart period will be "lost"
Therefore, you will need to find a mechanism to share these UUIDs across server failures and return them to the restarting server. One approach would be to have a mechanism similar to zookeeper leader election, in which each server is granted an exclusive lease that expires periodically. The server must then periodically refresh the lease to retain it. Then if the server were to crash, it would fail to refresh the lease on that UUID and the restarting server would then be granted the lease when it attempts to reconnect.
See https://curator.apache.org/curator-recipes/leader-election.html for a better explanation of the pattern.
I have my network running with 3 machines, each one with:
1 orderer
1 peer
1 ca
1 Node.Js client
They are deployed on AWS and a load balancer correctly distributes the requests to the 3 different clients.
The clients are not using discovery service, it is disabled.
Client1 only contacts orderer1 and peer1 and ca1, and so on and so forth for the others.
I want to try the high availability of Hyperledger so when I am inserting data I shutdown a machine, let's suppose machine1, and others should continue the execution.
What happens is that while the machine is down, the network stops the execution. The clients are not moving at all (they do not crash, just stop).
When I bring up the machine again, I see errors coming but it continues the execution now.
It seems like there are calls to machine 1 suspended but they recover as soon as the machine is up.
What I want is that if machine1 goes down, the requests to it are rejected and machine 2-3 continue the execution.
How to obtain it?
[EDIT] Additional information: I have inserted some logs in the client, especially in my endpoint for creation of transactions. Like this:
console.log('Starting Creation')
await contract.submitTransaction(example)
console.log('Creation done')
res.send(200)
Let me also say that this rows are encapsulated in an error handler, so that if any error occurs, I encapsulate the error.
But I get no error, I just get the first print done and the submitTransaction working for a lot of time, never receiving answers.
It seems like it tries to deliver request to orderer but orderer is not online.
When I bring down an orderer with docker service scale orderer1=0 (since I am using services with docker swarm), the orderer leader knows in the logs that he went offline. Also, if I bring the orderer up again, a new election starts.
This seems correct, in fact the problem only happens when I shutdown the machine, closing the connection in a non-friendly way.
Problem
We are developing a Azure Service Bus based Cloud Service, but after 24 hours the queue clients seem to get closed automatically.
Can someone confirm this behavior or give advise how to fix it?
At the moment we close the clients after 24 hours manually and recreate them to avoid this effect, but this can't be the only solution.
Sessions dropping intermittently is a normal occurrence. The AMQP protocol and stack in the client is newer and generally more resilient against this. The only reason not to use AMQP is if you are using transactions. Also, unless you have a good reason to run your own receive loop, use OnMessage.
You are getting ‘OperationCanceledException’ when the link fails for any reason and any in-flight requests will fail with this exception. However, this is transient, so you should be able to reuse the same QueueClient to issue receives and those should (eventually) work as the client recovers. OnMessage will hide all of that from you.
I receive messages from a RabbitMQ broker.
How do I issue an ack or nack, inside a Spark action (like foreach/foreachPartition), to retry message processing at a later time or just discard it?
I can't just pass along the deliveryTag, connect to rabbit inside an action and send the ack, since the deliveryTag is bound to a particular channel.
Spark tasks typically run on remote nodes. So all objects on the context that a task interacts with should be either private to the task or shared variables. RabbitMQ connection objects (any sort of connection, actually) established on the driver node will not be carried to remote nodes. Therefore in order to send ack and noack to RabbitMQ you need to do it outside of tasks, unless you are running everything on the driver node.
In short, try to find a way to signal message consumption failures back to the driver node and have the driver node send all acks and noacks.
For example, if there's a network outage and your producer loses connection to your RabbitMQ broke, how can you prevent messages from being black holed that need to be queued up? I have a few ideas one of them being to write all your messages to a local db and remove them once they're acked and periodically resend after some time period, but that only works if your connection factory is set to have the publisher confirm.
I'm just generating messages from my test application to simulate event logging. I'm essentially trying to create a durable producer. Is there a way to detect when you can reconnect to RabbitMQ also? I see there's a ConnectionListener interface, but it seems you cannot send messages to flush an internal queue in the ConnectionListener.
If you have a SimpleMessageListenerContainer (perhaps listening to a dummy queue) it will keep trying to reconnect (and fire the connection listener when successful). Or you can have a simple looper that calls createConnection() on the connection factory from time-to-time (it won't create a new connection each time, just return the single shared connection - if open); this will also fire the listener when a new connection is made.
You can use transactions instead of publisher confirms - but they're much slower due to the handshake. It depends on what your performance requirements are.