I am trying to set up a collection of docker containers and trying to emulate a cluster election process, by using etcd to store the current leader hostname with a ttl of 5 mins. I'm running a looped shell script inside every container which sleeps for 5min + random 1-9 sec, and try to create the leader key, which fails if there is an existing value, or sets a new value and the successful container becomes the new leader.
My immediate concern is, are etcdctl connections to the etcd host atomic ? What if by chance two containers connect at the same time ? And also I would appreciate suggestions on a more elegant way to implement the whole scenario.
Related
I have setup redis on three seperate instances and have configured them in such a way that 1 instance is a master and 2 are replicas of master. I have used sentinels to make sure there is high availability of the setup. I have a nodejs application which needs to use the redis. How do i achieve the read and write splitting in my application as incase my redis master goes down one of my read replica becomes the master and the writes need to go to it.
As far has I know, ioredis is the only node redis client that supports sentinels.
"ioredis guarantees that the node you connected to is always a master even after a failover. When a failover happens, instead of trying to reconnect to the failed node (which will be demoted to slave when it's available again), ioredis will ask sentinels for the new master node and connect to it. All commands sent during the failover are queued and will be executed when the new connection is established so that none of the commands will be lost."
I've been looking into using a local entry listener instead of a normal entry listener so that an event is only processes by a single listener.
I've found various posts on this topic such as this, this, this, this and this. It seems that a local entry listener is indeed the way to go for handling an event only once in a multi node cluster.
However, I'm not sure how such a local entry listener would function under failure conditions. For instance, what happens to an evicted-event if the node which is the master for that entry is unavailable. Will the backup pick this up in time? Or could the event be missed due to hazelcast needing some time to figure out the master is down and a new master should be elected? Is this different between the older AP-system and the new CP-subsystem?
We've refrained from using a local entry listener. Instead we are now using the executorservice from hazelcast to schedule a named task. In this way, we can correctly respond to changes in the cluster. It does seem like hazelcast has a preferred member on which a task is executed, but that isn't an issue for us.
From Hazelcast docs:
Note that entries in distributed map are partitioned across the
cluster members; each member owns and manages the some portion of the
entries. Owned entries are called local entries. This listener will be
listening for the events of local entries. Let's say your cluster has
member1 and member2. On member2 you added a local listener and from
member1, you call {#code map.put(key2, value2)}. If the key2 is owned
by member2 then the local listener will be notified for the add/update
event. Also note that entries can migrate to other nodes for load
balancing and/or membership change.
The key part is: "Also note that entries can migrate to other nodes for load balancing and/or membership change.”
I guess that if an original partition owner fails, then some other node will become a new owner of those entries (or part of them, depending on cluster state after the repartitioning is done) and then it, the new owner, will run local entry listener.
I am using Celery, Redis as both the message broker and as a result backend. I am unclear about the task expiration thing or KEY expiration in Redis.
Just to be clear, when Celery client initiate the task it generates a unique task_id "celery-task-meta-64h4378-875ug43-6523g" this is what it stores in Redis as a KEY (Just example) for each task and put it in the message queue, Celery workers will then checks the message queue and execute the tasks based on the number of workers in place. If worker finished the task and mark the task as SUCCESS/FAILURE it won’t change it to PENDING or any other state.
The Celery docs say that the expiration time corresponds to time after "publishing" the task, but I couldn't find any info of what "publishing" actually means.
I know that celery stores the task as a Redis Key and has a default expiration of 1 day (86400 seconds). In my case once after the task is created and stored in Redis as a KEY it takes more time for the worker to execute the task and update the result for that task whether it may be a Success/Failure.
Question #1: Regarding Redis key expiration time..Is that 1 day default time which celery is creating counts right from the time Key is created or after the task result is updated to the key by worker (I mean key created in Redis -> worker started that task -> worker finished and updated the task (Key in redis))..?
My only concern is after celery created the new task, worker then started executing that task and takes more than one day to finish that task (worst possible case..If we have more and more number of tasks created) and in mean time if the KEY expires in Redis...Then what to do in these cases..?
Quick solution is to Increase the redis Key expiration time to more than one day :)
Question #2: Is going to RabbitMq instead of Redis in the above scenario a good choice..? In this case we can store the result in the persistent db and we don't have to worry about the expiration time and also Redis In-Memory cache fill-up cases.
Please correct me if I am worng in understanding something in the above mentioned points. Any feedback/help on this will be greatly appreciated :)
Question #1: The expire time that you have referenced in the link is from the time that the call to apply_async or delay is made.
Question #2: Either one is a good choice. Redis is slightly less reliable, but much easier to configure than rabbitmq. That said, using rabbitmq as your broker is by far the most popular choice that most devs make. YMMV.
I had following deployment of sentinel - 3 redis instances on different servers, 3 sentinels on each of these servers.
Now, I realized that the current master does not have much memory, so I stopped sentinel and redis instance on this particular server. And did the same setup on a new machine. SO, still I have the same deployment, 3 redis instances and 3 sentinels.
The issue is that, now sentinels are saying, master is down, as they think the master is the server which I removed. What should I do to tell sentinel that it need not include that server in loop.
From the docs about Redis Sentinel, under the chapter Adding or removing Sentinels:
Removing a Sentinel is a bit more complex: Sentinels never forget already seen Sentinels, even if they are not reachable for a long time, since we don't want to dynamically change the majority needed to authorize a failover and the creation of a new configuration number. So in order to remove a Sentinel the following steps should be performed in absence of network partitions:
Stop the Sentinel process of the Sentinel you want to remove.
Send a SENTINEL RESET * command to all the other Sentinel instances (instead of * you can use the exact master name if you want to reset just a single master). One after the other, waiting at least 30 seconds between instances.
Check that all the Sentinels agree about the number of Sentinels currently active, by inspecting the output of SENTINEL MASTER mastername of every Sentinel.
Further:
Removing the old master or unreachable slaves.
Sentinels never forget about slaves of a given master, even when they are unreachable for a long time. This is useful, because Sentinels should be able to correctly reconfigure a returning slave after a network partition or a failure event.
Moreover, after a failover, the failed over master is virtually added as a slave of the new master, this way it will be reconfigured to replicate with the new master as soon as it will be available again.
However sometimes you want to remove a slave (that may be the old master) forever from the list of slaves monitored by Sentinels.
In order to do this, you need to send a SENTINEL RESET mastername command to all the Sentinels: they'll refresh the list of slaves within the next 10 seconds, only adding the ones listed as correctly replicating from the current master INFO output.
I might be misunderstanding something here, as it's not clear to me how I should connect to a Cassandra cluster. I have a Cassandra 1.2.1 cluster of 5 nodes managed by Priam, on AWS. i would like to use Astyanax to connect to this cluster by using a code similar to the code bellow:
conPool = new ConnectionPoolConfigurationImpl(getConecPoolName()) .setMaxConnsPerHost(CONNECTION_POOL_SIZE_PER_HOST).setSeeds(MY_IP_SEEDS)
.setMaxOperationsPerConnection(100) // 10000
What should I use as MY_IP_SEEDS? Should I use the IPs of all my nodes split by comma? Or should I use the IP of just 1 machine (the seed machine)? If I use the ip of just one machine, I am worried about overloading this machine with too many requests.
I know Priam has the "get_seeds" REST api (https://github.com/Netflix/Priam/wiki/REST-API) that for each node returns a list of IPs and I also know there is one seed per RAC. However, I am not sure what would happen if the seed node gets down... I would need to connect to others when trying to make new connections, right?
Seed nodes are only for finding the way into the cluster on node startup - no overload problems.
Of course one of the nodes must be reachable and up in the cluster to get the new one up and running.
So the best way is to update the seed list from Priam before starting the node. Priam should be behind an automatically updated DNS entry.
If you're highest availability you should regularly store the current list of seeds from Priam and store them in a mirrored fashion just as you store your puppet or chef config to be able to get nodes up even when Priam isn't reachable.