keep pcs resources always running on all hosts - linux

Is there a way to configure resources with pcs command, that they will keep always up on all configured nodes? I'm asking this question because I could observe following behaviour in my 2 node setup:
For example a two node setup with two resources, flotating IP address and rsyslog:
node1 node2
VIP -
rsyslog(on) rsyslog(off)
The rsyslog resource is only running on the active node, which is having the VIP. The passive node shuts down the process of the rsyslog resource and is waiting until the "active" one is breaking to do a fail-over. As soon this happens it will start the process of the resource on the 2nd node.
But I want to have the process running always on both nodes at the same time, even though one is declared as passive.
For any reason my pacemaker/corosync cluster turns off the resource on node2. I want to have them turned on always on both nodes, as long there is no reason for a fail.

I understand you wants to run the resource on both nodes and virtual IP resource on one node.
Did you tried cloning your resource?
By cloning your resource and making VIP as primitive resource, you can run your resource on all nodes and virtual IP on one node at a time.
I hope it helped.

Related

Azure Kubernetes Failover with PersistentVolumes

I am currently testing how Azure Kubernetes handles failover for StatefulSets. I simulated a network partition by running sudo iptables -A INPUT -j DROP on one of my nodes, not perfect but good enough to test some things.
1). How can I reuse disks that are mounted to a failed node? Is there a way to manually release the disk and make it available to the rescheduled pod? It takes forever for the resources to be released after doing a force delete, sometimes this takes over an hour.
2). If I delete a node from the cluster all the resources are released after a certain amount of time. The problem is that in the Azure dashboard it still displays my cluster as using 3 nodes even if I have deleted one. Is there a way to manually add the deleted node back in or do I need to rebuild the cluster each time?
3). I most definitely do not want to use ReadWriteMany.
Basically what I want is for my StatefulSet pods to terminate and have the associated disks detach and then reschedule on a new node in the event of a network partition or a node failure. I know the pods will terminate in the event of a recovery from a network partition but I want control over the process myself or at least have it happen sooner.
Yes, just detach the disks manually from the portal (or powershell\cli\api\etc)
This is not supported, you should not do this. Scaling\Upgrading might fix this, but it might not
Okay, dont.

How to undo kubectl delete node

I have a k8s cluster on Azure created with asc-engine. It has 4 windows agent nodes.
Recently 2 of the nodes went into a not-ready state and remained there for over a day. In an attempt to correct the situation I did a "kubectl delete node" command on both of the not-ready nodes, thinking that they would simply be restarted in the same way that a pod that is part of a deployment is restarted.
No such luck. The nodes no longer appear in the "kubectl get nodes" list. The virtual machines that are backing the nodes are still there and still running. I tried restarting the VMs thinking that this might cause them to self register, but no luck.
How do I get the nodes back as part of the k8s cluster? Otherwise, how do I recover from this situation? Worse case I can simply throw away the entire cluster and recreate it, but I really would like to simply fix what I have.
You can delete the virtual machines and rerun your acs engine template, that should bring the nodes back (although, i didnt really test your exact scenario). Or you could simply create a new cluster, not that it takes a lot of time, since you just need to run your template.
There is no way of recovering from deletion of object in k8s. Pretty sure they are purged from etcd as soon as you delete them.

Does Kubernetes reschedule pods to other minions if the minions join latter?

I was running a test with initially a minion node and a master node. I created 5 pods on the cluster and later on 2 minion nodes joined the cluster.
So the problem I faced was that all the pods were only scheduled on the master and minion nodes. They were not re-scheduled to new nodes so as to divide the whole resources. Due to which my new minion nodes were just sitting idle and didn't do any processing.
Is there anything specially to be run to make this happen ?
Not really. The scheduler is called whenever something needs to be scheduled, so unless you deploy new replicas of the pod, the scheduler won't be bothered again.
Whenever you want to schedule something, like creating a Deployment or a Pod, the scheduler looks at the available resources to place the Pods where it thinks is best. Next time you schedule something, it will take into account the new minions added to the cluster. Or if your pods are created via a Deployment object, you could try deleting one Pod, so the ReplicationController will create a new Pod and the scheduler may choose one of the new minions.
The documentation also recommends creating a Service before creating a Deployment`, so the scheduler will spread the pods better among the minions.

StatefulSet: pods stuck in unknown state

I'm experimenting with Cassandra and Redis on Kubernetes, using the examples for v1.5.1.
With a Cassandra StatefulSet, if I shutdown a node without draining or deleting it via kubectl, that node's Pod stays around forever (at least over a week, anyway), without being moved to another node.
With Redis, even though the pod sticks around like with Cassandra, the sentinel service starts a new pod, so the number of functional pods is always maintained.
Is there a way to automatically move the Cassandra pod to another node, if a node goes down? Or do I have to drain or delete the node manually?
Please refer to the documentation here.
Kubernetes (versions 1.5 or newer) will not delete Pods just because a
Node is unreachable. The Pods running on an unreachable Node enter the
‘Terminating’ or ‘Unknown’ state after a timeout. Pods may also enter
these states when the user attempts graceful deletion of a Pod on an
unreachable Node. The only ways in which a Pod in such a state can be
removed from the apiserver are as follows:
The Node object is deleted (either by you, or by the Node Controller).
The kubelet on the unresponsive Node starts responding,
kills the Pod and removes the entry from the apiserver.
Force deletion of the Pod by the user.
This was a behavioral change introduced in kubernetes 1.5, which allows StatefulSet to prioritize safety.
There is no way to differentiate between the following cases:
The instance being shut down without the Node object being deleted.
A network partition is introduced between the Node in question and the kubernetes-master.
Both these cases are seen as the kubelet on a Node being unresponsive by the Kubernetes master. If in the second case, we were to quickly create a replacement pod on a different Node, we may violate the at-most-one semantics guaranteed by StatefulSet, and have multiple pods with the same identity running on different nodes. At worst, this could even lead to split brain and data loss when running Stateful applications.
On most cloud providers, when an instance is deleted, Kubernetes can figure out that the Node is also deleted, and hence let the StatefulSet pod be recreated elsewhere.
However, if you're running on-prem, this may not happen. It is recommended that you delete the Node object from kubernetes as you power it down, or have a reconciliation loop keeping the Kubernetes idea of Nodes in sync with the the actual nodes available.
Some more context is in the github issue.

DataStax commnity AMI installation doesn't join other nodes

I kicked off a 6 node cluster as per the documentation on http://docs.datastax.com/en/cassandra/2.1/cassandra/install/installAMILaunch.html. All worked ok. It's meant to be a 6 node cluster - I can see the 6 nodes working on EC dashborad. I can see OpsWork working on node 0. But the nodes are not seeing each other... I dont have access to OpsWork via browser but I can ssh to each node and verify cassandra is working.
What do I need to do so that they join the cluster. Note they all in the same VPC, same subnet in the same IP range with the same cluster name. All launched using the AMI specified in the document.
Any help will be much appreciated.
Hope your listen address is configured. Add the "auto_bootstrap": false attribute to each node, and restart each node. Check the logs too. That would be of great help.
In my situation, turning on broadcast address to public-ip caused a similar issue. Make the broadcast address your private-ip, or just leave it untouched. If broadcasts address is a must have, have your architect modify the firewall rules.

Resources