How do I change system time (not timezone) for all containers deployed on Azure Kubernetes cluster?
Can this be changed from inside container / pods? I guess it should be changeable from host machine. How to do that?
I don't believe this is possible.
Time comes from the underlying kernel and that is not something that you will be able to adjust from code that runs in a pod.
Even if you could, I suspect it would cause a whole heap of trouble; the pod time and api-server time would be inconsistent and that won't end well!
Related
I'm running Azure AKS Cluster 1.15.11 with prometheus-operator 8.15.6 installed as a helm chart and I'm seeing some different metrics displayed by Kubernetes Dashboard compared to the ones provided by prometheus Grafana.
An application pod which is being monitored has three containers in it. Kubernetes-dashboard shows that the memory consumption for this pod is ~250MB, standard prometheus-operator dashboard is displaying almost exactly double value for the memory consumption ~500MB.
At first we thought that there might be some misconfiguration on our monitoring setup. Since prometheus-operator is installed as standard helm chart, Daemon Set for node exporter ensures that every node has exactly one exporter deployed so duplicate exporters shouldn't be the reason. However, after migrating our cluster to different node pools I've noticed that when our application is running on user node pool instead of system node pool metrics does match exactly on both tools. I know that system node pool is running CoreDNS and tunnelfront but I assume these are running as separate components also I'm aware that overall it's not the best choice to run infrastructure and applications in the same node pool.
However, I'm still wondering why running application under system node pool causes metrics by prometheus to be doubled?
I ran into a similar problem (aks v1.14.6, prometheus-operator v0.38.1) where all my values were multiplied by a factor of 3. Turns out you have to remember to remove the extra endpoints called prometheus-operator-kubelet that are created in the kube-system-namespace during install before you remove / reinstall prometheus-operator since Prometheus aggregates the metric types collected for each endpoint.
Log in to the Prometheus-pod and check the status page. There should be as many endpoints as there are nodes in the cluster, otherwise you may have a surplus of endpoints:
I am currently testing how Azure Kubernetes handles failover for StatefulSets. I simulated a network partition by running sudo iptables -A INPUT -j DROP on one of my nodes, not perfect but good enough to test some things.
1). How can I reuse disks that are mounted to a failed node? Is there a way to manually release the disk and make it available to the rescheduled pod? It takes forever for the resources to be released after doing a force delete, sometimes this takes over an hour.
2). If I delete a node from the cluster all the resources are released after a certain amount of time. The problem is that in the Azure dashboard it still displays my cluster as using 3 nodes even if I have deleted one. Is there a way to manually add the deleted node back in or do I need to rebuild the cluster each time?
3). I most definitely do not want to use ReadWriteMany.
Basically what I want is for my StatefulSet pods to terminate and have the associated disks detach and then reschedule on a new node in the event of a network partition or a node failure. I know the pods will terminate in the event of a recovery from a network partition but I want control over the process myself or at least have it happen sooner.
Yes, just detach the disks manually from the portal (or powershell\cli\api\etc)
This is not supported, you should not do this. Scaling\Upgrading might fix this, but it might not
Okay, dont.
I am experiencing a very complicated issue with Kubernetes in my production environments losing all their Agent Nodes, they change from Ready to NotReady, all the pods change from Running to NodeLost state. I have discovered that Kubernetes is making intensive usage of disks:
My cluster is deployed using acs-engine 0.17.0 (and I tested previous versions too and the same happened).
On the other hand, we decided to deploy the Standard_DS2_VX VM series which contains Premium disks and we incresed the IOPS to 2000 (It was previously under 500 IOPS) and same thing happened. I am going to try with a higher number now.
Any help on this will be appreaciated.
It was a microservice exhauting resources and then Kubernetes just halt the nodes. We have worked on establishing resources/limits based so we can avoid the entire cluster disruption.
I have a Kubernetes cluster. Provisioned with kops, running on CoreOS workers. From time to time I see a significant load spikes, that correlate with I/O spikes reported in Prometheus from node_disk_io_time_ms metric. The thing is, I seem to be unable to use any metric to pinpoint where this I/O workload actually originates from. Metrics like container_fs_* seem to be useless as I always get zero values for actual containers, and any data only for whole node.
Any hints on how can I approach the issue of locating what is to be blamed for I/O load in kube cluster / coreos node very welcome
If you are using nginx ingress you can configure it with
enable-vts-status: "true"
This will give you a bunch of prometheus metrics for each pod that has on ingress. The metric names start with nginx_upstream_
In case it is the cronjob creating the spikes, install node-exporter daemonset and check the metrics container_fs_
I'm experimenting with Cassandra and Redis on Kubernetes, using the examples for v1.5.1.
With a Cassandra StatefulSet, if I shutdown a node without draining or deleting it via kubectl, that node's Pod stays around forever (at least over a week, anyway), without being moved to another node.
With Redis, even though the pod sticks around like with Cassandra, the sentinel service starts a new pod, so the number of functional pods is always maintained.
Is there a way to automatically move the Cassandra pod to another node, if a node goes down? Or do I have to drain or delete the node manually?
Please refer to the documentation here.
Kubernetes (versions 1.5 or newer) will not delete Pods just because a
Node is unreachable. The Pods running on an unreachable Node enter the
‘Terminating’ or ‘Unknown’ state after a timeout. Pods may also enter
these states when the user attempts graceful deletion of a Pod on an
unreachable Node. The only ways in which a Pod in such a state can be
removed from the apiserver are as follows:
The Node object is deleted (either by you, or by the Node Controller).
The kubelet on the unresponsive Node starts responding,
kills the Pod and removes the entry from the apiserver.
Force deletion of the Pod by the user.
This was a behavioral change introduced in kubernetes 1.5, which allows StatefulSet to prioritize safety.
There is no way to differentiate between the following cases:
The instance being shut down without the Node object being deleted.
A network partition is introduced between the Node in question and the kubernetes-master.
Both these cases are seen as the kubelet on a Node being unresponsive by the Kubernetes master. If in the second case, we were to quickly create a replacement pod on a different Node, we may violate the at-most-one semantics guaranteed by StatefulSet, and have multiple pods with the same identity running on different nodes. At worst, this could even lead to split brain and data loss when running Stateful applications.
On most cloud providers, when an instance is deleted, Kubernetes can figure out that the Node is also deleted, and hence let the StatefulSet pod be recreated elsewhere.
However, if you're running on-prem, this may not happen. It is recommended that you delete the Node object from kubernetes as you power it down, or have a reconciliation loop keeping the Kubernetes idea of Nodes in sync with the the actual nodes available.
Some more context is in the github issue.