I have a Kubernetes cluster setup with 3 worker nodes and a master node. Using docker images to generate pods and altering the working containers. I have been working on one image and have altered it heavily to make the servers within work. The problem I am facing is -
There is no space on the device left.
On further investigation, I found the docker container is set at a 10G size limit which now, I would definitely want to change. How can I change that without losing all my changes in the container and without the need of storing the changes as a separate image altogether?
Changing the limit without a restart is impossible.
To prevent data loss during the container restart, you can use Volumes and store data there, instead of the root image.
P.S. It is not possible to mount Volume dynamically to a container without restart.
Related
I have ran a docker container locally and it stores data in a file (currently no volume is mounted). I stored some data using the API. After that I failed the container using process.exit(1) and started the container again. The previously stored data in the container survives (as expected). But when I do this same thing in Kubernetes (minikube) the data is lost.
Posting this as a community wiki for better visibility, feel free to edit and expand it.
As described in comments, kubernetes replaces failed containers with new (identical) ones and this explain why container's filesystem will be clean.
Also as said containers should be stateless. There are different options how to run different applications and take care about its data:
Run a stateless application using a Deployment
Run a stateful application either as a single instance or as a replicated set
Run automated tasks with a CronJob
Useful links:
Kubernetes workloads
Pod lifecycle
I have a container image prefetched on the Azure batch account pool, but I notice that the pool doesn't auto refresh the container when a new image is uploaded to the container registry, even though I have asked Batch to cache the latest container container (:latest).
Is there any good way to do this (other than deleting and recreating the pool)? The pool machines are currently managed by batch; I don't have access to it. Will refreshing the pool details prefetch the new container?
Container images are loaded on to the machine at the time of the request, e.g., if specified at the pool level, then they are loaded when the node starts up. If it is specified as part of the task, then it is "pulled" as part of the implicit docker run when the task executes. Subsequent task executions would simulate docker run and do not download images if the container image is found locally as per what you would expect if running locally.
There are a few workarounds:
Delete or resize down the pool to zero nodes and scale back up to pick up new images.
Have a rolling pool strategy where you spin up a new pool with the same image reference and autoscale/drain down the old pool and redirect work to the new pool. As an aside, this is generally a good strategy to have regardless of containers or not to pick up latest Azure Batch Node Agent versions.
Create a job preparation task such that you explicitly issue a docker pull of the image when a task within the job referencing the container image will re-pull the image on first run of the job on a node. Note that you'd have to delete the job and recreate it for every instance you update your container.
Create a script that remotes into machines and executes an image refresh command across all nodes.
I am currently testing how Azure Kubernetes handles failover for StatefulSets. I simulated a network partition by running sudo iptables -A INPUT -j DROP on one of my nodes, not perfect but good enough to test some things.
1). How can I reuse disks that are mounted to a failed node? Is there a way to manually release the disk and make it available to the rescheduled pod? It takes forever for the resources to be released after doing a force delete, sometimes this takes over an hour.
2). If I delete a node from the cluster all the resources are released after a certain amount of time. The problem is that in the Azure dashboard it still displays my cluster as using 3 nodes even if I have deleted one. Is there a way to manually add the deleted node back in or do I need to rebuild the cluster each time?
3). I most definitely do not want to use ReadWriteMany.
Basically what I want is for my StatefulSet pods to terminate and have the associated disks detach and then reschedule on a new node in the event of a network partition or a node failure. I know the pods will terminate in the event of a recovery from a network partition but I want control over the process myself or at least have it happen sooner.
Yes, just detach the disks manually from the portal (or powershell\cli\api\etc)
This is not supported, you should not do this. Scaling\Upgrading might fix this, but it might not
Okay, dont.
How do I change system time (not timezone) for all containers deployed on Azure Kubernetes cluster?
Can this be changed from inside container / pods? I guess it should be changeable from host machine. How to do that?
I don't believe this is possible.
Time comes from the underlying kernel and that is not something that you will be able to adjust from code that runs in a pod.
Even if you could, I suspect it would cause a whole heap of trouble; the pod time and api-server time would be inconsistent and that won't end well!
I have a k8s cluster on Azure created with asc-engine. It has 4 windows agent nodes.
Recently 2 of the nodes went into a not-ready state and remained there for over a day. In an attempt to correct the situation I did a "kubectl delete node" command on both of the not-ready nodes, thinking that they would simply be restarted in the same way that a pod that is part of a deployment is restarted.
No such luck. The nodes no longer appear in the "kubectl get nodes" list. The virtual machines that are backing the nodes are still there and still running. I tried restarting the VMs thinking that this might cause them to self register, but no luck.
How do I get the nodes back as part of the k8s cluster? Otherwise, how do I recover from this situation? Worse case I can simply throw away the entire cluster and recreate it, but I really would like to simply fix what I have.
You can delete the virtual machines and rerun your acs engine template, that should bring the nodes back (although, i didnt really test your exact scenario). Or you could simply create a new cluster, not that it takes a lot of time, since you just need to run your template.
There is no way of recovering from deletion of object in k8s. Pretty sure they are purged from etcd as soon as you delete them.