I have an Ubuntu machine that runs a Kubernetes cluster.
I constantly get "disk pressure" issues in various pods in that cluster.
To combat this issue, I've attached a volume/disk to that machine, formatted it, and mounted it in /media/whatever.
Unfortunately, it seems that the Kubernetes cluster is not utilizing the new disk space from the mounted volume.
My question is: how to get the Kubernetes cluster to utilize the new volume?
I don't mean to attach any volumes to individual pods, but to allow Kubernetes to use any available disk space freely.
I am aware that this question is a bit general and arises from a big gap in overall understanding of Kubernetes,
but I hope that you will still be kind enough to help me.
Related
While reading the document of AWS EFS, there is a thought coming into my mind: is it a good idea to mount a shared EFS to the EMR/Hadoop VMs during bootstrapping and use it as local disk for Spark jobs ?
It could take advantage of the performance of EFS
Maybe it could reduce the time for data transfer ?
Since all the VMs share a same EFS, is it possible to "tell" my Spark job that: hey, all the data you need to shuffle are already accessible by the target VMs, here is the path... (to you, the experts, to tell me). I think each Spark executor runs in its own "Yarn app space" which is private, maybe the other executors in other VMs can't access it ? If it is possible, it seems that it could save a lot of time for Spark shuffle.
Correct me if I am wrong and I'd like to listen to your opinions.
Thanks.
I have an image of size ~ 6GB, which is successfully built by Jenkins and pushed to ACR. When Kubernetes tries to pull the same, I get the ImagePullBackOff error. The same pull is tried for 3 times automatically, but during the third time, other pods were evicted with a message "DiskPressure True", "kubelet has disk pressure". Node has sufficient disk, sufficient memory, but I am not sure why DiskPressure is True. I don't have much experience in CI, would someone help in fixing this issue?
The configuration is as follows,
AKS Cluster
Kubernetes Version: 1.11.4
Cluster has 2 nodes with 4 Core Processor, 16GB RAM, and each node has 30GB Disk space.
Please let me know if further details are required.
in this case the error was due to insufficient space on the nodes. increasing the space on the disk fixed this error.
I am experiencing a very complicated issue with Kubernetes in my production environments losing all their Agent Nodes, they change from Ready to NotReady, all the pods change from Running to NodeLost state. I have discovered that Kubernetes is making intensive usage of disks:
My cluster is deployed using acs-engine 0.17.0 (and I tested previous versions too and the same happened).
On the other hand, we decided to deploy the Standard_DS2_VX VM series which contains Premium disks and we incresed the IOPS to 2000 (It was previously under 500 IOPS) and same thing happened. I am going to try with a higher number now.
Any help on this will be appreaciated.
It was a microservice exhauting resources and then Kubernetes just halt the nodes. We have worked on establishing resources/limits based so we can avoid the entire cluster disruption.
Trying to get the best performance from my application with as little setup as possible.
I'm struggling to find a consensus online of whether it would be better to use the Node cluster module in a Docker container, or to use a cluster of Docker instances instead.
OPINION: Node cluster first, then Docker cluster
OPINION: Don't use Node cluster in a Docker instance
Depends what "best performance" means? What is the bottleneck in your case? CPU? RAM? Network? Disk-I/O?
Advantages of a node cluster:
All communication is in memory.
Disadvantage
The solution doesn't scale beyond one host. If the host is overloaded, then so is your service
Advantages of a docker cluster:
high availability.
more network bandwidth, more resources as you have more hosts
Assuming you run your software as a service in docker anyway, I can't see the issue of "little setup as possible". Use both if it makes sense.
I'm currently trying to diagnose issues with a Postgres database that appears IO bound. The CPU is spending most of it's time in iowait but vmstat -d persistently shows 0 current outstanding ops for all mounted volumes. The volumes in question are EBS mounts. Anyone know if the outstanding ops stats for EBS mounts are just broken or have any idea what's going on here?
There is an excellent chance that you are suffering from noisy neighbors that saturate shared physical infrastructure.
To diagnose this, I would:
Create an EBS snapshot of your instance and your attached EBS mounts (you do have an EBS backed instance?)
Fire up a new copy of your server in a different availability zone
Retest
Note that firing up a new instance in the same availability zone could cause you to end up on the same hardware that is suffering from noisy neighbors.