Docker containers freezing

Docker containers freezing - node.js

I'm currently trying to deploy a node.js app on docker containers. I need to deploy 30 of them but they begin to have a weird behavior at some point, some of them freeze.
I am currently running Docker version for windows 18.03.0-ce, build 0520e24302, my computer specs (cpu and memory):
I5 4670 K
24 GB of ram
My docker default machine resource allocation is the following :
Allocated RAM : 10 Gb
Allocated vCPUs : 4
My node application is running on apline3.8 and node.js 11.4 and mostly do http requests every 2-3 seconds.
When i try to deploy 20 containers everything is running like a charm, my application do the job and i can see that there is an activity on every on my containers through the logs, activity stats.
The problem comes when i try to deploy more containers, more than 20, i can notice that some of the previously deployed containers start to stop their activities (0% cpu using, logs freezing). When everything is deployed (30 containers), Docker start to block the activity of some of them and unblock them at some point to block some others (blocking/unblocking is random). It seems to be sequential. I tried to wait and see what happened and the result is that some of the containers are able to poursue their activities and some others are stuck forever (still running but no more activity).
It's important to notice that i used the following resources restrictions on each of my containers :
MemoryReservation : 160mb
Memory soft limit : 160mb
NanoCPUs : 250000000 (0.25 cpus)
I had to increase my docker default machine resource allocation and decrease container's ressource allocation because it was using almost 100% of my cpu, maybe i did a mistake in my configuration. I tried to tweak those values, but no success i still have some containers freezing.
I'm kind of lost right know.
Any help would be appreciated even a little one, thank you in advance !

Related

Docker doesn't kill containers on OOM

I made two containers, both malloc in a loop until the server runs out of memory on a remote server running Debian 9 with enabled swap (4 GB RAM 1 GB swap). When running a single one (the host doesn't have any other services running, pretty much only dockerd), it gets killed in a minute or so, and everything is fine. Running 2/3 at the same time cause the server to lock out, making SSH unresponsive. Why don't these containers (I suppose they have really high OOM scores) get killed by OOM?

Is there a way to set the available resources of a docker container system using the docker container limit?

I am currently working on a Kubernetes cluster, which uses docker.
This cluster allows me to launch jobs. For each job, I specify a memory request and a memory limit.
The memory limit will be used by Kubernetes to fill the --memory option of the docker run command when creating the container. If this container exceeds this limit it will be killed for OOM reason.
Now, If I go inside a container, I am a little bit surprised to see that the available system memory is not the one from the --memory option, but the one from the docker machine. (The Kubernetes Node)
I am surprised because a system with wrong information about available resources will not behave correctly.
Take for example the memory cache used by IO operations. If you write on disk, pages will be cached on the RAM before being written. To do this the system will evaluate how many pages could be cached using the sysctl vm.dirty_ratio (20 % by default) and the memory size of the system. But how this could work if the container system memory size is wrong.
I verified it:
I ran a program with a lot of IO operations (os.write, decompression, ...) on a container limited at 10Gi of RAM, on a 180Gi Node. The container will be killed because it will reach the 10Gi memory limit. This OOM is caused by the wrong evaluation of dirty_ratio * the system memory.
This is terrible.
So, my question is the following:
Is there a way to set the available resources of a docker container system using the docker container limit?

Buffer/cache exhaustion Spark standalone inside a Docker container

I have a very weird memory issue (which is what a lot of people will most
likely say ;-)) with Spark running in standalone mode inside a Docker
container. Our setup is as follows: We have a Docker container in which we have a Spring boot application that runs Spark in standalone mode. This Spring boot app also contains a few scheduled tasks (managed by Spring). These tasks trigger Spark jobs. The Spark jobs scrape a SQL database, shuffles the data a bit and then writes the results to a different SQL table (writing the results doesn't go through Spark). Our current data set is very small (the table contains a few million rows).
The problem is that the Docker host (a CentOS VM) that runs the Docker
container crashes after a while because the memory gets exhausted. I currently have limited the Spark memory usage to 512M (I have set both executor and driver memory) and in the Spark UI I can see that the largest job only takes about 10 MB of memory. I know that Spark runs best if it has 8GB of memory or more available. I have tried that as well but the results are the same.
After digging a bit further I noticed that Spark eats up all the buffer / cache memory on the machine. After clearing this manually by forcing Linux to drop caches (echo 2 > /proc/sys/vm/drop_caches) (clearing the dentries and inodes) the cache usage drops considerably but if I don't keep doing this regularly I see that the cache usage slowly keeps going up until all memory is used in buffer/cache.
Does anyone have an idea what I might be doing wrong / what is going on here?
Big thanks in advance for any help!

Container crashes after one hour due to OOM

i'm running spark using docker on DC/OS. When i submit the spark jobs, using the following memory configurations
Driver 2 Gb
Executor 2 Gb
Number of executors are 3.
The spark submit works fine, after 1 hour the docker container(worker container) crashes due to OOM (exit code 137). but my spark logs shows that 1Gb+ of memory is available.
The strange thing is the same jar which is running in the container , runs normally for almost 20+ hours in the standalone mode.
Is it the normal behaviour of the Spark contianers, or is there Something im doing wrong.Or are there any extra configuraton do I need to use for the docker container.
Thanks

It looks like I have a similar issue. Have you looked at the cache/buffer memory usage on the OS?
Using the command below you can get some info on the type of memory usage on the OS:
free -h
In my case the buffer / cache kept on growing until there was no more memory available in the Container. In my case the VM was a CentOS machine running on AWS and it crashed entirely when this happened.

Is your spark calling REST end point, if yes, try closing connections

ELK stack performance tuning

I am new to ELK stack, i just installed it to give it a test drive for our production systems logs management and started pushing logs(IIS & Event) from 10 Windows VMs using nxlog.
After the installation, I am receiving around 25K hits/15 minutes as per my Kibana dashboard. The size of /var/lib/elasticsearch/ has been increased to around 15GBs in just 4 days.
I am facing serious performance issues, Elasticsearch process is eating up all my CPU and around 90% of memory.
Elasticsearch service was stuck previously and /etc/init.d/elasticsearch stop/start/restart wasn't even working. The process was running even after trying to kill it with kill command. A system reboot also took the machine to same condition. I just deleted all the indices with curl command and now i am able to restart Elasticsearch.
I am using a standard A3 Azure instance(7GB RAM 4 cores) for this ELK setup.
Please guide me to tune my ELK stack to achieve good performance. Thanks.

your are using 7GB RAM your jvm heap size for elasticsearch should be less than 3.5GB
for more information you can read elasticsearch heap sizing

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string