Docker doesn't kill containers on OOM - linux

I made two containers, both malloc in a loop until the server runs out of memory on a remote server running Debian 9 with enabled swap (4 GB RAM 1 GB swap). When running a single one (the host doesn't have any other services running, pretty much only dockerd), it gets killed in a minute or so, and everything is fine. Running 2/3 at the same time cause the server to lock out, making SSH unresponsive. Why don't these containers (I suppose they have really high OOM scores) get killed by OOM?

Related

Resident (RES) memory bloat in 1 of 15 sidekiq workers

I have a Rails app running on an Ubuntu EC2 instance.
It has 15 Sidekiq workers, and it appears that 1 of those Sidekiq processes always ends up consuming a lot of memory, even when it runs out of jobs to run.
After a fresh Sidekiq restart, the Sidekiq processes each use approx. 500MB of Resident memory. Over the course of a day, I'll end up with one of them up to about 2GB of Resident memory, even when I then go clear the Sidekiq queue.
This eventually results in the machine running out of memory, and restarting Sidekiq resolves the issue.
Is it normal for 1 of the Sidekiq processes to climb higher than the rest? If not, how can I figure out what's causing this?
Thanks

Node web app running in Fargate crashes under load with memory and CPU relatively untaxed

We are running a Koa web app in 5 Fargate containers. They are pretty straightforward crud/REST API's with Koa over Mongo Atlas. We started doing capacity testing, and noticed that the node servers started to slow down significantly with plenty of headroom left on CPU (sitting at 30%), Memory (sitting at or below 20%), and Mongo (still returning in < 10ms).
To further test this, we removed the Mongo operations and just hammered our health-check endpoints. We did see a lot of throughput, but significant degradation occurred at 25% CPU and Node actually crashed at 40% CPU.
Our fargate tasks (containers) are CPU:2048 (2 "virtual CPUs") and Memory 4096 (4 gigs).
We raised our ulimit nofile to 64000 and also set the max-old-space-size to 3.5 GB. This didn't result in a significant difference.
We also don't see significant latency in our load balancer.
My expectation is that CPU or memory would climb much higher before the system began experiencing issues.
Any ideas where a bottleneck might exist?
The main issue here was that we were running containers with 2 CPUs. Since Node only effectively uses 1 CPU, there was always a certain amount of CPU allocation that was never used. The ancillary overhead never got the container to 100%. So node would be overwhelmed on its 1 cpu while the other was basically idle. This resulted in our autoscaling alarms never getting triggered.
So adjusted to 1 cpu containers with more horizontal scale out (ie more instances).

Docker containers freezing

I'm currently trying to deploy a node.js app on docker containers. I need to deploy 30 of them but they begin to have a weird behavior at some point, some of them freeze.
I am currently running Docker version for windows 18.03.0-ce, build 0520e24302, my computer specs (cpu and memory):
I5 4670 K
24 GB of ram
My docker default machine resource allocation is the following :
Allocated RAM : 10 Gb
Allocated vCPUs : 4
My node application is running on apline3.8 and node.js 11.4 and mostly do http requests every 2-3 seconds.
When i try to deploy 20 containers everything is running like a charm, my application do the job and i can see that there is an activity on every on my containers through the logs, activity stats.
The problem comes when i try to deploy more containers, more than 20, i can notice that some of the previously deployed containers start to stop their activities (0% cpu using, logs freezing). When everything is deployed (30 containers), Docker start to block the activity of some of them and unblock them at some point to block some others (blocking/unblocking is random). It seems to be sequential. I tried to wait and see what happened and the result is that some of the containers are able to poursue their activities and some others are stuck forever (still running but no more activity).
It's important to notice that i used the following resources restrictions on each of my containers :
MemoryReservation : 160mb
Memory soft limit : 160mb
NanoCPUs : 250000000 (0.25 cpus)
I had to increase my docker default machine resource allocation and decrease container's ressource allocation because it was using almost 100% of my cpu, maybe i did a mistake in my configuration. I tried to tweak those values, but no success i still have some containers freezing.
I'm kind of lost right know.
Any help would be appreciated even a little one, thank you in advance !

Gremlin-Server takes too much memory and hangs

I'm using gremlin-server (v3.02), with titan-hbase. I'm using the default configuration settings.The server is 8GB memory and 4-cores.
After few hours of work, the server stops responding to queries requests..
It must be said that the requests intensity on the server is NOT high, pretty much low-medium (few requests per hour, maybe less than that).
When cheking gremlin's last server log messages, I see it's about Hbase session timeout, and retries to reconnect the hbase again.
The server CPU and memory are 90-100% at this point.
JDK 1.8.0_45-b14 64bit on Redhat
Using jstat -gc I can all its time is spent in GC, also oldgen is 100%.
I have set "-Xmx 8g" but vitual memory in htop goes up to 12g, with a few tests with xmx I see that virtual memory always gets about "-Xmx + 4g ".
Jmap -histo gives me about 2g of [B (Byte[]) with a gig for CacheRelation and gig for CacheVertex.
After a restarting the gremlin-server, everything is back to normal, and works again.
Any ideas?

Potential Memory leak in Suse

I've a SUSE server running tomcat with my web application (which has threads running in the backend to update database).
The server has 4GB RAM and tomcat is configured to use maximum of 1GB.
After running for few days, the free command shows that system has only 300MB free memory. Tomcat uses only 400MB and no other process seems to use unreasonable amount of memory.
Adding up the memory usage of all process (returned from ps aux command) shows only 2GB is in use.
Is there any way to identify if there is leak at system level?

Resources