Unable to locate the memory hog on openvz container - linux

i have a very odd issue on one of my openvz containers. The memory usage reported by top,htop,free and openvz tools seems to be ~4GB out of allocated 10GB.
when i list the processes by memory usage or use ps_mem.py script, i only get ~800MB of memory usage. Similarily, when i browse the process list in htop, i find myself unable to pinpoint the memory hogging offender.
There is definitely a process leaking ram in my container, but even when it hits critical levels and i stop everything in that container (except for ssh, init and shells) i cannot reclaim the ram back. Only restarting the container helps, otherwise the OOM starts kicking in in the container eventually.
I was under the assumption that leaky process releases all its ram when killed, and you can observe its misbehavior via top or similar tools.
If anyone has ever experienced behavior like this, i would be grateful for any hints. The container is running icinga2 (which i suspect for leaking ram) , although at most times the monitoring process sits idle, as it manages to execute all its scheduled checks in more than timely manner - so i'd expect the ram usage to drop at those times. It doesn't though.

I had a similar issue in the past and in the end it was solved by the hosting company where I had my openvz container. I think the best approach would be to open a support ticket to your hoster, explain them the problem and ask them to investigate. Maybe they use some outdated kernel version or they did changes on the server that have impact on your ovz container.

Related

Unable to Run Multiple Node Child Processes without Choking on DigitalOcean

I've been struggling to run multiple instances of Puppeteer on DigitalOcean for quite some time with little luck. I'm able to run ~5 concurrently using tools like puppeteer-cluster, but for some reason the whole thing just chokes with little helpful messaging. So, I switched to spawning ~5 child processes without any additional library -- just Puppeteer itself. Same issue. Chokes with no helpful errors.
I'm able to run all of these jobs just fine locally, but after I deploy, I hit these walls. So, my hunch is that it's a resource/performance issue, but I can't say for sure.
I'm running a droplet with 1GB and 3CPUs on Digital Ocean.
Basically, I'm just looking for ways to start troubleshooting something like this. is there a way I can know for sure that I'm hitting resource walls? I've tried pm2 and the DO dashboard graphs, but I feel like those are all leaving a lot of information out, or else I'm missing something else altogether.
Author of puppeteer-cluster here. You are right, 1 GB of memory is likely not enough for running 5 browser windows (or tabs) in addition to your operating system and maybe even other background tasks.
Here is a list of resources you should check:
Memory: Use a tool like htop to check your memory usage while your application is running.
CPU: Again, you can use htop for that, 3 vCPUs should be more than enough for 5 windows.
Disk space: Use a tool like df to check if there is enough space on the disk. I know of multiple cases in which there was not enough space on the disk (like some old kernels filling the disk), and Chrome needs at least some space to run.
Network throughput: Rarely the problem, but sometimes the network just does not have the bandwidth to support many open browser. Use a tool like nload to check the network throughput.
To use htop or nload, you start your script in the background (node script.js &) or use a terminal multiplexer (like tmux). Resource problems should then be easy to spot.
Most probably you're running out of memory, 5 puppeteer processes are a lot for a 1GB VM.
You can run
grep -i 'killed process' /var/log/messages
to confirm that the OOM killer terminated your processes.

How to does Linux manage VM allocation per process? OOM crash

I'm currently prototyping a very light weight TCP server based on a custom protocol. It's written in C++ and using Boost Asio for cross-platform sockets. When I monitor the process on Windows it only eats about <3MB in memory, barely grows with many concurrent connections (I tested up to 8).
I built the same server for Linux and put it on a 128MB + 64MB swap VPS for testing. It runs fine and my testings are successful, but the process gets killed in the middle of the night by kernel. I checked the logs and it was out of memory (OOM score was 0).
I highly doubt my process has memory leaks. I checked my server logs and only 1 person has connected to it the previous night, which should not result in OOM. The process sleeps for majority of the time as it only does processing if Boost's async handler wakes up the main thread to process the packet.
What I did notice is that the default VM allocation for the process is a whooping 89MB (using top command). And as soon as I make a connection it is doubled to about 151MB. My VPS has about 100MB free ram and all 64MB swap while running the server, so the the only thing I could think of is that the process tried to allocate more virtual memory going over the ~164MB remaining and went beyond the physical limit and triggered the OOM-Killer.
I've since used the ulimit command to limit the VM allocation to 30MB and it seems to be working fine, but I'll have to wait a while to see if it actually helps the issue.
My question is how does Linux determine how much VM to allocate for a process? Is there a compiler/linker setting I can use to reduce the default VM reservation? Is my reasoning correct or are there other reasons for the OOM?

Running Ubuntu with nothing installed uses 500 out of 512MB which process should I kill?

Running linux ubuntu 14.04 on a digitalOcean server which gives me 512MB ram. Surprisingly, when trying to run activator for a play app I came to realice that almost all the memory was used. Using 'htop' command I get this output. which process should I kill (I am using 2 ssh connections, one to monitor and the other one to do stuff).
I could also assign swap memory but that would affect performance. I thought 512MB should be more than enough to run a play server. I mean, seriously, we put a man on the moon with reaaaaly much less.
Linux makes as much use of memory as it can, but that doesn't mean that it's not available for your applications. It will use memory to cache certain things (such as files) and memory for buffers.
In your screenshot you'll see the memory usage bar is made of different coloured sections:
Green is memory in use
Blue is buffer
Yellow is cache
So generally any applications you run that require more memory will allocate it out of the memory used to cache data.
Having swap space is generally a good idea - it won't affect performance unless the kernel starts swapping heavily, but that's generally better than the alternative which is your applications will crash with an out-of-memory error.

do we need to disable swap for riak?

I just found in the riak documentation that the swap makes the server unresponsive so it has to be disabled.It is also given that Riak node be allowed to be killed by the kernel if it uses too much RAM. If swap is completely disabled, Riak will simply exit. I am confused should we have to disable the swap or not?
http://docs.basho.com/riak/latest/cookbooks/Linux-Performance-Tuning/
Swap Space
Due to the heavily I/O-focused profile of Riak, swap usage
can result in the entire server becoming unresponsive. Disable swap or
otherwise implement a solution for ensuring Riak's process pages are
not swapped.
Basho recommends that the Riak node be allowed to be killed by the
kernel if it uses too much RAM. If swap is completely disabled, Riak
will simply exit when it is unable to allocate more RAM and leave a
crash dump (named erl_crash.dump) in the /var/log/riak directory which
can be used for forensics (by Basho Client Services Engineers if you
are a customer).
So no, you don't have to ... but if you don't and you use all your available RAM the machine is likely to become unresponsive.
With any (unbounded) application that performs heavy I/O where you could exhaust your system's memory that's going to be the case. Typically you would have monitoring on the machine that warned you of memory usage going past a threshold.

Node.js memory leak causes system to lose available memory, even after node restart?

According to nodetime, my memory leak is persisting even through node application restarts. Check out the following "OS - Free Memory" graph; notice how the memory decreases steadily (despite the node app restarting dozens and dozens of times) until I restart the whole server:
How is this possible? Am I fundamentally misunderstanding something? I don't understand how a memory leak in one process could survive and continue to affect the OS...
Machine Info:
Amazon EC2 (m1.large) running CentOS
A memory leak in one process (that is actually killed) can't do this.
Are you using 3rd party systems to provided shared state? For example, a database, or something like redis for sessions? In that case, restarting your node process will just lead to reconnecting to the same shared state and continuing whatever leak was started initially.

Resources