CouchDB Memory Leak - Performance Issue

CouchDB Memory Leak - Performance Issue - memory-leaks

I am experiencing performance issue with CouchDB.
Use Case: I am working on a process that retrieves the data from RDBMS and process them into JSON document and POST them to the CouchDB.
I am trying to POST around half a million documents, most of them in batches (_bulk_doc) of 10,000 and have tried with batch of 5,000, 15,000, and 20,000.
Whole process takes around 90-100 minutes.
During the life of the process, Memory Consumption by CouchDB keeps on growing and memory is not released when CouchDB has finished working.
So if the memory consumption by CouchDB was 60% at the time process finishes, memory consumption will remain 60% and not reducing.
Subsequently, when the process starts running again. memory consumption is Maxed out and CouchDB restarts itself. This restart fails the process that I am running. Looking at the Syslogs , I see Out Of Memory Error by the CouchDB process and killing statement.
The CouchDb process that has the issue is the "beam.smp" of Erlang.
At this point, I have tried upgrading the memory of the server to see if this resolves the issue, unfortunately, the issue persists. Memory Leak is there and Usage keeps on growing until CouchDB restarts/crashed.
I also have tried running garbage collection from Erlang command (erlang:garbage_collect().) line but it didn't do anything.
At this point, I am out of ideas and not sure what is going on here. Any input/suggestion is highly appreciated!
Env:
Platform: Linux (Red Hat release 6.4 (Santiago))
CouchDB: 1.3 and have tried with 1.5 as well
RAM: Tried with 2G, 4G, and 8G
CPU: 2 cores
Process:/usr/lib64/erlang/erts-5.8.5/bin/beam.smp -Bd -K true -A 4 -- -root /usr/lib64/erlang

What kind of JSON parser are you using? Do you know that in Erlang atoms are not garbage collected? So if your parser is creating new atoms, they will stay forever, eventually consuming all your memory.

Related

When does Node garbage collect?

I have a NodeJS server running on a small VM with 256MB of RAM and I notice the memory usage keeps growing as the server receives new requests. I read that an issue on small environments is that Node doesn't know about the memory constraints and therefore doesn't try to garbage collect until much later (so for instance, maybe it would only want to start garbage collecting once it reaches 512MB of used RAM), is it really the case?
I also tried using various flags such as --max-old-space-size but didn't see much change so I'm not sure if I have an actual memory leak or if Node just doesn't GC as soon as possible?

This might not be a complete answer, but it's coming from experience and might provide some pointers. Memory leak in NodeJS is one of the most challenging bugs that most developers could ever face.
But before we talk about memory leak, to answer your question - unless you explicitly configure --max-old-space-size, there are default memory limits that would take over. Since certain phases of Garbage collection in node are expensive (and sometimes blocking) steps, depending upon how much memory is available to it, it would delay (e.g. mark-sweep collection) some of the expensive GC cycles. I have seen that in a Machine with 16 GB of memory it would easily let the memory go as high as 800 MB before significant Garbage Collections would happen. But I am sure that doesn't make ~800 MB any special limit. It would really depend on how much available memory it has and what kind of application are you running. E.g. it is totally possible that if you have some complex computations, caches (e.g. big DB Connection Pools) or buggy logging libraries - they would themselves always take high memory.
If you are monitoring your NodeJs's memory footprint - sometime after the the server starts-up, everything starts to warm up (express loads all the modules and create some startup objects, caches warm up and all of your high memory consuming modules became active), it might appear as if there is a memory leak because the memory would keep climbing, sometimes as high as ~1 gb. Then you would see that it stabilizes (this limit used to be lesser in <v8 versions).
But sometimes there are actual memory leaks (which might be hard to spot if there is no specific pattern to it).
In your case, 256 MB seems to be meeting just the minimum RAM requirements for nodejs and might not really be enough. Before you start getting anxious of memory leak, you might want to pump it up to 1.5 GB and then monitor everything.
Some good resources on NodeJS's memory model and memory leak.
Node.js Under the Hood
Memory Leaks in NodeJS
Can garbage collection happen while the main thread is
busy?
Understanding and Debugging Memory Leaks in Your Node.js Applications
Some debugging tools to help spot the memory leaks
Node inspector |
Chrome
llnode
gcore

Mono 4.2.2 garbage collection really slow/leaking on Linux with multiple threads?

I have an app that processes 3+GB of data into 300MB of data. Run each independent dataset sequentially on the main thread, its memory usage tops out at about 3.5GB and it works fine.
If I run each dataset concurrently on 10 threads, I see the following:
Virtual memory usage climbs steadily until allocations fail and it crashes. I can see GC is trying to run in the stack trace)
CPU utilization is 1000% for periods, then goes down to 100% for minutes, and then cycles back up. The app is easily 10x slower when run with multiple threads, even though they are completely independent.
This is mono 4.2.2 build for Linux with large heap support, running on 128GB RAM with 40 logical processors. I am running mono-sgen and have tried all the custom GC settings I could think of (concurrent mark-sweep, max heap size, etc).
These problems do not happen on Windows. If I rewrite code to do significant object pooling, I get farther in the dataset before running OOM, but the fate is the same. I have verified that I have no memory leaks using multiple tools and good-old printf-debugging.
My best theory is that lots of allocations across lots of threads are a weak case for the GC, and most of that wall-clock time is spent with my work threads suspended.
Does anyone have any experience with this? Is there a way I can help the GC get out of that 100% rut it gets stuck in, and to not run out of memory?

Docker not reporting memory usage correctly?

Through some longevity testing with docker (docker 1.5 and 1.6 with no memory limit) on (centos 7 / rhel 7) and observing the systemd-cgtop stats for the running containers, I noticed what appeared to be very high memory use. Typically the particular application running in a non-containerized state only utilizes around 200-300Meg of memory. Over a 3 day period I ended up seeing systemd-cgtop reporting that my container was up to 13G of memory used. While I am not an expert Linux admin by any means, I started digging in to this which pointed me to the following articles:
https://unix.stackexchange.com/questions/34795/correctly-determining-memory-usage-in-linux
http://corlewsolutions.com/articles/article-6-understanding-the-free-command-in-ubuntu-and-linux
So basically what I am understanding is to determine the actual free memory within the system unit would be to look at the -/+ buffers/cache: within "free -m" and not the top line, as I also noticed that the top line within "free -m" would constantly increase with memory used and constantly show a decreased amount of free memory just like what I am observing with my container through systemd-cgtop. If I observe the -/+ buffers/cache: line I will see the actual stable amounts of memory being used / free. Also, if I observe the actual process within top on the host, I can see the process itself is only ever using less then 1% of memory (0.8% of 32G).
I am a bit confused as to whats going on here. If I set a memory limit of 500-1000M for a container (I believe it would turn out to be twice as much due to the swap) would my process eventually stop when I reach my memory limit, even though the process itself is not using anywhere near that much memory? If anybody out there has any feedback on the former, that would be great. Thanks!

I used docker in CentOS 7 for a while, and got the same confused by these. Checking the github issues link, it looks like docker stats in this release is kind of mislead.
https://github.com/docker/docker/issues/10824
so I just ignored memory usage getting from docker stats.

A year since you asked, but adding an answer here for anyone else interested. If you set a memory limit, I think it would not be killed unless it fails to reclaim unused memory. the cgroups metrics and consequently docker stats shows page cache+RES. You could look at the cgroups detailed metrics to see the breakup
I had a similar issue and when I tested with a memory limit, I saw that the container is not killed. Rather the memory is reclaimed and reused.

Erlang garbage collection

I need your help in investigation of issue with Erlang memory consumption. How typical, isn't it?
We have two different deployment schemes.
In first scheme we running many identical nodes on small virtual machines (in Amazon AWS),
one node per machine. Each machine has 4Gb of RAM.
In another deployment scheme we running this nodes on big baremetal machines (with 64 Gb of RAM), with many nodes per machine. In this deployment nodes are isolated in docker containers (with memory limit set to 4 Gb).
I've noticed, that heap of processes in dockerized nodes are hogging up to 3 times much more RAM, than heaps in non-dockerized nodes with identical load. I suspect that garbage collection in non-dockerized nodes is more aggressive.
Unfortunately, I don't have any garbage collection statistics, but I would like to obtain it ASAP.
To give more information, I should say that we are using HiPE R17.1 on Ubuntu 14.04 with stock kernel. In both schemes we are running 8 schedulers per node, and using default fullsweep_after flag.
My blind suggestion is that Erlang default garbage collection relies (somehow) on /proc/meminfo (which is not actual in dockerized environment).
I am not C-guy and not familiar with emulator internals, so could someone point me to places in Erlang sources that are responsible for garbage collection and some emulator options which I can use to tweak this behavior?

Unfortunately VMs often try to be smarter with memory management than necessary and that not always plays nicely with the Erlang memory management model. Erlang tends to allocate and release a large number of small chunks of memory, which is very different to normal applications, which usually allocate and release a small number of big chunks of memory.
One of those technologies is Transparent Huge Pages (THP), which some OSes enable by default and which causes Erlang nodes running in such VMs to grow (until they crash).
https://access.redhat.com/solutions/46111
https://www.digitalocean.com/company/blog/transparent-huge-pages-and-alternative-memory-allocators/
https://docs.mongodb.org/manual/tutorial/transparent-huge-pages/
So, ensuring THP is switched off is first thing you can check.
The other is trying to tweak the memory options used when starting the Erlang VM itself, for example see this post:
Erlang: discrepancy of memory usage figures
Resulting options that worked for us:
-MBas aobf -MBlmbcs 512 -MEas aobf -MElmbcs 512
Some more theory about memory allocators:
http://www.erlang-factory.com/static/upload/media/139454517145429lukaslarsson.pdf
And more detailed description of memory allocator flags:
http://erlang.org/doc/man/erts_alloc.html

First thing to know, is that garbage collection i Erlang is process based. Each process is GC in their own time, and independently from each other. So garbage collection in your system is only dependent on data in your processes, not operating system itself.
That said, there could be some differencess between memory consumption from Eralang point of view, and System point of view. That why comparing erlang:memory to what your system is saying is always a good idea (it could show you some binary leaks, or other memory problems).
If you would like to understand little more about Erlang internals I would recommend those two talks:
https://www.youtube.com/watch?v=QbzH0L_0pxI
https://www.youtube.com/watch?v=YuPaX11vZyI
And from little better debugging of your memory management I could reccomend starting with http://ferd.github.io/recon/

linux CPU cache slowdown

We're getting overnight lockups on our embedded (Arm) linux product but are having trouble pinning it down. It usually takes 12-16 hours from power on for the problem to manifest itself. I've installed sysstat so I can run sar logging, and I've got a bunch of data, but I'm having trouble interpreting the results.
The targets only have 512Mb RAM (we have other models which have 1Gb, but they see this issue much less often), and have no disk swap files to avoid wearing the eMMCs.
Some kind of paging / virtual memory event is initiating the problem. In the sar logs, pgpin/s, pgnscand/s and pgsteal/s, and majflt/s all increase steadily before snowballing to crazy levels. This puts the CPU up correspondingly high levels (30-60 on dual core Arm chips). At the same time, the frmpg/s values go very negative, whilst campg/s go highly positive. The upshot is that the system is trying to allocate a large amount of cache pages all at once. I don't understand why this would be.
The target then essentially locks up until it's rebooted or someone kills the main GUI process or it crashes and is restarted (We have a monolithic GUI application that runs all the time and generally does all the serious work on the product). The network shuts down, telnet blocks forever, as do /proc filesystem queries and things that rely on it like top. The memory allocation profile of the main application in this test is dominated by reading data in from file and caching it as textures in video memory (shared with main RAM) in an LRU using OpenGL ES 2.0. Most of the time it'll be accessing a single file (they are about 50Mb in size), but I guess it could be triggered by having to suddenly use a new file and trying to cache all 50Mb of it all in one go. I haven't done the test (putting more logging in) to correlate this event with these system effects yet.
The odd thing is that the actual free and cached RAM levels don't show an obvious lack of memory (I have seen oom-killer swoop in the kill the main application with >100Mb free and 40Mb cache RAM). The main application's memory usage seems reasonably well-behaved with a VmRSS value that seems pretty stable. Valgrind hasn't found any progressive leaks that would happen during operation.
The behaviour seems like that of a system frantically swapping out to disk and making everything run dog slow as a result, but I don't know if this is a known effect in a free<->cache RAM exchange system.
My problem is superficially similar to question: linux high kernel cpu usage on memory initialization but that issue seemed driven by disk swap file management. However, dirty page flushing does seem plausible for my issue.
I haven't tried playing with the various vm files under /proc/sys/vm yet. vfs_cache_pressure and possibly swappiness would seem good candidates for some tuning, but I'd like some insight into good values to try here. vfs_cache_pressure seems ill-defined as to what the difference between setting it to 200 as opposed to 10000 would be quantitatively.
The other interesting fact is that it is a progressive problem. It might take 12 hours for the effect to happen the first time. If the main app is killed and restarted, it seems to happen every 3 hours after that fact. A full cache purge might push this back out, though.
Here's a link to the log data with two files, sar1.log, which is the complete output of sar -A, and overview.log, a extract of free / cache mem, CPU load, MainGuiApp memory stats, and the -B and -R sar outputs for the interesting period between midnight and 3:40am:
https://drive.google.com/folderview?id=0B615EGF3fosPZ2kwUDlURk1XNFE&usp=sharing
So, to sum up, what's my best plan here? Tune vm to tend to recycle pages more often to make it less bursty? Are my assumptions about what's happening even valid given the log data? Is there a cleverer way of dealing with this memory usage model?
Thanks for your help.
Update 5th June 2013:
I've tried the brute force approach and put a script on which echoes 3 to drop_caches every hour. This seems to be maintaining the steady state of the system right now, and the sar -B stats stay on the flat portion, with very few major faults and 0.0 pgscand/s. However, I don't understand why keeping the cache RAM very low mitigates a problem where the kernel is trying to add the universe to cache RAM.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string