I have a node process that I use to add key-values to an object. When I get to about 9.88 million keys added, the process appears to hang. I assumed an out-of-memory issue, so I turned on trace_gc and also put in a check in the the code that adds the keys:
const { heapTotal, heapUsed } = process.memoryUsage()
if ((heapUsed / heapTotal) > 0.99) {
throw new Error('Too much memory')
}
That condition was never met, and the error never thrown. As far as --trace_gc output, my last scavenge log was:
[21544:0x104000000] 2153122 ms: Scavenge 830.0 (889.8) -> 814.3 (889.8) MB, 1.0 / 0.0 ms allocation failure
Mark-sweep, however, continues logging this:
[21544:0x104000000] 3472253 ms: Mark-sweep 1261.7 (1326.9) -> 813.4 (878.8) MB, 92.3 / 0.1 ms (+ 1880.1 ms in 986 steps since start of marking, biggest step 5.6 ms, walltime since start of marking 12649 ms) finalize incremental marking via task GC in old space requested
Is this output consistent with memory issues?
I should note that having to add this many keys to the object is an edge-case; normally the range is more likely in the thousands. In addition, the keys are added during a streaming process, so I don't know how many are required to added at the outset. So in addition to trying to figure out what the specific problem is, I'm also looking for a way to determine that the problem will likely occur before the process hangs.
Related
I sometimes see this log in the GC logs when using the --trace_gc option of NodeJS:
1502684 ms: Mark-sweep 26.7 (29.2) -> 18.8 (28.2) MB, 7.7 / 0.4 ms (+
1.0 ms in 10 steps since start of marking, biggest step 0.2 ms, walltime since start of marking 28 ms) (average mu = 1.000, current mu
= 1.000) finalize incremental marking via task; GC in old space requested
Can I conclude that GC was only partial based on the fact it prints GC in old space requested? Does this mean old space wasn't garbage collected?
I also can't understand what the numbers 26.7 (29.2) -> 18.8 (28.2 MB), it would look like it reduced used memory from 29.2 to 18.8, but then I don't understand why it is suffixed with (28.2 MB).
I have an autopilot cluster which should increase the cpu/memory on demand but I am still getting the following error
I have not defined any limits/resources in the deployment. I am relying on Google to handle that automatically.
It's a node application that reads large csv files (300-400mb), parses them and inserts them to mysql db (using typeorm)
It seems it works with smaller files. The files are read one by one.
In this case there are over 1200 files (not all 300-400mb of size but quite many are)
It seems it does not work like I thought it would....
Is this a sign that there is something wrong with the js code or do I just need to increase the memory manually ?
<--- Last few GCs --->
[1:0x7f9fde9bf330] 4338991 ms: Scavenge (reduce) 955.5 (1037.2) -> 955.5 (1037.2) MB, 7.1 / 0.0 ms (average mu = 0.295, current mu = 0.279) allocation failure
[1:0x7f9fde9bf330] 4339001 ms: Scavenge (reduce) 956.9 (1037.7) -> 956.9 (1038.0) MB, 7.6 / 0.0 ms (average mu = 0.295, current mu = 0.279) allocation failure
[1:0x7f9fde9bf330] 4339011 ms: Scavenge (reduce) 958.1 (1038.0) -> 958.1 (1038.5) MB, 7.7 / 0.0 ms (average mu = 0.295, current mu = 0.279) allocation failure
<--- JS stacktrace --->
FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
I have not defined any limits/resources in the deployment. I am relying on Google to handle that automatically.
I don't think that's true. Copy pasting from the docs:
"Autopilot relies on what you specify in your deployment configuration to provision resources. If you do not specify resource requests for any container in the Pod, Autopilot applies default values. These defaults are designed to give the containers in your Pods an average amount of resources, which are suitable for many smaller workloads."
And this: "Important: Google recommends that you explicitly set your resource requests for each container to meet your application requirements, as these default values might not be sufficient, or optimal."
So it's likely that the default resource requests and limit are too low for your application. You should set them to be a value high enough so you don't run out of memory.
NOTE: Found the root cause in application code using hazelcast which started to execute after 15 min, the code retrieved almost entire data, so the issue NOT in hazelcast, leaving the question here if anyone will see same side effect or wrong code.
What can cause heavy traffic between Hazelcast (v3.12.12, also tried 4.1.1) 2 nodes ?
It holds maps with lot of data, no new entries are added/removed within that time, only map values are updated.
Java 11, Memory usage 1.5GB out of 12GB, no full GCs identified.
Following JFR the high IO is from:
com.hazelcast.internal.networking.nio.NioThread.processTaskQueue()
Below chart of Network IO, 15 min after start traffic jumps from 15 to 60 MB. From application perspective nothing changed after these 15 min.
This smells garbage collection, you are most likely to be running into long gc pauses. Check your gc logs, which you can enable using verbose gc settings on all members. If there are back-to-back GCs then you should do various things:
increase the heap size
tune your gc, I'd look into G1 (with -XX:MaxGCPauseMillis set to a reasonable number) and/or ZGC.
It seems that for my server mark-sweep operation takes seconds in stop-the-world way:
Oct 17 08:26:27 s3 u[30843]: [30843:0x26671a0] 63025059 ms: Mark-sweep 2492.7 (3285.6) -> 2317.6 (2945.0) MB, 84.9 / 0.1 ms (+ 3223.4 ms in 3877 steps since start of marking, biggest step 731.7 ms, walltime since start of marking 3315 ms) finalize incremental marking via task GC in old space requested
Oct 17 08:26:27 s3 u[30843]: Execution blocked for 3273 ms
.
Oct 17 08:28:15 s3 u[30843]: [30843:0x26671a0] 63133051 ms: Mark-sweep 2499.8 (3298.4) -> 2313.4 (2947.1) MB, 160.2 / 0.1 ms (+ 3691.7 ms in 3679 steps since start of marking, biggest step 1073.4 ms, walltime since start of marking 3859 ms) finalize incremental marking via task GC in old space requested
Oct 17 08:28:15 s3 u[30843]: Execution blocked for 3791 ms
.
This behavior is similar than described in https://github.com/nodejs/help/issues/947, it seems to be somewhat related to memory consumption and worsens over time.
The problem existed in node 7, but was only barely noticeable. Now with node 8.12 it reaches 5 sec block within 24 h.
I suspected it might have something to do with unorthodox way of storing data in big objects, tried to reconstruct one 2 GB object to smaller objects cutting it down to 1 GB, but no obvious benefit. However, these objects are very simple while big.
The questions are:
Is there an V8 option to make mark-sweep to mitigate this problem? Less often, smaller steps, skip some optimizations, anything?
I could not find any decent documentation about V8 options - is there?
How to help mark-sweep to do it's task more efficiently?
Are there common cases what to avoid where mark-sweep could struggle?
Any desperate hacks I could try?
I'm building a NodeJs App using Express 4 + Sequelize + a Postgresql database.
I'm using Node v8.11.3.
I wrote a script to load data into my database from a JSON file. I tested the script with a sample of ~30 entities to seed. It works perfectly.
Actually, I have around 100 000 entities to load, in the complete JSON file. My script reads the JSON file and tries to populate the database asynchronously (ie. 100 000 entities at the same time).
The result is, after some minutes :
<--- Last few GCs --->
[10488:0000018619050A20] 134711 ms: Mark-sweep 1391.6 (1599.7) -> 1391.6 (1599.7) MB, 1082.3 / 0.0 ms allocation failure GC in old space requested
[10488:0000018619050A20] 136039 ms: Mark-sweep 1391.6 (1599.7) -> 1391.5 (1543.7) MB, 1326.9 / 0.0 ms last resort GC in old space requested
[10488:0000018619050A20] 137351 ms: Mark-sweep 1391.5 (1543.7) -> 1391.5 (1520.2) MB, 1311.5 / 0.0 ms last resort GC in old space requested
<--- JS stacktrace --->
==== JS stack trace =========================================
Security context: 0000034170025879 <JSObject>
1: split(this=00000165BEC5DB99 <Very long string[1636]>)
2: attachExtraTrace [D:\Code\backend-lymo\node_modules\bluebird\js\release\debuggability.js:~775] [pc=0000021115C5728E](this=0000003CA90FF711 <CapturedTrace map = 0000033AD0FE9FB1>,error=000001D3EC5EFD59 <Error map = 00000275F61BA071>)
3: _attachExtraTrace(aka longStackTracesAttachExtraTrace) [D:\Code\backend-lymo\node_module...
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
1: node_module_register
2: v8::internal::FatalProcessOutOfMemory
3: v8::internal::FatalProcessOutOfMemory
4: v8::internal::Factory::NewFixedArray
5: v8::internal::HashTable<v8::internal::SeededNumberDictionary,v8::internal::SeededNumberDictionaryShape>::IsKey
6: v8::internal::HashTable<v8::internal::SeededNumberDictionary,v8::internal::SeededNumberDictionaryShape>::IsKey
7: v8::internal::StringTable::LookupString
8: v8::internal::StringTable::LookupString
9: v8::internal::RegExpImpl::Exec
10: v8::internal::interpreter::BytecodeArrayRandomIterator::UpdateOffsetFromIndex
11: 0000021115A043C1
Finally, some entities have been created but the process clearly crashed.
I understood that this error is due to memory.
My questions is : Why Node doesn't take the time to manage everything without overshooting memory ? Is there a "queue" to limit such explosions ?
I identified some workarounds :
Segment the seed into several JSON files
Use more memory using --max_old_space_size=8192 option
Proceed sequentially (using sync calls)
but none of these solutions are satisfying to me. It makes me afraid for the future of my app supposed to manage sometimes long operations in production.
What do you think about it ?
Node.js just does what you tell it. If you go into some big loop and start up a lot of database operations, then that's exactly what node.js attempts to do. If you start so many operations that you consume too many resources (memory, database resources, files, whatever), then you will run into trouble. Node.js does not manage that for you. It has to be your code that manages how many operations you keep in flight at the same time.
On the other hand, node.js is particularly good at having a bunch of asynchronous operations in flight at the same time and you will generally get better end-to-end performance if you do code it to have more than one operation going at a time. How many you want to have in flight at the same time depends entirely upon the specific code and exactly what the asynchronous operation is doing. If it's a database operation, then it will likely depend upon the database and how many simultaneous requests it does best with.
Here are some references that give you ideas for ways to control how many operations are going at once, including some code examples:
Make several requests to an API that can only handle 20 request a minute
Promise.all consumes all my RAM
Javascript - how to control how many promises access network in parallel
Fire off 1,000,000 requests 100 at a time
Nodejs: Async request with a list of URL
Loop through an api get request with variable URL
Choose proper async method for batch processing for max requests/sec
If you showed your code, we could advise more specifically which technique might fit best for your situation.
Use async.eachOfLimit to do at max X operations in same times :
var async = require("async");
var myBigArray = [];
var X = 10; // 10 operations in same time at max
async.eachOfLimit(myBigArray, X, function(element, index, callback){
// insert element
MyCollection.insert(element, function(err){
return callback(err);
});
}, function(err, result){
// all finished
if(err){
// do stg
}
else
{
// do stg
}
});