Jena tdbloader and tdbloader2 memory leak? - memory-leaks

I was trying to use tdbloader2 to load Freebase dump. But I got an exception: "java.lang.OutOfMemoryError: Java heap "
I increase the JVM_ARGS to -Xmx60G, and still get the same exception. (My machine has 64G).
I switched to tdbloader, and use top to monitor the memery consumption, and the memory usage increase dramatically to 15G in less than half an hour.
======
More info:
The dump is rdf format, and I splitted it into multiple ttl files, each is about 700M, and totally 90G.
I used
tdbloader2 --loc kg x*.ttl
And I modified the tdbloader2 script, changed the line
JVM_ARGS=${JVM_ARGS:--Xmx1024M}
to be
JVM_ARGS=${JVM_ARGS:--Xmx60G}
I don't have the exact error output now. But I remember it failed when it tried to create a new HashMap.

Related

Nodejs process consumes more memory than actual memory dump size

I've just noticed that pm2 process consumes a ton of memory, but when I tried to take a heap snapshot to figure it out the heap snapshot was 20x more times less.
This is what was inspected:
Then I read some articles covering how to debug heap snaphots and all of them were in vitro experiments. I doubt anyone code like this. All of the heap snaphots I did were healthy, just like any other process with low memory consumption. Does nodejs produce something like runtime cache or functions' calculations results in a form of weak map which is detached from heap snapshot?
Is there any way to restrict nodejs memory usage?

Filebeat - Failed to publish events caused by: read tcp x.x.x.x:36196->x.x.x.x:5045: i/o timeout

Hi i'm running into a problem while sending logs via filebeat to logstash:
In short - Can't see logs in kibana - when tailing the filebeat log I see a lot of these:
ERROR logstash/async.go:235 Failed to publish events caused by: read tcp x.x.x.x:36246->y.y,y.y:5045: i/o timeout (while y.y,y.y is logstash address and 5045 is the open beat port)
More details:
I have ~60 machines with filebeat 6.1.1 installed and one logstash machine with logstash 6.2.3 installed.
Some filebeats successfully sends their logs while some throws the error I mentioned above.
those non-errors filebeats sends old logs - means I can see in logstash debug logs that some logs timestamp are 2 or 3 days ago
Logstash usage memory is 35% and cpu usage near 75% on peaks,
in netstat -tupn output in the filebeat machines I can see that the established connections to logstash from filebeat.
Can someone help me find the problem ?
It looks like logstash performance issue. Cpu usage its probabbly too high Memory could be more. increase the minimum (Xms) and maximum (Xmx) heap allocation size to =[Total amount in the host - 1], (live 1 G to the Os) and set it equals (xms=xmx)
Also you can run another logstash instance and balance the filebeat output to these 2 and see what happen.
More things to consider:
Performance Checklist
Check the performance of input sources and output destinations:
Logstash is only as fast as the services it connects to. Logstash can only consume and produce data as fast as its input and output destinations can!
Check system statistics:
CPU
Note whether the CPU is being heavily used. On Linux/Unix, you can run top -H to see process statistics broken out by thread, as well as total CPU statistics.
If CPU usage is high, skip forward to the section about checking the JVM heap and then read the section about tuning Logstash worker settings.
Memory
Be aware of the fact that Logstash runs on the Java VM. This means that Logstash will always use the maximum amount of memory you allocate to it.
Look for other applications that use large amounts of memory and may be causing Logstash to swap to disk. This can happen if the total memory used by applications exceeds physical memory.
I/O Utilization
Monitor disk I/O to check for disk saturation.
Disk saturation can happen if you’re using Logstash plugins (such as the file output) that may saturate your storage.
Disk saturation can also happen if you’re encountering a lot of errors that force Logstash to generate large error logs.
On Linux, you can use iostat, dstat, or something similar to monitor disk I/O.
Monitor network I/O for network saturation.
Network saturation can happen if you’re using inputs/outputs that perform a lot of network operations.
On Linux, you can use a tool like dstat or iftop to monitor your network.
Check the JVM heap:
Often times CPU utilization can go through the roof if the heap size is too low, resulting in the JVM constantly garbage collecting.
A quick way to check for this issue is to double the heap size and see if performance improves. Do not increase the heap size past the amount of physical memory. Leave at least 1GB free for the OS and other processes.
You can make more accurate measurements of the JVM heap by using either the jmap command line utility distributed with Java or by using VisualVM. For more info, see Profiling the Heapedit.
Always make sure to set the minimum (Xms) and maximum (Xmx) heap allocation size to the same value to prevent the heap from resizing at runtime, which is a very costly process.
Tune Logstash worker settings:
Begin by scaling up the number of pipeline workers by using the -w flag. This will increase the number of threads available for filters and outputs. It is safe to scale this up to a multiple of CPU cores, if need be, as the threads can become idle on I/O.
You may also tune the output batch size. For many outputs, such as the Elasticsearch output, this setting will correspond to the size of I/O operations. In the case of the Elasticsearch output, this setting corresponds to the batch size.
More info here.

node.js RSS memory grows over time despite fairly consistent heap sizes

I've got a node.js application where the RSS memory usage seems to keep growing despite the heapUsed/heapTotal staying relatively constant.
Here's a graph of the three memory measurements taken over a week (from process.memoryUsage()):
You may note that there's a somewhat cyclical pattern - this corresponds with the application's activity throughout each day.
There actually does seem to be a slight growth in the heap, although it's nowhere near that of the RSS growth. So I've been taking heap dumps every now and then (using node-heapdump), and using Chrome's heap compare feature to find leaks.
One such comparison might look like the following (sorted by size delta in descending order):
What actually shows up does depend on when the snapshot was taken (eg sometimes more Buffer objects are allocated etc) - here I've tried to take a sample which demonstrates the issue best.
First thing to note is that the sizes on the left side (203MB vs 345MB) are much higher than heap sizes shown in the graph. Secondly, the size deltas clearly don't match up with the 142MB difference. In fact, sorting by size delta in ascending order, many objects have be deallocated, which means that the heap should be smaller!
Does anyone have any idea on:
why is this the case? (RSS constantly growing with stable heap size)
how can I stop this from happening, short of restarting the server every now and then?
Other details:
Node version: 0.10.28
OS: Ubuntu 12.04, 64-bit
Update: list of modules being used:
async v0.2.6
log4js v0.6.2
mysql v2.0.0-alpha7
nodemailer v0.4.4
node-time v0.9.2 (for timezone info, not to be confused with nodetime)
sockjs v0.3.8
underscore v1.4.4
usage v0.3.9 (for CPU stats, not used for memory usage)
webkit-devtools-agent v0.2.3 (loaded but not activated)
heapdump v0.2.0 is loaded when a dump is made.
Thanks for reading.
The difference you see between RSS usage and heap usage are buffers.
"A Buffer is similar to an array of integers but corresponds to a raw memory allocation outside the V8 heap"
https://nodejs.org/api/buffer.html#buffer_buffer

Virtual Memory Statistics per process

I am working on a very wierd Memory leak issue and this resulted into the following problem.
I have a process running on my system which increases its Virtual Memory size after a certain operation is made.Now in order to confirm the issue is not a memory leak issue I want to get statistics for the number of free and used pages held by the process when its currently running.
I am aware of vmstat command which gives the same statistics for the entire system.But for my confirmation I need a per process vmstat command.
Does anyone have a idea how this can be done ?
/proc/PID/smaps file will give you exhaustive information on all regions of virtual memory held by the given process.
If you're coding in C/C++, dynamic analysis tool like Valgrind could be useful. http://valgrind.org/

generate heap dump reduces dramatically after performing manual GC

this is my first post in stack overflow forum. we are recently experiencing some Java OOME issues and using jvisualvm, yourkit and eclipse mat tools able to idenify and fix some issues...
one behavior observed during analysis is that when we create a heapdump manually using jconsole or jvisualvm, the used heap size in jvm reduces dramatically (from 1.3 GB to 200 MB) after generating the heapdump.
can some one please advise on this behavior? this is a boon in disguise since whenever i see the used heapsize is >1.5GB, i perform a manaul GC and the system is back to lower used heapsize numbers resulting in no jvm restarts.
let me know for any additional details
thanks
Guru
when you use JConsole to create the dump file, there are 2 parameters: The first one is the file name to generate (complete path) and the second one (true by default) indicates if you want to perform a gc before taking the dump. Set it to false if you don't want a full gc before dumping
This is an old question but I found it while asking a new question of my own, so I figured I'd answer it.
When you generate a heap dump, the JVM performs a System.gc() operation before it generates the heap dump, which is collecting non-referenced objects and effectively reducing your heap utilization. I am actually looking for a way to disable that System GC so I can inspect the garbage objects that are churning in my JVM.

Resources