Is there a good way to check line-by-line memory allocation in Julia while using a Jupyter notebook? Via the %time macro I can see that my code is spending about 20% of time in garbage collection, and I'd like to see if this can be reduced.
On a related note, are there general tips to use for reducing garbage-collection time?
Related
I am doing memory profiling using YourKit and to simplify the matters for a Spark application, I am running the app in DirectRunner mode. The machine I am testing on has 32 cores. The captured snapshot looks like:
The "direct-runner-worker" has 32 threads and it seems like I was under the false assumption that direct runner occupies just one thread. My question is - shouldn't there be a limit on the number of parallelization threads? In the snapshot a thread occupies between 250 and 350 MB and this will inevitably blow up.
Another question is I am not sure if I should follow http://spark.apache.org/developer-tools.html#profiling for my case, the documentation seems to be for an application running with a SparkCluster but since I am using DirectRunner (for debugging purpose) then maybe whatever I am doing is good enough - does anyone have experience with this?
Any pointers are appreciated! :)
PS: my mind is boggled by the creation of 215 million objects but that should go down with the thread count. However, ~6 million objects per thread seem like a lot.
Currently working on optimizing a library for speed. I've already reduced execution time drastically, using V8 CPU and Memory Profiling through Webstorm. This was achieved mainly by changing the core method from recursive to iterative.
Now the self time distribution breaks down as
I'm assuming the first entry "node" is timing internal functions calls, which is great. The other entries also make sense. I'm new to Nodejs profiling, but 31.6% for GC seems high, so I've decided to investigate.
I've now created a heap dump through Webstorm, but unfortunately that doesn't give me much information.
These seem to be system internal memory references mainly. Stepping through the core iteration code logic again, there also don't seem to be a lot of places where memory is explicitly allocated (using this as a reference).
Question
Can the GC overhead be reduced?
Is this amount of allocation just expected here?
Is it possible to get better memory profiling information?
Setup Instructions
In case someone want's to try debugging this, I'm including setup instructions.
Download or clone object-scan and run
yarn install --frozen-lockfile
yarn run test-simple --verbose
Now create a file test.js in the project root containing this content and run node --trace_gc test.js or run it through Webstorm for advanced profiling.
In Javascript and in v8 (node) particularly an amount of time spent for garbage collection depends on amount of data stored in heap, but that's only one of many factors.
In v8 engine there are two main "types" of GC: minor (scavenge) and major (mark-sweep/mark-compact). You may see GC types that happen during your tests in console with --trace-gc enabled. And in different cases one type could "eat" more time than other an vice versa. So before optimizations you should determine which gc takes more time.
There are not a lot of options for optimizing major GC, cause it highly affected by amount of data that stays in memory for "long" (actually in this case long means that object survives scavenge GC) period. Such data is stored in so called "old space" in heap. And major GC works with this space and it should scan all that memory and mark objects that no longer have any references for further clearance.
In your case the amount of test data you're loading goes to old space. As a result it affects major GC during the whole test. And in this case major GC will not clear too much, because you're using your test object, but it still consume time for scanning entire old space. So you may consider preventing v8 from doing that by launching node with gc-specific flags like: --nouse-idle-notification --expose-gc --gc_interval=100500 (where 100500 is number of allocation, it can be take high value that will prevent running gc before the whole test will pass) that will allow trigger garbage collections manually. Test your code using this approach and see how major GC affects it, try tests with different amount of data you provide to function. If the impact is quiet high you may try to refactor your code trying to minimize long-lived variables, closures, etc.
If you'll discover that major GC doesn't have much impact on performance, then scavenge GC takes the most of time. Unlike major GC it operates with so called "new space" in heap. It's a space where all new objects are stored. If those objects survive scavenge, then they are moved to old space. New space has much smaller size ( you may control it by setting --max_semi_space_size, note: new space size = 2 * semi space size) than old space and more new objects and variables you allocate more scavenge GC runs will happen. If this GC heats performance too much you may consider refactor your code to make less new allocations. But if you'll reuse variables it may also slowdown the performance and those objects will go to old space and may become a problem described in "major GC" section.
Also v8 GC doesn't always work in the same thread that your program runs. It does some work in background too, but I don't know what Webstorm shows in your case. If it counts just total time spend in GC, may be it just doesn't have so much impact.
You may find more details on v8 GC in this blog post.
TL;DR:
Can the GC overhead be reduced?
Yes, but first you should discover what should be optimized by following steps above.
Is this amount of allocation just expected here?
That's could be just discovered by comparing different approaches. There's no some absolute number that could limit "good" amount from "bad", because it depends on lot's of factors, including the amount on entry data.
Is it possible to get better memory profiling information?
You may find some good tools here, but in general you may use Chrome dev tools which could provide a bit more details rather than Webstorm does.
I'm happily running Pygmo2 to solve a 18-parameters problem using self-adapting differential evolution.
Everything runs fine but at an high cost: Pygmo hugely overallocates memory, requesting about 170G while actually using about 10G.
I'm running on a shered cluster with a total of 500G, so I can't run multiple instances at the same time without affecting the server performance for other users. As it takes 2-3 hours to complete one run this is somewhat limiting for exploratory analysis and objective function optimization.
I looked at the documentation, other SO questions, git threads, but I've to say I didn't find much about memory usage.
So, my questions are:
Is this memory-greedy behaviour normal for problems with multiple parameters? Or is something due to how the objective function is coded? (I'd post the code, but is a 600-line piece of code describing a thermodynamic biochemical equilibrium, if not necessary I would not clog the post)
If this overallocation is normal, what function does it have?
Is there a way to limit the memory pygmo allocates?
Tips/tricks/experiences/suggestions?
Few details about the setting:
pygmo 2.8
18-parameter problem
archipelago with 4 islands
population of 40 parents (interesting statement about lack of performance increase exploding the number of parents regardless of the number of parameters here http://www1.icsi.berkeley.edu/~storn/code.html)
Thanks!
this is my first post in stack overflow forum. we are recently experiencing some Java OOME issues and using jvisualvm, yourkit and eclipse mat tools able to idenify and fix some issues...
one behavior observed during analysis is that when we create a heapdump manually using jconsole or jvisualvm, the used heap size in jvm reduces dramatically (from 1.3 GB to 200 MB) after generating the heapdump.
can some one please advise on this behavior? this is a boon in disguise since whenever i see the used heapsize is >1.5GB, i perform a manaul GC and the system is back to lower used heapsize numbers resulting in no jvm restarts.
let me know for any additional details
thanks
Guru
when you use JConsole to create the dump file, there are 2 parameters: The first one is the file name to generate (complete path) and the second one (true by default) indicates if you want to perform a gc before taking the dump. Set it to false if you don't want a full gc before dumping
This is an old question but I found it while asking a new question of my own, so I figured I'd answer it.
When you generate a heap dump, the JVM performs a System.gc() operation before it generates the heap dump, which is collecting non-referenced objects and effectively reducing your heap utilization. I am actually looking for a way to disable that System GC so I can inspect the garbage objects that are churning in my JVM.
I've set up a script in 3ds max to render a bunch of animations into frames. To do this, I open up a file with all of the materials, load an animation (as a bip) onto the figure, then render.
We were seeing a problem where eventually the script would fail because it was unable to open the next file-- max had consumed all of the system memory. Closing max, of course, freed the memory, and we were able to continue with the script.
I checked out the heapfree variable, hoping to see a memory leak within my script, hoping to see a memory leak within my own (maxscript) code-- but the amount of free space was the same after every animation.
Then, it must be 3ds max which is consuming all of that memory. Nothing in max need be saved from animation to animation-- is there some way to get max to free that memory? (I've tried resetMaxFile() and manually deleting all of the objects in the scene). Is there any known sets of operations that cause max to grow out of control?
Have you tried to add this at the end of your loop:
gc()
it does a garbarge collect and frees up some space.
However I suspect the bip part to be leaky.
The first line of questioning needs to be, do you have any locally-created plugins loaded? Could they be leaking memory?
I haven't worked with 3dsmax since version 5 but I don't remember any particular memory leaks that were problematic. However, I seem to recall (from others' experiences) that batch operations needed to restart MAX from time to time just to keep things sane. E.g. break up your batch job into smaller sets of work and call them sequentially. However, the stuff we were doing in MAX5 didn't need such kludges. YMMV of course. ;)
Autodesk has the Autodesk Developer Network, also; that's a great resource and not too much cash if your company is serious about its use of 3DS.