nodejs high cpu usage syscall - node.js

I have node app that makes an excessive usage of CPU.
I have used the --prof option to profile cause.
The profiler indicate that:
javascript uses 20% of all ticks
c++ uses 67% of all ticks
gc uses 8% of all ticks
[c++]: display 39% syscall
Diving into the c++ entry:
32.9% v8::internal::Runtime_SetProperty(int, v8::internal::Object**,v8::internal::Isolate*)
And 6.4% handleApiCall
I can't paste here the whole log, but I was wondering how can I understand and identify the root cause of the CPU usage from the log.

Related

htop shows that cpu usage of per core over 100%?

I'm using htop to monitor the CPU usage of my task. However, the CPU% value exceed 100% sometimes, which really confused me.
Some blogs explain that this is because I'm using a multi-core machine(this is true). If there are 8 (logic) cores, the max value of CPU% is gonna be 800%. CPU% over 100% means that my task is occupying more than one core.
But my question is: there is a column named CPU in htop window which shows the id of the core my task is running on. So how can the usage of this single core exceed 100%.
This is the screenshot. You can see the 84th core's usage is 375%!

Nodejs profiling, is epoll_pwait affecting the performance?

I am developing a node-js application that expects midi input and sends midi output.
In order to measure and improve the performance of the application, following this guide, I have extracted the CPU usage profile while using the application.
This is an extract of the data obtained:
[Summary]:
ticks total nonlib name
495 1.7% 2.0% JavaScript
24379 85.3% 96.9% C++
50 0.2% 0.2% GC
3430 12.0% Shared libraries
272 1.0% Unaccounted
Now the part that I find suspicious is the next:
ticks parent name
24080 84.3% epoll_pwait
Apparently I big percentage of the ticks belong to the same function.
According to this documentation:
Events are received from the event queue (e.g. kernel) via the event
provider (e.g. epoll_wait)
So, from my point of view the event-loop thread uses that function to poll events while in idle state. That would mean that high percentage of calls to epoll_pwait means that the event loop thread is rarely being blocked, and that would be good for performance.
Using the top command I can see that the CPU usage of the application is low (aprox. 3%)
The question is, are epoll_pwait calls affecting performance? If so, can I improve this somehow?

Tracking memory leaks in node.js - v8 profiler vs htop

Recently we've discovered that our node.js app is most probably having some memory leak (memory consumption showed in htop is growing and growing). We've managed to isolate small amount of code into separate script which is still causing memory leak and now trying to hunt it down. However we're having some troubles with analysing and understanding our test results gathered by htop tool and this v8 profiler: http://github.com/c4milo/node-webkit-agent
Right after script start htop is showing following memory consumption:
http://imageshack.us/a/img844/3151/onqk.png
Then app is running for 5 minutes and I'm taking heap snapshots each 30 secs. After 5 mins results are following:
Heap snapshots sizes:
http://imageshack.us/a/img843/1046/3f7x.png
And results from htop after 5 mins:
http://imageshack.us/a/img33/5339/2nb.png
So if I'm reading this right then V8 profiler shows that there is no serious memory leak, but htop shows that memory consumption grew up from 12MB to 56MB! Can anyone tell where is this difference coming from? And why even at the beginning of tests htop shows 12MB vs 4MB showed by profiler?
htop author here. You're reading the htop numbers right. I don't know about the V8 profiler, but on the issue of "12MB vs 4MB" at the start, it's most likely the case that V8 is accounting only your JS data, while htop accounts the entire resident memory usage of the process, including C libraries used by V8 itself, etc.

Profiler results are confusing

Here is a picture of CPU monitoring provided by VisualVm profiler. I'm confused because I can't understand what does percentage means?As you can see for CPU the number is 24,1 of what?overall cpu time?and gc - 21,8 the same question.What is 100% in both cases?Please clarify this data.
CPU usage is the total CPU usage of your system. The GC activity shows how much of the CPU is spent by the GC threads.
In your example, it looks like the GC was executing and thus contributing to the majority of the total CPU usage in the system. After the GC finished, there was no CPU used.
You should check if this observation is consistent with your GC logs, the logs will shows you the activity around the 19:50 time.

How to measure lock contention?

I'm reading http://lse.sourceforge.net/locking/dcache/dcache_lock.html, in which spinlock time for each functions is measured:
SPINLOCKS HOLD WAIT
UTIL CON MEAN( MAX ) MEAN( MAX )(% CPU) TOTAL NOWAIT SPIN RJECT NAME
5.3% 16.5% 0.6us(2787us) 5.0us(3094us)(0.89%) 15069563 83.5% 16.5% 0% dcache_lock
0.01% 10.9% 0.2us( 7.5us) 5.3us( 116us)(0.00%) 119448 89.1% 10.9% 0% d_alloc+0x128
0.04% 14.2% 0.3us( 42us) 6.3us( 925us)(0.02%) 233290 85.8% 14.2% 0% d_delete+0x10
0.00% 3.5% 0.2us( 3.1us) 5.6us( 41us)(0.00%) 5050 96.5% 3.5% 0% d_delete+0x94
I'd like to know where these statistics are from. I tried oprofile, but it seems oprofile cannot measure lock holding and waiting time for a specific lock. And valgrind's drd slows down applications too much, which will make the result less accurate and also consume too much time. mutrace seems good, but as the name points out, I'm afraid it can only trace mutex exclusions.
So is there any other tool, or how to use the tools I mentioned above, to get lock contention statistics?
Thanks for your reply.
Finally I find the performance measuring tool used in the article, which needs to patch kernel .
The introduction page can be found at http://oss.sgi.com/projects/lockmeter/, and the latest kernel patch corresponds to kernel version 2.6.16, which you can download here.
One way to tell is just get it running, pause it, and take a random stackshot of all the threads. Then do it again, several times. Then the fraction of stack samples that terminate in locking code is the percent of time you are after, roughly. It will also tell you which locations the locking is performed in. If you're after accuracy, take more samples. This works in any language or operating system.

Resources