NodeJS 100% cpu usage - epoll_wait - node.js

Im trying to track down why my nodejs app all a sudden uses 100% cpu. The app has around 50 concurrent connections and is running on a ec2 micro instance.
Below is the output of: strace -c node server.js
^C% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
87.32 0.924373 8 111657 epoll_wait
6.85 0.072558 3 22762 pread
2.55 0.026965 0 146179 write
0.92 0.009733 0 108434 1 futex
0.44 0.004661 0 82010 7 read
0.44 0.004608 0 223317 clock_gettime
0.31 0.003244 0 172467 gettimeofday
0.31 0.003241 35 93 brk
0.20 0.002075 0 75233 3 epoll_ctl
0.19 0.002052 0 23850 11925 accept4
0.19 0.001997 0 12302 close
0.19 0.001973 7 295 mmap
0.06 0.000617 4 143 munmap
And here is the output of: node-tick-processor
[Top down (heavy) profile]:
Note: callees occupying less than 0.1% are not shown.
inclusive self name
ticks total ticks total
669160 97.4% 669160 97.4% /lib/x86_64-linux-gnu/libc-2.15.so
4834 0.7% 28 0.0% LazyCompile: *Readable.push _stream_readable.js:116
4750 0.7% 10 0.0% LazyCompile: *emitReadable _stream_readable.js:392
4737 0.7% 19 0.0% LazyCompile: *emitReadable_ _stream_readable.js:407
1751 0.3% 7 0.0% LazyCompile: ~EventEmitter.emit events.js:53
1081 0.2% 2 0.0% LazyCompile: ~<anonymous> _stream_readable.js:741
1045 0.2% 1 0.0% LazyCompile: ~EventEmitter.emit events.js:53
960 0.1% 1 0.0% LazyCompile: *<anonymous> /home/ubuntu/node/node_modules/redis/index.js:101
948 0.1% 11 0.0% LazyCompile: RedisClient.on_data /home/ubuntu/node/node_modules/redis/index.js:541
This is my first time debugging a node app. Are there any conclusions that can be drawn from the above debug output? Where could the error be?
Edit
My node version: v0.10.25
Edit 2
After updating node to: v0.10.33
Here is the output
^C% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
91.81 1.894522 8 225505 45 epoll_wait
3.58 0.073830 1 51193 pread
1.59 0.032874 0 235054 2 write
0.98 0.020144 0 1101789 clock_gettime
0.71 0.014658 0 192494 1 futex
0.57 0.011764 0 166704 21 read

Seems like Node JS v0.10.25 bug with event loop, look here.
Note, from this github pull request:
If the same file description is open in two different processes, then
closing the file descriptor is not sufficient to deregister it from
the epoll instance (as described in epoll(7)), resulting in spurious
events that cause the event loop to spin repeatedly. So always
explicitly deregister it.
So as solution you can try update your OS or update Node JS.

Related

Top Command: How come CPU% in process is higher than in overall CPU Usage Percentage

How come CPU% in process is higher than in overall CPU Usage Percentage
top - 19:42:24 up 68 days, 19:49, 6 users, load average: 439.72, 540.53, 631.13
Tasks: 354 total, 3 running, 350 sleeping, 0 stopped, 1 zombie
Cpu(s): 21.5%us, 46.8%sy, 0.0%ni, 17.4%id, 0.0%wa, 0.1%hi, 14.2%si, 0.0%st
Mem: 65973304k total, 50278472k used, 15694832k free, 28749456k buffers
Swap: 19455996k total, 93436k used, 19362560k free, 14769728k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
4425 ladm 20 0 63.6g 211m 1020 S **425.7** 0.3 433898:26 zzz
28749 isdm 20 0 167g 679m 7928 S 223.7 1.1 2526:40 xxx
28682 iadm 20 0 167g 1.1g 7928 S 212.8 1.8 2509:08 ccc
28834 iladm 20 0 11.8g 377m 7968 S 136.3 0.6 850:25.78 vvv
7776 root 20 0 237m 139m 11m S 3.3 0.2 658:24.58 bbbb
45 root 20 0 0 0 0 R 1.1 0.0 1313:36 nnnn/10
1313 isom 20 0 103m 712 504 S 1.1 0.0 0:00.20 mmmm.sh
4240 ladm 20 0 338m 18m 576 S 1.1 0.0 558:21.33 memcached
32341 root 20 0 15172 1440 916 R 1.1 0.0 0:00.04 top
The machine in question is using 100% of the cores available.
In the situation presented, the pc or server has more than 1 core, therefore a process can use more than 1. That's why one process can use 425.7%, meaning that it's using more than 4 cores to do its job.

Issues with Scaling horizontally with Cassandra NoSQL

I am trying to configure and benchmark my AWS EC2 instances for Cassandra distributions with Datstax Community Edition. I'm working with 1 cluster so far, and I'm having issues with the horizontal scaling.
I'm running cassandra-stress tool to stress the nodes and I'm not seeing the horizontal scaling. My command is run under an EC2 instance that is on the same network as the nodes but not on the node (ie i'm not using one of the node to launch the command)
I have inputted the following:
cassandra-stress write n=1000000 cl=one -mode native cql3 -schema keyspace="keyspace1" -pop seq=1..1000000 -node ip1,ip2
I started with 2 nodes, and then 3, and then 6. But the numbers don't tell me what Cassandra is suppose to do: more nodes to a cluster should speed up read/write.
Results: 2 Nodes: 1M 3 Nodes: 1M 3 Nodes: 2M 6 Nodes: 1M 6 Nodes: 2M 6 Nodes: 6M 6 Nodes: 10M
op rate 6858 6049 6804 7711 7257 7531 8081
partition rate 6858 6049 6804 7711 7257 7531 8081
row rate 6858 6049 6804 7711 7257 7531 8081
latency mean 29.1 33 29.3 25.9 27.5 26.5 24.7
latency median 24.9 32.1 24 22.6 23.1 21.8 21.5
latency 95th percentile 57.9 73.3 62 50 56.2 52.1 40.2
latency 99th percentile 76 92.2 77.4 65.3 69.1 61.8 46.4
latency 99.9th percentile 87 103.4 83.5 76.2 75.7 64.9 48.1
latency max 561.1 587.1 1075 503.1 521.7 1662.3 590.3
total gc count 0 0 0 0 0 0 0
total gc mb 0 0 0 0 0 0 0
total gc time (s) 0 0 0 0 0 0 0
avg gc time(ms) NAN NaN NaN NaN NaN NaN NaN
stdev gc time(ms) 0 0 0 0 0 0 0
Total operation time 0:02:25 0:02:45 0:04:53 0:02:09 00:04.35 0:13:16 0:20:37
Each with the default keyspace1 that was provided.
I've tested at 3 Nodes: 1M, 2M iteration. 6 Nodes I've tried 1M,2M, 6M, and 10M. As I increased Iteration I'm marginally increasing the OP Rate.
Am I doing something wrong or do I have Cassandra backward. Right now RF = 1 as I don't want to insert latency for replications. I Just want to see in the longterm the horizontal scaling which I'm not seeing it.
Help?

CPU Higher than expected in Node running in docker

I have a vagrant machine running at 33% CPU on my Mac (10.9.5) when nothing is supposed to be happening. The VM machine is run by Kinematic. Looking inside one of the containers I see 2 node (v0.12.2) processes running at 3-4% CPU each.
root#49ab3ab54901:/usr/src# top -bc
top - 03:11:59 up 8:31, 0 users, load average: 0.13, 0.18, 0.22
Tasks: 7 total, 1 running, 6 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.7 sy, 0.0 ni, 99.1 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem: 2051824 total, 1942836 used, 108988 free, 74572 buffers
KiB Swap: 1466848 total, 18924 used, 1447924 free. 326644 cached Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
1 root 20 0 4332 672 656 S 0.0 0.0 0:00.10 /bin/sh -c node -e "require('./seed/seeder.js').seed().then(function (resp) { console.log('successfully seeded!'); pro+
15 root 20 0 737320 81008 13452 S 0.0 3.9 0:32.57 node /usr/local/bin/nodemon app/api.js
33 root 20 0 4332 740 652 S 0.0 0.0 0:00.00 sh -c node app/api.js
34 root 20 0 865080 68952 14244 S 0.0 3.4 0:01.70 node app/api.js
83 root 20 0 20272 3288 2776 S 0.0 0.2 0:00.11 bash
18563 root 20 0 20248 3152 2840 S 0.0 0.2 0:00.11 bash
18575 root 20 0 21808 2308 2040 R 0.0 0.1 0:00.00 top -bc
I went on and runned a node --prof and processed the log with node-tick-processor. It looks like that 99.3% of CPU is used in the syscall :
(for full output see http://pastebin.com/6qgFuFWK )
root#d6d78487e1ec:/usr/src# node-tick-processor isolate-0x26c0180-v8.log
...
Statistical profiling result from isolate-0x26c0180-v8.log, (130664 ticks, 0 unaccounted, 0 excluded).
...
[C++]:
ticks total nonlib name
129736 99.3% 99.3% syscall
160 0.1% 0.1% node::ContextifyScript::New(v8::FunctionCallbackInfo<v8::Value> const&)
124 0.1% 0.1% __write
73 0.1% 0.1% __xstat
18 0.0% 0.0% v8::internal::Heap::AllocateFixedArray(int, v8::internal::PretenureFlag)
18 0.0% 0.0% node::Stat(v8::FunctionCallbackInfo<v8::Value> const&)
17 0.0% 0.0% __lxstat
16 0.0% 0.0% node::Read(v8::FunctionCallbackInfo<v8::Value> const&)
...
1 0.0% 0.0% __fxstat
1 0.0% 0.0% _IO_default_xsputn
[GC]:
ticks total nonlib name
22 0.0%
[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 2.0% are not shown.
ticks parent name
129736 99.3% syscall
[Top down (heavy) profile]:
Note: callees occupying less than 0.1% are not shown.
inclusive self name
ticks total ticks total
129736 99.3% 129736 99.3% syscall
865 0.7% 0 0.0% Function: ~<anonymous> node.js:27:10
864 0.7% 0 0.0% LazyCompile: ~startup node.js:30:19
851 0.7% 0 0.0% LazyCompile: ~Module.runMain module.js:499:26
799 0.6% 0 0.0% LazyCompile: Module._load module.js:273:24
795 0.6% 0 0.0% LazyCompile: ~Module.load module.js:345:33
794 0.6% 0 0.0% LazyCompile: ~Module._extensions..js module.js:476:37
792 0.6% 0 0.0% LazyCompile: ~Module._compile module.js:378:37
791 0.6% 0 0.0% Function: ~<anonymous> /usr/src/app/api.js:1:11
791 0.6% 0 0.0% LazyCompile: ~require module.js:383:19
791 0.6% 0 0.0% LazyCompile: ~Module.require module.js:362:36
791 0.6% 0 0.0% LazyCompile: Module._load module.js:273:24
788 0.6% 0 0.0% LazyCompile: ~Module.load module.js:345:33
786 0.6% 0 0.0% LazyCompile: ~Module._extensions..js module.js:476:37
783 0.6% 0 0.0% LazyCompile: ~Module._compile module.js:378:37
644 0.5% 0 0.0% Function: ~<anonymous> /usr/src/app/api.authentication.js:1:11
627 0.5% 0 0.0%
...
A strace resulted in nothing abnormal:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
54.51 0.001681 76 22 clone
17.28 0.000533 4 132 epoll_ctl
16.80 0.000518 24 22 wait4
6.39 0.000197 2 110 66 stat
5.03 0.000155 1 176 close
0.00 0.000000 0 176 read
0.00 0.000000 0 88 write
0.00 0.000000 0 44 rt_sigaction
0.00 0.000000 0 88 rt_sigprocmask
0.00 0.000000 0 22 rt_sigreturn
0.00 0.000000 0 66 ioctl
0.00 0.000000 0 66 socketpair
0.00 0.000000 0 88 epoll_wait
0.00 0.000000 0 22 pipe2
------ ----------- ----------- --------- --------- ----------------
100.00 0.003084 1122 66 total
And the other node process:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
0.00 0.000000 0 14 epoll_wait
------ ----------- ----------- --------- --------- ----------------
100.00 0.000000 14 total
Am I missing something?
I wonder if it is VirtualBox's or Docker's layers consuming 4%.
When you have a few containers with 2 processes running at 4%, it adds up quickly.

Cryptic Node Tick Result -- CPU Taking 100%

So ive run into a slight snag with an application that I am writing. The application is a simple relay server, using Socket.IO and Node. Everything works great, but when it gets under heavy load, the process spikes to 100% CPU and stays there.
I ran a node-tick on it, and this is what the result was
Statistical profiling result from v8.log, (83336 ticks, 0 unaccounted, 0 excluded).
[Shared libraries]:
ticks total nonlib name
83220 99.9% 0.0% /lib/x86_64-linux-gnu/libc-2.19.so
97 0.1% 0.0% /usr/bin/nodejs
9 0.0% 0.0% /lib/x86_64-linux-gnu/libpthread-2.19.so
[JavaScript]:
ticks total nonlib name
1 0.0% 10.0% Stub: FastNewClosureStub
1 0.0% 10.0% Stub: CallConstructStub_Recording
1 0.0% 10.0% LazyCompile: ~stringify native json.js:308
1 0.0% 10.0% LazyCompile: ~hash /opt/connect/node/node_modules/sticky-session/lib/sticky-session.js:4
1 0.0% 10.0% LazyCompile: ~exports.dirname path.js:415
1 0.0% 10.0% LazyCompile: ~buildFn /opt/connect/node/node_modules/socket.io-redis/node_modules/msgpack-js/node_modules/bops/read.js:5
1 0.0% 10.0% LazyCompile: ~addListener events.js:126
1 0.0% 10.0% LazyCompile: ~ToObject native runtime.js:567
1 0.0% 10.0% LazyCompile: *setTime native date.js:482
1 0.0% 10.0% LazyCompile: *DefineOneShotAccessor native messages.js:767
[C++]:
ticks total nonlib name
[GC]:
ticks total nonlib name
3 0.0%
[Bottom up (heavy) profile]:
Note: percentage shows a share of a particular caller in the total
amount of its parent calls.
Callers occupying less than 2.0% are not shown.
ticks parent name
83220 99.9% /lib/x86_64-linux-gnu/libc-2.19.so
[Top down (heavy) profile]:
Note: callees occupying less than 0.1% are not shown.
inclusive self name
ticks total ticks total
83192 99.8% 83192 99.8% /lib/x86_64-linux-gnu/libc-2.19.so
114 0.1% 0 0.0% Function: ~<anonymous> node.js:27
113 0.1% 0 0.0% LazyCompile: ~startup node.js:30
103 0.1% 0 0.0% LazyCompile: ~Module.runMain module.js:495
98 0.1% 0 0.0% LazyCompile: Module._load module.js:275
90 0.1% 0 0.0% LazyCompile: ~Module.load module.js:346
89 0.1% 0 0.0% LazyCompile: ~Module._extensions..js module.js:472
88 0.1% 0 0.0% LazyCompile: ~Module._compile module.js:374
88 0.1% 0 0.0% Function: ~<anonymous> /opt/connect/node/lib/connect.js:1
83220 99.9% 0.0% /lib/x86_64-linux-gnu/libc-2.19.so
Only 1 library seems to be the problem -- but I can figure out what the heck it is for. My hunch is the json processing im doing -- but I can be sure.
Has anyone run into this problem before?

Lowest latency notification method between process under Linux

I'm looking for the lowest latency IPC that allow to put one process to sleep and allow other process to wake it.
I'm looking for the lowest latency method. Some possible methods so far:
Writing a byte to a pipe and reading it from it.
Writing a byte to a socket and reading it from it.
Sending a signal (kill) and waiting for it (sigwait)
Using sem_post/sem_wait
Any other better ideas?
Any solution that is Linux specific is fine as well.
Generally... There is almost no difference between the OS methods.
Setup:
Two processes with affinity to two different CPUs.
One process sleeps (nanosleep) for N microseconds measures current time
and then notifies other process.
Other process wakes measures current time and compares it to the client's time.
Average, standard deviation, median and percentile 95 is calculated over 1K samples after warm-up on 100 notifications.
OS: Linux 2.6.35 x86_64
CPU: Intel i5 M460
Results:
Semaphore (sem_wait/sem_post - kernel - futex):
sleep us mean median %95
1 4.98 ±18.7 3.78 5.04
10 4.14 ±14.8 3.54 4.00
100 20.60 ±29.4 22.96 26.96
1000 49.42 ±37.6 30.62 78.75
10000 63.20 ±22.0 68.38 84.38
Signal (kill/sigwait)
sleep us mean median %95
1 4.69 ±3.8 4.21 5.39
10 5.91 ±14.8 4.19 7.45
100 23.90 ±17.7 23.41 35.90
1000 47.38 ±28.0 35.27 81.16
10000 60.80 ±19.9 68.50 82.36
Pipe (pipe + write/read)
sleep us mean median %95
1 3.75 ±5.9 3.46 4.45
10 4.42 ±3.5 3.84 5.18
100 23.32 ±25.6 24.17 38.05
1000 51.17 ±35.3 46.34 74.75
10000 64.69 ±31.0 67.95 86.80
Socket (socketpair +write/read)
sleep us mean median %95
1 6.07 ±3.2 5.55 6.78
10 7.00 ±7.1 5.51 8.50
100 27.57 ±14.1 28.39 50.86
1000 56.75 ±25.7 50.82 88.74
10000 73.89 ±16.8 77.54 88.46
As a reference busy waiting:
sleep us mean median %95
1 0.17 ±0.5 0.13 0.23
10 0.15 ±0.3 0.13 0.19
100 0.17 ±0.3 0.16 0.21
1000 0.22 ±0.1 0.18 0.35
10000 0.38 ±0.3 0.30 0.78
Using the same code provided by #Artyom, but in a more morden hardware.
CPU: i9-9900k, closing C/S/P-state and set scaling policy as performance which keeps cores running at the maximum frequency(~5GHz).
OS: Preempt-RT patched Linux with kernel5.0.21, providing better real-time performance.
CPU affinity: two processes apartly running in two isolated cores, which keep away from irrelevant processes and interrupts AMSP.
Results:
Semaphore (sem_wait/sem_post - kernel - futex):
sleep us mean minimum median %99
1 1.75 ±0.1 1.60 1.74 1.82
10 1.76 ±0.0 1.61 1.75 1.83
100 2.12 ±0.3 1.59 2.24 2.42
1000 2.46 ±0.3 1.75 2.47 2.56
10000 2.45 ±0.1 2.11 2.44 2.54
Signal (kill/sigwait)
sleep us mean minimum median %99
1 2.15 ±0.2 2.00 2.13 2.22
10 2.12 ±0.2 1.93 2.11 2.19
100 2.56 ±0.3 2.00 2.67 2.88
1000 2.90 ±0.3 2.17 2.90 3.01
10000 2.94 ±0.5 2.66 2.89 3.03
Pipe (pipe + write/read)
sleep us mean minimum median %99
1 2.05 ±0.2 1.88 2.03 2.15
10 2.06 ±0.3 1.89 2.04 2.17
100 2.54 ±0.4 1.88 2.63 2.87
1000 2.98 ±0.3 2.27 2.98 3.09
10000 2.98 ±0.3 2.69 2.96 3.07
Socket (socketpair +write/read)
sleep us mean minimum median %99
1 3.11 ±0.4 2.85 3.09 3.22
10 3.14 ±0.1 2.92 3.14 3.25
100 3.66 ±0.5 2.92 3.74 4.01
1000 4.03 ±0.4 3.28 4.03 4.17
10000 3.99 ±0.4 3.64 3.96 4.10
As a reference busy waiting:
sleep us mean minimum median %99
1 0.07 ±0.1 0.06 0.07 0.07
10 0.07 ±0.1 0.06 0.07 0.07
100 0.07 ±0.0 0.06 0.07 0.08
1000 0.09 ±0.1 0.07 0.08 0.09
10000 0.09 ±0.1 0.07 0.09 0.09

Resources