Can't explain this Node clustering behavior - node.js

I'm learning about threads and how they interact with Node's native cluster module. I saw some behavior I can't explain that I'd like some help understanding.
My code:
process.env.UV_THREADPOOL_SIZE = 1;
const cluster = require('cluster');
if (cluster.isMaster) {
cluster.fork();
} else {
const crypto = require('crypto');
const express = require('express');
const app = express();
app.get('/', (req, res) => {
crypto.pbkdf2('a', 'b', 100000, 512, 'sha512', () => {
res.send('Hi there');
});
});
app.listen(3000);
}
I benchmarked this code with one request using apache benchmark.
ab -c 1 -n 1 localhost:3000/ yielded these connection times
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 605 605 0.0 605 605
Waiting: 605 605 0.0 605 605
Total: 605 605 0.0 605 605
So far so good. I then ran ab -c 2 -n 2 localhost:3000/ (doubling the number of calls from the benchmark). I expected the total time to double since I limited the libuv thread pool to one thread per child process and I only started one child process. But nothing really changed. Here's those results.
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 0
Processing: 608 610 3.2 612 612
Waiting: 607 610 3.2 612 612
Total: 608 610 3.3 612 612
For extra info, when I further increase the number of calls with ab -c 3 -n 3 localhost:3000/, I start to see a slow down.
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.0 0 0
Processing: 599 814 352.5 922 1221
Waiting: 599 814 352.5 922 1221
Total: 599 815 352.5 922 1221
I'm running all this on a quadcore mac using Node v14.13.1.
tldr: how did my benchmark not use up all my threads? I forked one child process with one thread in its libuv pool - so the one call in my benchmark should have been all it could handle without taking longer. And yet the second test (the one that doubled the amount of calls) took the same amount of time as the benchmark.

Related

Python : Memory consumption accumulating in a while loop

A confession first - a noob programmer here doing occasional scripting. I've been trying to figure the memory consumption for this simple piece of code but unable to figure this out. I have tried searching in the answered questions, but couldn't figure it out. I'm fetching some json data using REST API, and the piece of code below ends up consuming a lot of RAM. I checked the Windows task manager and the memory consumption increases incrementally with each iteration of the loop. I'm overwriting the same variable for each API call, so I think the previous response variable should be overwritten.
while Flag == True:
urlpart= 'data/device/statistics/approutestatsstatistics?scrollId='+varScrollId
response = json.loads(obj1.get_request(urlpart))
lstDataList = lstDataList + response['data']
Flag = response['pageInfo']['hasMoreData']
varScrollId = response['pageInfo']['scrollId']
count += 1
print("Fetched {} records out of {}".format(len(lstDataList), recordCount))
print('Size of List is now {}'.format(str(sys.getsizeof(lstDataList))))
return lstDataList
I tried to profile memory usage using memory_profiler...here's what it shows
92 119.348 MiB 0.000 MiB count = 0
93 806.938 MiB 0.000 MiB while Flag == True:
94 806.938 MiB 0.000 MiB urlpart= 'data/device/statistics/approutestatsstatistics?scrollId='+varScrollId
95 807.559 MiB 30.293 MiB response = json.loads(obj1.get_request(urlpart))
96 806.859 MiB 0.000 MiB print('Size of response within the loop is {}'.format(sys.getsizeof(response)))
97 806.938 MiB 1.070 MiB lstDataList = lstDataList + response['data']
98 806.938 MiB 0.000 MiB Flag = response['pageInfo']['hasMoreData']
99 806.938 MiB 0.000 MiB varScrollId = response['pageInfo']['scrollId']
100 806.938 MiB 0.000 MiB count += 1
101 806.938 MiB 0.000 MiB print("Fetched {} records out of {}".format(len(lstDataList), recordCount))
102 806.938 MiB 0.000 MiB print('Size of List is now {}'.format(str(sys.getsizeof(lstDataList))))
103 return lstDataList
obj1 is an object of Cisco's rest_api_lib class. Link to code here
In fact the program ends up consuming ~1.6 Gigs of RAM. The data I'm fetching has roughly 570K records. The API limits the records to 10K at a time, so the loop runs ~56 times. Line 95 of the code consumes ~30M of RAM as per the memory_profiler output. It's as if each iteration consumes 30M ending u with ~1.6G, so in the same ballpark. Unable to figure out why the memory consumption keeps on accumulating for the loop.
Thanks.
I would suspect it is the line lstDataList = lstDataList + response['data']
This is accumulating response['data'] over time. Also, your indentation seems off, should it be:
while Flag == True:
urlpart= 'data/device/statistics/approutestatsstatistics?scrollId='+varScrollId
response = json.loads(obj1.get_request(urlpart))
lstDataList = lstDataList + response['data']
Flag = response['pageInfo']['hasMoreData']
varScrollId = response['pageInfo']['scrollId']
count += 1
print("Fetched {} records out of {}".format(len(lstDataList), recordCount))
print('Size of List is now {}'.format(str(sys.getsizeof(lstDataList))))
return lstDataList
As far as I can tell, lstDataList will keep growing with each request, leading to the memory increase. Hope that helps, Happy Friday!
it's as if each iteration consumes 30M
That is exactly what is happening. You need to free memory that you dont need for example once you have extracted data from response. You can delete it like so
del response
more on del
more on garbage collection

Node.js - spawn is cutting off the results

I'm creating a node program to return the output of linux top command, is working fine the only issue is that the name of command is cutted, instead the full command name like /usr/local/libexec/netdata/plugins.d/apps.plugin 1 returns /usr/local+
My code
const topparser=require("topparser")
const spawn = require('child_process').spawn
let proc=null
let startTime=0
exports.start=function(pid_limit,callback){
startTime=new Date().getTime()
proc = spawn('top', ['-c','-b',"-d","3"])
console.log("started process, pid: "+proc.pid)
let top_data=""
proc.stdout.on('data', function (data) {
console.log('stdout: ' + data);
})
proc.on('close', function (code) {
console.log('child process exited with code ' + code);
});
}//start
exports.stop=function(){
console.log("stoped process...")
if(proc){proc.kill('SIGINT')}// SIGHUP -linux ,SIGINT -windows
}//stop
The results
14861 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kworker/1+
14864 root 20 0 0 0 0 S 0.0 0.0 0:00.02 [kworker/0+
15120 root 39 19 102488 3344 2656 S 0.0 0.1 0:00.09 /usr/bin/m+
16904 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kworker/0+
19031 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kworker/u+
21500 root 20 0 0 0 0 Z 0.0 0.0 0:00.00 [dsc] <def+
22571 root 20 0 0 0 0 S 0.0 0.0 0:00.00 [kworker/0+
Any way to fix it?
Best regards
From a top manpage:
In Batch mode, when used without an argument top will format output using the COLUMNS= and LINES=
environment variables, if set. Otherwise, width will be fixed at the maximum 512 columns. With an
argument, output width can be decreased or increased (up to 512) but the number of rows is considā€
ered unlimited.
Add '-w', '512' to the arguments.
Since you work with node, you can query netdata running on localhost for this.
Example:
http://london.my-netdata.io/api/v1/data?chart=apps.cpu&after=-1&options=ms
For localhost netdata:
http://localhost:19999/api/v1/data?chart=apps.cpu&after=-1&options=ms
You can also get systemd services:
http://london.my-netdata.io/api/v1/data?chart=services.cpu&after=-1&options=ms
If you are not planning to update the screen per second, you can instruct netdata to return the average of a longer duration:
http://london.my-netdata.io/api/v1/data?chart=apps.cpu&after=-5&points=1&group=average&options=ms
The above returns the average of the last 5 seconds.
Finally, you get the latest values all the metrics netdata monitors, with this:
http://london.my-netdata.io/api/v1/allmetrics?format=json
For completeness, netdata can export all the metrics in BASH format for shell scripts. Check this: https://github.com/firehol/netdata/wiki/receiving-netdata-metrics-from-shell-scripts

Nodejs delay/interrupt in for loop

I want to write a logger (please no comments why and "use ...")
But I am confused with the nodejs (event?) loop/forEach.
As example:
for(var i = 0; i<100; i++){
process.stdout.write(Date.now().toString() + "\n", "utf8");
};
output as: 1466021578453, 1466021578453, 1466021578469, 1466021578469
Questions: Where comes the Delay from 16ms; And how can I prevent that?
EDIT: Windows 7, x64; (Delay on Ubuntu 15, max 2ms)
sudo ltrace -o outlog node myTest.js
This is likely more than you want. The call Date.now() translates into on my machine is clock_gettime. You want to look at the stuff between subsequent calls to clock_gettime. You're also writing out to STDOUT, each time you do that there is overhead. You can run the whole process under ltrace to see what's happening and get a summary with -c.
For me, it runs in 3 ms when not running it under ltrace.
% time seconds usecs/call calls function
------ ----------- ----------- --------- --------------------
28.45 6.629315 209 31690 memcpy
26.69 6.219529 217 28544 memcmp
16.78 3.910686 217 17990 free
9.73 2.266705 214 10590 malloc
2.92 0.679971 220 3083 _Znam
2.86 0.666421 216 3082 _ZdaPv
2.55 0.593798 206 2880 _ZdlPv
2.16 0.502644 211 2378 _Znwm
1.09 0.255114 213 1196 strlen
0.69 0.161741 215 750 pthread_getspecific
0.67 0.155609 209 744 memmove
0.57 0.133857 212 631 _ZNSo6sentryC1ERSo
0.57 0.133344 226 589 pthread_mutex_lock
0.52 0.121342 206 589 pthread_mutex_unlock
0.46 0.106343 207 512 clock_gettime
0.40 0.093022 204 454 memset
0.39 0.089857 216 416 _ZNSt9basic_iosIcSt11char_traitsIcEE4initEPSt15basic_streambufIcS1_E
0.22 0.050741 195 259 strcmp
0.20 0.047454 228 208 _ZNSt8ios_baseC2Ev
0.20 0.047236 227 208 floor
0.19 0.044603 214 208 _ZNSt6localeC1Ev
0.19 0.044536 212 210 _ZNSs4_Rep10_M_destroyERKSaIcE
0.19 0.044200 212 208 _ZNSt8ios_baseD2Ev
I'm not sure why there are 31,690 memcpy's in there and 28544 memcmp. That seems a bit excessive but perhaps that just the JIT start up cost, as for the runtime cost, you can see there are 512 calls to clock_gettime. No idea why there at that many calls either, but you can see 106ms lost in clock_gettime. Good luck with it.

Running multiple processes doesn't scale

There are two C++ processes, one thread in each process. The thread handles network traffic (Diameter) from 32 incoming TCP connections, parses it and forwards split messages via 32 outgoing TCP connections. Let's call this C++ process a DiameterFE.
If only one DiameterFE process is running, it can handle 70 000 messages/sec.
If two DiameterFE processes are running, they can handle 35 000 messages/sec each, so the same 70 000 messages/sec in total.
Why don't they scale? What is a bottleneck?
Details:
There are 32 Clients (seagull) and 32 servers (seagull) for each Diameter Front End process, running on separate hosts.
A dedicated host is given for these two processes - 2 E5-2670 # 2.60GHz CPUs x 8 cores/socket x 2 HW threads/core = 32 threads in total.
10 GBit/sec network.
Average Diameter message size is 700 bytes.
It looks like only the Cpu0 handles network traffic - 58.7%si. Do I have to explicitly configure different network queues to different CPUs?
The first process (PID=7615) takes 89.0 % CPU, it is running on Cpu0.
The second process (PID=59349) takes 70.8 % CPU, it is running on Cpu8.
On the other hand, Cpu0 is loaded at: 95.2% = 9.7%us + 26.8%sy + 58.7%si,
whereas Cpu8 is loaded only at 70.3% = 14.8%us + 55.5%sy
It looks like the Cpu0 is doing the work also for the second process. There is very high softirq and only on the Cpu0 = 58.7%. Why?
Here is the top output with key "1" pressed:
top - 15:31:55 up 3 days, 9:28, 5 users, load average: 0.08, 0.20, 0.47
Tasks: 973 total, 3 running, 970 sleeping, 0 stopped, 0 zombie
Cpu0 : 9.7%us, 26.8%sy, 0.0%ni, 4.8%id, 0.0%wa, 0.0%hi, 58.7%si, 0.0%st
...
Cpu8 : 14.8%us, 55.5%sy, 0.0%ni, 29.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
...
Cpu31 : 0.0%us, 0.0%sy, 0.0%ni,100.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 396762772k total, 5471576k used, 391291196k free, 354920k buffers
Swap: 1048568k total, 0k used, 1048568k free, 2164532k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
7615 test1 20 0 18720 2120 1388 R 89.0 0.0 52:35.76 diameterfe
59349 test1 20 0 18712 2112 1388 R 70.8 0.0 121:02.37 diameterfe
610 root 20 0 36080 1364 1112 S 2.6 0.0 126:45.58 plymouthd
3064 root 20 0 10960 788 432 S 0.3 0.0 2:13.35 irqbalance
16891 root 20 0 15700 2076 1004 R 0.3 0.0 0:01.09 top
1 root 20 0 19364 1540 1232 S 0.0 0.0 0:05.20 init
...
The fix of this issue was to upgrade the kernel to 2.6.32-431.20.3.el6.x86_64 .
After that network interrupts and message queues are distributed among different CPUs.

nodejs response speed and nginx

Just started testing nodejs, and wanted to get some help in understanding following behavior:
Example #1:
var http = require('http');
http.createServer(function(req, res){
res.writeHeader(200, {'Content-Type': 'text/plain'});
res.end('foo');
}).listen(1001, '0.0.0.0');
Example #2:
var http = require('http');
http.createServer(function(req, res){
res.writeHeader(200, {'Content-Type': 'text/plain'});
res.write('foo');
res.end('bar');
}).listen(1001, '0.0.0.0');
When testing response time in Chrome:
example #1 - 6-10ms
example #2 - 200-220ms
But, if test both examples through nginx proxy_pass
server{
listen 1011;
location / {
proxy_pass http://127.0.0.1:1001;
}
}
i get this:
example #1 - 4-8ms
example #2 - 4-8ms
I am not an expert on either nodejs or nginx, and asking if someone can explain this?
nodejs - v.0.8.1
nginx - v.1.2.2
update:
thanks to Hippo, i made test with ab on my server with and without nginx,
and got opposite results.
also added to nginx config proxy_cache off
server{
listen 1011;
location / {
proxy_pass http://127.0.0.1:1001;
proxy_cache off;
}
}
example #1 direct:
ab -n 1000 -c 50 http:// 127.0.0.1:1001/
Server Software:
Server Hostname: 127.0.0.1
Server Port: 1001
Document Path: /
Document Length: 65 bytes
Concurrency Level: 50
Time taken for tests: 1.018 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 166000 bytes
HTML transferred: 65000 bytes
Requests per second: 981.96 [#/sec] (mean)
Time per request: 50.919 [ms] (mean)
Time per request: 1.018 [ms] (mean, across all concurrent requests)
Transfer rate: 159.18 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.6 0 3
Processing: 0 50 44.9 19 183
Waiting: 0 49 44.8 17 183
Total: 1 50 44.7 19 183
example #1 nginx:
ab -n 1000 -c 50 http:// 127.0.0.1:1011/
Server Software: nginx/1.2.2
Server Hostname: 127.0.0.1
Server Port: 1011
Document Path: /
Document Length: 65 bytes
Concurrency Level: 50
Time taken for tests: 1.609 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 187000 bytes
HTML transferred: 65000 bytes
Requests per second: 621.40 [#/sec] (mean)
Time per request: 80.463 [ms] (mean)
Time per request: 1.609 [ms] (mean, across all concurrent requests)
Transfer rate: 113.48 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.6 0 3
Processing: 2 77 44.9 96 288
Waiting: 2 77 44.8 96 288
Total: 3 78 44.7 96 288
example #2 direct:
ab -n 1000 -c 50 http:// 127.0.0.1:1001/
Server Software:
Server Hostname: 127.0.0.1
Server Port: 1001
Document Path: /
Document Length: 76 bytes
Concurrency Level: 50
Time taken for tests: 1.257 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 177000 bytes
HTML transferred: 76000 bytes
Requests per second: 795.47 [#/sec] (mean)
Time per request: 62.856 [ms] (mean)
Time per request: 1.257 [ms] (mean, across all concurrent requests)
Transfer rate: 137.50 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.3 0 2
Processing: 0 60 47.8 88 193
Waiting: 0 60 47.8 87 193
Total: 0 61 47.7 88 193
example #2 nginx:
ab -n 1000 -c 50 http:// 127.0.0.1:1011/
Server Software: nginx/1.2.2
Server Hostname: 127.0.0.1
Server Port: 1011
Document Path: /
Document Length: 76 bytes
Concurrency Level: 50
Time taken for tests: 1.754 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 198000 bytes
HTML transferred: 76000 bytes
Requests per second: 570.03 [#/sec] (mean)
Time per request: 87.715 [ms] (mean)
Time per request: 1.754 [ms] (mean, across all concurrent requests)
Transfer rate: 110.22 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.4 0 2
Processing: 1 87 42.1 98 222
Waiting: 1 86 42.3 98 222
Total: 1 87 42.0 98 222
Now results looks more logic, but still there is a strange delay when calling res.write()
I guess it was (sure looks like) a stupid question, but i still get huge difference in response time in browser with this server configuration (Centos 6) and this concrete server (vps).
On my home computer (Ubuntu 12) but with older versions testing from localhost everything works fine.
Peeking into http.js reveals that case #1 has special handling in nodejs itself, some kind of a shortcut optimization I guess.
var hot = this._headerSent === false &&
typeof(data) === 'string' &&
data.length > 0 &&
this.output.length === 0 &&
this.connection &&
this.connection.writable &&
this.connection._httpMessage === this;
if (hot) {
// Hot path. They're doing
// res.writeHead();
// res.end(blah);
// HACKY.
if (this.chunkedEncoding) {
var l = Buffer.byteLength(data, encoding).toString(16);
ret = this.connection.write(this._header + l + CRLF +
data + '\r\n0\r\n' +
this._trailer + '\r\n', encoding);
} else {
ret = this.connection.write(this._header + data, encoding);
}
this._headerSent = true;
} else if (data) {
// Normal body write.
ret = this.write(data, encoding);
}
if (!hot) {
if (this.chunkedEncoding) {
ret = this._send('0\r\n' + this._trailer + '\r\n'); // Last chunk.
} else {
// Force a flush, HACK.
ret = this._send('');
}
}
this.finished = true;
I've took you examples files and used ab (Apache Benchmark) as a proper tool for benchmarking HTTP server performance:
Example 1:
Concurrency Level: 50
Time taken for tests: 0.221 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 104000 bytes
HTML transferred: 3000 bytes
Requests per second: 4525.50 [#/sec] (mean)
Time per request: 11.049 [ms] (mean)
Time per request: 0.221 [ms] (mean, across all concurrent requests)
Transfer rate: 459.62 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.7 0 4
Processing: 1 11 6.4 10 32
Waiting: 1 11 6.4 10 32
Total: 1 11 6.7 10 33
Example 2:
Concurrency Level: 50
Time taken for tests: 0.256 seconds
Complete requests: 1000
Failed requests: 0
Write errors: 0
Total transferred: 107000 bytes
HTML transferred: 6000 bytes
Requests per second: 3905.27 [#/sec] (mean)
Time per request: 12.803 [ms] (mean)
Time per request: 0.256 [ms] (mean, across all concurrent requests)
Transfer rate: 408.07 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.6 0 4
Processing: 1 12 7.0 12 34
Waiting: 1 12 6.9 12 34
Total: 1 12 7.1 12 34
Note:
The second example is as fast as the first one. The small differences are probably caused by the the additional function call in the code and the fact that the document size is larger then with the first one.

Resources