v0.10.4
Here's the simple loop that results in an ever-increasing memory usage:
function redx(){
setTimeout(function(){ redx() },1000);
console.log('loop');
}
redx();
What am I doing wrong ??
EDIT
OK, just tried the suggestion to reference the timeout object in the scope and it seems that garbage collection does kick in after about 40 seconds, here's abbreviated logs from TOP:
3941 root 20 0 32944 7284 4084 S 4.587 3.406 0:01.32 node
3941 root 20 0 32944 7460 4084 S 2.948 3.489 0:01.59 node
3941 root 20 0 32944 7516 4084 S 2.948 3.515 0:01.68 node
3941 root 20 0 33968 8400 4112 S 2.948 3.928 0:02.15 node
3941 root 20 0 33968 8920 4112 S 3.275 4.171 0:02.98 node
3941 root 20 0 33968 8964 4112 S 2.948 4.192 0:03.07 node
3941 root 20 0 33968 9212 4112 S 2.953 4.308 0:03.16 node
3941 root 20 0 33968 9212 4112 S 2.953 4.308 0:03.25 node
3941 root 20 0 33968 9212 4112 S 3.276 4.308 0:03.35 node
3941 root 20 0 33968 9212 4112 S 2.950 4.308 0:03.44 node
No idea why but apparently if you reference the timeout object in the scope of the function nodejs will do the garbage collect that correctly.
function redx(){
var t = setTimeout(function(){ redx() },50);
console.log('hi');
}
redx();
Actually, I think it might be just the way the V8 garbage collector works.
On my system, node heap tends to increase up to 48 MB and then stabilize, so I think if you keep your program running for a long time, the memory consumption will eventually stabilize.
You can have information about when/how the GC kicks in by launching node with one of the V8 command line option: the --trace_gc flag.
In your first tries with Redis, you were systematically connecting/disconnecting from Redis at each call. This tends to generate garbage. You are supposed to open a connection once and use it many times. Nevertheless, even when I do this, memory consumption tends to stabilize. Here is the evolution of memory consumption on this example with Redis:
// something close to your initial function (when Redis was still in the picture)
function redx(){
var client = redis.createClient();
client.get("tally", function(err, reply) {
client.quit();
});
setTimeout(function(){ redx() }, 50 );
}
Here, the stabilization after 60 MB seems to be quite obvious.
Related
I have run out of disk space on my cluster machine. Doing
du -h --max-depth=20 | sort -hr > size_of_folders.txt
reveals that there is a directory called ./miniconda3/pkgs that is taking up 22Gbs.
Looking in this folder at the subfolders, I don't know what most of these . For example here are the top 50 entries:
804M ./qt-main-5.15.4-ha5833f6_2
769M ./qt-5.12.9-h1304e3e_6
629M ./qt-5.9.7-h5867ecd_1
629M ./cache
619M ./qt-5.9.7-h0c104cb_3
482M ./mkl-2020.2-256
481M ./mkl-2020.2-256/lib
360M ./qt-main-5.15.4-ha5833f6_2/include
359M ./qt-main-5.15.4-ha5833f6_2/include/qt
343M ./qt-5.12.9-h1304e3e_6/include/qt
343M ./qt-5.12.9-h1304e3e_6/include
321M ./qt-5.9.7-h5867ecd_1/include
321M ./qt-5.9.7-h0c104cb_3/include
320M ./qt-5.9.7-h5867ecd_1/include/qt
320M ./qt-5.9.7-h0c104cb_3/include/qt
307M ./libdb-6.2.32-h9c3ff4c_0
299M ./libdb-6.2.32-h9c3ff4c_0/docs
244M ./scipy-1.7.3-py310hea5193d_0
243M ./scipy-1.7.3-py310hea5193d_0/lib/python3.10
243M ./scipy-1.7.3-py310hea5193d_0/lib
242M ./scipy-1.7.3-py310hea5193d_0/lib/python3.10/site-packages/scipy
242M ./scipy-1.7.3-py310hea5193d_0/lib/python3.10/site-packages
221M ./pandas-1.3.4-py310hb5077e9_1
220M ./pandas-1.3.4-py310hb5077e9_1/lib/python3.10/site-packages
220M ./pandas-1.3.4-py310hb5077e9_1/lib/python3.10
220M ./pandas-1.3.4-py310hb5077e9_1/lib
219M ./python-3.8.3-cpython_he5300dc_0
219M ./pandas-1.3.4-py310hb5077e9_1/lib/python3.10/site-packages/pandas
199M ./python-3.8.3-cpython_he5300dc_0/lib
196M ./python-3.8.3-hcff3b4d_0
175M ./python-3.8.3-hcff3b4d_0/lib
172M ./pandas-1.4.1-py310hb5077e9_0
171M ./pandas-1.4.1-py310hb5077e9_0/lib/python3.10/site-packages
171M ./pandas-1.4.1-py310hb5077e9_0/lib/python3.10
171M ./pandas-1.4.1-py310hb5077e9_0/lib
170M ./pandas-1.4.3-py310h769672d_0
170M ./pandas-1.4.2-py310h769672d_2
170M ./pandas-1.4.1-py310hb5077e9_0/lib/python3.10/site-packages/pandas
169M ./pandas-1.4.3-py310h769672d_0/lib/python3.10/site-packages
169M ./pandas-1.4.3-py310h769672d_0/lib/python3.10
169M ./pandas-1.4.3-py310h769672d_0/lib
169M ./pandas-1.4.2-py310h769672d_2/lib/python3.10/site-packages
169M ./pandas-1.4.2-py310h769672d_2/lib/python3.10
169M ./pandas-1.4.2-py310h769672d_2/lib
168M ./perl-5.32.1-1_h7f98852_perl5
168M ./pandas-1.4.3-py310h769672d_0/lib/python3.10/site-packages/pandas
168M ./pandas-1.4.2-py310h769672d_2/lib/python3.10/site-packages/pandas
166M ./statsmodels-0.13.1-py310h96516ba_0
165M ./statsmodels-0.13.1-py310h96516ba_0/lib/python3.10/site-packages 165M ./statsmodels-0.13.1-py310h96516ba_0/lib/python3.10`
Can someone explain if these are deletable? These recognisable packages - such as pandas and scipy - have strange suffixes and I'm not sure if they're deletable.
I presume I can delete the cache but I'm not sure about the rest.
Running the following httperf command, wanting to hit the API 500 times with a rate of 500 requests per second.
./httperf --server <server IP> --port <server port> --uri <api uri> --num-conns 500 --rate 500 --ssl --add-header "<cookie values>"
The response is taking longer than a second to complete, more like 60 seconds, with a request rate/connection rate of around 8.2 req/s.
Output below:
Total: connections 500 requests 500 replies 500 test-duration 61.102 s
Connection rate: 8.2 conn/s (122.2 ms/conn, <=500 concurrent connections)
Connection time [ms]: min 60011.9 avg 60523.1 max 61029.1 median 60526.5 stddev 290.7
Connection time [ms]: connect 8.1
Connection length [replies/conn]: 1.000
Request rate: 8.2 req/s (122.2 ms/req)
Request size [B]: 3106.0
Reply rate [replies/s]: min 0.0 avg 0.0 max 0.0 stddev 0.0 (12 samples)
Reply time [ms]: response 4.3 transfer 60510.6
Reply size [B]: header 178.0 content 12910.0 footer 2.0 (total 13090.0)
Reply status: 1xx=0 2xx=500 3xx=0 4xx=0 5xx=0
CPU time [s]: user 28.27 system 32.59 (user 46.3% system 53.3% total 99.6%)
Net I/O: 129.4 KB/s (1.1*10^6 bps)
Errors: total 0 client-timo 0 socket-timo 0 connrefused 0 connreset 0
Errors: fd-unavail 0 addrunavail 0 ftab-full 0 other 0
So does this mean that the requests are really only getting sent at around 8 per second, or are they technically send to the server, but the server is queing them up.
in my local cluster (4 Raspberry PIs) i try to configure a rgw gateway. Unfortunately the services disappears automatically after 2 minutes.
[ceph_deploy.rgw][INFO ] The Ceph Object Gateway (RGW) is now running on host OSD1 and default port 7480
cephuser#admin:~/mycluster $ ceph -s
cluster:
id: 745d44c2-86dd-4b2f-9c9c-ab50160ea353
health: HEALTH_WARN
too few PGs per OSD (24 < min 30)
services:
mon: 1 daemons, quorum admin
mgr: admin(active)
osd: 4 osds: 4 up, 4 in
rgw: 1 daemon active
data:
pools: 4 pools, 32 pgs
objects: 80 objects, 1.09KiB
usage: 4.01GiB used, 93.6GiB / 97.6GiB avail
pgs: 32 active+clean
io:
client: 5.83KiB/s rd, 0B/s wr, 7op/s rd, 1op/s wr
After one minute the service(rgw: 1 daemon active) is no longer visible:
cephuser#admin:~/mycluster $ ceph -s
cluster:
id: 745d44c2-86dd-4b2f-9c9c-ab50160ea353
health: HEALTH_WARN
too few PGs per OSD (24 < min 30)
services:
mon: 1 daemons, quorum admin
mgr: admin(active)
osd: 4 osds: 4 up, 4 in
data:
pools: 4 pools, 32 pgs
objects: 80 objects, 1.09KiB
usage: 4.01GiB used, 93.6GiB / 97.6GiB avail
pgs: 32 active+clean
Many thanks for the help
Solution:
On the gateway node, open the Ceph configuration file in the /etc/ceph/ directory.
Find an RGW client section similar to the example:
[client.rgw.gateway-node1]
host = gateway-node1
keyring = /var/lib/ceph/radosgw/ceph-rgw.gateway-node1/keyring
log file = /var/log/ceph/ceph-rgw-gateway-node1.log
rgw frontends = civetweb port=192.168.178.50:8080 num_threads=100
https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/3/html-single/object_gateway_guide_for_red_hat_enterprise_linux/index
While testing a simple node server (written with Hapi.js):
'use strict';
var Hapi = require("hapi");
var count = 0;
const server = Hapi.server({
port: 3000,
host: 'localhost'
});
server.route({
method: 'GET',
path: '/test',
handler: (request, h) => {
count ++;
console.log(count);
return count;
}
});
const init = async () => {
await server.start();
};
process.on('unhandledRejection', (err) => {
process.exit(1);
});
init();
start the server:
node ./server.js
run the Apache ab tool:
/usr/bin/ab -n 200 -c 30 localhost:3000/test
Env details:
OS: CentOS release 6.9
Node: v10.14.1
Hapi.js: 17.8.1
I found unexpected results in case of multiple concurrent requests (-c 30): the request handler function has been called more times than the number of requests to be performed (-n 200).
Ab output example:
Benchmarking localhost (be patient)
Server Software:
Server Hostname: localhost
Server Port: 3000
Document Path: /test
Document Length: 29 bytes
Concurrency Level: 30
Time taken for tests: 0.137 seconds
Complete requests: 200
Failed requests: 0
Write errors: 0
Total transferred: 36081 bytes
HTML transferred: 6119 bytes
Requests per second: 1459.44 [#/sec] (mean)
Time per request: 20.556 [ms] (mean)
Time per request: 0.685 [ms] (mean, across all concurrent requests)
Transfer rate: 257.12 [Kbytes/sec] received
Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 0 0.1 0 0
Processing: 15 17 1.5 16 20
Waiting: 2 9 3.9 9 18
Total: 15 17 1.5 16 21
Percentage of the requests served within a certain time (ms)
50% 16
66% 16
75% 17
80% 18
90% 20
95% 20
98% 21
99% 21
100% 21 (longest request)
And the node server print out 211 log lines. During various tests the mismatch is variable but always present:
-n 1000 -c 1 -> 1000 log
-n 1000 -c 2 -> ~1000 logs
-n 1000 -c 10 -> ~1001 logs
-n 1000 -c 70 -> ~1008 logs
-n 1000 -c 1000 -> ~1020 logs
It seems that as concurrency increases, the mismatch increases.
I couldn't figure out whether the ab tool performs more http requests or the node server responds more times than necessary.
Could you please help?
Its very strange and I don't get the same results as you on my machine. I would be very surprised if it was ab that was issuing different numbers of actual requests.
Things i would try:
Write a simple server using express rather than hapi. If the issue still occurs you at least know its not a problem with hapi.
Intercept the network calls using fiddler
ab -X localhost:8888 -n 100 -c 30 http://127.0.0.1:3000/test will use the fiddler proxy which will then let you see the actual calls across the network interface. more details
wireshark if you need more power and your feeling brave (I'd only use it if fiddler has let you down)
If after all these you are still finding an issue then it has been narrowed down to an issue with node, I'm not sure what else it could be. Try using node 8 rather than 10.
Using the Fiddler proxy I found that AB tool runs more times than the number of requests to be performed (example: -n 200).
By running a series of consecutive tests:
# 11 consecutive times
/usr/bin/ab -n 200 -c 30 -X localhost:8888 http://localhost:3000/test
Both the proxy and the node server report a total of 2209 requests. It looks like that AB is less imprecise with the proxy in the middle, but still imprecise.
In general, and more important, I never found mismatches between the requests passed through the proxy and the requests received by the node server.
Thanks!
I'm building a web crawler in Node.js using the npm crawler package. My program right now creates 5 child processes which each instantiate a new Crawler, which crawls a list of URLS which the parent provides.
When it runs for about 15-20 minutes, it slows down to a halt and the processes' STATE column from the output of the top command reads stuck for all the children. [see below]
I have little knowledge of the top command, and the columns provided, but I want to know is there a way to find out what is causing the processes to slow down by looking at the output of top? I realize that it is probably my code that has a bug in it, but I want to know where I should start debugging: memory leak, caching issue, not enough children, too many children, etc.
Below is the entire output of top
PID COMMAND %CPU TIME #TH #WQ #PORT #MREG MEM RPRVT PURG CMPRS VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD SYSMACH
11615 node 2.0 17:16.43 8 0 42 2519 94M- 94M- 0B 1347M+ 1538M 4150M 11610 11610 stuck 541697072 14789409+ 218 168 21 6481040 63691
11614 node 2.0 16:57.66 8 0 42 2448 47M- 47M- 0B 1360M+ 1498M- 4123M 11610 11610 stuck 541697072 14956093+ 217 151 21 5707766 64937
11613 node 4.4 17:17.37 8 0 44 2415 100M+ 100M+ 0B 1292M- 1485M 4114M 11610 11610 sleeping 541697072 14896418+ 215 181 22 6881669+ 66098+
11612 node 10.3 17:37.81 8 0 42 2478 24M+ 24M+ 0B 1400M- 1512M 4129M 11610 11610 stuck 541697072 14386703+ 215 171 21 7083645+ 65551
11611 node 2.0 17:09.52 8 0 42 2424 68M- 68M- 0B 1321M+ 1483M 4111M 11610 11610 sleeping 541697072 14504735+ 220 168 21 6355162 63701
11610 node 0.0 00:04.63 8 0 42 208 4096B 0B 0B 126M 227M 3107M 11610 11446 sleeping 541697072 45184 410 52 21 36376 6939
Here are the dependencies:
├── colors#0.6.2
├── crawler#0.2.6
├── log-symbols#1.0.0
├── robots#0.9.4
└── sitemapper#0.0.1
Sitemapper is one I wrote myself which could be a source for bugs.