Why fs.readfile benchmark times are very different from crypto.pbkdf2? - node.js

I'm trying to understand how Nodejs uses libuv threadpool. Turns out there are some modules that uses this threadpool like fs and crypto so I tried this :
const crypto = require("crypto")
const start = Date.now()
for (let i = 0; i < 5; i++) {
crypto.pbkdf2('a', 'b', 100000, 512, 'sha512', () => {
console.log(`${i + 1}: ${Date.now() - start}`)
})
}
the results were :
1: 1178
2: 1249
3: 1337
4: 1344
5: 2278
And that was expected. The default threadpool size is 4, so 4 hashes will finish almost at the same time and the fifth will wait.
But when I do the same thing with fs.readfile() :
const start = Date.now()
const fs = require('fs')
for (let i = 0; i < 5; i++) {
fs.readFile('./test.txt', () => {
console.log(`${i + 1} : ${Date.now() - start}`)
})
}
I get this:
1 : 7
2 : 20
3 : 20
4 : 20
5 : 21
There is always one that finishes first and the others finish around the same time. I tried reading the file just twice, the same thing happens one will finish first the second after. I also tried reading it 10 times, same thing. Why is this happening
Edit:
test.txt is just 5 bytes
I'm using a quad-core laptop running windows 10

Related

Time first call function in node

I have the following code:
let startTime;
let stopTime;
const arr = [1, 2, 3, 8, 5, 0, 110, 4, 4, 16, 3, 8, 7, 56, 1, 2, 3, 8, 5, 0, 110, 16, 3, 8, 7, 56];
const sum = 63;
durationTime = (start, stop, desc) => {
let duration = (stop - start);
console.info('Duration' + ((desc !== undefined) ? '(' + desc + ')' : '') + ': ' + duration + 'ms');
};
findPair = (arr, sum) => {
let result = [];
const filterArr = arr.filter((number) => {
return number <= sum;
});
filterArr.forEach((valueFirst, index) => {
for (let i = index + 1; i < filterArr.length; i++) {
const valueSecond = filterArr[i];
const checkSum = valueFirst + valueSecond;
if (sum === checkSum) {
result.push([valueFirst, valueSecond]);
}
}
});
//console.info(result);
};
for (let i = 0; i < 5; i++) {
startTime = new Date();
findPair(arr, sum);
stopTime = new Date();
durationTime(startTime, stopTime);
}
When I run locally on the nodejs (v8.9.3), the result in the console:
Duration(0): 4ms
Duration(1): 0ms
Duration(2): 0ms
Duration(3): 0ms
Duration(4): 0ms
My Question: Why does the first call of 'findPair' take 4ms and other calls only 0ms?
When the loop runs the first time the JavaScript engine (Google's V8) interprets the code, compiles it and executes. However, code that runs more often will have it's compiled code optimized and cached so that subsequent runs of that code can run faster. Code inside loops would be a good example of such code that runs often.
Unless you fiddle with prototypes and things that could make that cached code invalid, it'll keep running that cached code which is a lot faster than interpreting the code every time it runs.
There are a lot of smart things V8 does to make your code run faster, if you are interested in this stuff I'd highly recommend reading the sources for my answer:
Dynamic Memory and V8 with JavaScript
How JavaScript works: inside the V8 engine + 5 tips on how to write optimized code
Before beginning, better time measurement is:
for (let i = 0; i < 5; i++) {
console.time('timer '+i);
findPair(arr, sum);
console.timeEnd('timer ' + i);
}
...
The first function call is slower, probably because V8 dynamically create hidden classes behind the scenes, and (initially) puts function into it.
More information on https://github.com/v8/v8/wiki/Design%20Elements#fast-property-access

Node.js Calling functions as quickly as possible without going over some limit

I have multiple functions that call different api endpoints, and I need to call them as quickly as possible without going over some limit (20 calls per second for example). My current solution is to have a delay and call the function once every 50 milliseconds for the example I gave, but I would like to call them as quickly as possible and not just space out the calls equally with the rate limit.
function-rate-limit solved a similar problem for me. function-rate-limit spreads out calls to your function over time, without dropping calls to your function. It still allows instantaneous calls to you function until the rate limit is reached, so it can behave with no latency introduced under normal circumstances.
Example from function-rate-limit docs:
var rateLimit = require('function-rate-limit');
// limit to 2 executions per 1000ms
var start = Date.now()
var fn = rateLimit(2, 1000, function (x) {
console.log('%s ms - %s', Date.now() - start, x);
});
for (var y = 0; y < 10; y++) {
fn(y);
}
results in:
10 ms - 0
11 ms - 1
1004 ms - 2
1012 ms - 3
2008 ms - 4
2013 ms - 5
3010 ms - 6
3014 ms - 7
4017 ms - 8
4017 ms - 9
You can try using queue from async. Be careful when doing this, it essentially behaves like a while(true) in other languages:
const async = require('async');
const concurrent = 10; // At most 10 concurrent ops;
const tasks = Array(concurrent).fill().map((e, i) => i);
let pushBack; // let's create a ref to a lambda function
const myAsyncFunction = (task) => {
// TODO: Swap with the actual implementation
return Promise.resolve(task);
};
const q = async.queue((task, cb) => {
myAsyncFunction(task)
.then((result) => {
pushBack(task);
cb(null, result);
})
.catch((err) => cb(err, null));
}, tasks.length);
pushBack = (task) => q.push(task);
q.push(tasks);
What's happening here? We are saying "hey run X tasks in parallel" and after each task gets completed, we put it back in the queue which is the equivalent of saying "run X tasks in parallel forever"

Socket.io and node.js, can't understand memory usage

I currently use socket.io v1.4.2 and node.js v0.10.29 on my server.
I try to track a memory leak in my app, I'm not sure, but I think socket.io is a part of my problem.
So here a code of the server (demo example):
var server = require ('http').createServer ();
var io = require ('socket.io')(server);
io.on ("connection", function (socket) {
socket.on ('disconnect', function (data) { /* Do nothing */ });
});
Step 1 : Memory : 58Mb
Step 2 : I create A LOT of clients (~10000), Memory : 300 Mb
Step 3 : I close all clients and waiting the GC doing his work
Step 4 : I look at my memory : 100 Mb :'(
Step 5 : Same as step 2 and 3
Step 6 : Memory 160Mb...
And so on and the memory keeps growing.
I presume the GC was lazy so I retry with the follow code :
setInterval (function () {
global.gc ();
}, 30000);
And I start my app.js with :
node --expose-gc app.js
But I had the same result.
Finally I try
var server = require ('http').createServer ();
var io = require ('socket.io')(server);
clients = {};
io.on ("connection", function (socket) {
clients[socket.id] = socket;
socket.on ('disconnect', function (data) {
delete clients[socket.id];
});
});
And I had the same result.
How can I free this memory ?
EDIT
I create snapshot directly on my main source.
I install the new module with the follow command :
npm install heapdump
I write in my code this :
heapdump = require ('heapdump');
setInterval (function () { heapdump.writeSnapshot (); }, 30000);
It took heapdump of the program every 30 seconds, and save it in the current directory.
I read the heapdump with the module 'profiles' of Chrome.
So, the issue is probably socket.io, cause I found many strings not released, that I emit with socket.
Perhaps I don't write the emit in the right way ?
I do that :
var data1 = [1, 2, 3];
var data2 = [4, 5, 6];
var data3 = [7, 8, 9];
socket.emit ('b', data1, data2, data3);
data1 = [];
data2 = [];
data3 = [];
And in my snapshot say that the program keeps the following string: "b [1, 2, 3] [4, 5, 6] [7, 8, 9]" in my memory, millions of time
What I'm suppose to do ?
I also make an another (perhaps stupid ?) test :
var t1 = new Date ();
...
var t2 = new Date ();
var data1 = [1, 2, 3];
var data2 = [4, 5, 6];
var data3 = [7, 8, 9];
socket.emit ('b', data1, data2, data3);
data1 = [];
data2 = [];
data3 = [];
console.log ("LAG: " + t2 - t1);
t1 = new Date ();
I had this result :
LAG: 1
LAG: 1
...
LAG: 13
LAG: 2
LAG: 26
LAG: 3
...
LAG: 100
LAG: 10
LAG: 1
LAG: 1
LAG: 120
...
keeps growing
EDIT 2 :
This is my entire test code :
/* Make snapshot every 30s in current directory */
heapdump = require ('heapdump');
setInterval (function () { heapdump.writeSnapshot (); }, 30000);
/* Create server */
var server = require ('http').createServer ();
var io = require ('socket.io')(server);
var t1 = new Date ();
clients = {};
io.on ("connection", function (socket) {
clients[socket.id] = socket;
socket.on ('disconnect', function (data) {
delete clients[socket.id];
});
});
setInterval (function () {
var t2 = new Date ();
for (c in clients) {
var data1 = [1, 2, 3];
var data2 = [4, 5, 6];
var data3 = [7, 8, 9];
clients[c].emit ('b', data1, data2, data3);
data1 = [];
data2 = [];
data3 = [];
}
console.log ("LAG: " + t2 - t1);
t1 = new Date ();
}, 100);
I don't give the code of client. Because I assume that: if the problem is in the client, so it's a security issue. In fact, it will be an easy way to saturate the RAM of the server. So it's a kind of better DDOS, I juste hope the problem is not in the client.
Edit based on the server code you included
On your server:
c.emit ('b', data1, data2, data3);`
should be changed to:
clients[c].emit('b', data1, data2, data3);
c.emit() was probably throwing an exception because c is the socket.id string and strings don't have a .emit() method.
Original answer
What you need to determine is whether the growth in memory is actually memory that is allocated within the node.js heap or if it's free memory that has just not been returned to the operating system and is available for future reuse within node.js? Measuring the memory used by the node.js process is useful to see what it has taken from the system and that should not continually go up forever over time, but it doesn't tell you what is really going on inside.
FYI, as long as your node.js app has a few free cycles, you shouldn't ever have to manually call the GC. It will do that itself.
The usual way to measure what is being used within the node.js heap is to take a heap snapshot, run your Steps 1-4, take a heap snapshot, run those steps again, take another heap snapshot, diff the snapshots and see what memory in the node.js heap is actually different between the two states.
That will show you what is actually in use within node.js that has changed.
Here's an article on taking heap snapshots and reading them in the debugger: https://strongloop.com/strongblog/how-to-heap-snapshots/

Fast file copy with progress information in Node.js?

Is there any chance to copy large files with Node.js with progress infos and fast?
Solution 1 : fs.createReadStream().pipe(...) = useless, up to 5 slower than native cp
See: Fastest way to copy file in node.js, progress information is possible (with npm package 'progress-stream' ):
fs = require('fs');
fs.createReadStream('test.log').pipe(fs.createWriteStream('newLog.log'));
The only problem with that way is that it takes easily 5 times longer compared "cp source dest". See also the appendix below for the full test code.
Solution 2 : rsync ---info=progress2 = same slow as solution 1 = useless
Solution 3 : My last resort, write a native module for node.js, using "CoreUtils" (linux sources for cp and others) or other functions as shown in Fast file copy with progress
Does anyone knows better than solution 3? I'd like to avoid native code but it seems the best fit.
thanks! any package recommendations or hints (tried all fs**) are welcome!
Appendix:
test code, using pipe and progress:
var path = require('path');
var progress = require('progress-stream');
var fs = require('fs');
var _source = path.resolve('../inc/big.avi');// 1.5GB
var _target= '/tmp/a.avi';
var stat = fs.statSync(_source);
var str = progress({
length: stat.size,
time: 100
});
str.on('progress', function(progress) {
console.log(progress.percentage);
});
function copyFile(source, target, cb) {
var cbCalled = false;
var rd = fs.createReadStream(source);
rd.on("error", function(err) {
done(err);
});
var wr = fs.createWriteStream(target);
wr.on("error", function(err) {
done(err);
});
wr.on("close", function(ex) {
done();
});
rd.pipe(str).pipe(wr);
function done(err) {
if (!cbCalled) {
console.log('done');
cb && cb(err);
cbCalled = true;
}
}
}
copyFile(_source,_target);
update: a fast (with detailed progress!) C version is implemented here: https://github.com/MidnightCommander/mc/blob/master/src/filemanager/file.c#L1480. Seems the best place to go from :-)
One aspect that may slow down the process is related to console.log. Take a look into this code:
const fs = require('fs');
const sourceFile = 'large.exe'
const destFile = 'large_copy.exe'
console.time('copying')
fs.stat(sourceFile, function(err, stat){
const filesize = stat.size
let bytesCopied = 0
const readStream = fs.createReadStream(sourceFile)
readStream.on('data', function(buffer){
bytesCopied+= buffer.length
let porcentage = ((bytesCopied/filesize)*100).toFixed(2)
console.log(porcentage+'%') // run once with this and later with this line commented
})
readStream.on('end', function(){
console.timeEnd('copying')
})
readStream.pipe(fs.createWriteStream(destFile));
})
Here are the execution times copying a 400mb file:
with console.log: 692.950ms
without console.log: 382.540ms
cpy and cp-file both support progress reporting
I have the same issue. I want to copy large files as fast as possible and want progress information. I created a test utility that tests the different copy methods:
https://www.npmjs.com/package/copy-speed-test
You can run it simply with:
npx copy-speed-test --source someFile.zip --destination someNonExistentFolder
It does a native copy using child_process.exec(), a copy file using fs.copyFile and it uses createReadStream with a variety of different buffer sizes (you can change buffer sizes by passing them on the command line. run npx copy-speed-test -h for more info.
Some things I learnt:
fs.copyFile is just as fast as native
you can get quite inconsistent results on all these methods, particularly when copying from and to the same disc and with SSDs
if using a large buffer then createReadStream is nearly as good as the other methods
if you use a very large buffer then the progress is not very accurate.
The last point is because the progress is based on the read stream, not the write stream. if copying a 1.5GB file and your buffer is 1GB then the progress immediately jumps to 66% then jumps to 100% and you then have to wait whilst the write stream finishes writing. I don't think that you can display the progress of the write stream.
If you have the same issue I would recommend that you run these tests with similar file sizes to what you will be dealing with and across similar media. My end use case is copying a file from an SD card plugged into a raspberry pi and copied across a network to a NAS so that's what I was the scenario that I ran the tests for.
I hope someone other than me finds it useful!
I solved a similar problem (using Node v8 or v10) by changing the buffer size. I think the default buffer size is around 16kb, which fills and empties quickly but requires a full cycle around the event loop for each operation. I changed the buffer to 1MB and writing a 2GB image fell from taking around 30 minutes to 5, which sounds similar to what you are seeing. My image was also decompressed on the fly, which possibly exacerbated the problem. Documentation on stream buffering has been in the manual since at least Node v6: https://nodejs.org/api/stream.html#stream_buffering
Here are the key code components you can use:
let gzSize = 1; // do not initialize divisors to 0
const hwm = { highWaterMark: 1024 * 1024 }
const inStream = fs.createReadStream( filepath, hwm );
// Capture the filesize for showing percentages
inStream.on( 'open', function fileOpen( fdin ) {
inStream.pause(); // wait for fstat before starting
fs.fstat( fdin, function( err, stats ) {
gzSize = stats.size;
// openTargetDevice does a complicated fopen() for the output.
// This could simply be inStream.resume()
openTargetDevice( gzSize, targetDeviceOpened );
});
});
inStream.on( 'data', function shaData( data ) {
const bytesRead = data.length;
offset += bytesRead;
console.log( `Read ${offset} of ${gzSize} bytes, ${Math.floor( offset * 100 / gzSize )}% ...` );
// Write to the output file, etc.
});
// Once the target is open, I convert the fd to a stream and resume the input.
// For the purpose of example, note only that the output has the same buffer size.
function targetDeviceOpened( error, fd, device ) {
if( error ) return exitOnError( error );
const writeOpts = Object.assign( { fd }, hwm );
outStream = fs.createWriteStream( undefined, writeOpts );
outStream.on( 'open', function fileOpen( fdin ) {
// In a simpler structure, this is in the fstat() callback.
inStream.resume(); // we have the _input_ size, resume read
});
// [...]
}
I have not made any attempt to optimize these further; the result is similar to what I get on the commandline using 'dd' which is my benchmark.
I left in converting a file descriptor to a stream and using the pause/resume logic so you can see how these might be useful in more complicated situations than the simple fs.statSync() in your original post. Otherwise, this is simply adding the highWaterMark option to Tulio's answer.
Here is what I'm trying to use now, it copies 1 file with progress:
String.prototype.toHHMMSS = function () {
var sec_num = parseInt(this, 10); // don't forget the second param
var hours = Math.floor(sec_num / 3600);
var minutes = Math.floor((sec_num - (hours * 3600)) / 60);
var seconds = sec_num - (hours * 3600) - (minutes * 60);
if (hours < 10) {hours = "0"+hours;}
if (minutes < 10) {minutes = "0"+minutes;}
if (seconds < 10) {seconds = "0"+seconds;}
return hours+':'+minutes+':'+seconds;
}
var purefile="20200811140938_0002.MP4";
var filename="/sourceDir"+purefile;
var output="/destinationDir"+purefile;
var progress = require('progress-stream');
var fs = require('fs');
const convertBytes = function(bytes) {
const sizes = ["Bytes", "KB", "MB", "GB", "TB"]
if (bytes == 0) {
return "n/a"
}
const i = parseInt(Math.floor(Math.log(bytes) / Math.log(1024)))
if (i == 0) {
return bytes + " " + sizes[i]
}
return (bytes / Math.pow(1024, i)).toFixed(1) + " " + sizes[i]
}
var copiedFileSize = fs.statSync(filename).size;
var str = progress({
length: copiedFileSize, // length(integer) - If you already know the length of the stream, then you can set it. Defaults to 0.
time: 200, // time(integer) - Sets how often progress events are emitted in ms. If omitted then the default is to do so every time a chunk is received.
speed: 1, // speed(integer) - Sets how long the speedometer needs to calculate the speed. Defaults to 5 sec.
// drain: true // drain(boolean) - In case you don't want to include a readstream after progress-stream, set to true to drain automatically. Defaults to false.
// transferred: false// transferred(integer) - If you want to set the size of previously downloaded data. Useful for a resumed download.
});
/*
{
percentage: 9.05,
transferred: 949624,
length: 10485760,
remaining: 9536136,
eta: 42,
runtime: 3,
delta: 295396,
speed: 949624
}
*/
str.on('progress', function(progress) {
console.log(progress.percentage+'%');
console.log('eltelt: '+progress.runtime.toString().toHHMMSS() + 's / hátra: ' + progress.eta.toString().toHHMMSS()+'s');
console.log(convertBytes(progress.speed)+"/s"+' '+progress.speed);
});
//const hwm = { highWaterMark: 1024 * 1024 } ;
var hrstart = process.hrtime(); // measure the copy time
var rs=fs.createReadStream(filename)
.pipe(str)
.pipe(fs.createWriteStream(output, {emitClose: true}).on("close", () => {
var hrend = process.hrtime(hrstart);
var timeInMs = (hrend[0]* 1000000000 + hrend[1]) / 1000000000;
var finalSpeed=convertBytes(copiedFileSize/timeInMs);
console.log('Done: file copy: '+ finalSpeed+"/s");
console.info('Execution time (hr): %ds %dms', hrend[0], hrend[1] / 1000000);
}) );
Refer to https://www.npmjs.com/package/fsprogress.
With that package, you can track progress while you are copying or moving files. The progress tracking is event and method call based so its very convenient to use.
You can provide options to do a lot of things. eg. total number of file for concurrent operation, chunk size to read from a file at a time.
It was tested for single file upto 17GB and directories up to i dont really remember but it was pretty large. And also :D, it is safe to use for large file(s).
So, go ahead and have a look at it whether it matches your expectations or if it is what you are looking for :D

Clustering Loop Confusion

My basic setup I have using the cluster module is: (I have 6 cores)
var cluster = require('cluster');
if (cluster.isMaster) {
var numCPUs = require('os').cpus().length;
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
}else{
//Code here
console.time("Time: ");
var obj = {'abcdef' : 1, 'qqq' : 13, '19' : [1, 2, 3, 4]};
for(var i = 0; i < 500000; i++) {
JSON.parse(JSON.stringify(obj));
}
console.timeEnd("Time: ");
}
If I were to run that test.
It will output:
But... if I run that same exact test inside the cluster.isMaster block, it will output:
1) Why is my code being executed multiple times instead of once?
2) Since I have 6 cpu cores helping me run that test, shouldn't it run that code only once but perform the operation faster?
You're forking os.cpus().length separate processes. So if os.cpus().length === 6, then you should see 6 separate outputs (which is the case from the output you've posted).
No, that's not how it works. Each process would be scheduled on a separate core. It's not about running it faster, but being able to do more processing in parallel.

Resources