Is http.ServerResponse.write() blocking? - node.js

Is it possible to write non-blocking response.write? I've written a simple test to see if other clients can connect while one downloads a file:
var connect = require('connect');
var longString = 'a';
for (var i = 0; i < 29; i++) { // 512 MiB
longString += longString;
}
console.log(longString.length)
function download(request, response) {
response.setHeader("Content-Length", longString.length);
response.setHeader("Content-Type", "application/force-download");
response.setHeader("Content-Disposition", 'attachment; filename="file"');
response.write(longString);
response.end();
}
var app = connect().use(download);
connect.createServer(app).listen(80);
And it seems like write is blocking!
Am I doing something wrong?
Update So, it doesn't block and it blocks in the same time. It doesn't block in the sense that two files can be downloaded simultaneously. And it blocks in the sense that creating a buffer is a long operation.

Any processing done strictly in JavaScript will block. response.write(), at least as of v0.8, is no exception to this:
The first time response.write() is called, it will send the buffered header information and the first body to the client. The second time response.write() is called, Node assumes you're going to be streaming data, and sends that separately. That is, the response is buffered up to the first chunk of body.
Returns true if the entire data was flushed successfully to the kernel buffer. Returns false if all or part of the data was queued in user memory. 'drain' will be emitted when the buffer is again free.
What may save some time is to convert longString to Buffer before attempting to write() it, since the conversion will occur anyways:
var longString = 'a';
for (...) { ... }
longString = new Buffer(longString);
But, it would probably be better to stream the various chunks of longString rather than all-at-once (Note: Streams are changing in v0.10):
var longString = 'a',
chunkCount = Math.pow(2, 29),
bufferSize = Buffer.byteLength(longString),
longBuffer = new Buffer(longString);
function download(request, response) {
var current = 0;
response.setHeader("Content-Length", bufferSize * chunkCount);
response.setHeader("Content-Type", "application/force-download");
response.setHeader("Content-Disposition", 'attachment; filename="file"');
function writeChunk() {
if (current < chunkCount) {
current++;
if (response.write(longBuffer)) {
process.nextTick(writeChunk);
} else {
response.once('drain', writeChunk);
}
} else {
response.end();
}
}
writeChunk();
}
And, if the eventual goal is to stream a file from disk, this can be even easier with fs.createReadStream() and stream.pipe():
function download(request, response) {
// response.setHeader(...)
// ...
fs.createReadStream('./file-on-disk').pipe(response);
}

Nope, it does not block, I tried one from IE and other from firefox. I did IE first but still could download file from firefox first.
I tried for 1 MB (i < 20) it works the same just faster.
You should know that whatever longString you create requires memory allocation. Try to do it for i < 30 (on windows 7) and it will throw FATAL ERROR: JS Allocation failed - process out of memory.
It takes time for memory allocation/copying nothing else. Since it is a huge file, the response is time taking and your download looks like blocking. Try it yourself for smaller values (i < 20 or something)

Related

Writing large amounts of streamed data frequently with writestream

I'm trying to write a live websocket feed line-by-line to a file - I think for this I should be using a writeable stream.
My problem here is that the data received is in the region of 10 lines per second, which quickly fills the buffer.
I understand when using streams from sources you control, you would normally add some sort of backpressure logic here, but what should I do if I do not control the source? Should I be batching up the writes and writing, say 500 lines at a time, instead of per line, or should I be using some other way to save this data?
I'm wondering how big are the lines? 10 lines per second sounds trivial to stream to a disk unless the lines are gigantic or the disk really slow. Ultimately, if you have no ability to apply backpressure logic, the source can overwhelm you if they go fast or your storage goes slow and you'd have to decide how much you can reasonably buffer and eventually just drop some of the data if you get behind.
But, you should be able to write a lot of data. On a my regular hard disk (using the generic stream code below with no additional buffering) I can do sequential writes of 100,000,000 bytes at a speed of 55 MBytes/sec:
So, if you have 10 lines per second coming in, as long as the lines were below 10,000,000 bytes each, my hard drive could keep up.
Here's the code I used to test it:
const fs = require('fs');
const { Bench } = require('../../Github/measure');
const { addCommas } = require("../../Github/str-utils");
const lineData = Buffer.from("012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678\n", 'utf-8');
let stream = fs.createWriteStream("D:\\Temp\\temp.txt");
stream.on('open', function() {
let linesRemaining = 1_000_000;
let b = new Bench();
let bytes = 0;
function write() {
do {
linesRemaining--;
let readyMore;
bytes += lineData.length;
if (linesRemaining === 0) {
readyForMore = stream.write(lineData, done);
} else {
readyForMore = stream.write(lineData);
}
} while (linesRemaining > 0 && readyForMore);
if (linesRemaining > 0) {
stream.once('drain', write);
}
}
function done() {
b.markEnd();
console.log(`Time to write ${addCommas(bytes)} bytes: ${b.formatSec(3)}`);
console.log(`bytes/sec = ${addCommas((bytes/b.sec).toFixed(0))}`);
console.log(`MB/sec = ${addCommas(((bytes/(1024 * 1024))/b.sec).toFixed(1))}`);
stream.end();
}
b.markBegin();
write();
});
Theoretically, it is more efficient for your disk to do fewer writes that are larger, than tons of small writes. In practice, because of the way the writeStream works, as soon as an inefficient write gets slow, the next write will get buffered and it kind of self corrects. If you were really trying to minimize the load on the disk, you would buffer writes until you had at least something like 4k to write. The issue is that each write has potentially allocate some bytes to the file (which involves writing to a table on the disk), then seek to where the bytes should be written on the disk, then write the bytes. Fewer and larger writes that are larger (up to some limit that depends upon internal implementation) will reduce the number of times it has to do the file allocation overhead.
So, I ran a test. I modified the above code (shown below) to buffer into 4k chunks and write them out in 4k chunks. The write through increased from 55 MBytes/sec to 284.2 MBytes/sec.
So, the theory holds true that you will write faster if you buffer into larger chunks.
But, even the simpler, non-buffered version may be plenty fast.
Here's the test code for the buffered version:
const fs = require('fs');
const { Bench } = require('../../Github/measure');
const { addCommas } = require("../../Github/str-utils");
const lineData = Buffer.from("012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789012345678\n", 'utf-8');
let stream = fs.createWriteStream("D:\\Temp\\temp.txt");
stream.on('open', function() {
let linesRemaining = 1_000_000;
let b = new Bench();
let bytes = 0;
let cache = [];
let cacheTotal = 0;
const maxBuffered = 4 * 1024;
stream.myWrite = function(data, callback) {
if (callback) {
cache.push(data);
return stream.write(Buffer.concat(cache), callback);
} else {
cache.push(data);
cacheTotal += data.length;
if (cacheTotal >= maxBuffered) {
let ready = stream.write(Buffer.concat(cache));
cache.length = 0;
cacheTotal = 0;
return ready;
} else {
return true;
}
}
}
function write() {
do {
linesRemaining--;
let readyMore;
bytes += lineData.length;
if (linesRemaining === 0) {
readyForMore = stream.myWrite(lineData, done);
} else {
readyForMore = stream.myWrite(lineData);
}
} while (linesRemaining > 0 && readyForMore);
if (linesRemaining > 0) {
stream.once('drain', write);
}
}
function done() {
b.markEnd();
console.log(`Time to write ${addCommas(bytes)} bytes: ${b.formatSec(3)}`);
console.log(`bytes/sec = ${addCommas((bytes/b.sec).toFixed(0))}`);
console.log(`MB/sec = ${addCommas(((bytes/(1024 * 1024))/b.sec).toFixed(1))}`);
stream.end();
}
b.markBegin();
write();
});
This code uses a couple of my local libraries for measuring the time and formatting the output. If you want to run this yourself, you can substitute your own logic for those.

How to write incrementally to a text file and flush output

My Node.js program - which is an ordinary command line program that by and large doesn't do anything operationally unusual, nothing system-specific or asynchronous or anything like that - needs to write messages to a file from time to time, and then it will be interrupted with ^C and it needs the contents of the file to still be there.
I've tried using fs.createWriteStream but that just ends up with a 0-byte file. (The file does contain text if the program ends by running off the end of the main file, but that's not the scenario I have.)
I've tried using winston but that ends up not creating the file at all. (The file does contain text if the program ends by running off the end of the main file, but that's not the scenario I have.)
And fs.writeFile works perfectly when you have all the text you want to write up front, but doesn't seem to support appending a line at a time.
What is the recommended way to do this?
Edit: specific code I've tried:
var fs = require('fs')
var log = fs.createWriteStream('test.log')
for (var i = 0; i < 1000000; i++) {
console.log(i)
log.write(i + '\n')
}
Run for a few seconds, hit ^C, leaves a 0-byte file.
Turns out Node provides a lower level file I/O API that seems to work fine!
var fs = require('fs')
var log = fs.openSync('test.log', 'w')
for (var i = 0; i < 100000; i++) {
console.log(i)
fs.writeSync(log, i + '\n')
}
NodeJS doesn't work in the traditional way. It uses a single thread, so by running a large loop and doing I/O inside, you aren't giving it a chance (i.e. releasing the CPU) to do other async operations for eg: flushing memory buffer to actual file.
The logic must be - do one write, then pass your function (which invokes the write) as a callback to process.nextTick or as callback to the write stream's drain event (if buffer was full during last write).
Here's a quick and dirty version which does what you need. Notice that there are no long-running loops or other CPU blockage, instead I schedule my subsequent writes for future and return quickly, momentarily freeing up the CPU for other things.
var fs = require('fs')
var log = fs.createWriteStream('test.log');
var i = 0;
function my_write() {
if (i++ < 1000000)
{
var res = log.write("" + i + "\r\n");
if (!res) {
log.on('drain',my_write);
} else {
process.nextTick(my_write);
}
console.log("Done" + i + " " + res + "\r\n");
}
}
my_write();
This function might also be helpful.
/**
* Write `data` to a `stream`. if the buffer is full will block
* until it's flushed and ready to be written again.
* [see](https://nodejs.org/api/stream.html#stream_writable_write_chunk_encoding_callback)
*/
export function write(data, stream) {
return new Promise((resolve, reject) => {
if (stream.write(data)) {
process.nextTick(resolve);
} else {
stream.once("drain", () => {
stream.off("error", reject);
resolve();
});
stream.once("error", reject);
}
});
}
You are writing into file using for loop which is bad but that's other case. First of all createWriteStream doesn't close the file automatically you should call close.
If you call close immediately after for loop it will close without writing because it's async.
For more info read here: https://nodejs.org/api/fs.html#fs_fs_createwritestream_path_options
Problem is async function inside for loop.

How to read a file by setting correct offset and position and write to the response in Nodejs with manual buffering?

I want to read a file in 64byte interval. And I also do not want to use any functionality which interanlly implements buffering. I wanted to do buffering manually. So I started using fs.read(). I tried hard but I really don't know how to set position which tells where to read from in the file and offset in the buffer to start writing at.So I found few resources and started implementing by my own. But what I did seems enterly wrong. Please find my code below.
app.get('/manualBufferAnother', function (req, res, next) {
var filePath = path.join(__dirname, 'Koala.jpg');
console.log("FilePath is: "+filePath);
var fileName = path.basename(filePath);
var mimeType = mime.lookup(filePath);
var stat = fs.statSync(filePath);
res.writeHead(200, {
"Content-Type": mimeType,
"Content-Disposition" : "attachment; filename=" + fileName,
'connection': 'keep-alive',
"Content-Length": stat.size,
"Transfer-Encoding": "chunked"
});
fs.open(filePath, 'r', function(err, fd) {
var completeBufferSize = stat.size;
var offset = 0; //is the offset in the buffer to start writing at
var length = 511; //is an integer specifying the number of bytes to read
var position = 0; //is an integer specifying where to begin reading
from in the file. If position is null, data will be read from the current file position
var buffer = new Buffer(completeBufferSize);
buf(res,fd,offset,position,length,buffer,stat);
});
});
var buf = function(res,fd,offset,position,length,buffer,stat) {
if(position+buffer.length < length) {
fs.read(fd,buffer,offset,length,position,function(error,bytesRead,bufferr {
res.write(bufferr.slice(0,bytesRead));
console.log("Bytes Read: "+bytesRead);
position=position+bufferr.length;
buf(res,fd,offset,position,length,bufferr,stat);
})
} else {
fs.read(fd,buffer,offset,length,position,function(error,bytesRead,bufferr) {
console.log("Bytes Read in else: "+bytesRead);
res.end(bufferr.slice(0,bytesRead));
fs.close(fd)
})
}
}
I know this code is doing so much wrong thing. But I don't know the right way.
Should I use any loop for setting and storing position and offset values?
Will be really helpful if you provide me good reference?
Here is an example:
res.writeHead(...);
var SIZE = 64; // 64 byte intervals
fs.open(filepath, 'r', function(err, fd) {
fs.fstat(fd, function(err, stats) {
var bufferSize = stats.size;
var buffer = new Buffer(bufferSize),
var bytesRead = 0;
while (bytesRead < bufferSize) {
var size = Math.min(SIZE, bufferSize - bytesRead);
var read = fs.readSync(fd, buffer, bytesRead, size, bytesRead);
bytesRead += read;
}
res.write(buffer);
});
});
Should I use any loop for setting and storing position and offset values?
Yes you can but be careful. In Node.js, most file system functions are asynchronous (non-blocking). As you probably realised, putting an asynchronous function in a loop is going to cause problems. You can tell if a function is asynchonous by
looking at the Node.js documentation and checking if there is a
callback parameter. So using read inside a loop is bad. We can instead use readSync. This is the synchronous (blocking) and is similar to C's read() function (which is also blocking).
I really don't know how to set position which tells where to read from the file and offset in the buffer to start writing at.
The arguments of the readSync function control both where to read from in the file and where in the destination buffer to write to.
// /----- where to start writing at in `buffer`
fs.readSync(fd, buffer, offset, length, position)
// \------- where to read from in the
// file given by `fd`
Note: The above style is idiomatic for C, but in Javascript it is considered poor practice -- the code will not scale well. In general you don't want to use synchronous functions ever because they block the single thread of execution that Javascript uses (aka "blocking the event loop").
From Express.js:
Synchronous functions and methods tie up the executing process until they return. A single call to a synchronous function might return in a few microseconds or milliseconds, however in high-traffic websites, these calls add up and reduce the performance of the app. Avoid their use in production. Although Node and many modules provide synchronous and asynchronous versions of their functions, always use the asynchronous version in production.
Using .pipe() and Streams (asynchronous style) is generally the way to go if you want the most idiomatic and performant code. Sorry to say this: no official sources / popular websites will describe file operations using synchronous functions via C-style blocking and buffering because it is bad practice in Node.js.

How do I apply back-pressure to node streams?

While attempting to experiment with Node.JS streams I ran into an interesting conundrum. When the input (Readable) stream pushes more data then the destination (Writable) cares about I was unable to apply back-pressure correctly.
The two methods I attempted was to return false from the Writable.prototype._write and to retain a reference to the Readable so I can call Readable.pause() from the Writable. Neither solution helped much which I'll explain.
In my exercise (which you can view the full source as a Gist) I have three streams:
Readable - PasscodeGenerator
util.inherits(PasscodeGenerator, stream.Readable);
function PasscodeGenerator(prefix) {
stream.Readable.call(this, {objectMode: true});
this.count = 0;
this.prefix = prefix || '';
}
PasscodeGenerator.prototype._read = function() {
var passcode = '' + this.prefix + this.count;
if (!this.push({passcode: passcode})) {
this.pause();
this.once('drain', this.resume.bind(this));
}
this.count++;
};
I thought that the return code from this.push() was enough to self pause and wait for the drain event to resume.
Transform - Hasher
util.inherits(Hasher, stream.Transform);
function Hasher(hashType) {
stream.Transform.call(this, {objectMode: true});
this.hashType = hashType;
}
Hasher.prototype._transform = function(sample, encoding, next) {
var hash = crypto.createHash(this.hashType);
hash.setEncoding('hex');
hash.write(sample.passcode);
hash.end();
sample.hash = hash.read();
this.push(sample);
next();
};
Simply add the hash of the passcode to the object.
Writable - SampleConsumer
util.inherits(SampleConsumer, stream.Writable);
function SampleConsumer(max) {
stream.Writable.call(this, {objectMode: true});
this.max = (max != null) ? max : 10;
this.count = 0;
}
SampleConsumer.prototype._write = function(sample, encoding, next) {
this.count++;
console.log('Hash %d (%s): %s', this.count, sample.passcode, sample.hash);
if (this.count < this.max) {
next();
} else {
return false;
}
};
Here I want to consume the data as fast as possible until I reach my max number of samples and then end the stream. I tried using this.end() instead of return false but that caused the dreaded write called after end problem. Returning false does stop everything if the sample size is small but when it is large I get an out of memory error:
FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Aborted (core dumped)
According to this SO answer in theory the Write stream would return false causing the streams to buffer until the buffers were full (16 by default for objectMode) and eventually the Readable would call it's this.pause() method. But 16 + 16 + 16 = 48; that's 48 objects in buffer till things fill up and the system is clogged. Actually less because there is no cloning involved so the objects passed between them is the same reference. Would that not mean only 16 objects in memory till the high water mark halts everything?
Lastly I realize I could have the Writable reference the Readable to call it's pause method using closures. However, this solution means the Writable stream knows to much about another object. I'd have to pass in a reference:
var foo = new PasscodeGenerator('foobar');
foo
.pipe(new Hasher('md5'))
.pipe(new SampleConsumer(samples, foo));
And this feels out of norm for how streams would work. I thought back-pressure was enough to cause a Writable to stop a Readable from pushing data and prevent out of memory errors.
An analogous example would be the Unix head command. Implementing that in Node I would assume that the destination could end and not just ignore causing the source to keep pushing data even if the destination has enough data to satisfy the beginning portion of the file.
How do I idiomatically construct custom streams such that when the destination is ready to end the source stream doesn't attempt to push more data?
This is a known issue with how _read() is called internally. Since your _read() is always pushing synchronously/immediately, the internal stream implementation can get into a loop in the right conditions. _read() implementations are generally expected to do some sort of async I/O (e.g. reading from disk or network).
The workaround for this (as noted in the link above) is to make your _read() asynchronous at least some of the time. You could also just make it async every time it's called with:
PasscodeGenerator.prototype._read = function(n) {
var passcode = '' + this.prefix + this.count;
var self = this;
// `setImmediate()` delays the push until the beginning
// of the next tick of the event loop
setImmediate(function() {
self.push({passcode: passcode});
});
this.count++;
};

NodeJS: How to write a file parser using readStream?

I have a file in a binary format:
The format is as follows:
[4 - header bytes] [8 bytes - int64 - how many bytes to read following] [variable num of bytes (size of the int64) - read the actual information]
And then it repeats, so I must first read the first 12 bytes to determine how many more bytes I need to read.
I have tried:
var readStream = fs.createReadStream('/path/to/file.bin');
readStream.on('data', function(chunk) { ... })
The problem I have is that chunk always comes back in chunks of 65536 bytes at a time whereas I need to be more specific on the number of bytes that I am reading.
I have always tried readStream.on('readable', function() { readStream.read(4) })
But it is also not very flexible, because it seems to turn asynchronous code into synchronous code because, I have to put the 'reading' in a while loop
Or maybe readStream is not appropriate in this case and I should use this instead? fs.read(fd, buffer, offset, length, position, callback)
Here's what I'd recommend as an abstract handler of a readStream to process abstract data like you're describing:
var pending = new Buffer(9999999);
var cursor = 0;
stream.on('data', function(d) {
d.copy(pending, cursor);
cursor += d.length;
var test = attemptToParse(pending.slice(0, cursor));
while (test !== false) {
// test is a valid blob of data
processTheThing(test);
var rawSize = test.raw.length; // How many bytes of data did the blob actually take up?
pending.copy(pending.copy, 0, rawSize, cursor); // Copy the data after the valid blob to the beginning of the pending buffer
cursor -= rawSize;
test = attemptToParse(pending.slice(0, cursor)); // Is there more than one valid blob of data in this chunk? Keep processing if so
}
});
For your use-case, ensure the initialized size of the pending Buffer is large enough to hold the largest possible valid blob of data you'll be parsing (you mention an int64; that max size plus the header size) plus one extra 65536 bytes in case the blob boundary happens just on the edge of a stream chunk.
My method requires a attemptToParse() method that takes a buffer and tries to parse the data out of it. It should return false if the length of the buffer is too short (data hasn't come in enough yet). If it is a valid object, it should return some parsed object that has a way to show the raw bytes it took up (.raw property in my example). Then you do any processing you need to do with the blob (processTheThing()), trim out that valid blob of data, shift the pending Buffer to just be the remainder and keep going. That way, you don't have a constantly growing pending buffer, or some array of "finished" blobs. Maybe process on the receiving end of processTheThing() is keeping an array of the blobs in memory, maybe it's writing them to a database, but in this example, that's abstracted away so this code just deals with how to handle the stream data.
Add the chunk to a Buffer, and then parse the data from there. Being aware not to go beyond the end of the buffer (if your data is large). I'm using my tablet right now so can't add any example source code. Maybe somebody else can?
Ok, mini source, very skeletal.
var chunks = [];
var bytesRead= 0;
stream.on('data', function(chunk) {
chunks.push(chunk);
bytesRead += chunk.length;
// look at bytesRead...
var buffer = Buffer.concat(chunks);
chunks = [buffer]; // trick for next event
// --> or, if memory is an issue, remove completed data from the beginning of chunks
// work with the buffer here...
}

Resources