NodeJS ReadStream not reading bufferSize bytes at a time - node.js

I have a code where the NodeJS server reads a file and streams it to response, it looks like:
var fStream = fs.createReadStream(filePath, {'bufferSize': 128 * 1024});
fStream.pipe(response);
The issue is, Node reads the file exactly 40960 bytes a time. However, my app would be much more efficient (due to reasons not applicable to this question), if it reads 131072 (128 * 1024) bytes at a time.
Is there a way to force Node to read 128 * 1024 bytes at a time?

The accepted answer is wrong. You can force Node to read (128*1024) bytes at a time using the highWaterMark option.
var fStream = fs.createReadStream('/foo/bar', { highWaterMark: 128 * 1024 });
The Documentation specifically states that 'the amount of data potentially buffered depends on the highWaterMark option passed into the streams constructor. For normal streams, the highWaterMark option specifies a total number of bytes. For streams operating in object mode, the highWaterMark specifies a total number of objects.'
Also, see this. The default buffer size is 64 KiloBytes

I'm new here, so bear with me....
I found this in node's sources:
var toRead = Math.min(pool.length - pool.used, ~~this.bufferSize);
and:
var kPoolSize = 40 * 1024;
So it seems that the buffer size is limited to 40kb, no matter what you provide. You could try to change the value in the code and rebuild node. That's probably not a very maintainable solution though...

Related

In 2018 a Tech Lead at Google said they were working to "support buffers way beyond 4GiB" in V8 on 64 bit systems. Did that happen?

In 2018 a Tech Lead at Google said they were working to "support buffers way beyond 4GiB" in V8 on 64 bit systems. Did that happen?
Trying to load a large file into a buffer like:
const fileBuffer = fs.readFileSync(csvPath);
in Node v12.16.1 and getting the error:
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3461193224) is greater than possible Buffer: 2147483647 bytes.
and in Node v14.12.0 (latest) and getting the error:
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3461193224) is greater than 2 GB
Which looks to me to be a limit set due to 32 bit integers for addressing of the buffers. But I don't understand why this would be a limitation on 64 bit systems... Yes I realize I can use streams or read from the file at a specific address, but I have massive amounts of memory laying around, and I'm limited to 2147483647 bytes because Node is limited at 32 bit addressing?
Surely having a buffer of a high frequency random access data-set fully loaded into a buffer rather than streamed has performance benefits. The code involved in directing the request to pull from the multiple buffer alternative structure is going to cost something, regardless of how small...
I can use the --max-old-space-size=16000 flag to increase the maximum memory used by Node, but I suspect this is a hard-limit based on the architecture of V8. However I still have to ask since the tech lead at Google did claim they were increasing the maximum buffer size past 4GiB: Is there any way in 2020 to have a buffer beyond 2147483647 bytes in Node.js?
Edit, relevant tracker on the topic by Google, where apparently they were working on fixing this since at least last year: https://bugs.chromium.org/p/v8/issues/detail?id=4153
Did that happen?
Yes, V8 supports very large (many gigabytes) ArrayBuffers nowadays.
Is there any way to have a buffer beyond 2147483647 bytes in Node.js?
Yes:
$ node
Welcome to Node.js v14.12.0.
Type ".help" for more information.
> let b = Buffer.alloc(3461193224)
undefined
> b.length
3461193224
That said, it appears that fs.readFileAsync has its own limit: https://github.com/nodejs/node/blob/master/lib/internal/fs/promises.js#L5
I have no idea what it would take to lift that. I suggest you file an issue on Node's bug tracker.
FWIW, Buffer has yet another limit:
> let buffer = require("buffer")
undefined
> buffer.kMaxLength
4294967295
And again, that's Node's decision, not V8's.

What is the internal buffer size for "fs.createReadStream"

What is the internal buffer size for "fs.createReadStream" ?
From the stream's documentation, it is mentioned that the fs.createReadStream uses an internal buffer that is accessible via readable._readableState.buffer
From the stream's documentation, it is mentioned that the default highWaterMark is 64kb.
But the readable._readableState.buffer.length get's 1, not 8 (8 bytes = 64 kb). Why is that ?
References:
https://nodejs.org/docs/latest-v6.x/api/stream.html#stream_buffering
https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options

Linux Lazarus: Wrong FileSize Reported by TFileStream

I am trying to read my file through Filestream to send it over the network and I have noticed something odd. I am not sure why.
My actual filesize is 44.7KB, but when filestream reads the same file it tells me that the filesize is 45228 Bytes or 45.2 KB. Why is that? Is there is a way to fix that?
fs:TFileStream;
fs := TFileStream.Create('myfile.dat', fmOpenRead or fmshareDenyWrite);
showmessage(inttostr(fs.Size));
One possibility is that whatever you are using to report the file size is measuring the size using kibibytes (1024 bytes) rather than kilobytes (1000 bytes).
Divide 45228 by 1024 to get 44.2KiB. This still doesn't match exactly but I would not be surprised if there was a transcription error in your question. There is at least one, where you wrote FileSize rather than Size (now corrected by a question edit) so my guess is that some of the other specific details are incorrect.
Other than that I think it very likely that the problem is in your other method of obtaining the file size. TFileStream.Size can be trusted to give an accurate value. If that doesn't tally with some other measure then that other measure is probably wrong.
On Linux you can use the stat command to get a definitive report of the file size. I would expect that to yield the same value as TFileStream.Size.

Is there a way to limit the maximum size of a websocket message?

I have an application in which I wish to limit the maximum size of a message that was sent across the wire by a connected client. Since the theoretical maximum of a message in Node.js is about 1.9 GB, I actually never want my application to allocate that big a chunk of memory if some malicious clients tries to send an over-sized packet.
How can I limit the incoming message size, to say, 1024 bytes?
To anyone looking for answer to this question in future, use maxPayload option in server configuration to limit the message size before it is read by Node (which is almost always what you want)
const wss = new WebSocket.Server({
clientTracking: true,
maxPayload: 128 * 1024, // 128 KB
path: "/learn",
//....
})
Each character is 1 to 3 bytes in length. So you can technically just substring the string to get desired length.
var limited = fullString.substring(0, 1024);
Or you can use a package like this : utf8-binary-cutter

Optimal buffer size with Node.js?

I have a situation where I need to take a stream and chunk it up into Buffers. I plan to write an object transform stream which takes regular input data, and outputs Buffer objects (where the buffers are all the same size). That is, if my chunker transform is configured at 8KB, and 4KB is written to it, it will wait until an additional 4KB is written before outputting an 8KB Buffer instance.
I can choose the size of the buffer, as long as it is in the ballpark of 8KB to 32KB. Is there an optimal size to pick? The reason I'm curious is that the Node.js documentation speaks of using SlowBuffer to back a Buffer, and allocating a minimum of 8KB:
In order to avoid the overhead of allocating many C++ Buffer objects for small blocks of memory in the lifetime of a server, Node allocates memory in 8Kb (8192 byte) chunks. If a buffer is smaller than this size, then it will be backed by a parent SlowBuffer object. If it is larger than this, then Node will allocate a SlowBuffer slab for it directly.
Does this imply that 8KB is an efficient size, and that if I used 12KB, there would be two 8KB SlowBuffers allocated? Or does it just mean that the smallest efficient size is 8KB? What about simply using multiples of 8KB? Or, does it not matter at all?
Basically it's saying that if your Buffer is less than 8KB, it'll try to fit it in to a pre-allocated 8KB chunk of memory. It'll keep putting Buffers in that 8KB chunk until one doesn't fit, then it'll allocate a new 8KB chunk. If the Buffer is larger than 8KB, it'll get its own memory allocation.
You can actually see what's happening by looking at the node source for buffer here:
if (this.length <= (Buffer.poolSize >>> 1) && this.length > 0) {
if (this.length > poolSize - poolOffset)
createPool();
this.parent = sliceOnto(allocPool,
this,
poolOffset,
poolOffset + this.length);
poolOffset += this.length;
} else {
alloc(this, this.length);
}
Looking at that, it actually looks like it'll only put the Buffer in to a pre-allocated chunk if it's less than or equal to 4KB (Buffer.poolSize >>> 1 which is 4096 when Buffer.poolSize = 8 * 1024).
As for an optimum size to pick in your situation, I think it depends on what you end up using it for. But, in general, if you want a chunk less than or equal to 8KB, I'd pick something less than or equal to 4KB that will evenly fit in to that 8KB pre-allocation (4KB, 2KB, 1KB, etc.). Otherwise, chunk sizes greater than 8KB shouldn't make too much of a difference.

Resources