V8/Node.js increase max allowed String length - node.js

AFAIK V8 has a known hard limit on the length of allowed Strings. Trying to parse >500MB Strings will pop the error:
Invalid String Length
Using V8 flags to increase the heap size doesn't make any difference
$ node --max_old_space_size=5000 process-large-string.js
I know that I should be using Streams instead. However is there any way to increase the maximum allowed String length anyway?
Update: Answer from #PaulIrish below indicates they upped it to 1GB - but it's still not user-configurable

In summer 2017, V8 increased the maximum size of strings from ~256MB to ~1GB. Specifically, from 2^28 - 16 to 2^30 - 25 on 64-bit platforms. V8 ticket.
This change landed in:
V8: 6.2.100
Chromium: 62.0.3167.0
Node.js: 9.0.0

Sorry, no, there is no way to increase the maximum allowed String length.
It is hard-coded in the source, and a lot of code implicitly relies on it, so while allowing larger strings is known to be on people's wishlist, it is going to be a lot of work and won't happen in the near future.

Related

what is the max size of a string type in terraform

I'm trying to locate a definitive answer to, "What is the max size of a Terraform value type of 'string'"?
Been searching and googling and can't seem to find it defined anywhere. Anyone have any reference they could point me to?
Tia
Bill W
The length of strings in Terraform is constrained in two main ways:
Terraform internally tracks the length of the string, which is stored as an integer type which has a limited range.
Strings need to exist in system memory as a consecutive sequence of bytes.
The first of these is directly answerable: Terraform tracks the length of a string using an integer type large enough to represent all pointers on the host platform. From a practical standpoint then, that means a 64-bit integer when you're using a 64-bit build, and a 32-bit number when you're using a 32-bit build.
That means that there's a hard upper limit imposed by the maximum value of that integer. Terraform is internally tracking the length of the UTF-8 representation of the string in bytes, and so this upper limit is measured in bytes rather than in characters:
32-bit systems: 4,294,967,295 bytes
64-bit systems: 18,446,744,073,709,551,615 bytes
Terraform stores strings in memory using Unicode NFC normal form, UTF-8 encoded, and so the number of characters will vary depending on how many bytes each character takes up in the UTF-8 encoding form. ASCII characters take only one byte, but other characters can require up to four bytes.
A string of the maximum representable length would take up the entire address space of the Terraform process, which is impossible (there needs to be room for the Terraform application code, libraries, and kernel space too!), and so in practice the available memory on your system is the more relevant limit. That limit varies based on various characteristics of the system where you're running Terraform, and so isn't answerable in a general sense.

In 2018 a Tech Lead at Google said they were working to "support buffers way beyond 4GiB" in V8 on 64 bit systems. Did that happen?

In 2018 a Tech Lead at Google said they were working to "support buffers way beyond 4GiB" in V8 on 64 bit systems. Did that happen?
Trying to load a large file into a buffer like:
const fileBuffer = fs.readFileSync(csvPath);
in Node v12.16.1 and getting the error:
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3461193224) is greater than possible Buffer: 2147483647 bytes.
and in Node v14.12.0 (latest) and getting the error:
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3461193224) is greater than 2 GB
Which looks to me to be a limit set due to 32 bit integers for addressing of the buffers. But I don't understand why this would be a limitation on 64 bit systems... Yes I realize I can use streams or read from the file at a specific address, but I have massive amounts of memory laying around, and I'm limited to 2147483647 bytes because Node is limited at 32 bit addressing?
Surely having a buffer of a high frequency random access data-set fully loaded into a buffer rather than streamed has performance benefits. The code involved in directing the request to pull from the multiple buffer alternative structure is going to cost something, regardless of how small...
I can use the --max-old-space-size=16000 flag to increase the maximum memory used by Node, but I suspect this is a hard-limit based on the architecture of V8. However I still have to ask since the tech lead at Google did claim they were increasing the maximum buffer size past 4GiB: Is there any way in 2020 to have a buffer beyond 2147483647 bytes in Node.js?
Edit, relevant tracker on the topic by Google, where apparently they were working on fixing this since at least last year: https://bugs.chromium.org/p/v8/issues/detail?id=4153
Did that happen?
Yes, V8 supports very large (many gigabytes) ArrayBuffers nowadays.
Is there any way to have a buffer beyond 2147483647 bytes in Node.js?
Yes:
$ node
Welcome to Node.js v14.12.0.
Type ".help" for more information.
> let b = Buffer.alloc(3461193224)
undefined
> b.length
3461193224
That said, it appears that fs.readFileAsync has its own limit: https://github.com/nodejs/node/blob/master/lib/internal/fs/promises.js#L5
I have no idea what it would take to lift that. I suggest you file an issue on Node's bug tracker.
FWIW, Buffer has yet another limit:
> let buffer = require("buffer")
undefined
> buffer.kMaxLength
4294967295
And again, that's Node's decision, not V8's.

How to write 64-bit BigInt to Buffer?

Is it possible to write 64-bit BigInts into a Buffer in Node.js (10.7+) yet?
Or do I still have to do it in two operations?
let buf = Buffer.allocUnsafe(16);
buf.writeUInt32BE(Number(time>>32n),0,true);
buf.writeUInt32BE(Number(time&4294967295n),4,true);
I can't find anything promising in the docs, but there's other barely documented methods such as BigInt.asUintN, so i thought I'd ask.
I was just faced with a similar problem (needing to build and write 64-bit IDs consisting of a 41-bit timestamp, 13-bit node ID, and a 10-bit counter). The largest single value I was able to write to a buffer was 48-bit using buf.writeIntLE(). So I ended up building up / writing the high 48 bits, and low 16 bits independently. If there's a better way to do it, I'm not aware of it.
Did you already try this package?
https://github.com/substack/node-bigint#tobufferopts

How can I get maximum Buffer size in Node.js

In Node.js 0.12.x maximum size of the buffer was limited by allocatable memory, which size could be got with:
require('smalloc').kMaxLength;
The actual value of kMaxLength was hardcoded in old versions of V8 and was equal to 0x3fffffff.
The problem is there is no smalloc module in io.js >=3.x (including node.js 4.x). It was mentioned that Buffer implementation was rewritten in V8 4.4.x.
So, my question is: is there a way to get the maximum size of the Buffer (and/or allocatable memory) in io.js >= 3.x ?
This file "calculated" (https://github.com/v8/v8-git-mirror/blob/4.4.63/src/objects.h) also has fixed constant for the external array.
4642 // Maximal acceptable length for an external array.
4643 static const int kMaxLength = 0x3fffffff;
EDIT:
It looks like you can use
require('buffer').kMaxLength;
That was the change in 3.0 and still in 4.0
b625ab4242 - buffer: fix usage of kMaxLength (Trevor Norris) #2003

What's the maximum size of a Node.js Buffer

I got a fatal error reading a file that was too big to fit in a buffer.
FATAL ERROR: v8::Object::SetIndexedPropertiesToExternalArrayData() length exceeds max acceptable value
Or,
RangeError: "size" argument must not be larger than 2147483647
at Function.Buffer.allocUnsafe (buffer.js:209:3)
If I try to allocate a 1GB Buffer I get the same fatal Error,
var oneGigInBytes = 1073741824;
var my1GBuffer = new Buffer(oneGigInBytes); //Crash
What is the maximum size of a Node.js Buffer class instance?
Maximum length of a typed array in V8 is currently set to kSmiMaxValue which depending on the platform is either:
1Gb - 1byte on 32-bit
2Gb - 1byte on 64-bit
Relevant constant in the code is v8::internal::JSTypedArray::kMaxLength (source).
V8 team is working on increasing this even further on 64-bit platforms, where currently ArrayBuffer objects can be up to Number.MAX_SAFE_INTEGER large (2**53 - 1). See bug 4153.
This is now documented as part of Node's buffer api, the maximum size is buffer.constants.MAX_LENGTH.
buffer.constants.MAX_LENGTH <integer> The largest size allowed for a single Buffer instance.
On 32-bit architectures, this value is (2^30)-1 (~1GB).
On 64-bit architectures, this value is (2^31)-1 (~2GB).
This value is also available as buffer.kMaxLength.
So you can figure out how big it is by doing
> (require('buffer').constants.MAX_LENGTH + 1) / 2**30
2
Seems like the current max buffer size is 2147483647 bytes aka 2.147GB
Source: https://stackoverflow.com/a/44994896/3973137 (and my own code)
The actual maximum size of a buffer changes across platforms and
versions of Node.js. You can find out what's the limit
in bytes in a given platform, run this:
import buffer from "buffer";
console.log(buffer.constants.MAX_LENGTH);

Resources