What's the difference between Buffer.allocUnsafe() and Buffer.allocUnsafeSlow() in NodeJS? - node.js

I've read the docs on the subject.
It says:
When using Buffer.allocUnsafe() to allocate new Buffer instances,
allocations under 4KB are sliced from a single pre-allocated Buffer.
This allows applications to avoid the garbage collection overhead of
creating many individually allocated Buffer instances. This approach
improves both performance and memory usage by eliminating the need to
track and clean up as many individual ArrayBuffer objects.
However, in the case where a developer may need to retain a small
chunk of memory from a pool for an indeterminate amount of time, it
may be appropriate to create an un-pooled Buffer instance using
Buffer.allocUnsafeSlow() and then copying out the relevant bits.
Also I've found this explanation:
Buffer.allocUnsafeSlow is different from Buffer.allocUnsafe() method. In
allocUnsafe() method, if buffer size is less than 4KB than it
automatically cut out the required buffer from a pre-allocated buffer
i.e. it does not initialize a new buffer. It saves memory by not
allocating many small Buffer instances. But if developer need to hold
on some amount of overhead memory for intermediate amount of time,
than allocUnsafeSlow() method can be used.
Though it is hard to understand for me. May you explain it more eloquent and detailed with some examples for both, please?

Related

Is there an auto-enlarging shared memory in Linux?

I'm sharing many memory blocks between two processes in Linux, one process receives data from network, and the other uses the data in those blocks (buffers).
Some buffers are small (hundreds bytes), and some buffers are very large(2G~4G bytes), while they all have same struct (array of a struct) and will be treated with a same algorithm.
I can NOT allocate all buffers as maximal size at beginning, because that will exceed the total memory of system too far.
I have to periodically check and re-allocate them respectively.
The problem is that I have to make the client process re-mmap the block, after the server process enlarge (re-allocate) the buffer on the same name.
Regarding performance, is there a way, I can allocate a buffer with some manner likes "lazy allocating"? i.e. it will occupy very small real memory at the beginning but always seat a same virtual address even through I continuously write data resulting that it occupys more and more real memory, and the client does NOT need to re-mmap the buffer, and always can access the data with a fixed virtual address in its own virtual space.
If any, I don't need to do a lot of IPC/locking/sync things.
If any, should I set a very huge swap space?

External memory vs. heap memory in node.js - any tradeoffs one way or the other?

In node.js, what is the difference between memory allocated in the node.js heap vs. memory reported as "external" as seen in process.memoryUsage()? As there advantages or disadvantages to one over the other? If you were going to have a particularly large object (gigabytes), does it matter which one is used? That doc only says this:
external refers to the memory usage of C++ objects bound to JavaScript
objects managed by V8.
It doesn't tell you what any consequences or tradeoffs might be of doing allocations that use this "external" memory instead of the heap memory.
I'm examining tradeoffs between using a large Array (regular Javascript Array object) vs. a large Uint32Array object. Apparently, an Array is always allocated in the heap, but a Uint32Array is always allocated from "external" memory (not in the heap).
Here the background. I have an object that needs a fairly large array of 32 bit integers as part of what it does. This array can be as much as a billion units long. I realized that I could cut the storage occupied by the array in half if I switched to a Uint32Array instead of a regular Array because a regular Array stores things as double precision floats (64 bits long) vs. a Uint32Array stores things as 32 bit values.
In a test jig, I measured that the Uint32Array does indeed use 1/2 the total storage for the same number of units long (32-bits per unit instead of 64-bits per unit). But, when looking at the results of process.memoryUsage(), I noticed that the Uint32Array storage is in the "external" bucket of storage whereas the array is in the "heapUsed" bucket of storage.
To add to the context a bit more, because a Uint32Array is not resizable, when I do need to change its size, I have to allocate a new Uint32Array and then copy data over from the original (which I suppose could lead to more memory fragmentation opportunites). With an Array, you can just change the size and the JS engine takes care of whatever has to happen internally to adjust to the new size.
What are the advantages/disadvantages of making a large storage allocation that's "external" vs. in the "heap"? Any reason to care?

What is the case of using Buffer.allocUnsafe() and Buffer.alloc()?

I am confused about using Buffer.allocUnsafe() and Buffer.alloc() , I know that Buffer.allocUnsafe() creates a buffer with pre-filled data or old buffers, but why do i need such thing if Buffer.alloc() creates a buffer with zero filled data
In Node.js Buffer is an abstraction over RAM, therefore if you allocate it in an unsafe way, there is a high risk of having even some source code in the buffer instance. Try running console.log(Buffer.allocUnsafe(10000).toString('utf-8')) and I guarantee that you will see some code in your stdout.
Allocation is a synchronous operation and we know that single threaded Node.js doesn't really feel good about synchronous stuff. Unsafe allocation is much faster than safe, because the buffer santarization step takes time. Safe allocation is, well, safe, but there is a performance trade off.
I'd suggest sticking to safe allocation first and if you end up with low performance, you can think of ways to implement unsafe allocation, without exposing private stuff. Just keep in mind that allocUnsafe method has the word unsafe for a reason. E.g, if you are going to pass some compliance certification like PCI DSS, I'm pretty sure QSA will notice that and will have a lot of questions.
Buffer.alloc(size, fill, encoding) -> returns a new initialized Buffer
of the specified size. This method is slower than Buffer.allocUnsafe(size) but guarantees that newly created Buffer instances never contain old data that is potentially sensitive.
Buffer.allocUnsafe(size) -> the Buffer is uninitialized, the allocated
segment of memory might contain old data that is potentially
sensitive. Using a Buffer created by Buffer.allocUnsafe() without completely overwriting the memory can allow this old data to be leaked when the Buffer memory is read.
Note: While there are clear performance advantages to using Buffer.allocUnsafe(), extra care must be taken in order to avoid introducing security vulnerabilities into an application

1GB Vector, will Vector.Unboxed give trouble, will Vector.Storable give trouble?

We need to store a large 1GB of contiguous bytes in memory for long periods of time (weeks to months), and are trying to choose a Vector/Array library. I had two concerns that I can't find the answer to.
Vector.Unboxed seems to store the underlying bytes on the heap, which can be moved around at will by the GC.... Periodically moving 1GB of data would be something I would like to avoid.
Vector.Storable solves this problem by storing the underlying bytes in the c heap. But everything I've read seems to indicate that this is really only to be used for communicating with other languages (primarily c). Is there some reason that I should avoid using Vector.Storable for internal Haskell usage.
I'm open to a third option if it makes sense!
My first thought was the mmap package, which allows you to "memory-map" a file into memory, using the virtual memory system to manage paging. I don't know if this is appropriate for your use case (in particular, I don't know if you're loading or computing this 1GB of data), but it may be worth looking at.
In particular, I think this prevents the GC moving the data around (since it's not on the Haskell heap, it's managed by the OS virtual memory subsystem). On the other hand, this interface handles only raw bytes; you couldn't have, say, an array of Customer objects or something.

Linux async (io_submit) write v/s normal (buffered) write

Since writes are immediate anyway (copy to kernel buffer and return), what's the advantage of using io_submit for writes?
In fact, it (aio/io_submit) seems worse since you have to allocate the write buffers on the heap and can't use stack-based buffers.
My question is only about writes, not reads.
EDIT: I am talking about relatively small writes (few KB at most), not MB or GB, so buffer copy should not be a big problem.
Copying a buffer into the kernel is not necessarily instantaneous.
First the kernel needs to find a free page. If there is none (which is fairly likely under heavy disk-write pressure), it has to decide to evict one. If it decides to evict a dirty page (instead of evicting your process for instance), it will have to actually write it before it can use that page.
there's a related issue in linux when saturating writing to a slow drive, the page cache fills up with dirty pages backed by a slow drive. Whenever the kernel needs a page, for any reason, it takes a long time to acquire one and the whole system freezes as a result.
The size of each individual write is less relevant than the write pressure of the system. If you have a million small writes already queued up, this may be the one that has to block.
Regarding whether the allocation lives on the stack or the heap is also less relevant. If you want efficient allocation of blocks to write, you can use a dedicated pool allocator (from the heap) and not pay for the general purpose heap allocator.
aio_write() gets around this by not copying the buffer into the kernel at all, it may even be DMAd straight out of your buffer (given the alignment requirements), which means you're likely to save a copy as well.

Resources