Just wondering what's optimal here, as no suggestions are made in the NDK docs. I'm fairly sure that although bufSize is of type size_t, specifying SIZE_MAX might not be a wise choice!
If your file is small (That's subjective but for me small is under 512kb) what you can do is:
AAsset* file = AAssetManager_open(assetManager, "your/file.ext", AASSET_MODE_BUFFER);
size_t fileLength = AAsset_getLength(file);
char* fileContent = new char[fileLength];
AAsset_read(file, fileContent, fileLength);
It can actually work for file a lot larger until your memory allocation fail.
If you plan on loading huge file then I would read by chunck of 512kb but again, that's subjective, there is no hard limit (Until the memory allocation fail)
Related
In 2018 a Tech Lead at Google said they were working to "support buffers way beyond 4GiB" in V8 on 64 bit systems. Did that happen?
Trying to load a large file into a buffer like:
const fileBuffer = fs.readFileSync(csvPath);
in Node v12.16.1 and getting the error:
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3461193224) is greater than possible Buffer: 2147483647 bytes.
and in Node v14.12.0 (latest) and getting the error:
RangeError [ERR_FS_FILE_TOO_LARGE]: File size (3461193224) is greater than 2 GB
Which looks to me to be a limit set due to 32 bit integers for addressing of the buffers. But I don't understand why this would be a limitation on 64 bit systems... Yes I realize I can use streams or read from the file at a specific address, but I have massive amounts of memory laying around, and I'm limited to 2147483647 bytes because Node is limited at 32 bit addressing?
Surely having a buffer of a high frequency random access data-set fully loaded into a buffer rather than streamed has performance benefits. The code involved in directing the request to pull from the multiple buffer alternative structure is going to cost something, regardless of how small...
I can use the --max-old-space-size=16000 flag to increase the maximum memory used by Node, but I suspect this is a hard-limit based on the architecture of V8. However I still have to ask since the tech lead at Google did claim they were increasing the maximum buffer size past 4GiB: Is there any way in 2020 to have a buffer beyond 2147483647 bytes in Node.js?
Edit, relevant tracker on the topic by Google, where apparently they were working on fixing this since at least last year: https://bugs.chromium.org/p/v8/issues/detail?id=4153
Did that happen?
Yes, V8 supports very large (many gigabytes) ArrayBuffers nowadays.
Is there any way to have a buffer beyond 2147483647 bytes in Node.js?
Yes:
$ node
Welcome to Node.js v14.12.0.
Type ".help" for more information.
> let b = Buffer.alloc(3461193224)
undefined
> b.length
3461193224
That said, it appears that fs.readFileAsync has its own limit: https://github.com/nodejs/node/blob/master/lib/internal/fs/promises.js#L5
I have no idea what it would take to lift that. I suggest you file an issue on Node's bug tracker.
FWIW, Buffer has yet another limit:
> let buffer = require("buffer")
undefined
> buffer.kMaxLength
4294967295
And again, that's Node's decision, not V8's.
I have a rather large file (32 GB) which is an image of an SD card, created using dd.
I suspected that the file is empty (i.e. filled with the null byte \x00) starting from a certain point.
I checked this using python in the following way (where f is an open file handle with the cursor at the last position I could find data at):
for i in xrange(512):
if set(f.read(64*1048576))!=set(['\x00']):
print i
break
This worked well (in fact it revealed some data at the very end of the image), but took >9 minutes.
Has anyone got a better way to do this? There must be a much faster way, I'm sure, but cannot think of one.
Looking at a guide about memory buffers in python here I suspected that the comparator itself was the issue. In most non-typed languages memory copies are not very obvious despite being a killer for performance.
In this case, as Oded R. established, creating a buffer from read and comparing the result with a previously prepared nul filled one is much more efficient.
size = 512
data = bytearray(size)
cmp = bytearray(size)
And when reading:
f = open(FILENAME, 'rb')
f.readinto(data)
Two things that need to be taken into account is:
The size of the compared buffers should be equal, but comparing bigger buffers should be faster until some point (I would expect memory fragmentation to be the main limit)
The last buffer may not be the same size, reading the file into the prepared buffer will keep the tailing zeroes where we want them.
Here the comparison of the two buffers will be quick and there will be no attempts of casting the bytes to string (which we don't need) and since we reuse the same memory all the time, the garbage collector won't have much work either... :)
Assume we have a file of FILE_SIZE bytes, and:
FILE_SIZE <= min(page_size, physical_block_size);
file size never changes (i.e. truncate() or append write() are never performed);
file is modified only by completly overwriting its contents using:
pwrite(fd, buf, FILE_SIZE, 0);
Is it guaranteed on ext4 that:
Such writes are atomic with respect to concurrent reads?
Such writes are transactional with respect to a system crash?
(i.e., after a crash the file's contents is completely from some previous write and we'll never see a partial write or empty file)
Is the second true:
with data=ordered?
with data=journal or alternatively with journaling enabled for a single file?
(using ioctl(fd, EXT4_IOC_SETFLAGS, EXT4_JOURNAL_DATA_FL))
when physical_block_size < FILE_SIZE <= page_size?
I've found related question which links discussion from 2011. However:
I didn't find an explicit answer for my question 2.
I wonder, if the above is true, is it documented somewhere?
From my experiment it was not atomic.
Basically my experiment was to have two processes, one writer and one reader. The writer writes to a file in a loop and reader reads from the file
Writer Process:
char buf[][18] = {
"xxxxxxxxxxxxxxxx",
"yyyyyyyyyyyyyyyy"
};
i = 0;
while (1) {
pwrite(fd, buf[i], 18, 0);
i = (i + 1) % 2;
}
Reader Process
while(1) {
pread(fd, readbuf, 18, 0);
//check if readbuf is either buf[0] or buf[1]
}
After a while of running both processes, I could see that the readbuf is either xxxxxxxxxxxxxxxxyy or yyyyyyyyyyyyyyyyxx.
So it definitively shows that the writes are not atomic. In my case 16byte writes were always atomic.
The answer was: POSIX doesn't mandate atomicity for writes/reads except for pipes. The 16 byte atomicity that I saw was kernel specific and may/can change in future.
Details of the answer in the actual post:
write(2)/read(2) atomicity between processes in linux
I am familiar with theory about filesystems in general, not with implementation of Ext4. Take this as educated guess.
Yes, I believe one sector reads and writes will be atomic because
Link you provided quotes "Currently concurrent reads/writes are atomic only wrt individual pages, however are not on the system call. "
Disk sector (512 bytes) writes are atomic according to Stephen Tweedie. In private email conversation with him, he acknowledged that this guarantee is only as good as the hardware.
Ext filesystems overwrite data in place, no copy on write. No allocation.
There is some effort to implement inline data, very small files data can fit in the inode itself. If you only need to store few bytes, that may have impact.
Not sure about one page, but it would make little sense in full journaling mode to send less than a page to the journal before commiting.
I have an old application which uses CString through out the code.
Maximum size of the string which is written to CString is 8,9 characters, but I noticed that it allocates more. (at least 128 bytes per CString)
Is there a way to limit the size of CString buffer. Fox example to 64bytes?
Thanks in advance,
No.
In detail:
The CString implementation is internal. You find the code in CSimpleStringT::PrepareWrite2 and in the Reallocate function of the string manager.
PrepareWrite2 allocates the buffer. If there was no buffer before, it requests the exact size. If the buffer is changes. The buffer is newLength*1.5.
Finally the request is passed to the Reallocate function of the string manager. Finally this size is passed to the CRT function realloc.
Keep in mind that the memory manager itself decides again what blocksize is "effective" and might change the size again.
So as I see (in VS-2013/VS-2010) you have no chance to change the blocksize. The job is finally done by realloc. And even this function passes its request to HeapAlloc...
Assume I want to cache certain calculations, but provoking syncing it out to disk would incur an I/O penalty that would more than defy the whole purpose of caching.
This means, I need to be able find out, how much physical RAM is left (including cached memory, assuming I can push that out and allowing for some slack should buffering increase). I looked into /proc/meminfo and know how to read it out. I am not so sure how to combine the numbers to get what i want though. Code not necessary, once i know what I need I can code it myself.
I will not have root on the box it needs to run at, but it should be reasonably quiet otherwise. No large amount of disk I/O, no other processes claiming a lot of mem in a burst. The OS is a rather recent linux with overcommitting turned on. This will need to work without triggering the OOM killer obviously.
The Numbers don't need to be exact down to the megabyte, I assume that it'll be roughly in the 1 to 7 gib range depending on the box but getting close to about 100 mb would be great.
It'd definitely be preferable if the estimate were to err on the smallish side.
Unices have the standard sysconf() function (OpenGroups man page, Linux man page).
Using this function, you can get the total physical memory:
unsigned long long ps = sysconf(_SC_PAGESIZE);
unsigned long long pn = sysconf(_SC_AVPHYS_PAGES);
unsigned long long availMem = ps * pn;
As an alternative to the answer of H2CO3, you can read from /proc/meminfo.
For me, statfs worked well.
#include <sys/vfs.h>
struct statfs buf;
size_t available_mem;
if ( statfs( "/", &buf ) == -1 )
available_mem = 0;
else
available_mem = buf.f_bsize * buf.f_bfree;