How can DWORD ever hold values more than 32 bits?

How can DWORD ever hold values more than 32 bits? - lotus-notes

The NSFDbSpaceUsage function in the Lotus Notes C API is defined as:
STATUS LNPUBLIC NSFDbSpaceUsage(
DBHANDLE hDB,
DWORD far *retAllocatedBytes,
DWORD far *retFreeByes);
This function returns the number of bytes allocated and the number of bytes free in a specified database.
Reading SO and elsewhere, I understand that DWORD is associated with unsigned long, which is (usually) 32 bits. What puzzles me is how the above function will ever return the size of a Domino database that is more than 2^32 bytes in size. And in fact, my sample application never returns anything greater than 2,147,483,647 (2^31) for some of my larger databases. An NSF file in Domino can grow to 64 GB, so why would IBM use a DWORD to report the number of bytes allocated, when a DWORD can't represent more than 4,294,967,296 (2^32) bytes?
What am I missing?

I'm guessing this API method was created back when 4 GBs was larger than anyone could ever imagine ;)
According to the comments here, this method is limited to only 4 GBs and there is another method to use: NSFDbSpaceUsageScaled

Related

Should we store long strings on stack in assembly?

The general way to store strings in NASM is to use db like in msg: db 'hello world', 0xA. I think this stores the string in the bss section. So the string will occupy the storage for the duration of the entire program. Instead, if we store it in the stack, it will be alive only during the local frame. For small strings (less than 8 bytes), this can be done using mov dword [rsp], 'foo'. But for longer strings, the string has to be split and be stored using multiple instructions. So this would increase the executable size (I thought so).
So now, which is better in large programs with multiple strings? Are any of the assumptions I made above, wrong?

mov dword [rsp] 'foo' assembles to C70424666F6F00, it takes 7 bytes to encode 4 payload characters.
In comparison with standard static way DB 'foo',0 the definition of string as immediate operand in code section increases the code size by 75 %.
Your dynamic method may be profitable only if you could eliminate the .rodata or .data section entirely (which is seldom the case of large programs). Each additional section takes more space in executable program than its netto contents because of its file-alignment (in PE format it is 512 bytes).
And even when your program has no other static data in data sections beside long strings, you could spare more space with static declaration in .text (code) section.

But for longer strings string has to be split and be stored using multiple instructions. So this would increase the executable size (I thought so).
Yep, and in almost all cases, the number of bytes used by those instructions will exceed the number of bytes that would be needed to just store the string in memory normally. (The instruction includes all the bytes of the immediate, with a few exceptions like zero- and sign-extension, as well as additional bytes to encode the opcode, destination address, etc). And code, of course, also occupies (virtual) memory for the entire duration of the program. There's no free lunch.
As such, you should just assemble the strings directly into memory with db as you have done. But try to arrange for them to go in a read-only section such as .text or .rodata depending on what your system supports. A modern operating system will then be able to discard those pages from physical memory if there is memory pressure, knowing that they can be reloaded from the executable if needed again. This is entirely transparent to your program. If there are many strings that will be needed at the same times, you can optimize a little by trying to arrange for them to be adjacent in your program's memory (defining them all together in one asm file should suffice).
You could also design something where at runtime, you read your strings from an external file into dynamically allocated memory, then free it when done with them. This was a common technique for programs on ancient operating systems with limited physical memory and no virtual memory support (think MS-DOS).

The string data has to be somewhere. Existing as immediates in your machine code takes space in the .text section of your program, normally linked into the same program segment as .rodata where you'd keep string literals. Running instructions to store strings to the stack means the data has to come into I-cache, then go out into D-cache.
But for long redundant strings, code to generate them in memory may be smaller than the string itself. This comes down to the Kolmogorov complexity; minimum size of a program that can output (or generate in an array) the given data. That program could be a gzip or zstd decompressor with some input constant data, could be good for a very large string or set of strings, much larger than the decompression code. Or for a specific case, have a look at code-golf questions like The alphabet in programming languages where my answer is 9 bytes of x86 machine code (including a ret) which inefficiently stores 1 byte at a time, incrementing a pointer in a loop, to produce the 26-byte string (without a terminating 0). Slow but compact.
Pushing / Storing from immediates
For just 4 bytes (not including the 0 terminator) you'd use push 'foo' = 5 bytes = 80% efficiency. On x86-64, that's a qword push (sign-extending the imm32 to 64-bit), so we get 4 bytes of zeros for free.
After that, getting the pointer into a register with mov rdi, rsp (3 bytes) is 4 bytes smaller than lea rdi, [rel msg_foo] (7 bytes), so it's an overall win for total size (unless padding for function alignment bumps it up or hides it). But anyway, comparing against the best option instead of the worst might be a good idea for your answer.
Compilers will sometimes use immediate data to init a local struct or array that has to be on the stack anyway (i.e. they have to pass a non-const pointer to another function); their threshold for switching to copying from .rodata (with movdqa xmm load/store) is higher than 8 bytes. But when you just want to print the string, you just need to pass a pointer to .rodata without copying it to the stack at all, so the threshold is much lower for it to be worth it to use stack space. Maybe 8 bytes, maybe 16, maybe only 4, depending on I-cache vs. D-cache pressure in your program.
For short but not tiny strings
mov rcx, imm64 + push rcx is 10+1 = 11 total bytes for 8 bytes of payload = 73% efficiency, vs. 4/7 = 57%. (At an offset from RSP it would be even worse, but to save code size you could use RBP+imm8 instead of RSP+imm8, but that's still 4 bytes per 7. You could mix push sign_extended_imm32 with mov dword [rsp+4], imm32 but that's also bad.)
This does have overhead that scales with string size, so it quickly becomes more size-efficient to just copy from .rodata, e.g. with an XMM loop, call memcpy, or even rep movsb if you're optimizing for size.
Or of course just using the string in .rodata if possible, if you don't need to make a copy you can modify on the stack.
Shellcode is a common use-case for techniques like this. You need a single "flat" sequence of bytes, usually not containing any 00 bytes which would terminate a C string.
You can have some data past the end of the actual machine code part, and mov store some zeros into that and generate pointers to it, but that's somewhat cumbersome. And you need a position-independent way to get pointers into registers. call/pop works, if you jump forward and call backward so the rel32 doesn't involve any 00 bytes. Same for RIP-relative lea.
It's often just as easy to push a zeroed register, or an imm8 or imm32, and get some zeros into memory along with ASCII data that way. Generating it a bit at a time makes it easy to mov rsi, rsp or whatever, instead of needing position-independent addressing relative to RIP.

How many actual bytes of memory node.js Buffer uses internally to store 1 logical byte of data?

Node.js documentations states that:
A Buffer is similar to an array of integers but corresponds to a raw memory allocation outside the V8 heap.
Am I right that all integers are represented as 64-bit floats internally in javascript?
Does it mean that storing 1 byte in Node.js Buffer actually takes 8 bytes of memory?
Thanks

Buffers are simply an array of bytes, so the length of the buffer is essentially the number of bytes that the Buffer will occupy.
For instance, the new Buffer(size) constructor is documented as "Allocates a new buffer of size octets." Here octets clearly identifies the cells as single-byte values. Similarly buf[index] states "Get and set the octet at index. The values refer to individual bytes, so the legal range is between 0x00 and 0xFF hex or 0 and 255.".
While a buffer is absolutely an array of bytes, you may interact with it as integers or other types using the buf.read* class of functions available on the buffer object. Each of these has a specific number of bytes that are affected by the operations.
For more specifics on the internals, Node just passes the length through to smalloc which just uses malloc as you'd expect to allocate the specified number of bytes.

Does __bread() always return PAGE_SIZE number of bytes?

The Linux kernel API has a __bread method:
__bread(struct block_device *bdev, sector_t block, unsigned size)
which returns a buffer_head pointer whose data field contains size worth of data.However, I noticed that reading beyond size bytes still gave me valid data up to PAGE_SIZE number of bytes. This got me wondering if I can presume the buffer_head returned by a *__bread* always contains valid data worth PAGE_SIZE bytes even if the size argument passed to it is lesser.
Or maybe it was just a coincidence.

The __bread perform a read IO from given block interface, but depending on the backing store, you get different results.
For harddrives, the block device will fetch data in sector sizes. Usually this is either 512 bytes or 4K. If 512 bytes, and you ask for 256 bytes, you'll be able to access the last parts of the sector. Thus, you may fetch up to the sector size. However, it is not always true. With memory backed devices, you may only access the 256 bytes, as it is not served up by the block layer, but by the VSL.
In short, no. You should not rely on this feature, as it depends on which block device is backing the storage and may also change with block layer implementation.

Find DWORD Size

I am new in vc++... I have one doubt in vc++. what is the size of GetTickCount() function.The return type of GetTickCount() is DWORD. Please anyone Answer for my question.
Thanks in Advance

The size of a function means the number of bytes occupied by the code that belongs to the function. You can find this out using a debugger like Windbg. But this is not useful information in most cases. To get the size of a data type, you can use the sizeof operator. Since the return type of GetTickCount is DWORD (4 bytes), you can either do sizeof(DWORD) or sizeof(GetTickCount()) to get its size. There is also a function by the name GetTickCount64 which returns ULONGLONG which is a 64-bit unsigned value (8 bytes).

GetTickCount() returns a DWORD which is 4 bytes. The function itself can be represented using its start address (function pointer) which will have size equal to size of void* which is 4 bytes on 32-bit systems and 8 bytes on 64-bit systems. Finding the size of code that the function occupies can be problematic and is rarely needed.

Memory allocation in VC++

I am working with VC++ 2005
I have overloaded the new and delete operators.
All is fine.
My question is related to some of the magic VC++ is adding to the memory allocation.
When I use the C++ call:
data = new _T [size];
The return (for example) from the global memory allocation is 071f2ea0 but data is set to 071f2ea4
When overloaded delete [] is called the 071f2ea0 address is passed in.
Another note is when using something like:
data = new _T;
both data an the return from the global memory allocation are the same.
I am pretty sure Microsoft is adding something at the head of the memory allocation to use for book keeping. My question is, does anyone know of the rules Microsoft is using.
I want to pass in the value of "data" into some memory testing routines so I need to get back to the original memory reference from the global allocation call.
I could assume the 4 byte are an index but I wanted to make sure. I could easily be a flag plus offset, or count or and index into some other table, or just an alignment to cache line of the CPU. I need to find out for sure. I have not been able to find any references to outline the details.
I also think on one of my other runs that the offset was 6 bytes not 4

The 4 bytes most likely contains the total number of objects in the allocation so delete [] will be able to loop over all objects in the array calling their destructor..
To get back the original address, you could keep a lookup-table keyed on address / 16, which stores the base address and length. This will enable you to find the original allocation. You need to ensure that your allocation+4 doesn't cross a 16-byte boundary, however.
EDIT:
I went ahead and wrote a test program that creates 50 objects with a destructor via new, and calls delete []. The destructor just calls printf, so it won't be optimized away.
#include <stdio.h>
class MySimpleClass
{
public:
~MySimpleClass() {printf("Hi\n");}
};
int main()
{
MySimpleClass* arr = new MySimpleClass[50];
delete [] arr;
return 0;
}
The partial disassembly is below, cleaned up to be a bit more legible. As you can see, VC++ is storing array count in the initial 4 byes.
; Allocation
mov ecx, 36h ; Size of allocation
call scratch!operator new
test rax,rax ; Don't write 4 bytes if NULL.
je scratch!main+0x25
mov dword ptr [rax],32h ; Store 50 in first 4 bytes
add rax,4 ; Increment pointer by 4
; Free
lea rdi,[rax-4] ; Grab previous 4 bytes of allocation
mov ebx,dword ptr [rdi] ; Store in loop counter
jmp StartLoop ; Jump to beginning of loop
Loop:
lea rcx,[scratch!`string' (00000000`ffe11170)] ; 1st param to printf
call qword ptr [scratch!_imp_printf; Destructor
StartLoop:
sub ebx,1 ; Decrement loop counter
jns Loop ; Loop while not negative
This book keeping is distinct from the book keeping that malloc or HeapAlloc do. Those allocators don't care about objects and arrays. They only see blobs of memory with a total size. VC++ can't query the heap manager for the total size of the allocation because that would mean that the heap manager would be bound to allocate a block exactly the size that you requested. The heap manager shouldn't have this limitation - if you ask for 240 bytes to allocate for 20 12 byte objects, it should be free to return a 256 byte block that it has immediately available.

The 4 bytes offset is for the number of elements. When delete[] is invoked it is necessary to know the exact number of elements to be able to call the destructors for exactly the necessary number of objects.
Since the memory allocator could have returned a bigger block than necessary for storing all the objects the only sure way to know the number of elements is to store it in the beginning of the block.

For sure, memory allocation does need some bookkeeping info stored together with the actual memory.
Next to that, heap allocated blocks will also be 'decorated' with some magic values, used to easily detect buffer overruns, double deletion, ... Take a look at this CodeGuru site for more info. If you want to know the last of heap debugging, take a look at the msdn documentation.

The finial answer is:
When you do a malloc (which new uses under the hoods) allocates more memory than needed for the system to manage memory. This debug information is not what I am interested. What I am interested is the difference between using the return from malloc on a C++ array allocation.
What I have been able to determine is sometimes the C++ adds/uses 4 additional byte to keep count of the objects being allocated. The twist is these 4 bytes are only added if the objects being allocated require being destructed.
So given:
void* _cdecl operator new(size_t size)
{
void *ptr = malloc(size);
return(ptr);
}
for the case of:
object * data = new object [size]
data will be ptr plus 4 bytes (assuming object required a destructor)
while:
char *data = new char [size]
data will equal ptr because no destructor is required.
Again, I am not interested in the memory tracking malloc adds to manage memory.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string