Is there any limit in setting zram device disksize on Linux? - linux

I'm trying to create a zram device on my target device. My target can not allocate memory if the zram disksize is above 100GB, but it's okay with the disksize of 50GB or less.
Is there any limit in setting zram device disksize on Linux? My target device only has 2GB of RAM memory.

I guess you can give a number up to UINT64_MAX - 4095 = 18446744073709547520 on a 64-bit platform.
https://github.com/torvalds/linux/blob/master/drivers/block/zram/zram_drv.h#L101
https://github.com/torvalds/linux/blob/master/drivers/block/zram/zram_drv.c#L1506
https://github.com/torvalds/linux/blob/master/drivers/block/zram/zram_drv.c#L901
So what we have:
... disksize_store(...) {
u64 disksize;
...
// ok, we can give at least UINT64_MAX here.
disksize = unsigned long long memparse(...);
// PAGE_ALIGN, PAGE_SIZE = 1<<12
disksize = PAGE_ALIGN(disksize)
= (((disksize)+((PAGE_SIZE)-1))&(~((typeof(disksize))(PAGE_SIZE)-1)))
= (disksize + ((1<<12)-1))&(~((1<<12)-1))
= (disksize + 4095) & 0xfffffffffffff000
// ^^^^^^^^^^^^^^^ this can overflow
// so max number is UINT64_MAX - 4095 so it doesn't overflow
// otherwise this macro will return 0
...
if (!zram_meta_alloc(..., disksize) {
...
return ...;
}
...
zram->disksize = disksize;
...
}
So let's see into zram_meta_alloc:
... zram_meta_alloc(..., disksize) {
...
num_pages = disksize >> PAGE_SHIFT;
// max num_pages = 0xfffffffffffff = UINT64_MAX >> PAGE_SHIFT
... = vzalloc(num_pages * sizeof(*zram->table));
// ^^^^^^^^^^^^^^^ this can overflow
...
}
vzallloc takes as argument unsigned long. ULONG_MAX should be UINT64_MAX on 64-bit platform. sizeof(*zram->table) is equal to sizeof(unsigned long) + sizeof(unsigned long) + [optional: + sizeof(ktime_t)] + padding (see here). Without padding, assuming 64-bit platform, sizeof(unsigned long) = 8 that should be equal to 8+8[+8] = 16 or 24. But anyway, maximum num_pages is equal to UINT64_MAX >> 12, so to overflow it on 64bit multiplication we would need sizeof(*zram->table) = 2^PAGE_SIZE = 4096, and that shouldn't happen (unless the compiler decides to give over 4000 bytes of padding into the zram->table struct). So we are left with UINT64_MAX - 4095.
So we are left, that the maximum number of disksize is UINT64_MAX-4095. If you give the disksize equal to UINT64_MAX - x, where 0 <= x < 4095, than because of PAGE_ALIGN macro, the disksize will be effectively set to 0. Probably this should be brought up to a kernel developer and they should modify the PAGE_ALIGN macro to support such numbers.
6 days ago to vzalloc calls the call to array_size was added to protect against overflow with this commit.

There is no limit but there is an overhead.
"Note that zram uses about 0.1% of the size of the disk when not in use so a huge zram is wasteful."
https://www.kernel.org/doc/Documentation/blockdev/zram.txt
Also disk_size is a virtual size purely dependent on the input and the compression ratio that receives via chosen alg. Disk-size is the max uncompressed size and general disk parameters.
The only 'actual' control is via mem_limit which is compressed size + disk & zram overheads.
Compression ratio is completely dependent on comp alg chosen from /proc/crypto as zlib & zstd are far more effective but are far slower. It is also very dependent on input as with text zlib & zstd can be over double that what lzo & lz4 will achieve.
If the input is already compressed any alg might garner little to zero compression and without a mem_limit could grab much precious memory from the system.
Mem_limit is the max you are prepared zram to grab from system and a disk-size any more than the compression ratio expected applied to mem_limit is likely a waste.
It will never get used but be part of the 0.1% empty creation overhead.
Maybe try https://github.com/StuartIanNaylor/zram-config

Related

NtQueryObject returns wrong insufficient required size via WOW64, why?

I am using the NT native API NtQueryObject()/ZwQueryObject() from user mode (and I am aware of the risks in general and I have written kernel mode drivers for Windows in the past in my professional capacity).
Generally when one uses the typical "query information" function (of which there are a few) the protocol is first to ask with a too small buffer to retrieve the required size with STATUS_INFO_LENGTH_MISMATCH, then allocate a buffer of said size and query again -- this time using the buffer and previously returned size.
In order to get the list of object types (67 on my build) on the system I am doing just that:
ULONG Size = 0;
NTSTATUS Status = NtQueryObject(NULL, ObjectTypesInformation, &Size, sizeof(Size), &Size);
And in Size I get 8280 (WOW64) and 8968 (x64). I then proceed to allocate the buffer with calloc() and query again:
ULONG Size2 = 0;
BYTE* Buf = (BYTE*)::calloc(1, Size);
Status = NtQueryObject(NULL, ObjectTypesInformation, Buf, Size, &Size2);
NB: ObjectTypesInformation is 3. It isn't declared in winternl.h, but Nebbett (as ObjectAllTypesInformation) and others describe it. Since I am not querying for a particular object's traits but the system-wide list of object types, I pass NULL for the object handle.
Curiously on WOW64, i.e. 32-bit, the value in Size2 upon return from the second query is 16 Bytes (= 8296) bigger than the previously returned required size.
As far as alignment is concerned, I'd expect at most 8 Bytes for this sort of thing and indeed neither 8280 nor 8296 are at a 16 Byte alignment boundary, but on an 8 Byte one.
Certainly I can add some slack space on top of the returned required size (e.g. ALIGN_UP to the next 32 Byte alignment boundary), but this seems highly irregular to be honest. And I'd rather want to understand what's going on than to implement a workaround that breaks, because I miss something crucial.
The practical issue for the code is that in Debug configurations it tells me there's a corrupted heap somewhere, upon freeing Buf. Which suggests that NtQueryObject() was indeed writing these extra 16 Bytes beyond the buffer I provided.
Question: Any idea why it is doing that?
As usual for NT native API the sources of information are scarce. The x64 version of the exact same code returns the exact number of bytes required. So my thinking here is that WOW64 is the issue. A somewhat cursory look into wow64.dll with IDA didn't reveal any immediate points for suspicion regarding what goes wrong in translating the results to 32-bit here.
PS: Windows 10 (10.0.19043, ntdll.dll "timestamp" 77755782)
PPS: this may be related: https://wj32.org/wp/2012/11/30/obquerytypeinfo-and-ntqueryobject-buffer-overrun-in-windows-8/ Tested it, by checking that OBJECT_TYPE_INFORMATION::TypeName.Length + sizeof(WCHAR) == OBJECT_TYPE_INFORMATION::TypeName.MaximumLength in all returned items, which was the case.
The only part of ObjectTypesInformation that's public is the first field defined in winternl.h header in the Windows SDK:
typedef struct __PUBLIC_OBJECT_TYPE_INFORMATION {
UNICODE_STRING TypeName;
ULONG Reserved [22]; // reserved for internal use
} PUBLIC_OBJECT_TYPE_INFORMATION, *PPUBLIC_OBJECT_TYPE_INFORMATION;
For x86 this is 96 bytes, and for x64 this is 104 bytes (assuming you have the right packing mode enabled). The difference is the pointer in UNICODE_STRING which changes the alignment in x64.
Any additional memory space should be related to the TypeName buffer.
UNICODE_STRING accounts for 8 bytes of the difference between 8280 and 8296. The function uses the sizeof(ULONG_PTR) for alignment of the returned string plus an extra WCHAR, so that could easily account for the remaining 8 bytes.
AFAIK: The public use of NtQueryObject is supposed to be limited to kernel-mode use which of course means it always matches the OS native bitness (x86 code can't run as kernel in x64 native OS), so it's probably just a quirk of using the NT functions via the WOW64 thunk.
Alright, I think I figured out the issue with the help of WinDbg and a thorough look at wow64.dll using IDA.
NB: the wow64.dll I have has the same build number, but differs slightly in data only (checksum, security directory entry, pieces from version resources). The code is identical, which was to be expected, given deterministic builds and how they affect the PE timestamp.
There's an internal function called whNtQueryObject_SpecialQueryCase (according to PDBs), which covers the ObjectTypesInformation class queries.
For the above wow64.dll I used the following points of interest in WinDbg, from a 32 bit program which calls NtQueryObject(NULL, ObjectTypesInformation, ...) (the program itself is irrelevant, though):
0:000> .load wow64exts
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B0E0
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B14E
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B1A7
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B24A
0:000> bp wow64!whNtQueryObject_SpecialQueryCase+B252
Explanation of the above points of interest:
+B0E0: computing length required for 64 bit query, based on passed length for 32 bit
+B14E: call to NtQueryObject()
+B1A7: loop body for copying 64 to 32 bit buffer contents, after successful NtQueryObject() call
+B24A: computing written length by subtracting current (last + 1) entry from base buffer address
+B252: downsizing returned (64 bit) required length to 32 bit
The logic of this function in regards to just ObjectTypesInformation is roughly as follows:
Common steps
Take the ObjectInformationLength (32 bit query!) argument and size it up to fit the 64 bit info
Align the retrieved size up to the next 16 byte boundary
If necessary allocate the resulting amount from some PEB::ProcessHeap and store in TLS slot 3; otherwise using this as a scratch space
Call NtQueryObject() passing the buffer and length from the two previous steps
The length passed to NtQueryObject() is the one from step 1, not the one aligned to a 16 byte boundary. There seems to be some sort of header to this scratch space, so perhaps that's where the 16 byte alignment comes from?
Case 1: buffer size too small (here: 4), just querying required length
The up-sized length in this case equals 4, which is too small and consequently NtQueryObject() returns STATUS_INFO_LENGTH_MISMATCH. Required size is reported as 8968.
Down-size from the 64 bit required length to 32 bit and end up 16 bytes too short
Return the status from NtQueryObject() and the down-sized required length form the previous step
Case 2: buffer size supposedly (!) sufficient
Copy OBJECT_TYPES_INFORMATION::NumberOfTypes from queried buffer to 32 bit one
Step to the first entry (OBJECT_TYPE_INFORMATION) of source (64 bit) and target (32 bit) buffer, 8 and 4 byte aligned respectively
For for each entry up to OBJECT_TYPES_INFORMATION::NumberOfTypes:
Copy UNICODE_STRING::Length and UNICODE_STRING::MaximumLength for TypeName member
memcpy() UNICODE_STRING::Length bytes from source to target UNICODE_STRING::Buffer (target entry + sizeof(OBJECT_TYPE_INFORMATION32)
Add terminating zero (WCHAR) past the memcpy'd string
Copy the individual members past the TypeName from 64 to 32 bit struct
Compute pointer of next entry by aligning UNICODE_STRING::MaximumLength up to an 8 byte boundary (i.e. the ULONG_PTR alignment mentioned in the other answer) + sizeof(OBJECT_TYPE_INFORMATION64) (already 8 byte aligned!)
The next target entry (32 bit) gets 4 byte aligned instead
At the end compute required (32 bit) length by subtracting the value we arrived at for the "next" entry (i.e. one past the last) from the base address of the buffer passed by the WOW64 program (32 bit) to NtQueryObject()
In my debugged scenario these were: 0x008ce050 - 0x008cbfe8 = 0x00002068 (= 8296), which is 16 bytes larger than the buffer length we were told during case 1 (8280)!
The issue
That crucial last step differs between merely querying and actually getting the buffer filled. There is no further bounds checking in that loop I described for case 2.
And this means it will just overrun the passed buffer and return a written length bigger than the buffer length passed to it.
Possible solutions and workarounds
I'll have to approach this mathematically after some sleep, the workaround is obviously to top up the required length returned from case 1 in order to avoid the buffer overrun. The easiest method is to use my up_size_from_32bit() from the example below and use that on the returned required size. This way you are allocating enough for the 64 bit buffer, while querying the 32 bit one. This should never overrun during the copy loop.
However, the fix in wow64.dll is a little more involved, I guess. While adding bounds checking to the loop would help avert the overrun, it would mean that the caller would have to query for the required size twice, because the first time around it lies to us.
Which means the query-only case (1) would have to allocate that internal buffer after querying the required length for 64 bit, then get it filled and then walk the entries (just like the copy loop), skipping over the last entry to compute the required length the same as it is now done after the copy loop.
Example program demonstrating the "static" computation by wow64.dll
Build for x64, just the way wow64.dll was!
#define WIN32_LEAN_AND_MEAN
#include <Windows.h>
#include <cstdio>
typedef struct
{
ULONG JustPretending[24];
} OBJECT_TYPE_INFORMATION32;
typedef struct
{
ULONG JustPretending[26];
} OBJECT_TYPE_INFORMATION64;
constexpr ULONG size_delta_3264 = sizeof(OBJECT_TYPE_INFORMATION64) - sizeof(OBJECT_TYPE_INFORMATION32);
constexpr ULONG down_size_to_32bit(ULONG len)
{
return len - size_delta_3264 * ((len - 4) / sizeof(OBJECT_TYPE_INFORMATION64));
}
constexpr ULONG up_size_from_32bit(ULONG len)
{
return len + size_delta_3264 * ((len - 4) / sizeof(OBJECT_TYPE_INFORMATION32));
}
// Trying to mimic the wdm.h macro
constexpr size_t align_up_by(size_t address, size_t alignment)
{
return (address + (alignment - 1)) & ~(alignment - 1);
}
constexpr auto u32 = 8280UL;
constexpr auto u64 = 8968UL;
constexpr auto from_64 = down_size_to_32bit(u64);
constexpr auto from_32 = up_size_from_32bit(u32);
constexpr auto from_32_16_byte_aligned = (ULONG)align_up_by(from_32, 16);
int wmain()
{
wprintf(L"32 to 64 bit: %u -> %u -(16-byte-align)-> %u\n", u32, from_32, from_32_16_byte_aligned);
wprintf(L"64 to 32 bit: %u -> %u\n", u64, from_64);
return 0;
}
static_assert(sizeof(OBJECT_TYPE_INFORMATION32) == 96, "Size for 64 bit struct does not match.");
static_assert(sizeof(OBJECT_TYPE_INFORMATION64) == 104, "Size for 64 bit struct does not match.");
static_assert(u32 == from_64, "Must match (from 64 to 32 bit)");
static_assert(u64 == from_32, "Must match (from 32 to 64 bit)");
static_assert(from_32_16_byte_aligned % 16 == 0, "16 byte alignment failed");
static_assert(from_32_16_byte_aligned > from_32, "We're aligning up");
This does not mimic the computation that happens in case 2, though.

Different memory allocations in linux and windows?

I have a tree (T*Tree: binary tree with many elements in the node) implemented in C++.
I want to insert around 5,000,000 integer values in it (let's say from 1 till 5,000,000). The tree size should be around 8 * 5,000,000 byte or 41MB in memory (according to my implementation which is reasonable).
When I display the size of the tree(in my program by calculating the size of every node), it is 41MB as normal. However when I checked in Windows 32bit>>"Task Manager" I found the memory taken is 732MB!!
I checked that there is no extra malloc in my code. Even after I freed the tree by traversing from node to node and deleting them(and the keys inside) the size in "Task Manager" becomes 513MB only!!
After that I compiled same code in Linux Ubuntu 32bit(virtual machine on another PC) and ran the program. Again tree size does not change in my program i.e. 41MB as normal, but in "System Monitor" memory is 230MB and when freeing the tree nodes in my program the memory in "System Monitor" remains same 230MB.
And in both Windows and Linux if I freed & reinitialized the tree and insert again 5,000,0000 integer values, the memory is increased by double like if the previous space is not freed and used somewhere (which I am not able to find where).
The question:
1) why are those huge memory differences in Windows and Linux although the code & input data is same?
2) why freeing the Tree nodes doesn't reduce the memory to some reasonable value like 10MB.
code: https://drive.google.com/open?id=0ByKaCojxzNa9dEt6cEJNeDI4eXc
below are some snippets:
typedef struct Keylist {
unsigned int k;
struct Keylist *next_ptr;
};
typedef struct Keylist Keylist;
typedef struct TstarTreeNode {
//Binary Node specific
struct TstarTreeNode *left;
struct TstarTreeNode *right;
//Bool rightVisitedDuringInsert;
//AVL Node specific
int height;
//T Node specific
int length; //length of keys array for easy locating
struct Keylist *keys; //later you deal with it like one dimentional array
int max; //max key
int min; //min key
//T* Node specific
struct TstarTreeNode *successor;
};
typedef struct TstarTreeNode TstarTreeNode;
/*****************************************************************************
* *
* Define a structure for binary trees. *
* *
*****************************************************************************/
typedef struct TstarTree {
int size; //number of element(not number of nodes) in a tree
int MinCount; //Min Count of elements in a Node
int MaxCount; //Max Count of elements in a Node
TstarTreeNode *root;
//Provide functions for comarison elements and destroying elements
int (*compare)(int key1, int key2); //// -1 smaller, 0 equal, 1 bigger
int (*inRange)(int key, int min, int max); // -1 smaller, 0 in range, 1 bigger
} ;
typedef struct TstarTree TstarTree;
Insert function of the tree uses dynamic allocation i.e. malloc.
Update
according to what "John Zwinck" pointed out (thanks John), I have two things now:
1) The huge memory taken in Windows was because of the compiling options in Visual Studio, which I think enabled debugging and a lot of extra things. When I compiled in Windows using Cygwin without that options i.e. "gcc main.c tstarTree.c -o main" I got same result as in Linux. The size now in Windows>>"Task Manager" becomes 230MB
2) If OS is 64bit then let's see how the size is calculated (as John said and as I modified):
5 million unsigned int k. 20 MB.
5 million 4-byte pads (after k to align next_ptr). 20 MB.
5 million 8-byte next_ptr. 40 MB.
5 million times the overhead of malloc(). I think for 64bit OS it is 32 bytes each (according to John provided link). so 160 MB.
N TstarTreeNodes, each of which is 48 bytes in the full code.
N times the overhead of malloc() (I think, 32 bytes each).
N is the number of nodes. I have a resulting balanced complete tree of height 16 so I assume the number of nodes are 2^17-1. so the last two items become 6.2MB(i.e. 2^17 * 48) + 4.1MB(i.e. 2^17 * 32) =10MB
So the total is: 20+20+40+160+10= 250MB which is somehow reasonable and close to 230MB.
However I have Windows/Linux 32bit it will be (I think):
5 million unsigned int k. 20 MB.
5 million 4-byte next_ptr. 20 MB.
5 million times the overhead of malloc(). I think for 32bit OS it is 16 bytes each. so 80 MB.
N TstarTreeNodes, each of which is 32 bytes in the full code.
N times the overhead of malloc() (I think, 16 bytes each).
N is the number of nodes. I have a resulting balanced complete tree of height 16 so I assume the number of nodes are 2^17-1. so the last two items become 4.1MB(i.e. 2^17 * 32) + 2MB(i.e. 2^17 * 16) =6MB
So the total is: 20+20+80+6= 126MB it is a little far from 230MB which I get in "Task Manager" (if you know why please tell me?)
Currently the remaining important question is, why isn't the tree freed from memory when I am freeing all the nodes and keys in the tree using this code:
void freekeys(struct Keylist ** keys){
if ((*keys) == NULL)
{
return;
}
freekeys(&(*keys)->next_ptr);
(*keys)->next_ptr = NULL;
free((*keys));
(*keys) = NULL;
}
void freeTree(struct TstarTreeNode ** tree){
if ((*tree) == NULL)
{
return;
}
freeTree(&(*tree)->left);
freeTree(&(*tree)->right);
freekeys(&(*tree)->keys);
(*tree)->keys = NULL;
(*tree)->left = NULL;
(*tree)->right = NULL;
(*tree)->successor = NULL;
free((*tree));
(*tree) = NULL;
}
and in main():
TstarTree * tree;
...
freeTree(&tree->root);
free(tree);
Note:
The tree is working perfectly (insert, update, delete, lookup, display...) but when trying to free the tree from memory nothing changed in its size
You say your data takes:
8 * 5,000,000 byte or 41MB in memory
But that is not correct. Looking at your code there are two main structures:
struct Keylist {
unsigned int k;
Keylist *next_ptr;
};
struct TstarTreeNode {
TstarTreeNode *left, *right;
Keylist *keys;
TstarTreeNode *successor;
};
Let's say we have 5 million integers to store, as in your example. What will we need?
5 million unsigned int k. 20 MB.
5 million 4-byte pads (after k to align next_ptr). 20 MB.
5 million 8-byte next_ptr. 40 MB.
5 million times the overhead of malloc(). Likely 16 bytes each. 80 MB.
N TstarTreeNodes, each of which is 48 bytes in the full code.
N times the overhead of malloc() (again, 16 bytes each).
If N is 500,000 (for example, I don't know the real value but you do), those last two items add up to 32 MB. That brings the total to at least 192 MB as a bare minimum. Therefore, seeing 230 MB of memory usage in Linux is not surprising.
Some systems, especially when optimization is not fully enabled at build time, will add more bookkeeping and debugging information to each block allocated with malloc(). Are you building with optimization fully enabled?
One way you can save a lot of overhead is to stop using Keylist and just store the integers in plain arrays (created with malloc(), but only one per TstarTreeNode).

Can the logical erase block size of an MTD device be increased?

The minimum erase block size for jffs2 (mtd-utils version 1.5.0, mkfs.jffs2 revision 1.60) seems to be 8KiB:
Erase size 0x1000 too small. Increasing to 8KiB minimum
However I am running Linux 3.10 with an at25df321a,
m25p80 spi32766.0: at25df321a (4096 Kbytes),
and the erase block size is only 4KiB:
mtd5
Name: spi32766.0
Type: nor
Eraseblock size: 4096 bytes, 4.0 KiB
Amount of eraseblocks: 1024 (4194304 bytes, 4.0 MiB)
Minimum input/output unit size: 1 byte
Sub-page size: 1 byte
Character device major/minor: 90:10
Bad blocks are allowed: false
Device is writable: true
Is there a way to make the mtd system treat multiple erase blocks as one? Maybe some ioctl or module parameter?
If I flash a jffs2 image with larger erase block size, I get lots of kernel error messages, missing files and sometimes panic.
workaround
I found that flasherase --jffs2 results in a working filesystem inspite of the 4KiB erase block size. So I hacked the mkfs.jfss2.c file and the resulting image seems to work fine. I'll give it some testing.
diff -rupN orig/mkfs.jffs2.c new/mkfs.jffs2.c
--- orig/mkfs.jffs2.c 2014-10-20 15:43:31.751696500 +0200
+++ new/mkfs.jffs2.c 2014-10-20 15:43:12.623431400 +0200
## -1659,11 +1659,11 ## int main(int argc, char **argv)
}
erase_block_size *= units;
- /* If it's less than 8KiB, they're not allowed */
- if (erase_block_size < 0x2000) {
- fprintf(stderr, "Erase size 0x%x too small. Increasing to 8KiB minimum\n",
+ /* If it's less than 4KiB, they're not allowed */
+ if (erase_block_size < 0x1000) {
+ fprintf(stderr, "Erase size 0x%x too small. Increasing to 4KiB minimum\n",
erase_block_size);
- erase_block_size = 0x2000;
+ erase_block_size = 0x1000;
}
break;
}
http://lists.infradead.org/pipermail/linux-mtd/2010-September/031876.html
JFFS2 should be able to fit at least one node to eraseblock. The
maximum node size is 4KiB+few bytes. This is why the minimum
eraseblocks size is 8KiB.
But in practice, even 8KiB is bad because you and up with wasting a
lot of space at the end of eraseblocks.
You should join several erasblock into one virtual eraseblock of 64 or
128 KiB and use it - this will be more optimal.
Some drivers have already implemented this. I know about
MTD_SPI_NOR_USE_4K_SECTORS
Linux configuration option. It have to be set to "n" to enable large erase sectors of size 0x00010000.

Need to do 64 bit multiplication on a machine with 32 bit longs

I'm working on a small embedded system that has 32 bit long ints. For one calculation I need output the time since 1970 in ms. I can get the time in 32 bit unsigned long seconds since 1970, but how can I represent this as a 64 bit no. of ms if my biggest int is only 32bits? I'm sure stackoverflow will have a cunning answer! I am using Dynamic C, close to standard C. I have some sample code from another system which has a 64 bit long long data type:
long long T = (long long)(SampleTime * 1000.0 + 0.5);
data.TimeLower = (unsigned int)(T & 0xffffffff);
data.TimeUpper = (unsigned short)((T >> 32) & 0xffff);
Since you are only multiplying by 1000 (seconds -> millis), you can do it with two 16 bit mutliplies and one add and a bit of bit fiddling, I have used your putative data type to store the result below:
uint32_t time32 = time();
uint32_t t1 = (time32 & 0xffff) * 1000;
uint32_t t2 = ((time32 >> 16) * 1000) + (t1 >> 16);
data.TimeLower = (uint32_t) ((t2 & 0xffff) << 16) | (t1 & 0xffff);
data.TimeUpper = (uint32_t) (t2 >> 16);
The standard approach, assuming you have a 16x16->32 multiply available, would be to split both numbers into 16-bit high and low parts, compute four partial products, and add the results. If you don't have a 16x16->32 primitive which is faster than a 32x32->32 primitive, though, I'm not sure what the best approach would be. I would think that a 32x32->32 multiply should be more useful than a 16x16->32, but I can't think how one would use it.
Personally, I wish there were a standard primitive to return the top half of a NxN multiply (32x32, certainly; also 16x16 for smaller machines and 64x64 for larger ones).
It might be helpful if you were more specific about what kinds of calculations you need to do. 64-bit multiplication implemented with 32-bit operations is quite slow, and you may have the additional overhead of 64-bit division (to convert back to seconds and milliseconds), which is even slower.
Without knowing more about what exactly you need to do, it seems to me that it would be more efficient to use a struct, containing a 32-bit unsigned int for the number of seconds and a 16-bit int for the number of milliseconds (the "remainder"). (Or use a 32-bit int for the milliseconds if 64-bit alignment is more important than saving a couple of bytes.)

Understanding /proc/sys/vm/lowmem_reserve_ratio

I am not able to understand the meaning of the variable "lowmem_reserve_ratio" by reading the explanation from Documentation/sysctl/vm.txt.
I have also tried to google it but all the explanations found are exactly similar as present in vm.txt.
It will be really helpful if sb explains it or mention some link about it.
Here goes the original explanation:-
The lowmem_reserve_ratio is an array. You can see them by reading this file.
-
% cat /proc/sys/vm/lowmem_reserve_ratio
256 256 32
-
Note: # of this elements is one fewer than number of zones. Because the highest
zone's value is not necessary for following calculation.
But, these values are not used directly. The kernel calculates # of protection
pages for each zones from them. These are shown as array of protection pages
in /proc/zoneinfo like followings. (This is an example of x86-64 box).
Each zone has an array of protection pages like this.
-
Node 0, zone DMA
pages free 1355
min 3
low 3
high 4
:
:
numa_other 0
protection: (0, 2004, 2004, 2004)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
pagesets
cpu: 0 pcp: 0
:
-
These protections are added to score to judge whether this zone should be used
for page allocation or should be reclaimed.
In this example, if normal pages (index=2) are required to this DMA zone and
watermark[WMARK_HIGH] is used for watermark, the kernel judges this zone should
not be used because pages_free(1355) is smaller than watermark + protection[2]
(4 + 2004 = 2008). If this protection value is 0, this zone would be used for
normal page requirement. If requirement is DMA zone(index=0), protection[0]
(=0) is used.
zone[i]'s protection[j] is calculated by following expression.
(i < j):
zone[i]->protection[j]
= (total sums of present_pages from zone[i+1] to zone[j] on the node)
/ lowmem_reserve_ratio[i];
(i = j):
(should not be protected. = 0;
(i > j):
(not necessary, but looks 0)
The default values of lowmem_reserve_ratio[i] are
256 (if zone[i] means DMA or DMA32 zone)
32 (others).
As above expression, they are reciprocal number of ratio.
256 means 1/256. # of protection pages becomes about "0.39%" of total present
pages of higher zones on the node.
If you would like to protect more pages, smaller values are effective.
The minimum value is 1 (1/1 -> 100%).
having the same problem as you, I googled (a lot) and stumbled apon this page which might (or might not) be more understandable than the kernel doc.
(I do not quote here because it will be unreadable)
I found the wording in that document really confusing too. Looking at the source in mm/page_alloc.c helped to clear it up, so let me try my hand at a more straightforward explanation:
As is said in the page you quoted, these numbers "are reciprocal number of ratio". Worded differently: these numbers are divisors. So when calculating the reserve pages for a given zone in a node, you take the sum of pages in that node in zones higher than that one, divide it by the provided divisor, and that's how many pages you're reserving for that zone.
Example: let's assume a 1 GiB node with 768 MiB in zone Normal and 256 MiB in zone HighMem (assume no zone DMA). Let's assume the default highmem reserve "ratio" (divisor) of 32. And let's assume the typical 4 KiB page size. Now we can calculate the reserve area for zone Normal:
Sum of "higher" zones than zone Normal (just HighMem): 256 MiB = (1024 KiB / 1 MiB) * (1 page / 4 KiB) = 65536 pages
Area reserved in zone Normal for this node: 65536 pages / 32 = 2048 pages = 8 MiB.
The concept stays the same when you add more zones and nodes. Just remember that the reserved size is in pages---you never reserve a fraction of a page.
I find the kernel source code that explain very well and clear.
/*
* setup_per_zone_lowmem_reserve - called whenever
* sysctl_lowmem_reserve_ratio changes. Ensures that each zone
* has a correct pages reserved value, so an adequate number of
* pages are left in the zone after a successful __alloc_pages().
*/
static void setup_per_zone_lowmem_reserve(void)
{
struct pglist_data *pgdat;
enum zone_type j, idx;
for_each_online_pgdat(pgdat) {
for (j = 0; j < MAX_NR_ZONES; j++) {
struct zone *zone = pgdat->node_zones + j;
unsigned long managed_pages = zone->managed_pages;
zone->lowmem_reserve[j] = 0;
idx = j;
while (idx) {
struct zone *lower_zone;
idx--;
if (sysctl_lowmem_reserve_ratio[idx] < 1)
sysctl_lowmem_reserve_ratio[idx] = 1;
lower_zone = pgdat->node_zones + idx;
lower_zone->lowmem_reserve[j] = managed_pages /
sysctl_lowmem_reserve_ratio[idx];
managed_pages += lower_zone->managed_pages;
}
}
}
/* update totalreserve_pages */
calculate_totalreserve_pages();
}
And here even list an demo.
/*
* results with 256, 32 in the lowmem_reserve sysctl:
* 1G machine -> (16M dma, 800M-16M normal, 1G-800M high)
* 1G machine -> (16M dma, 784M normal, 224M high)
* NORMAL allocation will leave 784M/256 of ram reserved in the ZONE_DMA
* HIGHMEM allocation will leave 224M/32 of ram reserved in ZONE_NORMAL
* HIGHMEM allocation will leave (224M+784M)/256 of ram reserved in ZONE_DMA
*
* TBD: should special case ZONE_DMA32 machines here - in those we normally
* don't need any ZONE_NORMAL reservation
*/
int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
#ifdef CONFIG_ZONE_DMA
256,
#endif
#ifdef CONFIG_ZONE_DMA32
256,
#endif
#ifdef CONFIG_HIGHMEM
32,
#endif
32,
};
In a word, the expression looks like,
zone[1]->lowmem_reserve[2] = zone[2]->managed_pages / sysctl_lowmem_reserve_ratio[1]
zone[0]->lowmem_reserve[2] = (zone[1] + zone[2])->managed_pages / sysctl_lowmem_reserve_ratio[0]

Resources