Any way to reserve but not commit memory in linux? - linux

Windows has VirtualAlloc, which allows you to reserve a contiguous region of address space, but not actually use any physical memory. Later when you want to use it (or part of it) you call VirtualAlloc again to commit the region of previously reserved pages.
This is actually really useful, but I want to eventually port my application to linux - so I don't want to use it if I can't port it later. Does linux have a way to do this?
EDIT - Use Case
I'm thinking of allocating 4 GB or some such of virtual address space, but only committing it 64K at a time. This would give me a zero-copy way to grow an array up to 4 GB. Which is important, because the typical double the array size and copy introduces seemingly random unacceptable latency for very large arrays.

mmap a special file, like /dev/zero (or use MAP_ANONYMOUS) as PROT_NONE, later use mprotect to commit.

You can turn this functionality on system-wide by using kernel overcommit. This is usually default setting on many distributions.
Here is the explanation http://www.mjmwired.net/kernel/Documentation/vm/overcommit-accounting

The Linux equivalent of VirtualAlloc() is mmap(), which provides the same behaviours. However as a commenter points out, reservation of contiguous memory is the behaviour of calls to malloc() as long as the memory is not initialized (such as by calloc(), or user code).

"seemingly random unacceptable latency
for very large arrays
You could also consider mlock() or mmap() + MAP_LOCKED to mitigate the impact of paging. Many CPUs support huge (aka large) pages, pages larger than 4kb. These larger pages can mitigate the impact of the TLB on streaming reads/writes.

Related

Large physically contiguous memory area

For my M.Sc. thesis, I have to reverse-engineer the hash function Intel uses inside its CPUs to spread data among Last Level Cache slices in Sandy Bridge and newer generations. To this aim, I am developing an application in Linux, which needs a physically contiguous memory area in order to make my tests. The idea is to read data from this area, so that they are cached, probe if older data have been evicted (through delay measures or LLC miss counters) in order to find colliding memory addresses and finally discover the hash function by comparing these colliding addresses.
The same procedure has already been used in Windows by a researcher, and proved to work.
To do this, I need to allocate an area that must be large (64 MB or more) and fully cachable, so without DMA-friendly options in TLB. How can I perform this allocation?
To have a full control over the allocation (i.e., for it to be really physically contiguous), my idea was to write a Linux module, export a device and mmap() it from userspace, but I do not know how to allocate so much contiguous memory inside the kernel.
I heard about Linux Contiguous Memory Allocator (CMA), but I don't know how it works
Applications don't see physical memory, a process have some address space in virtual memory. Read about the MMU (what is contiguous in virtual space might not really be physically contiguous and vice versa)
You might perhaps want to lock some memory using mlock(2)
But your application will be scheduled, and other processes (or scheduled tasks) would dirty your CPU cache. See also sched_setaffinity(2)
(and even kernel code might be perhaps preempted)
This page on Kernel Newbies, has some ideas about memory allocation. But the max for get_free_pages looks like 8MiB. (Perhaps that's a compile-time constraint?)
Since this would be all-custom, you could explore the mem= boot parameter of the linux kernel. This will limit the amount of memory used, and you can party all over the remaining memory without anyone knowing. Heck, if you boot up a busybox system, you could probably do mem=32M, but even mem=256M should work if you're not booting a GUI.
You will also want to look into the Offline Scheduler (and here). It "unplugs" the CPU from Linux so you can have full control over ALL code running on it. (Some parts of this are already in the mainline kernel, and maybe all of it is.)

Can I allocate one large and guaranteed continued range physical memory (100MB)?

Can I allocate one large and guaranteed continued range physical memory (100 MB consecutive without breaks) on Linux, and if I can, then how can I do this?
It is necessary to mapping this a continuous block of memory through the PCI-Express BAR from one CPU1 to the other CPU2 located behind the PCIe Non-Transparent Bridge.
You don't allocate physical memory in user applications (physical memory only makes sense inside the kernel).
I don't understand if you are coding a kernel module or some Linux application (e.g. a numerical finite-element code=.
Inside applications, you can allocate virtual memory with e.g. mmap(2) (and then you can allocate a big contiguous segment of address space)
I guess that some GPU cards give access to a large amount of GPU memory thru mmap so I believe it is possible to do what you want.
You might be interested by numa(7) man page. Probably the numa(3) library should give you what you want. Did you consider also open MPI? See also msync(2) and mlock(2)
From user space -- there is no guarantee depends on you luck.
if you compile your driver into the kernel -- you can use the mmap and allocate the required amount of memory.
if it is required to use it as storage or some other work not specifically for a driver then you should set the memmap parameter in the boot command line.
e.g. memmap=200M$1700M
it will block 200 MB memory starting from the end of 1700M (address).
Later it can be used to as FS as well ;)

How can I allocate memory in Linux that meets paging and cacheability requirements?

I want to allocate space for a large array that will be write-only until the very end of the program. For that reason, I don't care if it's it cached.
I also want to access this very frequently, so I don't want to have to do a page walk more than once. For that reason I want it to be allocated in a large a page (e.g. 4M).
So how can I...
...request the memory to be either uncacheable or write-through?
...request the memory to be placed in a large page?
I am working in Linux.
Disabling caching sounds like it would make your writes slower if it forces a write all the way through to the RAM. I'm not sure I'd attempt that at all.
To actually use large pages, I suggest following HugeTLB - Large Page Support in the Linux Kernel. It contains an example of how you can use large pages via a shared memory segment.
With transparent hugepages, simply allocating a 4M-aligned buffer will work. Use aligned_alloc or posix_memalign to get a pointer you can free. (Note that aligned_alloc is required to fail if the buffer size isn't a multiple of the alignment. /facepalm).
Depending on your setting for /sys/kernel/mm/transparent_hugepage/defrag, you may need to use madvise(MADV_HUGEPAGE) on the buffer to strongly encourage the kernel to use hugepages.
Also note that x86-64 uses 2M hugepages. x86-32 uses 4M hugepages. Aligning to 4M is fine if you want the easy solution for both.
request the memory to be either uncacheable or write-through?
AFAIK, you can't easily do that through normal Linux APIs. NT stores work to normal write-back memory, so use that instead. (They over-ride the memory type and are weakly-ordered cache-bypassing).
But if you're not writing full cache-lines at a time, you definitely want cached writes. Especially if there's any spatial or temporal locality, but even if not then letting the store buffer do its job (hiding the latency of cache-miss stores) is a good thing.

Calculating % memory used on Linux

Linux noob question:
If I have 500MB of RAM, and 500MB of swap space, can the OS and processes then use 1GB of memory?
In other words, is the total amount of memory available to programs and the OS the total of the physical memory size and swap size?
I'm trying to figure out which SNMP counters to query, but need to understand how Linux uses virtual memory a little better first.
Thanks
Actually, it IS essentially correct, but your "virtual" memory does NOT reside beside your "physical memory" (as Matthew Scharley stated).
Your "virtual memory" is an abstraction layer covering both "physical" (as in RAM) and "swap" (as in hard-disk, which is of course as much physical as RAM is) memory.
Virtual memory is in essention an abstraction layer. Your program always addresses a "virtual" address, which your OS translates to an address in RAM or on disk (which needs to be loaded to RAM first) depending on where the data resides. So your program never has to worry about lack of memory.
Nothing is ever quite so simple anymore...
Memory pages are lazily allocated. A process can malloc() a large quantity of memory and never use it. So on your 500MB_RAM + 500MB_SWAP system, I could -- at least in theory -- allocate 2 gig of memory off the heap and things will run merrily along until I try to use too much of that memory. (At which point whatever process couldn't acquire more memory pages gets nuked. Hopefully it's my process. But not always.)
Individual processes may be limited to 4 gig as a hard address limitation on 32-bit systems. Even when you have more than 4 gig of RAM on the machine and you're using that bizarre segmented 36-bit atrocity from hell addressing scheme, individual processes are still limited to only 4 gigs. Some of that 4 gigs has to go for shared libraries and program code. So yer down to 2-3 gigs of stack+heap as an ADDRESSING limitation.
You can mmap files in, effectively giving you more memory. It basically acts as extra swap. I.e. Rather than loading a program's binary code data into memory and then swapping it out to the swapfile, the file is just mmapped. As needed, pages are swapped into RAM directly from the file.
You can get into some interesting stuff with sparse data and mmapped sparse files. I've seen X-windows claim enormous memory usage when in fact it was only using up a tiny bit.
BTW: "free" might help you. As might "cat /proc/meminfo" or the Vm lines in /proc/$PID/status. (Especially VmData and VmStk.) Or perhaps "ps up $PID"
Although mostly it's true, it's not entirely correct. For a particular process, the environment you run it in may limit the memory available to your process. Check the output of ulimit -v as well.
Yes, this is essentially correct. The actual numbers might be (very) marginally lower, but for all intents and purposes, if you have x physical memory and y virtual memory (swap in linux), then you have x + y memory available to the operating system and any programs running underneath the OS.

is there a limit for the number of sk_buffs in the kernel

I need to steal some SKBs in my NetFilter hook, and retain them for some time.
Is there a limit in the kernel about how many SKBs can I use at a time?
What are the consequences of having some 100,000 or even more SKBs held in my kernel module?
I could avoid copying my packets two time if I can have many-many SKBs.
Regards,
Denes
If you have the memory no problem. The limit is the kernel data space on 32 bit x86 machines is normally limited to 1G (see http://kerneltrap.org/node/2450 ). Realize that each skb consumes the skb data structure as well as the memory it references. You could also use ipqueue to do the processing in user space (with more memory available).
Above link dead, last known version cached here

Resources