What does the Linux /proc/meminfo "Mapped" topic mean? - linux

What does the Linux /proc/meminfo "Mapped" topic mean? I have seen several one-liners that tell me it is the "Total size of memory in kilobytes that is mapped by devices or libraries with mmap." But I have now spent almost twenty hours searching the 2.6.30.5 kernel source code trying to confirm this statement, and I have been unable to do so -- indeed I see some things which seem to conflict with it.
The "Mapped" count is held in global_page_state[NR_FILE_MAPPED]. The comment near the declaration of NR_FILE_MAPPED says: "Pagecache pages mapped into pagetables. Only modified from process context."
Aren't all of the pages referred to by meminfo's "Cached" topic file-backed? Doesn't that mean that all these pages must be "Mapped"? I've looked at a few dozen meminfo listings, from several different architectures, and always the "Mapped" value is much smaller than the "Cached" value.
At any given time most of memory is filled with executable images and shared libraries. Looking at /proc/pid/smaps, I see that all of these are mapped into VMAs. Are all of these mapped into memory using mmap()? If so, why is "Mapped" so small? If they aren't mapped into memory using mmap(), how do they get mapped? Calls on handle_mm_fault, which is called by get_user_pages and various architecture-dependent page-fault handlers, increment the "Mapped" count, and they seem to do so for any page associated with a VMA.
I've looked at the mmap() functions of a bunch of drivers. Many of these call vm_insert_page or remap_vmalloc_range to establish their mappings, and these functions do increment the "Mapping" count. But a good many other drivers seem to call remap_pfn_range, which, as far as I can tell, doesn't increment the "Mapping" count.

It's the other way around. Everything in Mapped is also in Cached - Mapped is pagecache data that's been mapped into process virtual memory space. Most pages in Cached are not mapped by processes.
The same page can be mapped into many different pagetables - it'll only count in Mapped once, though. So if you have 100 processes running, each with 2MB mapped from /lib/i686/cmov/libc-2.7.so, that'll still only add 2MB to Mapped.

I think the intention is that it counts the number of pages that are mapped from files. In my copy of the source (2.6.31), it is incremented in page_add_file_rmap, and decremented in page_remove_rmap if the page to be removed is not anonymously mapped. page_add_file_rmap is, for example, invoked in __do_fault, again in case the mapping is not anonymous.
So it looks all consistent to me...

Related

Why does dereferencing pointer from mmap cause memory usage reported by top to increase?

I am calling mmap() with MAP_SHARED and PROT_READ to access a file which is about 25 GB in size. I have noticed that advancing the returned pointer has no effect to %MEM in top for the application, but once I start dereferencing the pointer at different locations, memory wildly increases and caps at 55%. That value goes back down to 0.2% once munmap is called.
I don't know if I should trust that 55% value top reports. It doesn't seem like it is actually using 8 GB of the available 16. Should I be worried?
When you first map the file, all it does is reserve address space, it doesn't necessarily read anything from the file if you don't pass MAP_POPULATE (the OS might do a little prefetch, it's not required to, and often doesn't until you begin reading/writing).
When you read from a given page of memory for the first time, this triggers a page fault. This "invalid page fault" most people think of when they hear the name, it's either:
A minor fault - The data is already loaded in the kernel, but the userspace mapping for that address to the loaded data needs to be established (fast)
A major fault - The data is not loaded at all, and the kernel needs to allocate a page for the data, populate it from the disk (slow), then perform the same mapping to userspace as in the minor fault case
The behavior you're seeing is likely due to the mapped file being too large to fit in memory alongside everything else that wants to stay resident, so:
When first mapped, the initial pages aren't already mapped to the process (some of them might be in the kernel cache, but they're not charged to the process unless they're linked to the process's address space by minor page faults)
You read from the file, causing minor and major faults until you fill main RAM
Once you fill main RAM, faulting in a new page typically leads to one of the older pages being dropped (you're not using all the pages as much as the OS and other processes are using theirs, so the low activity pages, especially ones that can be dropped for free rather than written to the page/swap file, are ideal pages to discard), so your memory usage steadies (for every page read in, you drop another)
When you munmap, the accounting against your process is dropped. Many of the pages are likely still in the kernel cache, but unless they're remapped and accessed again soon, they're likely first on the chopping block to discard if something else requests memory
And as commenters noted, shared memory mapped file accounting gets weird; every process is "charged" for the memory, but they'll all report it as shared even if no other processes map it, so it's not practical to distinguish "shared because it's MAP_SHARED and backed by kernel cache, but no one else has it mapped so it's effectively uniquely owned by this process" from "shared because N processes are mapping the same data, reporting shared_amount * N usage cumulatively, but actually only consuming shared_amount memory total (plus a trivial amount to maintain the per-process page tables for each mapping). There's no reason to be worried if the tallies don't line up.

Virtually contiguous vs. physically contiguous memory

Is virtually contiguous memory also always physically contiguous? If not, how is virtually continuous memory allocated and memory-mapped over physically non-contiguous RAM blocks? A detailed answer is appreciated.
Short answer: You need not care (unless you're a kernel/driver developer). It is all the same to you.
Longer answer: On the contrary, virtually contiguous memory is usually not physically contiguous (only in very small amounts). Except by coincidence, or shortly after the machine has just booted. That isn't necessary, however.
The only way of allocating larger amounts of physically contiguous RAM is by using large pages (since the memory within one page needs to be contiguous). It is however a useless endeavor, since there is no observable difference for your process whether memory of which you think that it is contiguous is actually contiguous, but there are disadvantages to using large pages.
Memory mapping over phyically non-contiuous RAM works in no particularly "special" way. It follows the same method which all memory management follows.
The OS divides virtual memory in "pages" and creates page table entries for your process. When you access a memory in some location, either the corresponding page does not exist at all, or it exists and corresponds to a real page in RAM, or it exists but doesn't correspond to a real page in RAM.
If the page exists in RAM, nothing happens at all1. Otherwise a fault is generated and some operating system code is run. If it turns out the page doesn't exist at all (or does not have the correct access rights), your process is killed with a segmentation fault.
Otherwise, the OS chooses an arbitrary page that isn't used (or it swaps out the one it thinks is the least important one), and loads the data from disk into that page. In the case of a memory mapping, the data comes from the mapped file, otherwise it comes from swap (and for completely new allocated memory, the zero page is copied). The OS then returns control back to your process. You never know this happened.
If you access another location in a "contiguous" (or so you think!) memory area which lies in a different page, the exact same procedure runs.
1 In reality, it is a little more complicated, since a page may exist in RAM but not exist "officially", being part of a list of pages that are to be recycled or such. But this gets too complicated.
No, it doesn't have to. Any page of the virtual memory can be mapped to an arbitrary physical page. Therefore you can have adjacent pages of your virtual memory pointing to non-adjacent physical pages. This mapping is maintained by the OS and is used by the MMU unit of CPU.

mmap: will the mapped file be loaded into memory immediately?

From the manual, I just know that mmap() maps a file to a virtual address space, so the file can be randomly accessed. But, it is unclear to me that whether the mapped file is loaded into memory immediately? I guess that kernel manages the mapped memory by pages, and they are loaded on demand, if I only do a few of reads and writes, only a few pages are loaded. Is it correct?
No, yes, maybe. It depends.
Calling mmap generally only means that to your application, the mapped file's contents are mapped to its address space as if the file was loaded there. Or, as if the file really existed in memory, as if they were one and the same (which includes changes being written back to disk, assuming you have write access).
No more, no less. It has no notion of loading something, nor does the application know what this means.
An application does not truly have knowledge of any such thing as memory, although the virtual memory system makes it appear like that. The memory that an application can "see" (and access) may or may not correspond to actual physical memory, and this can in principle change at any time, without prior warning, and without an obvious reason (obvious to your application).
Other than possibly experiencing a small delay due to a page fault, an application is (in principle) entirely unaware of any such thing happening and has little or no control over it1.
Applications will, generally, load pages from mapped files (including the main executable!) on demand, as a consequence of encountering a fault. However, an operating system will usually try to speculatively prefetch data to optimize performance.
In practice, calling mmap will immediately begin to (asynchronously) prefetch pages from the beginning of the mapping, up to a certain implementation-specified size. Which means, in principle, for small files the answer would be "yes", and for larger files it would be "no".
However, mmap does not block to wait for completion of the readahead, which means that you have no guarantee that any of the file is in RAM immediately after mmap returns (not that you have that guarantee at any time anyway!). Insofar, the answer is "maybe".
Under Linux, last time I looked, the default prefetch size was 31 blocks (~127k) -- but this may have changed, plus it's a tuneable parameter. As pages near or at the end of the prefetched area are touched, more pages are being prefetched asynchronously.
If you have hinted MADV_RANDOM to madvise, prefetching is "less likely to happen", under Linux this completely disables prefetch.
On the other hand, giving the MADV_SEQUENTIAL hint will asynchronously prefetch "more aggressively" beginning from the beginning of the mapping (and may discard accessed pages quicker). Under Linux, "more aggressively" means twice the normal amount.
Giving the MADV_WILLNEED hint suggests (but does not guarantee) that all pages in the given range are loaded as soon as possible (since you're saying you're going to access them). The OS may ignore this, but under Linux, it is treated rather as an order than a hint, up to the process' maximum RSS limit, and an implementation-specified limit (if I remember correctly, 1/2 the amount of physical RAM).
Note that MADV_DONTNEED is arguably implemented wrongly under Linux. The hint is not interpreted in the way specified by POSIX, i.e. you're OK with pages being paged out for the moment, but rather that you mean to discard them. Which makes no big difference for readonly mapped pages (other than a small delay, which you said would be OK), but it sure does matter for everything else.
In particular, using MADV_DONTNEED thinking Linux will release unneeded pages after the OS has written them lazily to disk is not how things work! You must explicitly sync, or prepare for a surprise.
Having called readahead on the file descriptor prior to calling mmap (or alternatively, having had read/written the file previously), the file's contents will in practice indeed be in RAM immediately.
This is, however, only an implementation detail (unified virtual memory system), and subject to memory pressure on the system.
Calling mlock will -- assuming it succeeds2 -- immediately load the requested pages into RAM. It blocks until all pages are physically present, and you have the guarantee that the pages will stay in RAM until you unlock them.
1 There exist functionality to query (mincore) whether any or all of the pages in a particular range are actually present at the very moment, and functionality to hint the OS about what you would like to see happening without any hard guarantees (madvise), and finally functionality to force a limited subset of pages to be present in memory (mlock) for privilegued processes.
2 It might not, both for lack of privilegues and for exceeding quotas or the amount of physical RAM present.
Yes, mmap creates a mapping. It does not normally read the entire content of whatever you have mapped into memory. If you wish to do that you can use the mlock/mlockall system call to force the kernel to read into RAM the content of the mapping, if applicable.
By default, mmap() only configure the mapping and returns (fast).
Linux (at least) has the option MAP_POPULATE (see 'man mmap') that does exactly what your question is about.
Yes. The whole point of mmap is that is manages memory more efficiently than just slurping everything into memory.
Of course, any given implementation may in some situations decide that it's more efficient to read in the whole file in one go, but that should be transparent to the program calling mmap.

Calculating memory of a Process using Proc file system

I am writing small process monitor script in Perl by reading values from Proc file system. Right now I am able to fetch number of threads, process state, number of bytes read and write using /proc/[pid]/status and /proc/[pid]/io files. Now I want to calculate the memory usage of a process. After searching, I came to know memory usage will be present /proc/[pid]/statm. But I still can't figure out what are necessary fields needed from that file to calculate the memory usage. Can anyone help me on this? Thanks in advance.
You likely want resident or size. From kernel.org.
size total program size
This is the whole program, including stuff never swapped in
resident resident set size
Stuff in RAM at the current moment (this does not include pages swapped out)
It is extremely difficult to know what the "memory usage" of a process is. VM size and RSS are known, measurable values.
But what you probably want is something else. In practice, "VM size" seems too high and RSS often seems too low.
The main problems are:
Multiple processes can share the same pages. You can add up the RSS of all running processes, and end up with much more than the physical memory of your machine (this is before kernel data structures are counted)
Private pages belonging to the process can be swapped out. Or they might not be initialised yet. Do they count?
How exactly do you count memory-mapped file pages? Dirty ones? Clean ones? MAP_SHARED or MAP_PRIVATE ones?
So you really need to think about what counts as "memory usage".
It seems to me that logically:
Private pages which are not shared with any other processes (NB: private pages can STILL be copy-on-write!) must count even if swapped out
Shared pages should count divided by the number of processes they're shared by e.g. a page shared by two processes counts half
File-backed pages which are resident can count in the same way
File-backed non-resident pages can be ignored
If the same page is mapped more than once into the address-space of the same process, it can be ignored the 2nd and subsequent time. This means that if proc 1 has page X mapped twice, and proc 2 has page X mapped once, they are both "charged" half a page.
I don't know of any utility which does this. It seems nontrivial though, and involves (at least) reading /proc/pid/pagemap and possibly some other /proc interfaces, some of which are root-only.
Another (less simple, but more precise) possibility would be to parse the /proc/123/maps file, perhaps by using the pmap utility. It gives you information about the "virtual memory" (i.e. the address space of the process).

Find out how many pages of memory a process uses on linux

I need to find out how many pages of memory a process allocates?
Each page is 4096, the process memory usage I'm having some problems locating the correct value. When I'm looking in the gome-system-monitor there are a few values to choose from under memory map.
Thanks.
The point of this is to divide the memory usage by the page count and verify the page size.
It's hard to figure exact amount of memory allocated correctly: there are pages shared with other processes (r/o parts of libraries), never used memory allocated by brk and anonymous mmap, mmaped file which are not fetched from disk completely due to efficient processing algorithms which touch only small part of file etc, swapped out pages, dirty pages to-be-written-on-disk etc.
If you want to deal with all this complexity and figure out True count of pages, the detailed information is available at /proc/<pid>/smaps, and there are tools, like mem_usage.py or smem.pl (easily googlable) to turn it into more-or-less usable summary.
This would be the "Resident Set Size", assuming you process doesn't use swap.
Note that a process may allocate far more memory ("Virtual Memory Size"), but as long as it don't writes to the memory, it is not represented by physical memory, be it in RAM or on the disk.
Some system tools, like top, display a huge value for "swap" for each process - this is of course completly wrong, the value is the difference between VMS and RSS and most likely those unused, but allocated, memory pages.

Resources