Related
what is the main difference between File-backed mapping & Anonymous
mapping.
How can we choose between File-backed mapping or Anonymous
mapping, when we need an IPC between processes.
What is the advantage,disadvantage of using these.?
mmap() system call allows you to go for either file-backed mapping or anonymous mapping.
void *mmap(void *addr, size_t lengthint " prot ", int " flags ,int fd,
off_t offset)
File-backed mapping- In linux , there exists a file /dev/zero which is an infinite source of 0 bytes. You just open this file, and pass its descriptor to the mmap() call with appropriate flag, i.e., MAP_SHARED if you want the memory to be shared by other process or MAP_PRIVATE if you don't want sharing.
Ex-
.
.
if ((fd = open("/dev/zero", O_RDWR)) < 0)
printf("open error");
if ((area = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_SHARED,fd, 0)) == MAP_FAILED)
{
printf("Error in memory mapping");
exit(1);
}
close(fd); //close the file because memory is mapped
//create child process
.
.
Quoting the man-page of mmap() :-
The contents of a file mapping (as opposed to an anonymous mapping;
see MAP_ANONYMOUS below), are initialized using length bytes starting
at offset offset in the file (or other object) referred to by the file
descriptor fd. offset must be a multiple of the page size as returned
by sysconf(_SC_PAGE_SIZE).
In our case, it has been initialized with zeroes(0s).
Quoting the text from the book Advanced Programming in the UNIX Environment by W. Richard Stevens, Stephen A. Rago II Edition
The advantage of using /dev/zero in the manner that we've shown is
that an actual file need not exist before we call mmap to create the
mapped region. Mapping /dev/zero automatically creates a mapped region
of the specified size. The disadvantage of this technique is that it
works only between related processes. With related processes, however,
it is probably simpler and more efficient to use threads (Chapters 11
and 12). Note that regardless of which technique is used, we still
need to synchronize access to the shared data
After the call to mmap() succeeds, we create a child process which will be able to see the writes to the mapped region(as we specified MAP_SHARED flag).
Anonymous mapping - The similar thing that we did above can be done using anonymous mapping.For anonymous mapping, we specify the MAP_ANON flag to mmap and specify the file descriptor as -1.
The resulting region is anonymous (since it's not associated with a pathname through a file descriptor) and creates a memory region that can be shared with descendant processes.
The advantage is that we don't need any file for mapping the memory, the overhead of opening and closing file is also avoided.
if ((area = mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_ANON | MAP_SHARED, -1, 0)) == MAP_FAILED)
printf("Error in anonymous memory mapping");
So, these file-backed mapping and anonymous mapping necessarily work only with related processes.
If you need this between unrelated processes, then you probably need to create named shared memory by using shm_open() and then you can pass the returned file descriptor to mmap().
I need to write in embedded Linux(2.6.37) as fast as possible incoming DMA buffers to HD partition as raw device /dev/sda1. Buffers are aligned as required and are of equal 512KB length. The process may continue for a very long time and fill as much as, for example, 256GB of data.
I need to use the memory-mapped file technique (O_DIRECT not applicable), but can't understand the exact way how to do this.
So, in pseudo code "normal" writing:
fd=open(/dev/sda1",O_WRONLY);
while(1) {
p = GetVirtualPointerToNewBuffer();
if (InputStopped())
break;
write(fd, p, BLOCK512KB);
}
Now, I will be very thankful for the similar pseudo/real code example of how to utilize memory-mapped technique for this writing.
UPDATE2:
Thanks to kestasx the latest working test code looks like following:
#define TSIZE (64*KB)
void* TBuf;
int main(int argc, char **argv) {
int fdi=open("input.dat", O_RDONLY);
//int fdo=open("/dev/sdb2", O_RDWR);
int fdo=open("output.dat", O_RDWR);
int i, offs=0;
void* addr;
i = posix_memalign(&TBuf, TSIZE, TSIZE);
if ((fdo < 1) || (fdi < 1)) {
printf("Error in files\n");
return -1; }
while(1) {
addr = mmap((void*)TBuf, TSIZE, PROT_READ | PROT_WRITE, MAP_SHARED, fdo, offs);
if ((unsigned int)addr == 0xFFFFFFFFUL) {
printf("Error MMAP=%d, %s\n", errno, strerror(errno));
return -1; }
i = read(fdi, TBuf, TSIZE);
if (i != TSIZE) {
printf("End of data\n");
return 0; }
i = munmap(addr, TSIZE);
offs += TSIZE;
sleep(1);
};
}
UPDATE3:
1. To precisely imitate the DMA work, I need to move read() call before mmp(), because when the DMA finishes it provides me with the address where it has put data. So, in pseudo code:
while(1) {
read(fdi, TBuf, TSIZE);
addr = mmap((void*)TBuf, TSIZE, PROT_READ|PROT_WRITE, MAP_FIXED|MAP_SHARED, fdo, offs);
munmap(addr, TSIZE);
offs += TSIZE; }
This variant fails after(!) the first loop - read() says BAD ADDRESS on TBuf.
Without understanding exactly what I do, I substituted munmap() with msync(). This worked perfectly.
So, the question here - why unmapping the addr influenced on TBuf?
2.With the previous example working I went to the real system with the DMA. The same loop, just instead of read() call is the call which waits for a DMA buffer to be ready and its virtual address provided.
There are no error, the code runs, BUT nothing is recorded (!).
My thought was that Linux does not see that the area was updated and therefore does not sync() a thing.
To test this, I eliminated in the working example the read() call - and yes, nothing was recorded too.
So, the question here - how can I tell Linux that the mapped region contains new data, please, flush it!
Thanks a lot!!!
If I correctly understand, it makes sense if You mmap() file (not sure if it You can mmap() raw partition/block-device) and data via DMA is written directly to this memory region.
For this to work You need to be able to control p (where new buffer is placed) or address where file is maped. If You don't - You'll have to copy memory contents (and will lose some benefits of mmap).
So psudo code would be:
truncate("data.bin", 256GB);
fd = open( "data.bin", O_RDWR );
p = GetVirtualPointerToNewBuffer();
adr = mmap( p, 1GB, PROT_READ | PROT_WRITE, MAP_SHARED, fd, offset_in_file );
startDMA();
waitDMAfinish();
munmap( adr, 1GB );
This is first step only and I'm not completely sure if it will work with DMA (have no such experience).
I assume it is 32bit system, but even then 1GB mapped file size may be too big (if Your RAM is smaller You'll be swaping).
If this setup will work, next step would be to make loop to map regions of file at different offsets and unmap already filled ones.
Most likely You'll need to align addr to 4KB boundary.
When You'll unmap region, it's data will be synced to disk. So You'll need some testing to select appropriate mapped region size (while next region is filled by DMA, there must be enough time to unmap/write previous one).
UPDATE:
What exactly happens when You fill mmap'ed region via DMA I simply don't know (not sure how exactly dirty pages are detected: what is done by hardware, and what must be done by software).
UPDATE2: To my best knowledge:
DMA works the following way:
CPU arranges DMA transfer (address where to write transfered data in RAM);
DMA controller does the actual work, while CPU can do it's own work in parallel;
once DMA transfer is complete - DMA controller signals CPU via IRQ line (interrupt), so CPU can handle the result.
This seems simple while virtual memory is not involved: DMA should work independently from runing process (actual VM table in use by CPU). Yet it should be some mehanism to invalidate CPU cache for modified by DMA physical RAM pages (don't know if CPU needs to do something, or it is done authomatically by hardware).
mmap() forks the following way:
after successfull call of mmap(), file on disk is attached to process memory range (most likely some data structure is filled in OS kernel to hold this info);
I/O (reading or writing) from mmaped range triggers pagefault, which is handled by kernel loading appropriate blocks from atached file;
writes to mmaped range are handled by hardware (don't know how exactly: maybe writes to previously unmodified pages triger some fault, which is handled by kernel marking these pages dirty; or maybe this marking is done entirely in hardware and this info is available to kernel when it needs to flush modified pages to disk).
modified (dirty) pages are written to disk by OS (as it sees appropriate) or can be forced via msync() or munmap()
In theory it should be possible to do DMA transfers to mmaped range, but You need to find out, how exactly pages ar marked dirty (if You need to do something to inform kernel which pages need to be written to disk).
UPDATE3:
Even if modified by DMA pages are not marked dirty, You should be able to triger marking by rewriting (reading ant then writing the same) at least one value in each page (most likely each 4KB) transfered. Just make sure this rewriting is not removed (optimised out) by compiler.
UPDATE4:
It seems file opened O_WRONLY can't be mmap'ed (see question comments, my experimets confirm this too). It is logical conclusion of mmap() workings described above. The same is confirmed here (with reference to POSIX standart requirement to ensure file is readable regardless of maping protection flags).
Unless there is some way around, it actually means that by using mmap() You can't avoid reading of results file (unnecessary step in Your case).
Regarding DMA transfers to mapped range, I think it will be a requirement to ensure maped pages are preloalocated before DMA starts (so there is real memory asigned to both DMA and maped region). On Linux there is MAP_POPULATE mmap flag, but from manual it seams it works with MAP_PRIVATE mapings only (changes are not writen to disk), so most likely it is usuitable. Likely You'll have to triger pagefaults manually by accessing each maped page. This should triger reading of results file.
If You still wish to use mmap and DMA together, but avoid reading of results file, You'll have to modify kernel internals to allow mmap to use O_WRONLY files (for example by zero-filling trigered pages, instead of reading them from disk).
I need reserve 256-512 Mb of continuous physical memory and have access to this memory from the user space.
I decided to use CMA for memory reserving.
Here are the steps on my idea that must be performed:
Reservation required amount of memory by CMA during system booting.
Parsing of CMA patch output which looks like for example: "CMA: reserved 256 MiB at 27400000" and saving two parameters: size of CMA area = 256*1024*1024 bytes and phys address of CMA area = 0x27400000.
Mapping of CMA area at /dev/mem file with offset = 0x27400000 using mmap(). (Of course, CONFIG_STRICT_DEVMEM is disabled)
It would let me to read data directly from phys memory from user space.
But the next code make segmentation fault(there size = 1Mb):
int file;
void* start;
file=open("/dev/mem", O_RDWR | O_SYNC);
if ( (start = mmap(0, 1024*1024, PROT_READ | PROT_WRITE, MAP_SHARED, file, 0x27400000)) == MAP_FAILED ){
perror("mmap");
}
for (int offs = 0; offs<50; offs++){
cout<<((char *)start)[offs];
}
Output of this code: "mmap: Invalid argument".
When I changed offset = 0x27400000 on 0, this code worked fine and program displayed trash. It also work for alot of offsets which I looked at /proc/iomem.
According to information from /proc/iomem, phys addr of CMA area (0x27400000 on my system) always situated in System RAM.
Does anyone have any ideas, how to mmap CMA area on /dev/mem? What am I doing wrong?
Thanks alot for any help!
Solution to this problem was suggested to me by Jeff Haran in kernelnewbies mailing list.
It was necessary to disable CONFIG_x86_PAT in .config and mmap() has started to work!
If CONFIG_X86_PAT is configured you will have problems mapping memory to user space. It basically implements the same restrictions as CONFIG_STRICT_DEVMEM.
Jeff Haran
Now I can mmap /dev/mem on any physical address I want.
But necessary to be careful:
Word of caution. CONFIG_X86_PAT was likely introduced for a reason. There may be some performance penalties for turning this off, though in my testing so far turning if off doesn’t seem to break anything.
Jeff Haran
I am using Fedora 17 running on board with integrated graphics.
Given that I am able to manipulate content of physical memory, how can I find out the physical memory offsets that I could write to in order to display something on the screen?
I tried to lookup 0xB8000 and 0xB0000 offsets but they contain all 0xff.
Is there a specific pattern that starts video buffer in memory?
Is there any good source of information about this topic?
The root cause of my problem was that Linux is not using legacy video mode, so the memory at 0xB8000 is restricted (in my case read only). However issuing an interrupt can switch to other modes:
INT 10 - VIDEO - SET VIDEO MODE
AH = 00h
AL = desired video mode (see #00010)
Found on: http://www.delorie.com/djgpp/doc/rbinter/id/74/0.html
Living like it's 1989
#include <linux/fb.h>
#define DEV_MEM "/dev/fb0"
/* Screen parameters (probably via ioctl() and /sys. */
#define YRES 240
#define XRES 320
#define BYTES_PER_PIXEL (sizeof(unsigned short)) /* 16 bit pixels. */
#define MAP_SIZE XRES*YRES*BYTES_PER_PIXEL
unsigned short *map_lbase;
if((fd = open(DEV_MEM, O_RDWR | O_SYNC)) == -1) {
fprintf(stderr, "cannot open %s - are you root?\n", DEV_MEM);
exit(1);
}
// Map that page.
map_lbase = (unsigned short *)mmap(NULL, MAP_SIZE,
PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if((long)map_lbase == -1) {
perror("cannot mmap");
exit(1);
}
Humons - Framebuffer API doc, Framebuffer Doc dir.
Smart Humons - Internals, Deferred I/O doc or how to emulate memory mapped video.
You can not use 0xB8000 and 0xB0000 directly as those are physical addresses. I assume you are in user space and not writing a kernel driver. Under Linux, we normally have the MMU enable; In other words, we have virtual memory. Not all processes/users have access to video memory. However, if you are allowed, you can mmap a framebuffer device to your address space. It is best to let the kernel give you an address and not request a specific one.
Or see how professionals do it.
Man: mmap
Edit: If you aren't root, you can still use Unix permissions on /dev/fb0 (or which ever device) to give group permission to read/write or use some sort of login process that gives a user on the current tty permission.
Maybe you could start around here:
http://www.tldp.org/HOWTO/Framebuffer-HOWTO/
But modern video graphics are in no way as simple as "finding where in the meory is the VRAM" and writing there.
What are RSS and VSZ in Linux memory management? In a multithreaded environment how can both of these can be managed and tracked?
RSS is the Resident Set Size and is used to show how much memory is allocated to that process and is in RAM. It does not include memory that is swapped out. It does include memory from shared libraries as long as the pages from those libraries are actually in memory. It does include all stack and heap memory.
VSZ is the Virtual Memory Size. It includes all memory that the process can access, including memory that is swapped out, memory that is allocated, but not used, and memory that is from shared libraries.
So if process A has a 500K binary and is linked to 2500K of shared libraries, has 200K of stack/heap allocations of which 100K is actually in memory (rest is swapped or unused), and it has only actually loaded 1000K of the shared libraries and 400K of its own binary then:
RSS: 400K + 1000K + 100K = 1500K
VSZ: 500K + 2500K + 200K = 3200K
Since part of the memory is shared, many processes may use it, so if you add up all of the RSS values you can easily end up with more space than your system has.
The memory that is allocated also may not be in RSS until it is actually used by the program. So if your program allocated a bunch of memory up front, then uses it over time, you could see RSS going up and VSZ staying the same.
There is also PSS (proportional set size). This is a newer measure which tracks the shared memory as a proportion used by the current process. So if there were two processes using the same shared library from before:
PSS: 400K + (1000K/2) + 100K = 400K + 500K + 100K = 1000K
Threads all share the same address space, so the RSS, VSZ and PSS for each thread is identical to all of the other threads in the process. Use ps or top to view this information in linux/unix.
There is way more to it than this, to learn more check the following references:
http://manpages.ubuntu.com/manpages/en/man1/ps.1.html
https://web.archive.org/web/20120520221529/http://emilics.com/blog/article/mconsumption.html
Also see:
A way to determine a process's "real" memory usage, i.e. private dirty RSS?
RSS is Resident Set Size (physically resident memory - this is currently occupying space in the machine's physical memory), and VSZ is Virtual Memory Size (address space allocated - this has addresses allocated in the process's memory map, but there isn't necessarily any actual memory behind it all right now).
Note that in these days of commonplace virtual machines, physical memory from the machine's view point may not really be actual physical memory.
Minimal runnable example
For this to make sense, you have to understand the basics of paging: How does x86 paging work? and in particular that the OS can allocate virtual memory via page tables / its internal memory book keeping (VSZ virtual memory) before it actually has a backing storage on RAM or disk (RSS resident memory).
Now to observe this in action, let's create a program that:
allocates more RAM than our physical memory with mmap
writes one byte on each page to ensure that each of those pages goes from virtual only memory (VSZ) to actually used memory (RSS)
checks the memory usage of the process with one of the methods mentioned at: Memory usage of current process in C
main.c
#define _GNU_SOURCE
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <unistd.h>
typedef struct {
unsigned long size,resident,share,text,lib,data,dt;
} ProcStatm;
/* https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c/7212248#7212248 */
void ProcStat_init(ProcStatm *result) {
const char* statm_path = "/proc/self/statm";
FILE *f = fopen(statm_path, "r");
if(!f) {
perror(statm_path);
abort();
}
if(7 != fscanf(
f,
"%lu %lu %lu %lu %lu %lu %lu",
&(result->size),
&(result->resident),
&(result->share),
&(result->text),
&(result->lib),
&(result->data),
&(result->dt)
)) {
perror(statm_path);
abort();
}
fclose(f);
}
int main(int argc, char **argv) {
ProcStatm proc_statm;
char *base, *p;
char system_cmd[1024];
long page_size;
size_t i, nbytes, print_interval, bytes_since_last_print;
int snprintf_return;
/* Decide how many ints to allocate. */
if (argc < 2) {
nbytes = 0x10000;
} else {
nbytes = strtoull(argv[1], NULL, 0);
}
if (argc < 3) {
print_interval = 0x1000;
} else {
print_interval = strtoull(argv[2], NULL, 0);
}
page_size = sysconf(_SC_PAGESIZE);
/* Allocate the memory. */
base = mmap(
NULL,
nbytes,
PROT_READ | PROT_WRITE,
MAP_SHARED | MAP_ANONYMOUS,
-1,
0
);
if (base == MAP_FAILED) {
perror("mmap");
exit(EXIT_FAILURE);
}
/* Write to all the allocated pages. */
i = 0;
p = base;
bytes_since_last_print = 0;
/* Produce the ps command that lists only our VSZ and RSS. */
snprintf_return = snprintf(
system_cmd,
sizeof(system_cmd),
"ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == \"%ju\") print}'",
(uintmax_t)getpid()
);
assert(snprintf_return >= 0);
assert((size_t)snprintf_return < sizeof(system_cmd));
bytes_since_last_print = print_interval;
do {
/* Modify a byte in the page. */
*p = i;
p += page_size;
bytes_since_last_print += page_size;
/* Print process memory usage every print_interval bytes.
* We count memory using a few techniques from:
* https://stackoverflow.com/questions/1558402/memory-usage-of-current-process-in-c */
if (bytes_since_last_print > print_interval) {
bytes_since_last_print -= print_interval;
printf("extra_memory_committed %lu KiB\n", (i * page_size) / 1024);
ProcStat_init(&proc_statm);
/* Check /proc/self/statm */
printf(
"/proc/self/statm size resident %lu %lu KiB\n",
(proc_statm.size * page_size) / 1024,
(proc_statm.resident * page_size) / 1024
);
/* Check ps. */
puts(system_cmd);
system(system_cmd);
puts("");
}
i++;
} while (p < base + nbytes);
/* Cleanup. */
munmap(base, nbytes);
return EXIT_SUCCESS;
}
GitHub upstream.
Compile and run:
gcc -ggdb3 -O0 -std=c99 -Wall -Wextra -pedantic -o main.out main.c
echo 1 | sudo tee /proc/sys/vm/overcommit_memory
sudo dmesg -c
./main.out 0x1000000000 0x200000000
echo $?
sudo dmesg
where:
0x1000000000 == 64GiB: 2x my computer's physical RAM of 32GiB
0x200000000 == 8GiB: print the memory every 8GiB, so we should get 4 prints before the crash at around 32GiB
echo 1 | sudo tee /proc/sys/vm/overcommit_memory: required for Linux to allow us to make a mmap call larger than physical RAM: Maximum memory which malloc can allocate
Program output:
extra_memory_committed 0 KiB
/proc/self/statm size resident 67111332 768 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 1648
extra_memory_committed 8388608 KiB
/proc/self/statm size resident 67111332 8390244 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 8390256
extra_memory_committed 16777216 KiB
/proc/self/statm size resident 67111332 16778852 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 16778864
extra_memory_committed 25165824 KiB
/proc/self/statm size resident 67111332 25167460 KiB
ps -o pid,vsz,rss | awk '{if (NR == 1 || $1 == "29827") print}'
PID VSZ RSS
29827 67111332 25167472
Killed
Exit status:
137
which by the 128 + signal number rule means we got signal number 9, which man 7 signal says is SIGKILL, which is sent by the Linux out-of-memory killer.
Output interpretation:
VSZ virtual memory remains constant at printf '0x%X\n' 0x40009A4 KiB ~= 64GiB (ps values are in KiB) after the mmap.
RSS "real memory usage" increases lazily only as we touch the pages. For example:
on the first print, we have extra_memory_committed 0, which means we haven't yet touched any pages. RSS is a small 1648 KiB which has been allocated for normal program startup like text area, globals, etc.
on the second print, we have written to 8388608 KiB == 8GiB worth of pages. As a result, RSS increased by exactly 8GIB to 8390256 KiB == 8388608 KiB + 1648 KiB
RSS continues to increase in 8GiB increments. The last print shows about 24 GiB of memory, and before 32 GiB could be printed, the OOM killer killed the process
See also: https://unix.stackexchange.com/questions/35129/need-explanation-on-resident-set-size-virtual-size
OOM killer logs
Our dmesg commands have shown the OOM killer logs.
An exact interpretation of those has been asked at:
Understanding the Linux oom-killer's logs but let's have a quick look here.
https://serverfault.com/questions/548736/how-to-read-oom-killer-syslog-messages
The very first line of the log was:
[ 7283.479087] mongod invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
So we see that interestingly it was the MongoDB daemon that always runs in my laptop on the background that first triggered the OOM killer, presumably when the poor thing was trying to allocate some memory.
However, the OOM killer does not necessarily kill the one who awoke it.
After the invocation, the kernel prints a table or processes including the oom_score:
[ 7283.479292] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
[ 7283.479303] [ 496] 0 496 16126 6 172032 484 0 systemd-journal
[ 7283.479306] [ 505] 0 505 1309 0 45056 52 0 blkmapd
[ 7283.479309] [ 513] 0 513 19757 0 57344 55 0 lvmetad
[ 7283.479312] [ 516] 0 516 4681 1 61440 444 -1000 systemd-udevd
and further ahead we see that our own little main.out actually got killed on the previous invocation:
[ 7283.479871] Out of memory: Kill process 15665 (main.out) score 865 or sacrifice child
[ 7283.479879] Killed process 15665 (main.out) total-vm:67111332kB, anon-rss:92kB, file-rss:4kB, shmem-rss:30080832kB
[ 7283.479951] oom_reaper: reaped process 15665 (main.out), now anon-rss:0kB, file-rss:0kB, shmem-rss:30080832kB
This log mentions the score 865 which that process had, presumably the highest (worst) OOM killer score as mentioned at: https://unix.stackexchange.com/questions/153585/how-does-the-oom-killer-decide-which-process-to-kill-first
Also interestingly, everything apparently happened so fast that before the freed memory was accounted, the oom was awoken again by the DeadlineMonitor process:
[ 7283.481043] DeadlineMonitor invoked oom-killer: gfp_mask=0x6200ca(GFP_HIGHUSER_MOVABLE), order=0, oom_score_adj=0
and this time that killed some Chromium process, which is usually my computers normal memory hog:
[ 7283.481773] Out of memory: Kill process 11786 (chromium-browse) score 306 or sacrifice child
[ 7283.481833] Killed process 11786 (chromium-browse) total-vm:1813576kB, anon-rss:208804kB, file-rss:0kB, shmem-rss:8380kB
[ 7283.497847] oom_reaper: reaped process 11786 (chromium-browse), now anon-rss:0kB, file-rss:0kB, shmem-rss:8044kB
Tested in Ubuntu 19.04, Linux kernel 5.0.0.
Linux kernel docs
https://github.com/torvalds/linux/blob/v5.17/Documentation/filesystems/proc.rst has some points. The term "VSZ" is not used there but "RSS" is, and there's nothing too enlightening (surprise?!)
Instead of VSZ, the kernel seems to use the term VmSize, which appears e.g. on /proc/$PID/status.
Some quotes of interest:
The first of these lines shows the same information as is displayed for the mapping in /proc/PID/maps. Following lines show the size of the mapping (size); the size of each page allocated when backing a VMA (KernelPageSize), which is usually the same as the size in the page table entries; the page size used by the MMU when backing a VMA (in most cases, the same as KernelPageSize); the amount of the mapping that is currently resident in RAM (RSS); the process' proportional share of this mapping (PSS); and the number of clean and dirty shared and private pages in the mapping.
The "proportional set size" (PSS) of a process is the count of pages it has in memory, where each page is divided by the number of processes sharing it. So if a process has 1000 pages all to itself, and 1000 shared with one other process, its PSS will be 1500.
Note that even a page which is part of a MAP_SHARED mapping, but has only a single pte mapped, i.e. is currently used by only one process, is accounted as private and not as shared.
So we can guess a few more things:
shared libraries used by a single process appear in RSS, if more than one process has them then not
PSS was mentioned by jmh, and has a more proportional approach between "I'm the only process that holds the shared library" and "there are N process holding the shared library, so each one holds memory/N on average"
VSZ - Virtual Set Size
The Virtual Set Size is a memory size assigned to a process ( program ) during the initial execution. The Virtual Set Size memory is simply a number of how much memory a process has available for its execution.
RSS - Resident Set Size (kinda RAM)
As opposed to VSZ ( Virtual Set Size ), RSS is a memory currently used by a process. This is a actual number in kilobytes of how much RAM the current process is using.
Source
I think much has already been said, about RSS vs VSZ. From an administrator/programmer/user perspective, when I design/code applications I am more concerned about the RSZ, (Resident memory), as and when you keep pulling more and more variables (heaped) you will see this value shooting up. Try a simple program to build malloc based space allocation in loop, and make sure you fill data in that malloc'd space. RSS keeps moving up.
As far as VSZ is concerned, it's more of virtual memory mapping that linux does, and one of its core features derived out of conventional operating system concepts. The VSZ management is done by Virtual memory management of the kernel, for more info on VSZ, see Robert Love's description on mm_struct and vm_struct, which are part of basic task_struct data structure in kernel.
To summarize #jmh excellent answer :
In #linux, a process's memory comprises :
its own binary
its shared libs
its stack and heap
Due to paging, not all of those are always fully in memory, only the useful, most recently used parts (pages) are. Other parts are paged out (or swapped out) to make room for other processes.
The table below, taken from #jmh's answer, shows an example of what Resident and Virtual memory are for a specific process.
+-------------+-------------------------+------------------------+
| portion | actually in memory | total (allocated) size |
|-------------+-------------------------+------------------------|
| binary | 400K | 500K |
| shared libs | 1000K | 2500K |
| stack+heap | 100K | 200K |
|-------------+-------------------------+------------------------|
| | RSS (Resident Set Size) | VSZ (Virtual Set Size) |
|-------------+-------------------------+------------------------|
| | 1500K | 3200K |
+-------------+-------------------------+------------------------+
To summarize : resident memory is what is actually in physical memory right now, and virtual size is the total, necessary physical memory to load all components.
Of course, numbers don't add up, because libraries are shared between multiple processes and their memory is counted for each process separately, even if they are loaded only once.
They are not managed, but measured and possibly limited (see getrlimit system call, also on getrlimit(2)).
RSS means resident set size (the part of your virtual address space sitting in RAM).
You can query the virtual address space of process 1234 using proc(5) with cat /proc/1234/maps and its status (including memory consumption) thru cat /proc/1234/status