Why is my process taking higher resident memory as compared to virtual memory? - linux

'top' logs of my linux process show that its resident memory is around 6 times of the virtual memory. I have researched a lot but couldn't find any reason for such a behavior. Ideally VIRT is always higher than RES due to linux kernel's memory management. Top output is below -
13743 root 20 0 15.234g 0.010t 4372 R 13.4 4.0 7:43.41 q

Not quite.
The g suffix indicates Gibibyte(s), and t indicates Tebibyte(s).
Let's do the conversion of 0.010t to g (GiB):
zsh% print $((0.010 * 1024))g
10.24g
And 10.24g < 15.234g, so yor assumption is not correct i.e. top is correctly showing the correct values for virtual set size (VSZ) and resident set size (RSS) -- just in different units (need to take a peek at the source for why).

Related

Write high bandwidth real-time data to SSD in Linux

I have a real-time process that receives 16 kB of data every 200 us for about 1 hr. I need to store this data.
I have a 240 GB SSD on a SATA III channel and I thought I could use it as a plain storage device without any filesystem on it. I am running 5.4.0-109-generic kernel with 8 GB or ram.
Here is what I have done so far:
I set up a shared memory shm, dimension of 1 GB, where I write the data and I use a semaphore to tell a logger process when data is available.
In the logger process:
I open the SSD:
fd = open("/dev/sdb",O_WRONLY|O_LARGEFILE);
I wait for the data to be available in the shm and then I write a chunk, writing_size, of it to the SSD:
written_size = write(fd,local_buffer,writing_size);
I checked and written_size is always equal to writing_size;
performs an fsync() after N cycles:
if(written_cycles > N)
ret = fsync(fd);
I checked and fsync never returns -1.
I did set the I/O scheduler of /dev/sdb as noop and I did experiment with different writing_size and N. The final values I came up with are writing_size = 64 kB and N = 16.
The behavior that I am seeing is this:
the whole process works very well up until 17 GB have been written. At that point the logger process is being put to "uninterruptible sleep (D)" quite often and for quite some time, 1 or 2 seconds. This is still fine as the shared memory buffer will fill up in ~13 sec. When the data written reaches 20 GB, the logger process is being put to sleep for way longer, until it reaches 13 sec and I start to lose data. The threshold when the process is starting to being put to sleep is quite repeatable, 16 - 17 GB, but the maximum amount of data I can save before I lose it is random.
This is the best I can achieve so far with my method and the writing_size and N tuning mentioned previously.
I tried to set the logger process nice to -20 with no improvements.
It also looks like the noop I/O scheduler does not support the ionice so I tried the CFQ scheduler with maximum ionice but still got worse performances.
I bet the logger process is being put to sleep for I/O access but I do not understand why it happens after a certain number of bytes have been written. iotop shows that the I/O bandwidth of the logger process is stable around 85 MB/s.
I welcome any suggestions.
PS: I did try to mmap the the SSD and do memcpy instead of write()+fsync() but mmap is slower and results are worse.

How is memory used value derived in check_snmp_mem.pl?

I was configuring icinga2 to get memory used information from one linux client using script at check_snmp_mem.pl . Any idea how the memory used is derived in this script ?
Here is free command output
# free
total used free shared buff/cache available
Mem: 500016 59160 89564 3036 351292 408972
Swap: 1048572 4092 1044480
where as the performance data shown in icinga dashboard is
Label Value Max Warning Critical
ram_used 137,700.00 500,016.00 470,015.00 490,016.00
swap_used 4,092.00 1,048,572.00 524,286.00 838,858.00
Looking through the source code, it mentions ram_used for example in this line:
$n_output .= " | ram_used=" . ($$resultat{$nets_ram_total}-$$resultat{$nets_ram_free}-$$resultat{$nets_ram_cache}).";";
This strongly suggests that ram_used is calculated as the difference of the total RAM and the free RAM and the RAM used for cache. These values are retrieved via the following SNMP ids:
my $nets_ram_free = "1.3.6.1.4.1.2021.4.6.0"; # Real memory free
my $nets_ram_total = "1.3.6.1.4.1.2021.4.5.0"; # Real memory total
my $nets_ram_cache = "1.3.6.1.4.1.2021.4.15.0"; # Real memory cached
I don't know how they correlate to the output of free. The difference in free memory reported by free and to Icinga is 48136, so maybe you can find that number somewhere.

Linux `top` command: how much process memory is physically stored in swap space?

Let's say I run my program on a 64-bit Linux machine with 64 Gb of RAM. In my very small C program immediately after the start I do
void *p = sbrk(1024ull * 1024 * 1024 * 120);
this moving my data segment break forward by 120 Gb.
After the above sbrk call top entry for my process shows RES at some low value, VIRT at 120g, and SWAP at 120g.
After this operation I write something into the first 90 Gb of the above region
memset(p, 0xAB, 1024ull * 1024 * 1024 * 90);
This causes some changes in the top entry for my process: VIRT expectedly remains at 120g, RES becomes almost 64g, SWAP drops to around 56g.
The common Swap stats in the header of top output show that swap file usage increases, which is expected since my program will have to push about 26 Gb of memory pages into the swap file.
So, according to the above observations, SWAP column simply reports my process's non-RES address space regardless of whether this address space has been "materialized", i.e. regardless of whether I already wrote something into that region of virtual memory.
But is there any way to figure out how much of that SWAP size has actually been "materialized" and backed up by something stored in the swap file? I.e. is there any way to make top to display that 26 Gb value for my process?
The behavior depends on a version of procps you are using. For instance, in version 3.0.5 SWAP value equals:
task->size - task->resident
and it is exactly what you are encountering. Man top.1 says:
VIRT = SWAP + RES
Procps-ng, however, reads /proc/pid/status and sets SWAP correctly
https://gitlab.com/procps-ng/procps/blob/master/proc/readproc.c#L383
So, you can update procps or look at /proc/pid/status directly

Stress-ng: RAM testing commands

Stress-ng: Can we test RAM using stress-ng? What are the commands used to test RAM on a MIPS 32 device?
There are many memory based stressors in stress-ng:
stress-ng --class memory?
class 'memory' stressors: atomic bsearch context full heapsort hsearch
lockbus lsearch malloc matrix membarrier memcpy memfd memrate memthrash
mergesort mincore null numa oom-pipe pipe qsort radixsort remap
resources rmap stack stackmmap str stream tlb-shootdown tmpfs tsearch
vm vm-rw wcs zero zlib
Alternatively, one can also use VM based stressors too:
stress-ng --class vm?
class 'vm' stressors: bigheap brk madvise malloc mlock mmap mmapfork mmapmany
mremap msync shm shm-sysv stack stackmmap tmpfs userfaultfd vm vm-rw
vm-splice
I suggest looking at the vm stressor first as this contains a large range of stressor methods that exercise memory patterns and can possibly find broken memory:
-m N, --vm N
start N workers continuously calling mmap(2)/munmap(2) and writ‐
ing to the allocated memory. Note that this can cause systems to
trip the kernel OOM killer on Linux systems if not enough physi‐
cal memory and swap is not available.
--vm-bytes N
mmap N bytes per vm worker, the default is 256MB. One can spec‐
ify the size as % of total available memory or in units of
Bytes, KBytes, MBytes and GBytes using the suffix b, k, m or g.
--vm-ops N
stop vm workers after N bogo operations.
--vm-hang N
sleep N seconds before unmapping memory, the default is zero
seconds. Specifying 0 will do an infinite wait.
--vm-keep
do not continually unmap and map memory, just keep on re-writing
to it.
--vm-locked
Lock the pages of the mapped region into memory using mmap
MAP_LOCKED (since Linux 2.5.37). This is similar to locking
memory as described in mlock(2).
--vm-madvise advice
Specify the madvise 'advice' option used on the memory mapped
regions used in the vm stressor. Non-linux systems will only
have the 'normal' madvise advice, linux systems support 'dont‐
need', 'hugepage', 'mergeable' , 'nohugepage', 'normal', 'ran‐
dom', 'sequential', 'unmergeable' and 'willneed' advice. If this
option is not used then the default is to pick random madvise
advice for each mmap call. See madvise(2) for more details.
--vm-method m
specify a vm stress method. By default, all the stress methods
are exercised sequentially, however one can specify just one
method to be used if required. Each of the vm workers have 3
phases:
1. Initialised. The anonymously memory mapped region is set to a
known pattern.
2. Exercised. Memory is modified in a known predictable way.
Some vm workers alter memory sequentially, some use small or
large strides to step along memory.
3. Checked. The modified memory is checked to see if it matches
the expected result.
The vm methods containing 'prime' in their name have a stride of
the largest prime less than 2^64, allowing to them to thoroughly
step through memory and touch all locations just once while also
doing without touching memory cells next to each other. This
strategy exercises the cache and page non-locality.
Since the memory being exercised is virtually mapped then there
is no guarantee of touching page addresses in any particular
physical order. These workers should not be used to test that
all the system's memory is working correctly either, use tools
such as memtest86 instead.
The vm stress methods are intended to exercise memory in ways to
possibly find memory issues and to try to force thermal errors.
Available vm stress methods are described as follows:
Method Description
all iterate over all the vm stress methods
as listed below.
flip sequentially work through memory 8
times, each time just one bit in memory
flipped (inverted). This will effec‐
tively invert each byte in 8 passes.
galpat-0 galloping pattern zeros. This sets all
bits to 0 and flips just 1 in 4096 bits
to 1. It then checks to see if the 1s
are pulled down to 0 by their neighbours
or of the neighbours have been pulled up
to 1.
galpat-1 galloping pattern ones. This sets all
bits to 1 and flips just 1 in 4096 bits
to 0. It then checks to see if the 0s
are pulled up to 1 by their neighbours
or of the neighbours have been pulled
down to 0.
gray fill the memory with sequential gray
codes (these only change 1 bit at a time
between adjacent bytes) and then check
if they are set correctly.
incdec work sequentially through memory twice,
the first pass increments each byte by a
specific value and the second pass
decrements each byte back to the origi‐
nal start value. The increment/decrement
value changes on each invocation of the
stressor.
inc-nybble initialise memory to a set value (that
changes on each invocation of the stres‐
sor) and then sequentially work through
each byte incrementing the bottom 4 bits
by 1 and the top 4 bits by 15.
rand-set sequentially work through memory in 64
bit chunks setting bytes in the chunk to
the same 8 bit random value. The random
value changes on each chunk. Check that
the values have not changed.
rand-sum sequentially set all memory to random
values and then summate the number of
bits that have changed from the original
set values.
read64 sequentially read memory using 32 x 64
bit reads per bogo loop. Each loop
equates to one bogo operation. This
exercises raw memory reads.
ror fill memory with a random pattern and
then sequentially rotate 64 bits of mem‐
ory right by one bit, then check the
final load/rotate/stored values.
swap fill memory in 64 byte chunks with ran‐
dom patterns. Then swap each 64 chunk
with a randomly chosen chunk. Finally,
reverse the swap to put the chunks back
to their original place and check if the
data is correct. This exercises adjacent
and random memory load/stores.
move-inv sequentially fill memory 64 bits of mem‐
ory at a time with random values, and
then check if the memory is set cor‐
rectly. Next, sequentially invert each
64 bit pattern and again check if the
memory is set as expected.
modulo-x fill memory over 23 iterations. Each
iteration starts one byte further along
from the start of the memory and steps
along in 23 byte strides. In each
stride, the first byte is set to a ran‐
dom pattern and all other bytes are set
to the inverse. Then it checks see if
the first byte contains the expected
random pattern. This exercises cache
store/reads as well as seeing if neigh‐
bouring cells influence each other.
prime-0 iterate 8 times by stepping through mem‐
ory in very large prime strides clearing
just on bit at a time in every byte.
Then check to see if all bits are set to
zero.
prime-1 iterate 8 times by stepping through mem‐
ory in very large prime strides setting
just on bit at a time in every byte.
Then check to see if all bits are set to
one.
prime-gray-0 first step through memory in very large
prime strides clearing just on bit
(based on a gray code) in every byte.
Next, repeat this but clear the other 7
bits. Then check to see if all bits are
set to zero.
prime-gray-1 first step through memory in very large
prime strides setting just on bit (based
on a gray code) in every byte. Next,
repeat this but set the other 7 bits.
Then check to see if all bits are set to
one.
rowhammer try to force memory corruption using the
rowhammer memory stressor. This fetches
two 32 bit integers from memory and
forces a cache flush on the two
addresses multiple times. This has been
known to force bit flipping on some
hardware, especially with lower fre‐
quency memory refresh cycles.
walk-0d for each byte in memory, walk through
each data line setting them to low (and
the others are set high) and check that
the written value is as expected. This
checks if any data lines are stuck.
walk-1d for each byte in memory, walk through
each data line setting them to high (and
the others are set low) and check that
the written value is as expected. This
checks if any data lines are stuck.
walk-0a in the given memory mapping, work
through a range of specially chosen
addresses working through address lines
to see if any address lines are stuck
low. This works best with physical mem‐
ory addressing, however, exercising
these virtual addresses has some value
too.
walk-1a in the given memory mapping, work
through a range of specially chosen
addresses working through address lines
to see if any address lines are stuck
high. This works best with physical mem‐
ory addressing, however, exercising
these virtual addresses has some value
too.
write64 sequentially write memory using 32 x 64
bit writes per bogo loop. Each loop
equates to one bogo operation. This
exercises raw memory writes. Note that
memory writes are not checked at the end
of each test iteration.
zero-one set all memory bits to zero and then
check if any bits are not zero. Next,
set all the memory bits to one and check
if any bits are not one.
--vm-populate
populate (prefault) page tables for the memory mappings; this
can stress swapping. Only available on systems that support
MAP_POPULATE (since Linux 2.5.46).
So to run 1 vm stressor that uses 75% of memory using all the vm stressors with verification for 10 minutes with verbose mode enabled, use:
stress-ng --vm 1 --vm-bytes 75% --vm-method all --verify -t 10m -v

Manual Virtual Address Translation

I've looked at a few different articles related to this already but none of them explain the solution in a way that I can understand and replicate. I need to know how to translate a physical address to a virtual address in memory based on the following:
A simple virtual memory system has 32KB physical memory with 16-bit virtual address, of which 12 bits are used as offset. The following is the current content of the page table of one of the processes:
So basically I think the page size of this virtual memory system is 1024KB. I need a process to find the corresponding PA of VA B2A0. If you can give me the process I can go from there, you don't have to give me the final solution :)
Thanks in advance guys. Also, if you know of an article that does this already and I've just missed it, feel free to just link me to that.
Cheers.
32 KB is 2^15.
so there are 15 bits for every physical address, lower 12 of them are used as offset, higher 3 as a number of pageframe.
What virtual page does 0xb2a0 resides in? To determine this, we need to take bits of the address, higher than 2^12. The size of a page is 2^12, that is 4096 or 0x1000, so it is a virtual page number 0xb = 11 (floor of 0xb2a0 / 0x1000). Offset inside the page is 0xb2a0 modulo 0x1000, it's 0x2a0.
Then use the table to translate the virtual page number 11 to a physical pageframe. The virtual page is present (1), and it corresponds to the physical frame number with higher bits 111, that is 111 + twelve 0 in binary, => 0x7000 - it is the address of the start of the physical frame.
Our physical address resides at offset 0x2a0, so, the sought physical address is 0x7000 + 0x2a0 = 0x72a0.
Please, follow this flow and make it clear for you. If you have questions, read the Wikipedia first and if something is still not clear, ask :)
I was trying to do my examination review and study, and I couldn't find a solid answer to this same question. I consolidated what I have learnt, and I hope that whatever I summed up here will help those like me. :)
I find the explanation in the answer above a little hard to understand for my little brain.
I think this link below gives a better overview than Wikipedia's explanation:
http://williams.comp.ncat.edu/addrtrans.htm
This youtube video also offers an excellent guide in explaining the process of virtual address translation:
https://www.youtube.com/watch?v=6neHHkI0Z0o
Back to the question ->>>
The first question is - what is the 'page size' of this virtual memory system?
based on the definition here - https://en.wikipedia.org/wiki/Page_(computer_memory)
I was initially confused between 'pages' and 'page size' but I kinda figured it out now. Pages determines the number of pages available (like in a book), and page size is like (A4,A5,A6 pages in the book!).
As such, since the virtual memory and physical memory offset is the same and is mapped accordingly, we can determine the page size via the offset size. If the offset size is given as 12-bit, then 2^12 = 4,096 Byte a.k.a 4-KB.
Side question for curious minds, how many virtual memory pages are there?
- 16-bit of virtual address space minus 12-bit of offset = 4-bit
- which equals to 2^4 = 16-pages available (thus the table we see!)
Another side question for other curious minds, how many PHYSICAL memory pages are there?
- 32KB of physical Memory = 32 x 1024bytes = 32,768 bytes
- Log(32768) / Log(2) = 15-bits which also means 2^15 for total physical MEMORY
- minus the offset of 12-bit that we already know...
- 15-bit (total physical memory) minus 12-bit (offset) = 3-bit for physical address space
Going to the next question, what is the corresponding physical address of virtual address 0xb2a0 (that is currently set in hex notation)?
#Dmytro Sirenko answer above explains it quite well, I will help to rephrase it here.
We need to remember that our virtual address is - 16-bit, and that address space now contain is value = b2a0 (ignoring the 0x).
My short-cut (please correct me if am wrong), is that since the ratio of the address : offset (page size) is 4:12 = 1:3...
b | 2 a 0
^
page number | offset
When converting hex value b to decimal = 11.
I look into the table, and I found Page Frame = 111 in the table entry number 11.
111 is noted in binary and it correlates to the physical memory frame.
Remember, we were looking at 15-bit of Physical Memory Address space, as such, we can determine that:
1 1 1 | 0 0 0 0 0 0 0 0 0 0 0 0
Address | offset
As Offset are mapped directly from virtual memory to physical memory, we bring the value of (2a0) right into the physical memory. Unfortunately, we can't represent it right away in here because it's in hexadecimal format while my above address space is set in binary.
Considering that I am going to be tested in an examination and I won't be allowed to bring in a calculator... I will do a reverse and answer in Hexadecimal instead. :)
When we convert 111 into decimal (I go by 001 = 1, 010 = 2, 100 = 4, 101 = 5, 110 = 6, 111 = 7).
Now I need to convert from decimal to Hex! = 7 (dec) = 7
As such, the corresponding Physical Memory location of this virtual memory address is.... (loud drums and curtain open....)
7 2 a 0
which is notated in this manner 0x72a0.

Resources