updation of dirty pages in swap in Linux - linux

What I have read,
Swap space has no file system
Disk has filesystem. Whenever a file which is on disk is modified then its modified content is written to new disk block (and not to the original block) and associated data structures are updated.
Dirty pages are written back to the Swap before they are paged out(due to various reasons).
The question is, are dirty pages written back to their original Page-Slots or they are written to new Page-Slot ? If written to new page slot then what is the procedure ?

Let me try to answer the questions you raise in generic terms.
First of all, the page partition is called a swap partition in eunuchs for historical reasons. In ye olde days before virtual memory, entire processes were swapped out. Now processes are paged out.
For performance reasons, the operating system wants to do paging in complete blocks. A page generally maps to one or more disk blocks. On most non-eunuchs systems, the page file is a contiguous file. Paging is done using virtual block I/O to the page file (and the executable file or libraries).
The traditional eunuchs file (inode) system was quick and dirty designed. There is no ability to create a contiguous file. The only way to write contiguous data is to use an entire disk or disk partition. Eunuchs databases and page files have then been disk partitions (Mac OS uses a different system). Instead of doing virtual block I/O to a page file the system does logical (or physical) I/O to the disk.
When a process allocates virtual memory, normally page file space is a prerequisite. Thus the page file location for a page frame remains in the same location. If there were not he case, a process might need to have a page out and not have an available location in the page file.

Related

Swap space and dirty pages

I can't understand the utility of dirty bit, that should be useful during pages replacement, to mark dirty pages.
Swap space is a disk portion where OS puts pages that don't fit in primary memory. So, why a not-dirty page shouldn't be written on disk?
Let's take for example a page swapped out from memory to the disk. At this point let's imagine that it is first moved to primary memory again and then it is moved back to disk again.
When it is moved to primary memory, I don't think the disk will retain a copy of it.
Therefore, even if this page does not get dirty in primary memory, why it should not be rewritten on the disk when it is freed again from primary memory?
When the page is swapped back into memory (loaded into RAM from disk) the bits in the swap file are not invalidated or erased - they still contain the same values that were written out when the page was swapped from RAM to disk. So at the point when it is swapped from disk to RAM the pages in RAM and disk are identical. If no writes are performed then the RAM and disk (swap) versions of the page remain identical. If the kernel decides to swap this page out of RAM again, there is no need to write it to disk (swap) because the correct contents of the page is already on disk. So the page can simply be freed and use for some other purpose. But if a write has been performed then the version of the page on disk and in swap is different, and in this case the dirty bit is set indicating that the page must be written to disk before it can be reused.
Processors that use a dirty bit set that bit whenever a write is made to a page.
If the bit is clear, that means the page has not been changed. If the operating system needs to page out that page,it know that it does not have to write that page (with a clear dirty bit) back to the paging file.

When and how is mmap'ed memory swapped in and out?

In my understanding, mmap'ing a file that fits into RAM will be like having the file in memory.
Say that we have 16G of RAM, and we first mmap a 10G file that we use for a while. This should be fairly efficient in terms of access. If we then mmap a second 10G file, will that cause the first one be swapped out? Or parts of it? If so, when will this happen? At the mmap call, or on accessing the memory area of the newly loaded file?
And if we want to access the memory of the pointer for the first file again, will that make it load the swap the file in again? So, say we alternate reading between memory corresponding to the first file and the second file, will that lead to disastrous performance?
Lastly, if any of this is true, would it be better to mmap several smaller files?
As has been discussed, your file will be accessed in pages; on x86_64 (and IA32) architectures, a page is typically 4096 bytes. So, very little if any of the file will be loaded at mmap time. The first time you access some page in either file, then the kernel will generate a page fault and load some of your file. The kernel may prefetch pages, so more than one page may be loaded. Whether it does this depends on your access pattern.
In general, your performance should be good if your working set fits in memory. That is, if you're only regularly accesning 3G of file across the two files, so long as you have 3G of RAM available to your process, things should generally be fine.
On a 64-bit system there's no reason to split the files, and you'll be fine if the parts you need tend to fit in RAM.
Note that if you mmap an existing file, swap space will not be required to read that file. When an object is backed by a file on the filesystem, the kernel can read from that file rather than swap space. However, if you specify MMAP_PRIVATE in your call to mmap, swap space may be required to hold changed pages until you call msync.
Your question does not have a definitive answer, as swapping in/out is handled by your kernel, and each kernel will have a different implementation (and linux itself offers different profiles depending on your usage, RT, desktop, server…)
Generally speaking, though, whatever you load in memory is done using pages, so your mmap'ed file in memory is loaded (and offloaded) by pages between all the levels of memory (the caches, RAM and swap).
Then if you load two 10GB data into memory, you'll have parts of both between the RAM and your Swap, and the kernel will try to keep in RAM the pages you're likely to use now and guess what you'll load next.
What it means is that if you do truly random access to a few bytes of data in both files alternatively, you should expect awful performance, if you access contiguous chunks sequentially from both files alternatively, you should expect decent performance.
You can read some more details about kernel paging into virtual memory theory:
https://0xax.gitbooks.io/linux-insides/content/Theory/Paging.html
https://en.wikipedia.org/wiki/Paging

Linux memory management (caching)

I'm having a hard time telling the difference between the different cache areas (OS). I'd love a brief explanation about disk\buffer\swap\page cache. Where do they reside? What are the main differences between them?
From what I understand the page cache is part of the main memory that stores pages brought from an I/O device.
Are buffer cache and disk cache the same? Do they "live" at the I/O device?
Many thanks!!
In linux two caches were distinct: Files were in the page cache, disk blocks were in the buffer cache. Given that most files are represented by a filesystem on a disk, data was represented twice, once in each of the caches. Many Unix systems follow a similar pattern.
The buffer cache remains, however, as the kernel still needs to perform block I/O in terms of blocks, not pages. As most blocks represent file data, most of the buffer cache is represented by the page cache. But a small amount of block data isn't file backed—metadata and raw block I/O for example—and thus is solely represented by the buffer cache.
Disk cache/Buffer cache
This cache caches disk blocks to optimize block I/O.
It is the RAM used for faster access to disk.It is embedded in the disk or it can be portion of Main memory set aside.
Swap cache/Page cache
This cache caches pages of files to optimize file I/O
The swap cache is a list of page table entries. This page table entry for a swapped out page and describes which swap file the page is being held in together with its location in the swap file, so that when has to be brought back again we will be having its location in swap file.
It resides on disk.

File Caching between processes

I'm interested in knowing under windows and linux, does file caching work between processes? if process A reads the whole file, and a new process B wants to read parts of it (or all of it), would it make sense to assume the file is already in memory? or does the caching happen only per file object in each process?
Both Windows and Linux cache file data in system memory, separate from processes. You can't make any assumptions on how much of the file, if any, is still in cache at any given time, however.
At a high level, the operating system maintains a cache of fixed-size pages (normally 4 KB on Linux, 256 KB on Windows). Each page contains part of a file. When your process does a read, the operating system searches the cache for pages with the data you requested. If it can't find all of the data you requested, it reads the required pages into the cache from disk, possibly overwriting other existing pages.

Write a cached page before it is reclaimed

everyone. I am stuck on the following question.
I am working on a hybrid storage system which uses an ssd as a cache layer for hard disk. To this end, the data read from the hard disk should be written to the ssd to boost the subsequent reads of this data. Since Linux caches data read from disk in the page cache, the writing of data to the ssd can be delayed; however, the pages caching the data may be freed, and accessing the freed pages is not recommended. Here is the question: I have "struct page" pointers pointing to the pages to be written to the ssd. Is there any way to determine whether the page represented by the pointer is valid or not (by valid I mean the cached page can be safely written to the ssd? What will happen if a freed page is accessed via the pointer? Is the data of the freed page the same as that before freeing?
Are you using cleancache module? You should only get valid pages from it and it should remain valid until your callback function finished.
Isn't this a cleancache/frontswap reimplementation? (https://www.kernel.org/doc/Documentation/vm/cleancache.txt).
The benefit of existing cleancache code is that it calls your code only just before it frees a page, so before the page resides in RAM, and when there is no space left in RAM for it the kernel calls your code to back it up in tmem (transient memory).
Searching I also found an existing project that seems to do exactly this: http://bcache.evilpiepirate.org/:
Bcache is a Linux kernel block layer cache. It allows one or more fast
disk drives such as flash-based solid state drives (SSDs) to act as a
cache for one or more slower hard disk drives.
Bcache patches for the Linux kernel allow one to use SSDs to cache
other block devices. It's analogous to L2Arc for ZFS, but Bcache also
does writeback caching (besides just write through caching), and it's
filesystem agnostic. It's designed to be switched on with a minimum of
effort, and to work well without configuration on any setup. By
default it won't cache sequential IO, just the random reads and writes
that SSDs excel at. It's meant to be suitable for desktops, servers,
high end storage arrays, and perhaps even embedded.
What you are trying to achieve looks like the following:
Before the page is evicted from the pagecache, you want to cache it. This, in concept, is called a Victim cache. You can look for papers around this.
What you need is a way to "pin" the pages targeted for eviction for the duration of the IO. Post IO, you can free the pagecache page.
But, this will delay the eviction, which is possibly needed during memory pressure to create more un-cached pages.
So, one possible solution is to start your caching algorithm a bit before the pagecache eviction starts.
A second possible solution is to set aside a bunch of free pages and exchange the page being evicted form the page cache with a page from the free pool, and cache the evicted page in the background. But, you need to now synchronize with file block deletes, etc

Resources