variable page size in linux - linux

I was going through, paging concept in linux, where most of the time I found that page size is fix (default value is 4kb).
When I searched more over internet I found there is one POWER architecture from IBM which support variable size paging.
https://www-304.ibm.com/support/knowledgecenter/ssw_aix_71/com.ibm.aix.performance/variable_large_page.htm
The POWER7® processor supports mixing 4 KB, 64 KB, and 16 MB page sizes within a single segment.
Here I did not get one thing whether this variable page size is handled during run time? I mean if variable page size can be allocated on demand at anytime?? and if yes then how is it possible?
If this not the right platform please move this question on right one.

Linux has a fixed size of memory pages which is set to 4KB. Since this leads to a huge number of page entries to be managed by the MMU, Linux (RH) also supports transparent huge pages. This feature has to be enabled while booting and allows page sizes of 2Mb and 2GB. Be aware that the kernel is doing some kind of defragmentation which degrades performance. This can be switched off by writing 'never' to /sys/kernel/mm/transparent_hugepage/defrag.

Related

Linux huge page value in sysctl.conf

Why do we configue huge page value in Linux ?
When we will configure huge page value and how we calculate the huge page value ?
Usually the huge page value is configured, when large memory pages needs to be allocated contigously (in a sequence) in RAM.
The below link has an example, which explains when and how:
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/5/html/Tuning_and_Optimizing_Red_Hat_Enterprise_Linux_for_Oracle_9i_and_10g_Databases/sect-Oracle_9i_and_10g_Tuning_Guide-Large_Memory_Optimization_Big_Pages_and_Huge_Pages-Sizing_Big_Pages_and_Huge_Pages.html
When you need Huge page value:
When the applications require large blocks of memory for processing.
Translation lookaside buffer (TLB) is a chaching mechaninsm for memory, which is used for quicker memory access. During memory management mapping entries are entered in to TLB, so that it helps in quick access of memory whenever required. (To know much about TLB refer https://en.wikipedia.org/wiki/Translation_lookaside_buffer)
TLB has a fixed number of slots, So it is a scarce source. So when the application has requirement for large blocks of memory,using huge pages will have less number of entries in the TLB, so that the TLB is used much effectively.
If you want more in depth information on huge page and TLB, Please walk through below Kernel documentation. But it is too deep.
https://www.kernel.org/doc/Documentation/vm/hugetlbpage.txt

On Linux why is our pagesize so big (60GB). Does each process get a copy?

Using AWS R3.8xlarge, with 240GB, 2 oracle instances, 90GB each. No Hugepages set up and Transparent Huge Pages are enabled.
On one of the instances doing proc/meminfo we see
PageTables: 60709140 kB
Why would the pagesize be so high? What causes a pagetable to increase to that point. Does each process get a copy as noted in various web sites (doesn't seem possible)
In Oracle there is one dedicated session process per connection. Each process accesses SGA(shared memory segments). So when you take SGA size, divide it by 4096B (page size) and multiply it by amount of concurrent db connections.
You get an amount of mapped SGA pages. This number can be huge.
Note: Oracle recommends to use HugePages and disable Transparent HugePages. This will make SGA non-swappable. This will make Linux memory management much simpler.

Changing memory page size

i was reading there,that the number of virtual memory pages are equal to number of physical memory frames and that the size of frames and that of pages are equal,like for my 32bit system the page size is 4096.
Well i was thinking is there there any way to change the page size or the frame size?
I am using Linux OS. I have searched a lot and what I found is,we can change the page size or in fact we can increase the page size by shifting to huge pages.Is there any other way to change (increase or decrease) or set the page size of our choice?
(Not coding anything,a general question)
In practice it is (nearly) impossible to "change" the memory page size, since the page size is known & determined by the MMU hardware, so the operating system is taking that into account. However, notice that some Linux systems (and hardware!) have hugetlbpage and Linux mmap(2) might accept MAP_HUGETLB (but your code should handle the case of processors or kernels without huge page support, e.g. by calling mmap again without MAP_HUGETLB when the first mmap with MAP_HUGETLB has failed).
From what I read, on some Linux systems, you can use hugetlbpage with various sizes. But the sysadmin can restrict these (or some kernels disable it), so your code should always be prepared that a mmap with MAP_HUGETLB fails.
Even with those "huge pages", the page size is not arbitrary. Use sysconf(_SC_PAGE_SIZE) on POSIX systems to get the standard page size (it is usually 4Kbytes). See also sysconf(3)
AFAIK, even on systems with hugetlbpage feature, mmap can be called without MAP_HUGETLB and the page size (as reported by sysconf(_SC_PAGE_SIZE)) is still 4Kbytes. Perhaps some recent kernels with some weird configurations are using huge pages everywhere, and IIRC some kernels might be configured with 1Mbyte page (i am not sure about that and I might be wrong)...

linux / virtual/physical page sizes

Is the page size constant? To be more specific , getconf PAGE_SIZE gives 4096, fair enough. But can this change through a program's runtime? Or is it constant throughout the entire OS process spawn. I.e. , is it possible for a process to have 1024 and 2048 AND 4096 page sizes? Let's just talk virtual page sizes for now. But going further is it possible for a virtual page to span a physical page of greater size?
It is possible for a process to use more than one pagesize. On newer kernels this may even happen without notice, see Andrea Arcangelis transparent huge pages.
Other than that, you can request memory with a different (usually larger) page size over hugetlbfs.
The main reason for having big pages is performance, the TLB in the processor is very limited in size, and fewer but bigger pages mean more hits.

What are the exact conditions based on which Linux swaps process(s) memory from RAM to a swap file?

My server has 8Gigs of RAM and 8Gigs configured for swap file. I have memory intensive apps running. These apps have peak loads during which we find swap usage increase. Approximately 1 GIG of swap is used.
I have another server with 4Gigs of RAM and 8 Gigs of swap and similar memory intensive apps running on it. But here swap usage is very negligible. Around 100 MB.
I was wondering what are the exact conditions or a rough formula based on which Linux will do a swapout of a process memory in RAM to the swap file.
I know its based on swapiness factor. What else is it based on? Swap file size? Any pointers to Linux kernel documentation/source code explaining this will be great.
I've seen a lot of people posting subjective explanations of what this does. Here is hopefully a more full answer.
In the split LRU on post 2.6.28 Linux swappiness is a multiplier used to arbitrarily modify the fraction that is calculated determining the pressure built up in both LRUs.
So, for example on a system with no free memory left - the value of the existing memory you have is measured based off of the rate of how much memory is listed as 'Active' and the rate of how often pages are promoted to active after falling into the inactive list.
An LRU with many promotions/demotions of pages between active and inactive is in a lot of use.
Typically file backed storage is cheaper and safer to evict when your running out of memory and automatically is given a modifier of 200 (this makes file backed memory 200 times more worthless than swap backed memory (Which has a value of 0) when it multiplies this fraction.
What swappiness does is modify this value by deducting the swappiness number you gave (default 60) to file memory and adding the swappiness value you gave as a multiplier to anon memory. Thus the default swappiness leaves you with anonymous memory being 80 times more valuable than file memory (200-60 for file, 0+60 for anon). Thus, on a typical linux system that has used up all its memory, page cache would have to be 80 TIMES more active than anonymous memory for anonymous memory to be swapped out in favour of page cache.
If you set swappiness to 100 this gives anon a modifier of 100 and file memory a modifier of 100 (200 - 100) leaving both LRUs equally weighted. Thus on a file heavy system that wants page cache providing the anon memory is not as active as page cache then anon memory will be swapped to disk to make space for extra page cache.
Linux (or any other OS) divides memory up into pages (typically 4Kb). Each of these pages represent a chunk of memory. Usage information for these pages is maintained, which basically contains info about whether the page is free or is in use (part of some process), whether it has been accessed recently, what kind of data it contains (process data, executable code etc.), owner of the page, etc. These pages can also be broadly divided into two categories - filesystem pages or the page cache (in which all data read/written to your filesystem resides) and pages belonging to processes.
When the system is running low on memory, the kernel starts swapping out pages based on their usage. Using a list of pages sorted w.r.t recency of access is common for determining which pages can be swapped out (linux kernel has such a list too).
During swapping, Linux kernel needs to decide what to trade-off when nuking pages in memory and sending them to swap. If it swaps filesystem pages too aggressively, more reads are required from the filesystem to read those pages back when they are needed. However, if it swaps out process pages more aggressively it can hurt interactivity, because when the user tries to use the swapped out processes, they will have to be read back from the disk. See a nice discussion here on this.
By setting swappiness = 0, you are telling the linux kernel not to swap out pages belonging to processes. When setting swappiness = 100 instead, you tell the kernel to swap out pages belonging to processes more aggressively. To tune your system, try changing the swappiness parameter in steps of 10, monitoring performance and pages being swapped in/out at each setting using the "vmstat" command. Keep the setting that gives you the best results. Remember to do this testing during peak usage hours. :)
For database applications, swappiness = 0 is generally recommended. (Even then, test different settings on your systems to arrive to a good value).
References:
http://www.linuxvox.com/2009/10/what-is-the-linux-kernel-parameter-vm-swappiness/
http://www.pythian.com/news/1913/

Resources