How to change linux kernel page size on x86?

How to change linux kernel page size on x86? - linux

It is known that the page size is 4KB on x86. If we have 64G RAM, then there are 16M page enteries, it will cause too mant tlb misses. In x86, we can enable PAE to access more than 4GB memory. (and the page size could split to 2MB per page?)
The Hugepagetlbfs permit us to use huge pages to get performance benefit(Eg: less tlb miss), but there are a lot of limitions:
Must use share memory interface to write the Hugepagetlbfs
Not all processes can use it
Reserve memory may fail
So, if we can change the page size to 2M or 4M, then we can get the performance benefit.
In my way, I tried some ways to change it, but fail.
Compile the kernel with CONFIG_HUGETLBFS, fail
Compilt the kernel with CONFIG_TRANSPARENT_HUGEPAGE and CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS, fail
Could somebody help me?

Related

Find exact physical memory usage in Ubuntu/Linux

(I'm new to Linux)
Say I've 1300 MB memory, on a Ubuntu machine. OS and other default programs consumes 300 MB memory and 1000 MB is free for my own applications.
I installed my application and I could configure it to use 700 MB memory, when the application starts.
However I couldn't verify its actual memory usage. Even I disabled swap space.
The "VIRT" value shows a huge value and "RES", "SHR", "%MEM" shows very less value.
It is difficult to find actual physical memory usage, similar to "Resource monitor" in Windows, which will say my application is using 700 MB memory.
Is there any way to find actual physical memory in Ubuntu/Linux ?

TL;DR - Virtual memory is complicated.
The best measure of a Linux processes current usage of physical memory is RES.
The RES value represents the sum of all of the processes pages that are currently resident in physical memory. It includes resident code pages and resident data pages. It also includes shared pages (SHR) that are currently RAM resident, though these pages cannot be exclusively ascribed to >>this<< process.
The VIRT value is actually the sum of all notionally allocated pages for the process, and it includes pages that are currently RAM resident, pages that are currently swapped to disk.
See https://stackoverflow.com/a/56351211/1184752 for another explanation.
Note that RES is giving you (roughly) instantaneous RAM usage. That is what you asked about ...
The "actual" memory usage over time is more complicated because the OS's virtual memory subsystem is typically be swapping pages in and out according to demand. So, for example, some of your application's pages may not have been accesses recently, and the OS may then swap them out (to swap space) to free up RAM for other pages required by your application ... or something else.
The VIRT value while actually representing virtual address space, is a good approximation of total (virtual) memory usage. However, it may be an over-estimate:
Some pages in a processes address space are shared between multiple processes. This includes read-only code segments, pages shared between parent and child processes between vfork and exec, and shared memory segments created using mmap.
Some pages may be set to have illegal access (e.g. for stack red-zones) and may not be backed by either RAM or swap device pages.
Some pages of the address space in certain states may not have been committed to either RAM or disk yet ... depending on how the virtual memory system is implemented. (Consider the case where a process requests a huge memory segment and neither reads from it or writes to it. It is possible that the virtual memory implementation will not allocate RAM pages until the first read or write in the page. And if you use lazy swap reservation, swap pages not be committed either. But beware that you can get into trouble with lazy swap reservation.)
VIRT can also be under-estimate because the OS usually reserves swap space for all pages ... whether they are currently swapped in or swapped out. So if you count the RAM and swap versions of a given page as separate units of storage, VIRT usually underestimates the total storage used.
Finally, if your real goal is to limit your application to using at most
700 MB (of virtual address space) then you can use ulimit -v ... to do this. If the application tries to request memory beyond its limit, the request fails.

mprotect(addr, size, PROT_NONE) for guard pages and its memory consumption

I allocated some memory using memalign, and I set the last page as a guard page using mprotec(adde, size, PROT_NONE), so this page is inaccessible.
Does the inaccessible page consume physical memory? In my opinion, the kernel can offline the physical pages safely, right?
I also tried madvise(MADV_SOFT_OFFLINE) to manually offline the physical memory but the function always fails.
Can anybody tell me the internal behavior of kernel with mprotect(PROT_NONE), and how to offline the physical memory to save physical memory consumption?

Linux applications are using virtual memory. Only the kernel is managing physical RAM. Application code don't see the physical RAM.
A segment protected with mprotect & PROT_NONE won't consume any RAM.
You should allocate your segment with mmap(2) (maybe you want MAP_NORESERVE). Mixing memalign with mprotect may probably break libc invariants.
Read carefully madvise(2) man page. MADV_SOFT_OFFLINE may require a specially configured kernel.

linux kernel and user address spaces

In 4GB RAM system running linux, 3gb is given to user-space and 1gb to kernel, does it mean that even if kernel is using 50MB and user space is running low, user cannot use kernel space? if no, why? why cannot linux map their pages to user space?

The 3/1 separation refers to VIRTUAL memory. The virtual memory, however, is sparse. Meaning that even though there is "on paper" 1 GB, in practice a LOT less than that is used. Whenever possible, the "virtual" memory is backed by physical pages (meaning, if your virtual memory footprint is 50MB, then you're using 50 MB of physical memory), up until the point where there is no more physical memory, in which case you either A) spill over to swap or B) the system encounters a low memory condition and frees memory the hard way - by killing processes.
It gets more complicated. Virtual memory is not really used (committed) until actually used. THis means when you allcoate memory, you get an "IOU" or "promise" for memory, but the memory only gets consumed when you actually use the memory, as in write some value to it. Overall, however, you are correct in that there is segregation - at the hardware level - between kernel and user mode. In other words, of the 4GB addressable (assuming 32bit), the top 1GB, even though it is in your address space, is not accessible to you, and in practice belongs to the kernel. (The limit of 4 GB stems from 32-bit pointers - for 64 bits, it's effectively 48, which means 256TB, btw, 128TB user, 128TB kernel). Further, this 1GB of your space that is the kernel's is identical in other processes, too. So it doesnt matter which process you are in, when you "call kernel", (i.e. a system call), you end up in the top 1GB, which is shared in between all processes.
Again, the key point is that the 1GB isn't REALLY used in full. The actual memory footprint of the kernel is a lot smaller - in the tens of MB. It's jsut that theoretically, the kernel can use UP to 1GB, but that is assuming it can be backed up either by RAM or (rarely) swap. You can look at /proc/meminfo. As for the answer above, about changing 3/1 - it actually CAN be changed (in Windows it's as easy as a kernel command line option in boot.ini, in Linux it requires recompilation).

The 3GB/1GB split in process space is fixed. There is no way to change it regardless of how much RAM is actually in use.

When would a process cause paging in linux

Today, i received few alerts about swapping activity of 3000KB/sec. This linux box have very few process running and has total 32GB of RAM. When i logged and did execute free, i did not see any thing suspicious. The ratio of free memory to used in (buffers/cache) row, was high enough(25GB in free, to 5GB in usage).
So i am wondering what are main causes of paging on linux system
How does swappiness impact paging
How long does a page stay in Physical RAM before it is swapped out. What controls this behavior on linux?
Is it possible that even if there is adequate free physical RAM, but a process's memory access pattern is such that data is spread over multiple pages. Would this cause paging?
For example consider a 5GB array, such that the program access 5GB in a loop, although slowly, such that pages which are not used , are swapped out. Again, keep in mind, that even if buffer is 5GB, there could be 20GB of physical RAM available.
UPDATE:
The linux vendor is RHEL 6.3, kernel version 2.6.32-279.el6.x86_64

Nginx killed for “Out of memory”?

My nginx has 8 process, each of them take about 150M memory.
From time to time，my nginx get killed for this（dmesg）
21228 total pagecache pages 50 pages in swap cache Swap cache stats:
add 85, delete 35, find 63/64 Free swap = 2031300kB Total swap =
2031608kB 3407856 pages RAM 3180034 pages HighMem 290515 pages
reserved 36448 pages shared 491788 pages non-shared
Out of memory:kill process 16373 (nginx) score 5013 or a child
I googled it, turns out the Low Memory is run out, so the oom-killer begin doing its job...
Here is my questions:
I get 16GB memory, but the Low Memory is just 800M (free -lm), how can i use the rest of them? google tell me i can use a patch kernel-hugemem , but it only for centos4, Mine is centos5.2, so...
Is it too much for a nginx process use 150MB memory? You guys never meet such a problem?
The Low memory confused me, 32-bit system can use more than 3GB memory without PAE, so why the Low/Hign memory for? Is not a kernel bug?

I notice you have a lot of free swap, which should prevent the OOM killer from activating but for some reason isn't. This question from serverfault indicates that OOM with unused swap means the failed allocation was in kernel mode, caused by a driver that wanted a lot of memory. That might also explain why the free HighMem wasn't good enough to satisfy the request (kernel mode things can request memory from specific regions). You should look a bit farther back in the dmesg to see if there are any clues, like a backtrace.
As for the 3rd part of your question, there is a distinction between HighMen and LowMem when using PAE because it extends the physical address space to 36 bits while the virtual address space is still 32 bits. Access to the full potential 64G of physical addresses can't be done as quickly as access to a fixed 4G subset, so the kernel divides the memory up and tries to keep the most performance-critical things in the "low" area.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string