How can I detect my RAM free and total space in Python? - linux

So, the title describes almost all the necessary to answer me. Just one more thing: please, just reply about libraries installed with Python by default, as the app which I'm developing is part of the Ubuntu App Showdown.
Running Python 2.7, Ubuntu 12.04.

You are asking for a number that is nearly impossible to calculate and has very little value.
Any Linux system that is running for an amount of time will have hardly any 'free' ram available. Just cat /proc/meminfo - the MemFree entry is usually in order of just a few megabytes.
So, where did that memory go?
The kernel caches all disk access, for starters.
That's usually visible in the Cached entry. Disk cache will be pruned when you require more memory, so you could add that number to MemFree .
But, if an application allocates (malloc() in C) 2 gigabytes on a system with exactly 2 gigabytes of RAM, that usually will just be granted: you get a valid pointer back.
However, none of the RAM is actually reserved for your application - that only happens when your application starts touching memory pages - each touched page will be allocated.
The maximum size you can ask for is available as CommitLimit.
But the application code itself might not be in RAM either - binary file and libraries are mmapp()ed, so again only pages that are touched are loaded into RAM.
If you run a tool like top - you get all kinds of memory info per process, including VIRT, RES and SHR.
VIRT is for 'virtual' - all memory pages that the app would need if it would claim all pages it has asked for.
RES is 'resident' - the amount of memory actually used
SHR is 'shared' - the amount of pages that are shared with other applications, like e.g. libraries that are loaded in multiple applications.
So, what is the value of knowing how much memory is available?
You can start an application that could require significantly more RAM than your system has, and yet it runs...
You might even be able to run the application twice or thrice - code pages are shared anyway...
Note: the above answer cuts quite a few corners, the real mechanisms are significantly more complex. And I haven't even started bringing swap space into the story.
But this will do for you, I hope...

Related

Bypassing 4KB block size limitation on block layer/device

We are developing an ssd-type storage hardware device that can take read/write request for big block size >4KB at a time (even in MBs size).
My understanding is that linux and its filesystem will "chop down" files into 4KB block size that will be passed to block device driver, which will need to physically fill the block with data from the device (ex., for write)
I am also aware the kernel page size has a role in this limitation as it is set at 4KB.
For experiment, I want to find out if there is a way to actually increase this block size, so that we will save some time (instead of doing multiple 4KB writes, we can do it with bigger block size).
Is there any FS or any existing project that I can take a look for this?
If not, what is needed to do this experiment - what parts of linux needs to be modified?
Trying to find out the level of difficulties and resource needed. Or, if it is even impossible to do so and/or any reason why we do not even need to do so. Any comment is appreciated.
Thanks.
The 4k limitation is due to the page cache. The main issue is that if you have a 4k page size, but a 32k block size, what happens if the file is only 2000 bytes long, so you only allocate a 4k page to cover the first 4k of the block. Now someone seeks to offset 20000, and writes a single byte. Now suppose the system is under a lot of memory pressure, and the 4k page for the first 2000 bytes, which is clean, gets pushed out of memory. How do you track which parts of the 32k block contain valid data, and what happens when the system needs to write out the dirty page at offset 20000?
Also, let's assume that the system is under a huge amount of memory pressure, we need to write out that last page; what if there isn't enough memory available to instantiante the other 28k of the 32k block, so we can do the read-modify-write cycle just to update that one dirty 4k page at offset 20000?
These problems can all be solved, but it would require a lot of surgery in the VM layer. The VM layer would need to know that for this file system, pages need to be instantiated in chunks of 8 pages at a time, and if that there is memory pressure to push out a particular page, you need write out all of the 8 pages at the same time if it is dirty, and then drop all 8 pages from the page cache at the same time. All of this implies that you want to track page usage and page dirty not at the 4k page level, but at the compound 32k page/"block" level. It basically will involve changes to almost every single part of the VM subsystem, from the page cleaner, to the page fault handler, the page scanner, the writeback algorithms, etc., etc., etc.
Also consider that even if you did hire a Linux VM expert to do this work, (which the HDD vendors would deeply love you for, since they also want to be able to deploy HDD's with a 32k or 64k physical sector size), it will be 5-7 years before such a modified VM layer would make its appearance in a Red Hat Enterprise Linux kernel, or the equivalent enterprise or LTS kernel for SuSE or Ubuntu. So if you are working at a startup who is hoping to sell your SSD product into the enterprise market --- you might as well give up now with this approach. It's just not going to work before you run out of money.
Now, if you happen to be working for a large Cloud company who is making their own hardware (ala Facebook, Amazon, Google, etc.) maybe you could go down this particular path, since they don't use enterprise kernels that add new features at a glacial pace --- but for that reason, they want to stick relatively close to the upstream kernel to minimize their maintenance cost.
If you do work for one of these large cloud companies, I'd strongly recommend that you contact other companies who are in this same space, and maybe you could collaborate with them to see if together you could do this kind of development work and together try to get this kind of change upstream. It really, really is not a trivial change, though --- especially since the upstream linux kernel developers will demand that this not negatively impact performance in the common case, which will not be involving > 4k block devices any time in the near future. And if you work at a Facebook, Google, Amazon, etc., this is not the sort of change that you would want to maintain as a private change to your kernel, but something that you would want to get upstream, since other wise it would be such a massive, invasive change that supporting it as an out-of-tree patch would be huge headache.
Although I've never written a device driver for Linux, I find it very unlikely that this is a real limitation of the driver interface. I guess it's possible that you would want to break I/O into scatter-gather lists where each entry in the list is one page long (to improve memory allocation performance and decrease memory fragmentation), but most device types can handle those directly nowadays, and I don't think anything in the driver interface actually requires it. In fact, the simplest way that requests are issued to block devices (described on page 13 -- marked as page 476 -- of that text) looks like it receives:
a sector start number
a number of sectors to transfer (no limit is mentioned, let alone a limit of 8 512B sectors)
a pointer to write the data into / read the data from (not a scatter-gather list for this simple case, I guess)
whether this is a read versus a write
I suspect that if you're seeing exclusively 4K accesses it's probably a result of the caller not requesting more than 4K at a time -- if the filesystem you're running on top of your device only issues 4K reads, or whatever is using the filesystem only accesses one block at a time, there is nothing your device driver can do to change that on its own!
Using one block at a time is common for random access patterns like database read workloads, but database log or FS journal writes or large serial file reads on a traditional (not copy-on-write) filesystem would issue large I/Os more like what you're expecting. If you want to try issuing large reads against your device directly to see if it's possible through whatever driver you have now, you could use dd if=/dev/rdiskN of=/dev/null bs=N to see if increasing the bs parameter from 4K to 1M shows a significant throughput increase.

Erlang garbage collection

I need your help in investigation of issue with Erlang memory consumption. How typical, isn't it?
We have two different deployment schemes.
In first scheme we running many identical nodes on small virtual machines (in Amazon AWS),
one node per machine. Each machine has 4Gb of RAM.
In another deployment scheme we running this nodes on big baremetal machines (with 64 Gb of RAM), with many nodes per machine. In this deployment nodes are isolated in docker containers (with memory limit set to 4 Gb).
I've noticed, that heap of processes in dockerized nodes are hogging up to 3 times much more RAM, than heaps in non-dockerized nodes with identical load. I suspect that garbage collection in non-dockerized nodes is more aggressive.
Unfortunately, I don't have any garbage collection statistics, but I would like to obtain it ASAP.
To give more information, I should say that we are using HiPE R17.1 on Ubuntu 14.04 with stock kernel. In both schemes we are running 8 schedulers per node, and using default fullsweep_after flag.
My blind suggestion is that Erlang default garbage collection relies (somehow) on /proc/meminfo (which is not actual in dockerized environment).
I am not C-guy and not familiar with emulator internals, so could someone point me to places in Erlang sources that are responsible for garbage collection and some emulator options which I can use to tweak this behavior?
Unfortunately VMs often try to be smarter with memory management than necessary and that not always plays nicely with the Erlang memory management model. Erlang tends to allocate and release a large number of small chunks of memory, which is very different to normal applications, which usually allocate and release a small number of big chunks of memory.
One of those technologies is Transparent Huge Pages (THP), which some OSes enable by default and which causes Erlang nodes running in such VMs to grow (until they crash).
https://access.redhat.com/solutions/46111
https://www.digitalocean.com/company/blog/transparent-huge-pages-and-alternative-memory-allocators/
https://docs.mongodb.org/manual/tutorial/transparent-huge-pages/
So, ensuring THP is switched off is first thing you can check.
The other is trying to tweak the memory options used when starting the Erlang VM itself, for example see this post:
Erlang: discrepancy of memory usage figures
Resulting options that worked for us:
-MBas aobf -MBlmbcs 512 -MEas aobf -MElmbcs 512
Some more theory about memory allocators:
http://www.erlang-factory.com/static/upload/media/139454517145429lukaslarsson.pdf
And more detailed description of memory allocator flags:
http://erlang.org/doc/man/erts_alloc.html
First thing to know, is that garbage collection i Erlang is process based. Each process is GC in their own time, and independently from each other. So garbage collection in your system is only dependent on data in your processes, not operating system itself.
That said, there could be some differencess between memory consumption from Eralang point of view, and System point of view. That why comparing erlang:memory to what your system is saying is always a good idea (it could show you some binary leaks, or other memory problems).
If you would like to understand little more about Erlang internals I would recommend those two talks:
https://www.youtube.com/watch?v=QbzH0L_0pxI
https://www.youtube.com/watch?v=YuPaX11vZyI
And from little better debugging of your memory management I could reccomend starting with http://ferd.github.io/recon/

does linux load program-pages on demand?

I wrote a program daemon that starts some sort of self-healing procedure when the hard drive controller of a computer crashes. This program already works fine, but I have concerns that the program (about 18KB compiled file size) may not be fully loaded into RAM by the operating system and that - when I'm really unlucky - some program pages have to be loaded from the disk exactly when the program has to come active and disk accesses are no longer possible.
After all, most of the time the program stays in an endless loop checking if everything is okay and 95% of the program code isn't used. So, I think, the Kernel may optimize the RAM usage by removing unused program pages from RAM.
So, my question: does Linux load and keep all program code pages into memory, making it unnecessary to access the hard disk again to run the program code itself, once the program has started?
Technical details: Linux Kernel 2.6.36+, about 1 GB of RAM, Debian 5, no swap space active
I already learned that I can prevent swapping by calling mlockall(MCL_CURRENT | MCL_FUTURE);, but wondering if I really need to update my machines.
No, the program code pages are memory mapped into the address space of the process, not so differently than any other mmap(), so that if you don't access these pages in a long time, they can eventually be removed from RAM. To avoid it, just use the mlockall() call.
From mlockall manual
mlockall() locks all pages mapped into the address space of the calling process.
This includes the pages of the code, data and stack segment, as well as shared
libraries, user space kernel data, shared memory, and memory-mapped files. All
mapped pages are guaranteed to be resident in RAM when the call returns success‐
fully; the pages are guaranteed to stay in RAM until later unlocked.
So, if locked, pages will be here. However, modifying mounted hard disk partition is always great risk, regardless of any kind of locks.

Calculating % memory used on Linux

Linux noob question:
If I have 500MB of RAM, and 500MB of swap space, can the OS and processes then use 1GB of memory?
In other words, is the total amount of memory available to programs and the OS the total of the physical memory size and swap size?
I'm trying to figure out which SNMP counters to query, but need to understand how Linux uses virtual memory a little better first.
Thanks
Actually, it IS essentially correct, but your "virtual" memory does NOT reside beside your "physical memory" (as Matthew Scharley stated).
Your "virtual memory" is an abstraction layer covering both "physical" (as in RAM) and "swap" (as in hard-disk, which is of course as much physical as RAM is) memory.
Virtual memory is in essention an abstraction layer. Your program always addresses a "virtual" address, which your OS translates to an address in RAM or on disk (which needs to be loaded to RAM first) depending on where the data resides. So your program never has to worry about lack of memory.
Nothing is ever quite so simple anymore...
Memory pages are lazily allocated. A process can malloc() a large quantity of memory and never use it. So on your 500MB_RAM + 500MB_SWAP system, I could -- at least in theory -- allocate 2 gig of memory off the heap and things will run merrily along until I try to use too much of that memory. (At which point whatever process couldn't acquire more memory pages gets nuked. Hopefully it's my process. But not always.)
Individual processes may be limited to 4 gig as a hard address limitation on 32-bit systems. Even when you have more than 4 gig of RAM on the machine and you're using that bizarre segmented 36-bit atrocity from hell addressing scheme, individual processes are still limited to only 4 gigs. Some of that 4 gigs has to go for shared libraries and program code. So yer down to 2-3 gigs of stack+heap as an ADDRESSING limitation.
You can mmap files in, effectively giving you more memory. It basically acts as extra swap. I.e. Rather than loading a program's binary code data into memory and then swapping it out to the swapfile, the file is just mmapped. As needed, pages are swapped into RAM directly from the file.
You can get into some interesting stuff with sparse data and mmapped sparse files. I've seen X-windows claim enormous memory usage when in fact it was only using up a tiny bit.
BTW: "free" might help you. As might "cat /proc/meminfo" or the Vm lines in /proc/$PID/status. (Especially VmData and VmStk.) Or perhaps "ps up $PID"
Although mostly it's true, it's not entirely correct. For a particular process, the environment you run it in may limit the memory available to your process. Check the output of ulimit -v as well.
Yes, this is essentially correct. The actual numbers might be (very) marginally lower, but for all intents and purposes, if you have x physical memory and y virtual memory (swap in linux), then you have x + y memory available to the operating system and any programs running underneath the OS.

What are the exact conditions based on which Linux swaps process(s) memory from RAM to a swap file?

My server has 8Gigs of RAM and 8Gigs configured for swap file. I have memory intensive apps running. These apps have peak loads during which we find swap usage increase. Approximately 1 GIG of swap is used.
I have another server with 4Gigs of RAM and 8 Gigs of swap and similar memory intensive apps running on it. But here swap usage is very negligible. Around 100 MB.
I was wondering what are the exact conditions or a rough formula based on which Linux will do a swapout of a process memory in RAM to the swap file.
I know its based on swapiness factor. What else is it based on? Swap file size? Any pointers to Linux kernel documentation/source code explaining this will be great.
I've seen a lot of people posting subjective explanations of what this does. Here is hopefully a more full answer.
In the split LRU on post 2.6.28 Linux swappiness is a multiplier used to arbitrarily modify the fraction that is calculated determining the pressure built up in both LRUs.
So, for example on a system with no free memory left - the value of the existing memory you have is measured based off of the rate of how much memory is listed as 'Active' and the rate of how often pages are promoted to active after falling into the inactive list.
An LRU with many promotions/demotions of pages between active and inactive is in a lot of use.
Typically file backed storage is cheaper and safer to evict when your running out of memory and automatically is given a modifier of 200 (this makes file backed memory 200 times more worthless than swap backed memory (Which has a value of 0) when it multiplies this fraction.
What swappiness does is modify this value by deducting the swappiness number you gave (default 60) to file memory and adding the swappiness value you gave as a multiplier to anon memory. Thus the default swappiness leaves you with anonymous memory being 80 times more valuable than file memory (200-60 for file, 0+60 for anon). Thus, on a typical linux system that has used up all its memory, page cache would have to be 80 TIMES more active than anonymous memory for anonymous memory to be swapped out in favour of page cache.
If you set swappiness to 100 this gives anon a modifier of 100 and file memory a modifier of 100 (200 - 100) leaving both LRUs equally weighted. Thus on a file heavy system that wants page cache providing the anon memory is not as active as page cache then anon memory will be swapped to disk to make space for extra page cache.
Linux (or any other OS) divides memory up into pages (typically 4Kb). Each of these pages represent a chunk of memory. Usage information for these pages is maintained, which basically contains info about whether the page is free or is in use (part of some process), whether it has been accessed recently, what kind of data it contains (process data, executable code etc.), owner of the page, etc. These pages can also be broadly divided into two categories - filesystem pages or the page cache (in which all data read/written to your filesystem resides) and pages belonging to processes.
When the system is running low on memory, the kernel starts swapping out pages based on their usage. Using a list of pages sorted w.r.t recency of access is common for determining which pages can be swapped out (linux kernel has such a list too).
During swapping, Linux kernel needs to decide what to trade-off when nuking pages in memory and sending them to swap. If it swaps filesystem pages too aggressively, more reads are required from the filesystem to read those pages back when they are needed. However, if it swaps out process pages more aggressively it can hurt interactivity, because when the user tries to use the swapped out processes, they will have to be read back from the disk. See a nice discussion here on this.
By setting swappiness = 0, you are telling the linux kernel not to swap out pages belonging to processes. When setting swappiness = 100 instead, you tell the kernel to swap out pages belonging to processes more aggressively. To tune your system, try changing the swappiness parameter in steps of 10, monitoring performance and pages being swapped in/out at each setting using the "vmstat" command. Keep the setting that gives you the best results. Remember to do this testing during peak usage hours. :)
For database applications, swappiness = 0 is generally recommended. (Even then, test different settings on your systems to arrive to a good value).
References:
http://www.linuxvox.com/2009/10/what-is-the-linux-kernel-parameter-vm-swappiness/
http://www.pythian.com/news/1913/

Resources