Is there a simple way to measure total memory consumption? - rust

I want to report the total memory usage of an Actix Web application from within itself.
Because this is only shown at the bottom of a single page and used very rarely and it does not need to be that accurate (e.g. it is enough if it measures just the heap), I don't want to dedicate too much code to it or use an external crate.
Is there a simple one-liner to measure how much total memory my program is using without using an external dependency or is that not possible?

Is there a simple one-liner in Rust
No.
The best way to get the total memory consumption of a process is via the OS's facilities. Which, of course, depends on what OS you are running on.
There's crates like sysinfo that abstract it all away though and implement cross-platform process metrics.

Related

How To implement swapfile cross operating system

There are use cases where I can't have a lot of ram, and sometimes due to docker based services doesn't always provide more than 512mb/1gb of ram, or if I run multiple rust based gui apps and if each take 100mb of ram normally, how can I implement a swapfile/ virtual ram to exceed allotted ram? Also os level swapfiles don't let users choose which app can use real ram and which swapfile, so it can become a problem too. I want to use swapfile as much as possible, and not even real ram, if possible. Users and hosting services provide with lot of storage usually (more than 10gb normally) so it would be a good way to use the available storage too!
If swapfile or anything like that aren't possible, I would like to know if there is any difference in speed and cpu consumption between "cache data in ram" apps and "cache data in file and read it when required" apps. If the latter is slow normally and not as efficient as swapfiles, I would like to know the possible ways how os manages to make swapfiles that efficient than apps.
An application does not control whether the memory they allocate is allocated on real RAM, on a swap partition, or else. You just ask for memory, and the OS is responsible for finding available memory to give to you.
Besides that, note that using swap (sometimes called swapping) is extremely bad performance-wise. How much depends a lot on your hardware, but it's about three orders of magnitude. This is even amplified if you are interacting with a user: a program that is fetching some resources will not be too bothered if it has to wait one minute to get them instead of a few milliseconds because the system is under heavy load, but a user will generally not be that patient.
Also note that, when swapping, the OS does not chose which application gets the faster RAM and which ones get the swap memory at random. It will try to determine which application should be prioritized, by how much, etc. based on how it was configured (at least for the Linux kernel), so in reality it's the user who, in the end, decides which applications get the most RAM (ahead of time, of course: they are not prompted each time the kernel has to make that decision with a little pop-up...).
Finally, modern OS allow several applications to allocate memory that overlap, as long as each application is not fully using the memory it asked for (which is kind of usual), allowing you to run applications that in theory require more RAM that you actually have.
This was on the OS part: now to the application part. Usually, when you write a program (whose purpose is not specifically RAM-related), you should not really care for memory consumption (up to a certain point), especially in Rust. Not only that is usually handled by the OS in case you used a little bit too much memory, but when it's possible, most people prefer to trade a little more memory usage (even a lot more) for better CPU performance, because RAM is a lot cheaper than CPU.
There are exceptions, of course, in which the memory consumption is so high that you can't really afford not paying attention. In these cases, either you let the user deal with this problem (ie. this application is known to consume a lot of memory because there are no other ways to do this, so if you want to use it, just have a lot of memory), as often video games do, or you rethink your application to reduce the memory usage trading it for some CPU efficiency, as for example is done when you are handling graphs so huge you couldn't even store them on all the hard disks of the world (in which case your application has to be smart enough to be able to work on small parts of the graph at the time), or finally you are working with a big resource but which can be stored on the hard disk, so you just write it on a file and access it chunks-by-chunks, as some database managers do.

How to monitor cpu and memory usage itself in Haskell program

In Haskell program, how to monitor CPU and memory usage by the program itself?
In my program, there could be idle time because of user does not make any request.
At that time, I want to evaluate statistics to trigger GC.
However, I want to record statistics not by static methods but lazy method.
So, I want to design the program could monitor itself to find the relative idle timing to evaluate.
I've searched web and Hackage about {monitoring,cpu,memory} libraries, but I could find only total CPU and memory usages, not a program itself.
Are there any monitoring libraries exist that I skipped?
Or do I have to make something myself?
If I have to make it myself, what should I study for? It should works with Windows, Linux and OS X.
Update
currentBytesUsed in GHC.Stats gives memory usages. Thanks danidiaz.
I'm still finding that monitoring CPU usages.
P.S. I do not want to profile it. I have done it many times already....
I don't know if profiling is enough for you, or if you need live monitoring, but there is a lot of profiling possibilities described in the haskell docs https://downloads.haskell.org/~ghc/latest/docs/html/users_guide/profiling.html
There is also remote monitoring
https://hackage.haskell.org/package/ekg

Tool to identify app's data/code most susceptible to memory performance

Context:
-- embedded platform running Linux with some static RAM which is declared about 3 times faster then the rest of RAM (dynamic). The amount of this fast memory is 512kB and the official name is eSRAM. (Details not important for this post: Galileo board, information on eSRAM and relevant kernel API: https://communities.intel.com/servlet/JiveServlet/previewBody/22488-102-1-26046/Quark_SWDevManLx_330235_001.pdf)
-- eSRAM can be used by an application with some support from the kernel---a simple driver that allocates kernel memory on its behalf, overlays the memory with eSRAM (this is done in physical space) and mmaps it to app's virtual memory space. This was tested and confirmed to work as expected.
Problem:
Identify which sections of app's data (and possibly code) to map into eSRAM to achieve optimum performance gain. A suitable analysis tool is required.
After some search I'm not sure if any existing tool is actually suited to this task. Currently my best bet is to develop a specialized Valgrind tool. But maybe there is already something in the ecosystem to start with. Any advice/information is welcome even if, for instance, a tool is kind of partially suited etc.
P.S.
Full analysis should probably take a lot of factors into account, like:
-- memory access patterns (cache performance)
-- changes over time (one could consider eSRAM paging)
...
I have taken a look at Valgrind Cachegrind. It can collect data about data cache reades and data cache writes. And cg_annotate can report Line-by-line Counts for you program. Can it be useful for you to find variables in your program that cause most operations with data cache and in this way to identify data that can benefit most from moving to quick memory? http://valgrind.org/docs/manual/cg-manual.html#cg-manual.line-by-line
Probably, you are interested in D cache reads (Dr) and D cache writes (Dw), or even (Dr+Dw). In that way you can find a place in your code which does most (Dr+Dw) and try to move this place in your quick memory.

measure cycles spent in accessing remote cache

How to measure cycles spent in accessing shared remote cache say L3. I need to get this cache access information both system-wide and for per-thread. Is there any specific tool/hardware requirements. Or can I use any formula to get an approximate value of cycles spent over a time interval
To get the average latencies (when a single thread is running) to various caches present on your machine, you can use memory profiler tools such as RMMA for windows (http://cpu.rightmark.org/products/rmma.shtml) and Lmbench for linux.
You can also write your own benchmarks based on the ideas used by these tools.
See the answers posted on this StackOverflow question:
measuring latencies of memory
Or Google for how the Lmbench benchmark works.
If you want to find exact latencies for particular memory access patterns, you will need to use a simulator. This way you can trace a memory access as it flows through the memory system. However simulators will not model all the effects that are present in a modern processor or memory system.
If you want to learn how multiple threads affect the average latency to L3, I think the best bet would be to write your own benchmark.

Preallocate memory for a program in Linux before it gets started

I have a program that repeatedly solves large systems of linear equations using cholesky decomposition. Characterising is that I sometimes need to store the complete factorisation which can exceed about 20 GB of memory. The factorisation happens inside a library that I call. Furthermore, this matrix and the resulting factorisation changes quite frequently and as such the memory requirements as well.
I am not the only person to use this compute-node. Therefore, is there a way to start the program under Linux and preallocate free memory for the process?
Something like: $: prealloc -m 25G ./program
I'll stick my neck out and say that I don't think that there is such a way under Linux. I think that the philosophy of Linux (and every other multi-tasking o/s I've used or heard of) is to provide the programmer (and the program) with the illusion that they have the whole of the computer's memory to play with and to make it very difficult indeed for a programmer to interfere with the o/s.
Instead, I think that you should plan to modify your program to grab the memory it will (or may) require when it starts up, that is, do the memory management yourself in whatever your chosen language is. How easy this might be for you, considering calls to a library, I don't know.
I've never heard of such a way. Usually it would be bad for other users on the node if one program went ahead and hogged all available memory. It's not good practice.
But opinions aside, I would probably write my program in such a way that it acts like a small environment that is able to make multiple runs of the routine in question without ending. It would allocate lots of memory on startup, then wait for user commands (through a minimal shell) and make the runs requested with the allocated memory pool. It would hold on to the pool until the user requests termination.
Of course this requires you to have an interactive session on the node, which you may not have.

Resources