Finding out memory footprint size - linux

I would like to be able to restart a service when it is using too much memory (this is related to a bug in a third party library)
I have used this to limit the amount of memory that can be requested:
resource.setrlimit(resource.RLIMIT_AS, (128*1024*1024, 128*1024*1024))
But the third party library gets stuck in a memory allocation busyloop failing and re-requesting memory. So I want to be able to, in a thread, poll the current size of the memory of the process.
Language I'm using is python, but a solution for any programming language can be translated into python code, provided it's viable and sensible on linux.

Monit is a service you can run to monitor external processes. All you need to do is dump your pid to a file for monit to read. People often use it to monitor their web server. One of the tests monit can do is for total memory usage. You can set a value and if your process uses too much memory it will be restarted. Here's an example monit config
check process yourProgram
with pidfile "/var/run/YOUR.pid"
start program = "/path/to/PROG.py"
stop program = "/script/to/kill/prog/kill_script.sh"
restart if totalmem is greater than 60.0 MB

This is the code that I came up with. Seems to work properly, and avoids too much string parsing. The variable names I unpack come from proc(5) man page, and this is probably a better way of extracting the OS information than string parsing /proc/self/status.
def get_vsize():
parts = open('/proc/self/stat').read().split()
(pid, comm, state, ppid, pgrp, session, tty, tpgid, flags, minflt, cminflt,
majflt, cmajflt, utime, stime, cutime, cstime, counter, priority, timeout,
itrealvalue, starttime, vsize, rss, rlim, startcode, endcode, startstack,
kstkesp, kstkeip, signal, blocked, sigignore, sigcatch, wchan,
) = parts[:35]
return int(vsize)
def memory_watcher():
while True:
time.sleep(120)
if get_vsize() > 120*1024*1024:
os.kill(0, signal.SIGTERM)

You can read the current memory usage using the /proc filesystem.
The format is /proc/[pid]/status. In the status virtual file you can see the current VmRSS (resident memory).

Related

Limit Number of Core Dumps by Process Name

QUESTION Is there an easy, established and accepted way to limit the number of core dumps for a given process on Linux?
WHAT I WANT My ideal solution would be a one-line command to set the per-application limit of x core dumps for all applications. Alternatively, I would be happy with a method to set the limit for each application individually.
WHAT I DON'T WANT I know I can already set a limit for the size of the core dumps using ulimit. I don't want to limit the size of the dumps, just the number of them. I also know I could modify the apport script to get any functionality I desire, but I would like to avoid this if there is a less intrusive solution.
MOTIVATION I am working on a system that is sensitive to excessive disk usage. If a given application cores, I want to keep the core file so that I can debug the problem. If it cores again, which is highly likely since several applications are restarted by a watcher if they die, I don't want to keep the core file because it is unlikely to contain new information and it will just take up disk space.
Process can coredump once, then it is killed. I presume you meant programs like in the rest of the question.
There is nothing of the sort in stock kernels, but things like grsecurity at least used to offer the relevant feature to tamper brute forcing against ASLR.
What do you need this for?

Status of linux process when core dumping

Let's say I have a process which will generate a huge core file if it crashes somehow (e.g mysql). I want to know what's the status of the process when it core dumping. Is it as before or changing to zombie?
My real life problem is like this:
I have a monitor to check the status of a process. Once it realizes the process crashes(by monitoring the status of the process), it will do something. I want to make sure the monitor do something only after core dumping finished. That's why I want to know the process status when core dumping.
If your monitor is starting the processes with fork it should able to get SIGCHLD signals then call waitpid(2). AFAIK waitpid will tell you when the core dumping has finished (and won't return successfully before that)
Read also core(5)
Perhaps using inotify(7) facilities on the directory containing the core dump might help.
And systemd might be relevant too (I don't know the details)
BTW, while core dumping, I believe that the process status (as reported thru proc(5) in 3rd field of /proc/$PID/stat) is
D Waiting in uninterruptible disk sleep
So if you are concerned about long core dump time you could for example loop every half-second to fopen then fscanf then fclose that /proc/$PID/stat pseudo-file till the status is no more D
At last, core dumps are usually quick (unless you run on a supercomputer with a terabyte of RAM) these days (on Linux with a good file system like Ext4 or BTRFS), because I believe that (if you have sufficient RAM) the core dump file stays in the page cache. Core dumps lasting half an hour were common in the previous century on supercomputers (Cray) of that time.
Of course you could also stat(2) the core file.
See also http://www.linuxatemyram.com/

How to dump the heap of running C++ process to a file under Linux?

I've got a program that is running on a headless/embedded Linux box, and under certain circumstances that program seems to be using up quite a bit more memory (as reported by top, etc) than I would expect it to use.
Since the fault condition is difficult to reproduce outside of the actual working environment, and since the embedded box doesn't have niceties like valgrind or gdb installed, what I'd like to do is simply write out the process's heap-memory to a file, which I could then transfer to my development machine and look through at my leisure, to see if I can tell from the contents of the file what kind of data it is that is taking up the bulk of the heap. If I'm lucky there might be a smoking gun like a repeating string or magic-number that comes up a lot, that points me to the place in my code that is either leaking or perhaps just growing a data structure without bounds.
Is there a good way to do this? The only way I can think of would be to force the process to crash and then collect a core dump, but since the fault condition is rare it would be preferable if I could collect the information without crashing the process as a side effect.
You can read the entire memory space of the process via /proc/pid/mem; You can read /proc/pid/maps to see what is where in the memory space (so you can find the bounds of the heap and read just that). You can attempt to read the data while the process is running (in which case it might be changing while you are reading it), or you can stop the process with a SIGSTOP signal and later resume it with a SIGCONT.

How to monitor a process in Linux CPU, Memory and time

How can I benchmark a process in Linux? I need something like "top" and "time" put together for a particular process name (it is a multiprocess program so many PIDs will be given)?
Moreover I would like to have a plot over time of memory and cpu usage for these processes and not just final numbers.
Any ideas?
I typically throw together a simple script for this type of work.
Take a look at the kernel documentation for the proc filesystem (Google 'linux proc.txt').
The first line of /proc/stat (Section 1.8 in proc.txt) will give you cumulative cpu usage stats (i.e. user, nice, system, idle, ...). For each process, the file /proc/$PID/stat (Table 1-4 in proc.txt) will provide you with both process-specific cpu usage stats and memory usage stats (see rss).
If you google a bit you'll find plenty of detailed info on these files, and pointers to libraries / apps / code snippets that can help you obtain / derive the values you need. With that in mind, I'll focus on the high-level strategy.
For CPU stats, use your favorite scripting language to create an executable that takes a set of process ids for monitoring. At a fixed interval (ex: 1 second) poll / calculate the cumulative totals for each process and the system as a whole. During each poll interval, write all results on a single line to stdout.
For memory stats, write a similar script, but simply log the per-process memory usage. Memory is a bit easier as we directly obtain the instantaneous values.
Run these script for the duration of your test, passing the set of processes ids that you'd like to monitor and redirecting its output to a log file.
./logcpu $(pidof foo) $(pidof bar) > cpustats
./logmem $(pidof foo) $(pidof bar) > memstats
Import the contents of these files into a spreadsheet (for certain applications this is as easy as copy / paste). For CPU, you are after instantaneous values but have cumulative values, so you'll need to do some minor spreadsheet work to derive these values (it's just the delta 't(x + 1) - t(x)'). Of course you could have your cpu logger write the delta, but you'll be spending a bit more time up front on the script.
Finally, use your spreadsheet to generate a nice plot.
Following are the tools for monitoring a linux system
System commands like top, free -m, vmstat, iostat, iotop, sar, netstat, etc. Nothing comes near these linux utility when you are debugging a problem. These command give you a clear picture that is going inside your server
SeaLion: Agent executes all the commands mentioned in #1 (also user defined) and outputs of these commands can be accessed in a beautiful web interface. This tool comes handy when you are debugging across hundreds of servers as installation is clear simple. And its FREE
Nagios: It is the mother of all monitoring/alerting tools. It is very much customization but very much difficult to setup for beginners. There are sets of tools called nagios plugins that covers pretty much all important Linux metrics
Munin
Server Density: A cloudbased paid service that collects important Linux metrics and gives users ability to write own plugins.
New Relic: Another well know hosted monitoring service.
Zabbix

Can I tell Linux not to swap out a particular processes' memory?

Is there a way to tell Linux that it shouldn't swap out a particular processes' memory to disk?
Its a Java app, so ideally I'm hoping for a way to do this from the command line.
I'm aware that you can set the global swappiness to 0, but is this wise?
You can do this via the mlockall(2) system call under Linux; this will work for the whole process, but do read about the argument you need to pass.
Do you really need to pull the whole thing in-core? If it's a java app, you would presumably lock the whole JVM in-core. I don't know of a command-line method for doing this, but you could write a trivial program to call fork, call mlockall, then exec.
You might also look to see if one of the access pattern notifications in madvise(2) meets your needs. Advising the VM subsystem about a better paging strategy might work out better if it's applicable for you.
Note that a long time ago now under SunOS, there was a mechanism similar to madvise called vadvise(2).
If you wish to change the swappiness for a process add it to a cgroup and set the value for that cgroup:
https://unix.stackexchange.com/questions/10214/per-process-swapiness-for-linux#10227
There exist a class of applications in which you never want them to swap. One such class is a database. Databases will use memory as caches and buffers for their disk areas, and it makes absolutely no sense that these are ever put to swap. The particular memory may hold some relevant data that is not needed for a week until one day when a client asks for it. Without the caching/swapping, the database would simply find the relevant record on disk, which would be quite fast; but with swapping, your service might suddenly be taking a long time to respond.
mysqld includes code to use the OS / system call memlock. On Linux, since at least 2.6.9, this system call will work for non-root processes that have the CAP_IPC_LOCK capability[1]. When using memlock(), the process must still work within the bounds of the LimitMEMLOCK limit. [2]. One of the (few) good things about systemd is that you can grant the mysqld process these capabilities, without requiring a special program. If can also set the rlimits as you'd expect with ulimit. Here is an override file for mysqld that does the requisite steps, including a few others that you might need for a process such as a database:
[Service]
# Prevent mysql from swapping
CapabilityBoundingSet=CAP_IPC_LOCK
# Let mysqld lock all memory to core (don't swap)
LimitMEMLOCK=-1
# do not kills this process if low on memory
OOMScoreAdjust=-900
# Use higher io scheduling
IOSchedulingClass=realtime
Type=simple
ExecStart=
ExecStart=/usr/sbin/mysqld --memlock $MYSQLD_OPTS
Note The standard community mysql currently ships with Type=forking and adds --daemonize in the option to the service on the ExecStart line. This is inherently less stable than the above method.
UPDATE I am not 100% happy with this solution. After several days of runtime, I noticed the process still had enormous amounts of swap! Examining /proc/XXXX/smaps, I note the following:
The largest contributor of swap is from a stack segment! 437 MB and fluctuating. This presents obvious performance issues. It also indicates stack-based memory leak.
There are zero Locked pages. This indicates the memlock option in MySQL (or Linux) is broken. In this case, it wouldn't matter much because MySQL can't memlock stack.
You can do that by the mlock family of syscalls. I'm not sure, however, if you can do it for a different process.
As super user you can 'nice' it to the highest priority level -20 and hope that's enough to keep it from being swapped out. It usually is. Positive numbers lower scheduling priority. Normal users cannot nice upwards (negative nos.)
Except in extremely unusual circumstances, asking this question means that You're Doing It Wrong(tm).
Seriously, if Linux wants to swap and you're trying to keep your process in memory then you're putting an unreasonable demand on the OS. If your app is that important then 1) buy more memory, 2) remove other apps/daemons from the machine, or dedicate a machine to your app, and/or 3) invest in a really fast disk subsystem. These steps are reasonable for an important app. If you can't justify them, then you probably can't justify wiring memory and starving other processes either.
Why do you want to do this?
If you are trying to increase performance of this app then you are probably on the wrong track. The OS will swap out a process to increase memory for disk cache - even if there is free RAM, the kernel knows best (actauly the samrt guys that wrote the scheduler know best).
If you have a process that needs responsiveness (it's swapped out while not used and you need it to restart quickly) then nice it to high priority, mlock, or using a real time kernel might help.

Resources