Reading living process memory without interrupting it - linux

I would like to explore the memory of a living process, and when I do so, the process must not get disturbed - so attaching gdb to the process (which would stop it) is not an option.
Therefore I would like to get this info from /proc/kcore (if you know of another way to do this please let me know).
So I made a little experiment. I created a file called TEST with only "EXTRATESTEXTRA" inside.
Then I opened it with less
$ less TEST
I got the PID of this process with
$ ps aux | grep TEST
user 7785 0.0 0.0 17944 992 pts/8 S+ 16:15 0:00 less TEST
user 7798 0.0 0.0 13584 904 pts/9 S+ 16:16 0:00 grep TEST
And then I used this script to create a dump of all files :
#!/bin/bash
grep rw-p /proc/$1/maps | sed -n 's/^\([0-9a-f]*\)-\([0-9a-f]*\) .*$/\1 \2/p' | while read start stop; do gdb --batch --pid $1 -ex "dump memory $1-$start-$stop.dump 0x$start 0x$stop"; done
(I found it on this site https://serverfault.com/questions/173999/dump-a-linux-processs-memory-to-file)
$ sudo ./dump_all_pid_memory.sh 7785
After this, I looked for "TRATESTEX" in all dumped files :
$ grep -a -o -e '...TRATESTEX...' ./*.dump
./7785-00624000-00628000.dump:HEXTRATESTEXTRA
./7785-00b8f000-00bb0000.dump:EXTRATESTEXTRA
./7785-00b8f000-00bb0000.dump:EXTRATESTEXTRA
So I concluded that there must be an occurance of this string somewhere between 0x00624000 and 0x00628000 .
Therefore I converted the offsets into decimal numbers and used dd to get the memory from /proc/kcore :
$ sudo dd if="/proc/kcore" of="./y.txt" skip="0" count="1638400" bs=1
To my surprise, the file y.txt was full of zeros (I didn't find the string I was looking for in it).
As a bonus surprise, I ran a simmilar test at the same time with a different test file and found that the other test string i was using
(both processes with less were running at the same time) should be found at the same location (the dumping and greping gave the same offset).
So there must be something I don't understand clearly.
Isn't the /proc/pid/maps supposed to show the offset of the memory (i.e. : if it would say "XXX" is at offset 0x10, another program could not be using the same offset am I right? - this is the source of my second surprise)
How can I read /proc/kmap to get the memory that belongs to a process which's pid I know ?

If you have root access and are on a linux system, you can use the following linux script (adapted from Gilles' excellent unix.stackexchange.com answer and the answer originally given in the question above but including SyntaxErrors and not being pythonic):
#!/usr/bin/env python
import re
import sys
def print_memory_of_pid(pid, only_writable=True):
"""
Run as root, take an integer PID and return the contents of memory to STDOUT
"""
memory_permissions = 'rw' if only_writable else 'r-'
sys.stderr.write("PID = %d" % pid)
with open("/proc/%d/maps" % pid, 'r') as maps_file:
with open("/proc/%d/mem" % pid, 'r', 0) as mem_file:
for line in maps_file.readlines(): # for each mapped region
m = re.match(r'([0-9A-Fa-f]+)-([0-9A-Fa-f]+) ([-r][-w])', line)
if m.group(3) == memory_permissions:
sys.stderr.write("\nOK : \n" + line+"\n")
start = int(m.group(1), 16)
if start > 0xFFFFFFFFFFFF:
continue
end = int(m.group(2), 16)
sys.stderr.write( "start = " + str(start) + "\n")
mem_file.seek(start) # seek to region start
chunk = mem_file.read(end - start) # read region contents
print chunk, # dump contents to standard output
else:
sys.stderr.write("\nPASS : \n" + line+"\n")
if __name__ == '__main__': # Execute this code when run from the commandline.
try:
assert len(sys.argv) == 2, "Provide exactly 1 PID (process ID)"
pid = int(sys.argv[1])
print_memory_of_pid(pid)
except (AssertionError, ValueError) as e:
print "Please provide 1 PID as a commandline argument."
print "You entered: %s" % ' '.join(sys.argv)
raise e
If you save this as write_mem.py, you can run this (with python2.6 or 2.7) or early in python2.5 (if you add from __future__ import with_statement) as:
sudo python write_mem.py 1234 > pid1234_memory_dump
to dump pid1234 memory to the file pid1234_memory_dump.

For process 1234 you can get its memory map by reading sequentially /proc/1234/maps (a textual pseudo-file) and read the virtual memory by e.g. read(2)-ing or mmap(2)-ing appropriate segments of the /proc/1234/mem sparse pseudo-file.
However, I believe you cannot avoid some kind of synchronization (perhaps with ptrace(2), as gdb does), since the process 1234 can (and does) alter its address space at any time (with mmap & related syscalls).
The situation is different if the monitored process 1234 is not arbitrary, but if you could improve it to communicate somehow with the monitoring process.
I'm not sure to understand why do you ask this. And gdb is able to watch some location without stopping the process.

Since the 3.2 version of the kernel. You can use the process_vm_readv system call to read process memory without interruption.
ssize_t process_vm_readv(pid_t pid,
const struct iovec *local_iov,
unsigned long liovcnt,
const struct iovec *remote_iov,
unsigned long riovcnt,
unsigned long flags);
These system calls transfer data between the address space of the
calling process ("the local process") and the process identified by
pid ("the remote process"). The data moves directly between the
address spaces of the two processes, without passing through kernel
space.

You'll have to use /proc/pid/mem to read process memory. I wouldn't recommend trying to read /proc/kcore or any of the kernel memory functions (which is time consuming).

If you just want to get the value of a global variable or a specified address, you can use my tool gvardump instead of reading the entire memory.
gvardump will parse the variable address and print its value nicely without causing process interruption.
For example:
root#ubuntu:/home/u/trace_test# ./gvardump.py 53670 -a 1 '*g_ss[0].sss[0].ps'
*((g_ss[0]).sss[0]).ps = {
.a = 6,
.sss = {
{
.bbb = 0,
.ps = 0x563ca42a2020,
.bs = {
.m = 0,
},
},
// other 9 elements are omit
},
...

and when I do so, the process must not get disturbed - so attaching gdb to the process (which would stop it) is not an option.
I have modified gdb to avoid attaching.
With this modified gdb, you can run gdb -m <PID> to explore the memory without stopping the process.

i achieved this by issuing the below command
[root#stage1 ~]# echo "Memory usage for PID [MySql]:"; for mem in {Private,Rss,Shared,Swap,Pss};do grep $mem /proc/ps aux |grep mysql |awk '{print $2}'|head -n 1/smaps | awk -v mem_type="$mem" '{i=i+$2} END {print mem_type,"memory usage:"i}' ;done
Result Output
Memory usage for PID [MySql]:
Private memory usage:204
Rss memory usage:1264
Shared memory usage:1060
Swap memory usage:0
Pss memory usage:423

Related

Is there a way to consume all of the file descriptors of a running proc?

it is a negative testing to check if the program could handle the FD exhausting issue, and I am not able to modify the code. I need to check how the problem behave when hit the limit. it at least not being allowed to crash
I want to consume all the FDs of a running process, I tried the followed command to reduce the softlimit and hardlimit of the process to simulate this, however, I want to real consume the opened FD to its original limit, is there any other ways to achieve this?
# ls /proc/169607/fd|wc -l
503
# prlimit --pid 169607 --nofile=503:503
I also have other follow up questions need to confirm, I tried to attach the process in gdb and invoke open to consume the FD, it works like in below:
# this way works
(gdb) p open("/dev/null")
$11 = 80
# shell does not work
(gdb) shell for i in `seq 1 10`; do echo $i>"/dev/null"; done
$1 = 5
(gdb) define lopen
Type commands for definition of "lopen".
End with a line saying just "end".
>set $total = $arg0
>set $i = 0
> while($i<$total)
> set $i = $i + 1
> p open("/dev/null")
> end
>end
(gdb) lopen 10 # this works in loop
I found I could still open files even if reached the hard limit and soft limit of the current consumed amount, why is that?
E.g.,
# ls /proc/169607/fd|wc -l
503
# prlimit --pid 169607 --nofile=503:503
# In gdb I could still open files, and I found the fd keep increasing
(gdb) p open("/dev/null")

Linux: check if file descriptor is available for reading

Considering the following example, emulating a command which gives output after 10 seconds: exec 5< <(sleep 10; pwd)
In Solaris, if I check the file descriptor earlier than 10 seconds, I can see that it has a size of 0 and this tells me that it hasn't been populated with data yet. I can simply check every second until the file test condition is met (different from 0) and then pull the data:
while true; do
if [[ -s /proc/$$/fd/5 ]]; then
variable=$(cat <&5)
break
fi
sleep 1
done
But in Linux I can't do this (RedHat, Debian etc). All file descriptors appear with a size of 64 bytes no matter if they hold data or not. For various commands that will take a variable amount of time to dump their output, I will not know when I should read the file descriptor. No, I don't want to just wait for cat <&5 to finish, I need to know when I should perform the cat in the first place. Because I am using this mechanism to issue simultaneous commands and assign their output to corresponding file descriptors. As mentioned already, this works great in Solaris.
Here is the skeleton of an idea :
#!/bin/bash
exec 5< <(sleep 4; pwd)
while true
do
if
read -t 0 -u 5 dummy
then
echo Data available
cat <&5
break
else
echo No data
fi
sleep 1
done
From the Bash reference manual :
If timeout is 0, read returns immediately, without trying to read and
data. The exit status is 0 if input is available on the specified file
descriptor, non-zero otherwise.
The idea is to use read with -t 0 (to have zero timeout) and -u 5 (read from file descriptor 5) to instantly check for data availability.
Of course this is just a toy loop to demonstrate the concept.
The solution given by User Fred using only bash builtins works fine, but is a tiny bit non-optimal due to polling for the state of a file descriptor. If calling another interpreter (for example Python) is not a no-go, a non-polling version is possible:
#! /bin/bash
(
sleep 4
echo "This is the data coming now"
echo "More data"
) | (
python3 -c 'import select;select.select([0],[],[])'
echo "Data is now available and can be processed"
# Replace with more sophisticated real-world processing, of course:
cat
)
The single line python3 -c 'import select;select.select([0],[],[])' waits until STDIN has data ready. It uses the standard select(2) system call, for which I have not found a direct shell equivalent or wrapper.

How do I find the current process running on a particular PBS job

I am trying to write a script to provide diagnostics on processes. I have submitted a script to a job scheduling server using qsub. I can easily find the node that the job gets sent to. But I would like to be able to find what process is currently being run.
ie. I have a list of different commands in the submitted script, how can I find the current one that is running, and the arguments passed to it?
example of commands in script
matlab -nodesktop -nosplash -r "display('here'),quit"
python runsomethings.py
I would like to see whether the nodes is currently executing the first or second line.
When you submit a job, pbs_server pass your task to pbs_mom. pbs_mom process/daemon actually executes your script on the execution node. It
"creates a new session as identical user."
This means invoking a shell. You specialize the shell at the top of the script marking your choice with shebang: #!/bin/bash).
It's clear, that pbs_mom stores process (shell) PID somewhere to kill the job and to monitor if the job (shell process) have finished.
UPD. based on #Dmitri Chubarov comment: pbs_mom stores subshell PID internally in memory after calling fork(), and in the .TK file which is under torque installation directory: /var/spool/torque/mom_priv/jobs on my system.
Dumping file internals in decimal mode (<job_number>, <queue_name> should be your own values):
$ hexdump -d /var/spool/torque/mom_priv/jobs/<job_number>.<queue_name>.TK
have disclosed, that in my torque implementation it is stored in position
00000890 + offset 4*2 = 00000898 (it is hex value of first byte of PID in .TK file) and has a length of 2 bytes.
For example, for shell PID=27110 I have:
0000890 00001 00000 00001 00000 27110 00000 00000 00000
Let's recover PID from .TK file:
$ hexdump -s 2200 -n 2 -d /var/spool/torque/mom_priv/jobs/<job_number>.<queue_name>.TK | tr -s ' ' | cut -s -d' ' -f 2
27110
This way you've found subshell PID.
Now, monitor process list on the execution node and find name of child processes (getcpid function is a slighlty modified version of that posted earlier on SO):
function getcpid() {
cpids=`pgrep -P $1|xargs`
for cpid in $cpids;
do
ps -p "$cpid" -o comm=
getcpid $cpid
done
}
At last,
getcpid <your_PID>
gives you the child processes' names (note, there will be some garbage lines, like task numbers). This way you will finally know, what command is currently running on the execution node.
Of course, for each task monitored, you should obtain the PID and process name on the execution node after doing
ssh <your node>
You can automatically retrieve node name(s) in <node/proc+node/proc+...> format (process it further to obtain bare node names):
qstat -n <job number> | awk '{print $NF}' | grep <pattern_for_your_node_names>
Note:
The PID method is reliable and, as I believe, optimal.
Search by name is worse, it provides you unambiguous result only if your invoke different commands in your scripts, and no user executes the same software on the node.
ssh <your node>
ps aux | grep matlab
You will know if matlab runs.
Simple and elegant way to do it is to print to a log file
`
ARGS=" $A $B $test "
echo "running MATLAB now with args: $ARGS" >> $LOGFILE
matlab -nodesktop -nosplash -r "display('here'),quit"
PYARGS="$X $Y"
echo "running Python now with args: $ARGS" >> $LOGFILE
python runsomethings.py
`
And monitor the output of $LOGFILE using tail -f $LOGFILE

linux: redirect stdout after the process started [duplicate]

I have some scripts that ought to have stopped running but hang around forever. Is there some way I can figure out what they're writing to STDOUT and STDERR in a readable way?
I tried, for example, to do:
$ tail -f /proc/(pid)/fd/1
but that doesn't really work. It was a long shot anyway.
Any other ideas?
strace on its own is quite verbose and unreadable for seeing this.
Note: I am only interested in their output, not in anything else. I'm capable of figuring out the other things on my own; this question is only focused on getting access to stdout and stderr of the running process after starting it.
Since I'm not allowed to edit Jauco's answer, I'll give the full answer that worked for me (Russell's page relies on un-guaranteed behaviour that, if you close file descriptor 1 for STDOUT, the next creat call will open FD 1.
So, run a simple endless script like this:
import time
while True:
print 'test'
time.sleep(1)
Save it to test.py, run with
$ python test.py
Get the PID:
$ ps auxw | grep test.py
Now, attach gdb:
$ gdb -p (pid)
and do the fd magic:
(gdb) call creat("/tmp/stdout", 0600)
$1 = 3
(gdb) call dup2(3, 1)
$2 = 1
Now you can tail /tmp/stdout and see the output that used to go to STDOUT.
There's several new utilities that wrap up the "gdb method" and add some extra touches. The one I use now is called "reptyr" ("Re-PTY-er"). In addition to grabbing STDERR/STDOUT, it will actually change the controlling terminal of a process (even if it wasn't previously attached to a terminal).
The best use of this is to start up a screen session, and use it to reattach a running process to the terminal within screen so you can safely detach from it and come back later.
It's packaged on popular distros (Ex: 'apt-get install reptyr').
http://onethingwell.org/post/2924103615/reptyr
GDB method seems better, but you can do this with strace, too:
$ strace -p <PID> -e write=1 -s 1024 -o file
Via the man page for strace:
-e write=set
Perform a full hexadecimal and ASCII dump of all the
data written to file descriptors listed in the spec-
ified set. For example, to see all output activity
on file descriptors 3 and 5 use -e write=3,5. Note
that this is independent from the normal tracing of
the write(2) system call which is controlled by the
option -e trace=write.
This prints out somewhat more than you need (the hexadecimal part), but you can sed that out easily.
I'm not sure if it will work for you, but I read a page a while back describing a method that uses gdb
I used strace and de-coded the hex output to clear text:
PID=some_process_id
sudo strace -f -e trace=write -e verbose=none -e write=1,2 -q -p $PID -o "| grep '^ |' | cut -c11-60 | sed -e 's/ //g' | xxd -r -p"
I combined this command from other answers.
strace outputs a lot less with just -ewrite (and not the =1 suffix). And it's a bit simpler than the GDB method, IMO.
I used it to see the progress of an existing MythTV encoding job (sudo because I don't own the encoding process):
$ ps -aef | grep -i handbrake
mythtv 25089 25085 99 16:01 ? 00:53:43 /usr/bin/HandBrakeCLI -i /var/lib/mythtv/recordings/1061_20111230122900.mpg -o /var/lib/mythtv/recordings/1061_20111230122900.mp4 -e x264 -b 1500 -E faac -B 256 -R 48 -w 720
jward 25293 20229 0 16:30 pts/1 00:00:00 grep --color=auto -i handbr
$ sudo strace -ewrite -p 25089
Process 25089 attached - interrupt to quit
write(1, "\rEncoding: task 1 of 1, 70.75 % "..., 73) = 73
write(1, "\rEncoding: task 1 of 1, 70.76 % "..., 73) = 73
write(1, "\rEncoding: task 1 of 1, 70.77 % "..., 73) = 73
write(1, "\rEncoding: task 1 of 1, 70.78 % "..., 73) = 73^C
You can use reredirect (https://github.com/jerome-pouiller/reredirect/).
Type
reredirect -m FILE PID
and outputs (standard and error) will be written in FILE.
reredirect README also explains how to restore original state of process, how to redirect to another command or to redirect only stdout or stderr.
You don't state your operating system, but I'm going to take a stab and say "Linux".
Seeing what is being written to stderr and stdout is probably not going to help. If it is useful, you could use tee(1) before you start the script to take a copy of stderr and stdout.
You can use ps(1) to look for wchan. This tells you what the process is waiting for. If you look at the strace output, you can ignore the bulk of the output and identify the last (blocked) system call. If it is an operation on a file handle, you can go backwards in the output and identify the underlying object (file, socket, pipe, etc.) From there the answer is likely to be clear.
You can also send the process a signal that causes it to dump core, and then use the debugger and the core file to get a stack trace.

How can a process intercept stdout and stderr of another process on Linux?

I have some scripts that ought to have stopped running but hang around forever. Is there some way I can figure out what they're writing to STDOUT and STDERR in a readable way?
I tried, for example, to do:
$ tail -f /proc/(pid)/fd/1
but that doesn't really work. It was a long shot anyway.
Any other ideas?
strace on its own is quite verbose and unreadable for seeing this.
Note: I am only interested in their output, not in anything else. I'm capable of figuring out the other things on my own; this question is only focused on getting access to stdout and stderr of the running process after starting it.
Since I'm not allowed to edit Jauco's answer, I'll give the full answer that worked for me (Russell's page relies on un-guaranteed behaviour that, if you close file descriptor 1 for STDOUT, the next creat call will open FD 1.
So, run a simple endless script like this:
import time
while True:
print 'test'
time.sleep(1)
Save it to test.py, run with
$ python test.py
Get the PID:
$ ps auxw | grep test.py
Now, attach gdb:
$ gdb -p (pid)
and do the fd magic:
(gdb) call creat("/tmp/stdout", 0600)
$1 = 3
(gdb) call dup2(3, 1)
$2 = 1
Now you can tail /tmp/stdout and see the output that used to go to STDOUT.
There's several new utilities that wrap up the "gdb method" and add some extra touches. The one I use now is called "reptyr" ("Re-PTY-er"). In addition to grabbing STDERR/STDOUT, it will actually change the controlling terminal of a process (even if it wasn't previously attached to a terminal).
The best use of this is to start up a screen session, and use it to reattach a running process to the terminal within screen so you can safely detach from it and come back later.
It's packaged on popular distros (Ex: 'apt-get install reptyr').
http://onethingwell.org/post/2924103615/reptyr
GDB method seems better, but you can do this with strace, too:
$ strace -p <PID> -e write=1 -s 1024 -o file
Via the man page for strace:
-e write=set
Perform a full hexadecimal and ASCII dump of all the
data written to file descriptors listed in the spec-
ified set. For example, to see all output activity
on file descriptors 3 and 5 use -e write=3,5. Note
that this is independent from the normal tracing of
the write(2) system call which is controlled by the
option -e trace=write.
This prints out somewhat more than you need (the hexadecimal part), but you can sed that out easily.
I'm not sure if it will work for you, but I read a page a while back describing a method that uses gdb
I used strace and de-coded the hex output to clear text:
PID=some_process_id
sudo strace -f -e trace=write -e verbose=none -e write=1,2 -q -p $PID -o "| grep '^ |' | cut -c11-60 | sed -e 's/ //g' | xxd -r -p"
I combined this command from other answers.
strace outputs a lot less with just -ewrite (and not the =1 suffix). And it's a bit simpler than the GDB method, IMO.
I used it to see the progress of an existing MythTV encoding job (sudo because I don't own the encoding process):
$ ps -aef | grep -i handbrake
mythtv 25089 25085 99 16:01 ? 00:53:43 /usr/bin/HandBrakeCLI -i /var/lib/mythtv/recordings/1061_20111230122900.mpg -o /var/lib/mythtv/recordings/1061_20111230122900.mp4 -e x264 -b 1500 -E faac -B 256 -R 48 -w 720
jward 25293 20229 0 16:30 pts/1 00:00:00 grep --color=auto -i handbr
$ sudo strace -ewrite -p 25089
Process 25089 attached - interrupt to quit
write(1, "\rEncoding: task 1 of 1, 70.75 % "..., 73) = 73
write(1, "\rEncoding: task 1 of 1, 70.76 % "..., 73) = 73
write(1, "\rEncoding: task 1 of 1, 70.77 % "..., 73) = 73
write(1, "\rEncoding: task 1 of 1, 70.78 % "..., 73) = 73^C
You can use reredirect (https://github.com/jerome-pouiller/reredirect/).
Type
reredirect -m FILE PID
and outputs (standard and error) will be written in FILE.
reredirect README also explains how to restore original state of process, how to redirect to another command or to redirect only stdout or stderr.
You don't state your operating system, but I'm going to take a stab and say "Linux".
Seeing what is being written to stderr and stdout is probably not going to help. If it is useful, you could use tee(1) before you start the script to take a copy of stderr and stdout.
You can use ps(1) to look for wchan. This tells you what the process is waiting for. If you look at the strace output, you can ignore the bulk of the output and identify the last (blocked) system call. If it is an operation on a file handle, you can go backwards in the output and identify the underlying object (file, socket, pipe, etc.) From there the answer is likely to be clear.
You can also send the process a signal that causes it to dump core, and then use the debugger and the core file to get a stack trace.

Resources