My problem is specific to the running of SPECCPU2006(a benchmark suite).
After I installed the benchmark, I can invoke a command called "specinvoke" in terminal to run a specific benchmark. I have another script, where part of the codes are like following:
cd (specific benchmark directory)
specinvoke &
pid=$!
My goal is to get the PID of the running task. However, by doing what is shown above, what I got is the PID for the "specinvoke" shell command and the real running task will have another PID.
However, by running specinvoke -n ,the real code running in the specinvoke shell will be output to the stdout. For example, for one benchmark,it's like this:
# specinvoke r6392
# Invoked as: specinvoke -n
# timer ticks over every 1000 ns
# Use another -n on the command line to see chdir commands and env dump
# Starting run for copy #0
../run_base_ref_gcc43-64bit.0000/milc_base.gcc43-64bit < su3imp.in > su3imp.out 2>> su3imp.err
Inside it it's running a binary.The code will be different from benchmark to benchmark(by invoking under different benchmark directory). And because "specinvoke" is installed and not just a script, I can not use "source specinvoke".
So is there any clue? Is there any way to directly invoke the shell command in the same shell(have same PID) or maybe I should dump the specinvoke -n and run the dumped materials?
You can still do something like:
cd (specific benchmark directory)
specinvoke &
pid=$(pgrep milc_base.gcc43-64bit)
If there are several invocation of the milc_base.gcc43-64bit binary, you can still use
pid=$(pgrep -n milc_base.gcc43-64bit)
Which according to the man page:
-n
Select only the newest (most recently started) of the matching
processes
when the process is a direct child of the subshell:
ps -o pid= -C=milc_base.gcc43-64bit --ppid $!
when not a direct child, you could get the info from pstree:
pstree -p $! | grep -o 'milc_base.gcc43-64bit(.*)'
output from above (PID is in brackets): milc_base.gcc43-64bit(9837)
Related
I was trying to get pid of process I ran with setsid and which ought to run in background like this:
test.sh:
#/bin/bash
setsid nohup ./my_program &
echo $!
if I run ./test.sh it will print a pid of my_program process and it's exactly what I need. But if run this commands one by one in my shell like this:
$ setsid nohup ./my_program &
$ echo $!
It will give me a pid of setsid command (or may be something else, but it almost all times gives me pid of my_program minus one).
What is happening here? Why results of commands I ran in terminal by myself differs from results of test.sh script?
Btw, may be you know some easy way of process which I started with setsid and which I need to run in background?
Repost of comments above as an answer:
This is because setsid only forks the current process if it is the process group leader. A detailed explanation can be found here.
To get the pid of a process executed via setsid, the approaches given here may be tried.
setsid will call fork to ensure that it creates a new process group aswell as a new session, hence the resulting pid will not match the pid of setsid. The most clean work-around would be that my_program stores its pid into a file.
When you later want to send kill to my_program, you should check that the pid actually matches a program named my_program via /proc file system or calling the ps command with some magic code around it. (This is a very common method used by many daemons)
I want to monitor all the running processes using strace and when a process ends the output of the strace should be sent to a file.
And how to find every running proc PID. I also want to include process name in the output file.
$ sudo strace -p 1725 -o firefox_trace.txt
$ tail -f firefox_trace.txt
1725 would be the PID of the proccess you want to monitor (you can find the PID with "ps -C firefox-bin", for firefox in the example)
And firefox_trace.txt would be the output file !
The way to got would be to find every running proc PID, and use the command to write them in the output file !
Considering the doc,
-p pid
Attach to the process with the process ID pid and begin tracing. The
trace may be terminated at any time by a keyboard interrupt signal (
CTRL -C). strace will respond by detaching itself from the traced
process(es) leaving it (them) to continue running. Multiple -p options
can be used to attach to up to 32 processes in addition to command
(which is optional if at least one -p option is given).
Use -o to store the output to the file, or 2>&1 to redirect standard error to output, so you can filter it (grep) or redirect it into file (> file).
To monitor process without knowing its PID, but name, you can use pgrep command, e.g.
strace -p $(pgrep command) -o file.out
where command is your name of process (e.g. php, Chrome, etc.).
To learn more about parameters, check man strace.
I am trying to write a script to provide diagnostics on processes. I have submitted a script to a job scheduling server using qsub. I can easily find the node that the job gets sent to. But I would like to be able to find what process is currently being run.
ie. I have a list of different commands in the submitted script, how can I find the current one that is running, and the arguments passed to it?
example of commands in script
matlab -nodesktop -nosplash -r "display('here'),quit"
python runsomethings.py
I would like to see whether the nodes is currently executing the first or second line.
When you submit a job, pbs_server pass your task to pbs_mom. pbs_mom process/daemon actually executes your script on the execution node. It
"creates a new session as identical user."
This means invoking a shell. You specialize the shell at the top of the script marking your choice with shebang: #!/bin/bash).
It's clear, that pbs_mom stores process (shell) PID somewhere to kill the job and to monitor if the job (shell process) have finished.
UPD. based on #Dmitri Chubarov comment: pbs_mom stores subshell PID internally in memory after calling fork(), and in the .TK file which is under torque installation directory: /var/spool/torque/mom_priv/jobs on my system.
Dumping file internals in decimal mode (<job_number>, <queue_name> should be your own values):
$ hexdump -d /var/spool/torque/mom_priv/jobs/<job_number>.<queue_name>.TK
have disclosed, that in my torque implementation it is stored in position
00000890 + offset 4*2 = 00000898 (it is hex value of first byte of PID in .TK file) and has a length of 2 bytes.
For example, for shell PID=27110 I have:
0000890 00001 00000 00001 00000 27110 00000 00000 00000
Let's recover PID from .TK file:
$ hexdump -s 2200 -n 2 -d /var/spool/torque/mom_priv/jobs/<job_number>.<queue_name>.TK | tr -s ' ' | cut -s -d' ' -f 2
27110
This way you've found subshell PID.
Now, monitor process list on the execution node and find name of child processes (getcpid function is a slighlty modified version of that posted earlier on SO):
function getcpid() {
cpids=`pgrep -P $1|xargs`
for cpid in $cpids;
do
ps -p "$cpid" -o comm=
getcpid $cpid
done
}
At last,
getcpid <your_PID>
gives you the child processes' names (note, there will be some garbage lines, like task numbers). This way you will finally know, what command is currently running on the execution node.
Of course, for each task monitored, you should obtain the PID and process name on the execution node after doing
ssh <your node>
You can automatically retrieve node name(s) in <node/proc+node/proc+...> format (process it further to obtain bare node names):
qstat -n <job number> | awk '{print $NF}' | grep <pattern_for_your_node_names>
Note:
The PID method is reliable and, as I believe, optimal.
Search by name is worse, it provides you unambiguous result only if your invoke different commands in your scripts, and no user executes the same software on the node.
ssh <your node>
ps aux | grep matlab
You will know if matlab runs.
Simple and elegant way to do it is to print to a log file
`
ARGS=" $A $B $test "
echo "running MATLAB now with args: $ARGS" >> $LOGFILE
matlab -nodesktop -nosplash -r "display('here'),quit"
PYARGS="$X $Y"
echo "running Python now with args: $ARGS" >> $LOGFILE
python runsomethings.py
`
And monitor the output of $LOGFILE using tail -f $LOGFILE
Using jobs I know the process is running.
bash-4.2$ jobs
[1]+ Running test.sh &
I wanted it to be set NOHUP so that it won't be killed when I exit. I used
disown
and
bash-4.2$ jobs
shows nothing. I'm not sure if the process is set NOHUP or not. I'm curious about this because after I read the manual it says
disown -h
should be used to set NOHUP.
Edit
I don't think the link Find the Process run by nohup command helps. The question is different than that one.
I'm gonna restate my problem. I run a program without nohup, and later I wanted it to be set NOHUP so that it won't be killed when I exit the system. So I used disown, but later I found the manual says I should have used disown -h to set NOHUP. I want to check if my process is set NOHUP or not successfully. If not, what can I do to set it to be NOHUP?
UPDATE
I know two ways my be helpful:
1) Whenever a process is running over nohup It writes output on ~/nohup.out . So you can check this file by running command find -cmin 2. It shows you if nohup.out is changing each 2 seconds or not.
If it is changing you would understand that sth is running by nohup command, after that you can check it with lsof and continue your checking...
2) If you logout from specific user andgo to tty then do ps aux | grep <user> or ps aux | grep ? then you can understand that is running with nohup command... because there is no pts then it shows you ? instead...
useful command:
ps aux | grep <program> | awk -F" " '{print $7}'
Hope to be helpful
I have a basic script that outputs various status messages. e.g.
~$ ./myscript.sh
0 of 100
1 of 100
2 of 100
...
I wanted to wrap this in a parent script, in order to run a sequence of child-scripts and send an email upon overall completion, e.g. topscript.sh
#!/bin/bash
START=$(date +%s)
/usr/local/bin/myscript.sh
/usr/local/bin/otherscript.sh
/usr/local/bin/anotherscript.sh
RET=$?
END=$(date +%s)
echo -e "Subject:Task Complete\nBegan on $START and finished at $END and exited with status $RET.\n" | sendmail -v group#mydomain.com
I'm running this like:
~$ topscript.sh >/var/log/topscript.log 2>&1
However, when I run tail -f /var/log/topscript.log to inspect the log I see nothing, even though running top shows myscript.sh is currently being executed, and therefore, presumably outputting status messages.
Why isn't the stdout/stderr from the child scripts being captured in the parent's log? How do I fix this?
EDIT: I'm also running these on a remote machine, connected via ssh using pseudo-tty allocation, e.g. ssh -t user#host. Could the pseudo-tty be interfering?
I just tried your the following: I have three files t1.sh, t2.sh, and t3.sh all with the following content:
#!/bin/bash
for((i=0;i<10;i++)) ; do
echo $i of 9
sleep 1
done
And a script called myscript.sh with the following content:
#!/bin/bash
./t1.sh
./t2.sh
./t3.sh
echo "All Done"
When I run ./myscript.sh > topscript.log 2>&1 and then in another terminal run tail -f topscript.log I see the lines being output just fine in the log file.
Perhaps the things being run in your subscripts use a large output buffer? I know when I've run python scripts before, it has a pretty big output buffer so you don't see any output for a while. Do you actually see the entire output in the email that gets sent out at the end of topscript.sh? Is it just that while the processes run you're not seeing the output?
try
unbuffer topscript.sh >/var/log/topscript.log 2>&1
Note that unbuffer is not always available as a std binary in old-style Unix platforms and may require a search and installation for a package to support it.
I hope this helps.