running script through ssh fails when locally same script succeeds - linux

I'm experiencing a very strange behavior. I have found what appears to be a work around, but I am hoping that someone can explain to me WHY I'm seeing this crazy behavior.
Highlevel of what I'm doing: I'd like to have a shell script to stop my process. I'd like it to be robust enough to kill one or more instances of the process I'm grepping for. I don't want it to fail if there's NO process running (meaning I want a 0 return code...not an empty arg list passed to the kill command)
What I'm seeing is that a script is behaving differently when invoked by passing a command through ssh than if that same script was executed locally. What is very strange is that by adding a seemingly arbitrary command to my ssh command, I'm able to get my script to execute properly and I DONT KNOW WHY!
The stop scipt (echo statments were there to help me debug - not part of real script)
echo "Stopping myProcess"
echo "-->ps aux | grep myProcess | grep -v grep"
pid=ps -ef | grep myProcess | grep -v grep | awk '{ print $2 }'
echo "Here: ${pid}"
if [[ ! -z $pid ]]; then
echo "Here2"
kill -9 $pid
else
echo "Here3"
echo "not stopping anything - no myProcess process running."
fi
echo "Here4"
exit 0
Result of local execution of script when NO processes is running:
Stopping myProcess
-->
Here:
Here3
not stopping anything - no myProcess running.
Here4
Result of execution of script from a different machine though the following command:
Command:
ssh eak0703#myServer 'source ${HOME}/.bash_profile;cd /usr/local/myprocess/bin/;./stop-myProcess'
Result:
Stopping myProcess
--> eak0703 2099 0.0 0.0 10728 1500 ? Ss 17:08 0:00 bash -c source ${HOME}/.bash_profile;cd /usr/local/myProcess/bin/;./stop-myProcess
eak0703 2100 0.0 0.0 10740 992 ? S 17:08 0:00 bash -c source ${HOME}/.bash_profile;cd /usr/local/myProcess/bin/;./stop-myProcess
eak0703 2101 0.0 0.0 10740 668 ? S 17:08 0:00 bash -c source ${HOME}/.bash_profile;cd /usr/local/myProcess/bin/;./stop-myProcess
Here: 2099
2100
2105
Here2
Notice: for some strange and unexplained to me reason there appear to be 3 invocations of my command. I also know that this command doesn't terminate with an exit code of 0. I am assuming this is because by the time the kill -9 is invoked, the process ids picked up by the grep are gone.
Now - here's the SAME ssh command with an extra "date | grep crap" thrown in:
Command:
ssh eak0703#myServer 'source ${HOME}/.bash_profile;cd /usr/local/myprocess/bin/;date | grep crap;./stop-myProcess'
Result:
Stopping myProcess
-->
Here:
Here3
not stopping anything - no myProcess running.
Here4
Putting "date | grep crap" fixes things. It appears that the magic is in the "|" (pipe) operator. So I am actually able to make this work with "anycommand | anyothercommand".
I can make it work - but how can I justify randomly leaving such a nugget in a bash script??? No one will ever know why this is there. Not even me! If anyone has encountered this please help!

Parsing ps to find a process is fragile and error prone. Your example is a nice illustration why:
An unrelated process (the bash process started by ssh) contains the process name as part of the command line, and is accidentally picked up by your ps parser.
The unrelated process is removed by your grep -v grep when you make the command line include the word "grep".
Instead, just use pgrep or pkill. These tools list/kill processes based on the executable name and are therefore far more robust than parsing ps.

Related

Can i wait for a process termination that is not a child of current shell terminal?

I have a script that has to kill a certain number of times a resource managed by a high avialability middelware. It basically checks whether the resource is running and kills it afterwards, i need the timestamp of when the proc is really killed. So i have done this code:
#!/bin/bash
echo "$(date +"%T,%N") :New measures Run" > /home/hassan/logs/measures.log
for i in {1..50}
do
echo "Iteration: $i"
PID=`ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print$2'}`
if [ -n "$PID" ]; then
echo "$(date +"%T,%N") :Killing $PID" >> /home/hassan/logs/measures.log
ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print "kill -9 " $2'} | sh
wait $PID
else
PID=`ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print$2'}`
until [ -n "$PID" ]; do
sleep 2
PID=`ps -ef | grep "/home/hassan/Desktop/pcmAppBin pacemaker_app/MainController"|grep -v "grep" | awk {'print$2'}`
done
fi
done
But with my wait command i get the following error message: wait: pid xxxx is not a child of this shell
I assume that You started the child processes from bash and then start this script to wait for. The problem is that the child processes are not the children of the bash running the script, but the children of its parent!
If You want to launch a script inside the the current bash You should start with ..
An example. You start a vim and then You make is stop pressing ^Z (later you can use fg to get back to vim). Then You can get the list of jobs by using the˙jobs command.
$ jobs
[1]+ Stopped vim myfile
Then You can create a script called test.sh containing just one command, called jobs. Add execute right (e.g. chmod 700 test.sh), then start it:
$ cat test.sh
jobs
~/dev/fi [3:1]$ ./test.sh
~/dev/fi [3:1]$ . ./test.sh
[1]+ Stopped vim myfile
As the first version creates a new bash session no jobs are listed. But using . the script runs in the present bash script having exactly one chold process (namely vim). So launch the script above using the . so no child bash will be created.
Be aware that defining any variables or changing directory (and a lot more) will affect to your environment! E.g. PID will be visible by the calling bash!
Comments:
Do not use ...|grep ...|grep -v ... |awk --- pipe snakes! Use ...|awk... instead!
In most Linux-es you can use something like this ps -o pid= -C pcmAppBin to get just the pid, so the complete pipe can be avoided.
To call an external program from awk you could try system("mycmd"); built-in
I hope this helps a bit!

Bash script optimization for waiting for a particular string in log files

I am using a bash script that calls multiple processes which have to start up in a particular order, and certain actions have to be completed (they then print out certain messages to the logs) before the next one can be started. The bash script has the following code which works really well for most cases:
tail -Fn +1 "$log_file" | while read line; do
if echo "$line" | grep -qEi "$search_text"; then
echo "[INFO] $process_name process started up successfully"
pkill -9 -P $$ tail
return 0
elif echo "$line" | grep -qEi '^error\b'; then
echo "[INFO] ERROR or Exception is thrown listed below. $process_name process startup aborted"
echo " ($line) "
echo "[INFO] Please check $process_name process log file=$log_file for problems"
pkill -9 -P $$ tail
return 1
fi
done
However, when we set the processes to print logging in DEBUG mode, they print so much logging that this script cannot keep up, and it takes about 15 minutes after the process is complete for the bash script to catch up. Is there a way of optimizing this, like changing 'while read line' to 'while read 100 lines', or something like that?
How about not forking up to two grep processes per log line?
tail -Fn +1 "$log_file" | grep -Ei "$search_text|^error\b" | while read line; do
So one long running grep process shall do preprocessing if you will.
Edit: As noted in the comments, it is safer to add --line-buffered to the grep invocation.
Some tips relevant for this script:
Checking that the service is doing its job is a much better check for daemon startup than looking at the log output
You can use grep ... <<<"$line" to execute fewer echos.
You can use tail -f | grep -q ... to avoid the while loop by stopping as soon as there's a matching line.
If you can avoid -i on grep it might be significantly faster to process the input.
Thou shalt not kill -9.

How to get watch to run a bash script with quotes

I'm trying to have a lightweight memory profiler for the matlab jobs that are run on my machine. There is either one or zero matlab job instance, but its process id changes frequently (since it is actually called by another script).
So here is the bash script that I put together to log memory usage:
#!/bin/bash
pid=`ps aux | grep '[M]ATLAB' | awk '{print $2}'`
if [[ -n $pid ]]
then
\grep VmSize /proc/$pid/status
else
echo "no pid"
fi
when I run this script in bash like this:
./script.sh
it works fine, giving me the following result:
VmSize: 1289004 kB
which is exactly what I want.
Now, I want to run this periodically. So I run it with watch, like this:
watch ./script.sh
But in this case I only receive:
no pid
Please note that I know the matlab job is still running, because I can see it with the same pid on top, and besides, I know each matlab job take several hours to finish.
I'm pretty sure that something is wrong with the quotes I have when setting pid. I just can't figure out how to fix it. Anyone knows what I'm doing wrong?
PS.
In the man page of watch, it says that commands are executed by sh -c. I did run my script like sh -c ./script and it works just fine, but watch doesn't.
Why don't you use a loop with sleep command instead?
For example:
#!/bin/bash
pid=`ps aux | grep '[M]ATLAB' | awk '{print $2}'`
while [ "1" ]
do
if [[ -n $pid ]]
then
\grep VmSize /proc/$pid/status
else
echo "no pid"
fi
sleep 10
done
Here the script sleeps(waits) for 10 seconds. You can set the interval you need changing the sleep command. For example to make the script sleep for an hour use sleep 1h.
To exit the script press Ctrl - C
This
pid=`ps aux | grep '[M]ATLAB' | awk '{print $2}'`
could be changed to:
pid=$(pidof MATLAB)
I have no idea why it's not working in watch but you could use a cron job and make the script log to a file like so:
#!/bin/bash
pid=$(pidof MATLAB) # Just to follow previously given advice :)
if [[ -n $pid ]]
then
echo "$(date): $(\grep VmSize /proc/$pid/status)" >> logfile
else
echo "$(date): no pid" >> logfile
fi
You'd of course have to create logfile with touch.
You might try just running ps command in watch. I have had issues in the past with watch chopping lines and such when they get too long.
It can be fixed by making the terminal you are running the command from wider or changing the column like this (may need to adjust the 160 to your liking):
export COLUMNS=160;

Why are commands executed in backquotes giving me different results when done in as script?

I have a script that I mean to be run from cron that ensures that a daemon that I wrote is working. The contents of the script file are similar to the following:
daemon_pid=`ps -A | grep -c fsdaemon`
echo "daemon_pid: " $daemon_pid
if [ $daemon_pid -eq 0 ]; then
echo "restarting fsdaemon"
/etc/init.d/fsdaemon start
fi
When I execute this script from the command prompt, the line that echoes the value of $daemon_pid is reporting a value of 2. This value is two regardless of whether my daemon is running or not. If, however, I execute the command with back quotes and then examine the $daemon_pid variable, the value of $daemon_pid is now one. I have also tried single stepping through the script using bashdb and, when I examine the variables using that tool, they are what they should be.
My question therefore is: why is there a difference in the behaviour between when the script is executed by the shell versus when the commands in the script are executed manually? I'm sure that there is something very fundamental that I am missing.
You're very likely encountering the grep as part of the 'answer' from ps.
To help fully understand what is happening, turn off the -c option, to see what data is being returned from just ps -A | grep fsdameon.
To solve the issue, some systems have a p(rocess)grep (pgrep). That will work, OR
ps -A | grep -v grep | grep -c fsdaemon
Is a common idiom you will see, but at the expense of another process.
The cleanest solution is,
ps -A | grep -c '[f]sdaemon'
The regular expression syntax should work with all greps, on all systems.
I hope this helps.
The problem is that grep itself shows up... Try running this command with anything after grep -c:
eple:~ erik$ ps -a | grep -c asdfladsf
1
eple:~ erik$ ps -a | grep -c gooblygoolbygookeydookey
1
eple:~ erik$
What does ps -a | grep fsdaemon return? Just look at the processes actually listed... :)
Since this is Linux, why not try the pgrep? This saves you a pipe, and you don't end up with grep reporting back the daemon script itself running.
Aany process with arguments including that name will add to the count - grep, and your script.
psing for a process isn't really reliable, you should use a lock file.
As several people have pointed out already, your process count is inflated because ps | grep detects (1) the script itself and (2) the subprocess created by the backquotes, which inherits the name of the main script. So an easy solution is to change the name of the script to something that doesn't include the name you're looking for. But you can do better.
The "best-practice" solution that I would suggest is to use the facilities provided by your operating system. It's not uncommon for an init script to create a PID file as part of the process of starting your daemon; in other words, instead of just running the daemon itself, you use a wrapper script that starts the daemon and then writes the process ID to a file somewhere. If start-stop-daemon exists on your system (and I think it's fairly common these days), you can use that like so:
start-stop-daemon --start --quiet --background \
--make-pidfile --pidfile /var/run/fsdaemon.pid -- /usr/bin/fsdaemon
(obviously replace the path /usr/bin/fsdaemon as appropriate) to start it, and then
start-stop-daemon --stop --quiet --pidfile /var/run/fsdaemon.pid
to stop it. start-stop-daemon has other options that might be useful to you, which you can investigate by reading the man page.
If you don't have access to start-stop-daemon, you can write a wrapper script to do basically the same thing, something like this to start:
echo "$$" > /var/run/fsdaemon.pid
exec /usr/bin/fsdaemon
and this to stop:
kill $(< /var/run/fsdaemon/pid)
rm /var/run/fsdaemon.pid
(this is pretty crude, of course, but it should normally work).
Anyway, once you have the setup to generate a PID file, whether by using start-stop-daemon or not, you can update your check script to this:
daemon_pid=`ps --no-headers --pid $(< /var/run/fsdaemon.pid) | wc -l`
if [ $daemon_pid -eq 0 ]; then
echo "restarting fsdaemon"
/etc/init.d/fsdaemon restart
fi
(one would think there would be a concise command to check whether a given PID is running, but I don't know it).
If you don't want to (or can't) create a PID file, I would at least suggest pgrep instead of ps | grep, since pgrep will search directly for a process by name and won't find anything that just happens to include the same string.
daemon_pid=`pgrep -x -c fsdaemon`
if [ $daemon_pid -eq 0 ]; then
echo "restarting fsdaemon"
/etc/init.d/fsdaemon restart
fi
The -x means "match exactly", and -c works as with grep.
By the way, it seems a bit misleading to name your variable daemon_pid when it is actually a count.

Shell script to get the process ID on Linux [duplicate]

This question already has answers here:
How to get pid given the process name
(4 answers)
Closed 5 years ago.
I want to write a shell script (.sh file) to get a given process id. What I'm trying to do here is once I get the process ID, I want to kill that process. I'm running on Ubuntu (Linux).
I was able to do it with a command like
ps -aux|grep ruby
kill -9 <pid>
but I'm not sure how to do it through a shell script.
Using grep on the results of ps is a bad idea in a script, since some proportion of the time it will also match the grep process you've just invoked. The command pgrep avoids this problem, so if you need to know the process ID, that's a better option. (Note that, of course, there may be many processes matched.)
However, in your example, you could just use the similar command pkill to kill all matching processes:
pkill ruby
Incidentally, you should be aware that using -9 is overkill (ho ho) in almost every case - there's some useful advice about that in the text of the "Useless Use of kill -9 form letter ":
No no no. Don't use kill -9.
It doesn't give the process a chance to cleanly:
shut down socket connections
clean up temp files
inform its children that it is going away
reset its terminal characteristics
and so on and so on and so on.
Generally, send 15, and wait a second or two, and if that doesn't
work, send 2, and if that doesn't work, send 1. If that doesn't,
REMOVE THE BINARY because the program is badly behaved!
Don't use kill -9. Don't bring out the combine harvester just to tidy
up the flower pot.
If you are going to use ps and grep then you should do it this way:
ps aux|grep r[u]by
Those square brackets will cause grep to skip the line for the grep command itself. So to use this in a script do:
output=`ps aux|grep r\[u\]by`
set -- $output
pid=$2
kill $pid
sleep 2
kill -9 $pid >/dev/null 2>&1
The backticks allow you to capture the output of a comand in a shell variable. The set -- parses the ps output into words, and $2 is the second word on the line which happens to be the pid. Then you send a TERM signal, wait a couple of seconds for ruby to to shut itself down, then kill it mercilessly if it still exists, but throw away any output because most of the time kill -9 will complain that the process is already dead.
I know that I have used this without the backslashes before the square brackets but just now I checked it on Ubuntu 12 and it seems to require them. This probably has something to do with bash's many options and the default config on different Linux distros. Hopefully the [ and ] will work anywhere but I no longer have access to the servers where I know that it worked without backslash so I cannot be sure.
One comment suggests grep-v and that is what I used to do, but then when I learned of the [] variant, I decided it was better to spawn one fewer process in the pipeline.
As a start there is no need to do a ps -aux | grep... The command pidof is far better to use. And almost never ever do kill -9 see here
to get the output from a command in bash, use something like
pid=$(pidof ruby)
or use pkill directly.
option -v is very important. It can exclude a grep expression itself
e.g.
ps -w | grep sshd | grep -v grep | awk '{print $1}' to get sshd id
This works in Cygwin but it should be effective in Linux as well.
ps -W | awk '/ruby/,NF=1' | xargs kill -f
or
ps -W | awk '$0~z,NF=1' z=ruby | xargs kill -f
Bash Pitfalls
You can use the command killall:
$ killall ruby
Its pretty simple.
Simply Run Any Program like this :- x= gedit & echo $! this will give you PID of this process.
then do this kill -9 $x
To kill the process in shell
getprocess=`ps -ef|grep servername`
#echo $getprocess
set $getprocess
pid=$2
#echo $pid
kill -9 $pid
If you already know the process then this will be useful:
PID=`ps -eaf | grep <process> | grep -v grep | awk '{print $2}'`
if [[ "" != "$PID" ]]; then
echo "killing $PID"
kill -9 $PID
fi

Resources