pkill returns 255 in combination with another command via remote ssh - linux

When I try to execute pkill on a remote host in combination with another command, it always returns 255, even though both the commands were successful.
Examples
ssh <remoteHost> 'pkill -f xyz' # returns 0 (rightly so when xyz is a process)
ssh <remoteHost> 'source /etc/profile' # returns 0 (rightly so)
But when I run the combination command:
ssh <remoteHost> 'source /etc/profile; pkill -f xyz' # returns 255 - why?
There's something about "pkill" in combination with another command because the following returns zero even though it's a combination:
ssh <remoteHost> 'source /etc/profile; ls' # returns 0
Assume that xyz is running at all times when we try to kill it.
I do not understand this behavior. Why does it return 255 in case 3?

The documentation for the pkill -f option says:
-f
The pattern is normally only matched against the process name. When -f is set, the full command line is used.
So pkill -f xyz will kill any process with "xyz" anywhere on its command line.
When you run ssh <remoteHost> 'source /etc/profile; pkill -f xyz', the remote ssh server will run the equivalent of this on your behalf:
$SHELL -c 'source /etc/profile; pkill -f xyz'
The resulting shell instance is a process with "xyz" in its command line. My guess is that pkill is killing it, and ssh is reporting the killed session as exit code 255, like this:
$ ssh localhost 'kill $$'
$ echo $?
255
It doesn't happen when you just run ssh <remoteHost> 'pkill -f xyz', because some shells like bash will optimize for this case. Instead of running pkill as a subprocess, the shell instance will replace itself with the pkill process. So by the time pkill runs, the shell process with "xyz" on its command line is gone.
You can probably work around this by running pkill like this:
ssh <remoteHost> 'source /etc/profile; exec pkill -f xyz'
If that doesn't work, you can specify the pkill pattern in such a way that it doesn't match the pattern itself. For example:
ssh <remoteHost> 'source /etc/profile; exec pkill -f "[x]yz"'
The pattern [x]yz matches the text "xyz", so pkill will kill processes where the text "xyz" appears. But the pattern doesn't match itself, so pkill won't kill processes where the pattern appears.

Related

Bash script iterate over PID's and kill items

I try to kill all occurrences of a process, what's happen actually an iteration stops after first item, what's wrong here ?
#!/usr/bin/env bash
SUPERVISORCLS=($(pidof supervisorctl))
for i in "${SUPERVISORCLS[#]}"
do
echo $i
exec sudo kill -9 ${i}
done
Before I tried sth like this as solution for restart script, but as well script was not always executed at total always only one if block was executed.?
ERROR0=$(sudo supervisord -c /etc/supervisor/supervisord.conf 2>&1)
if [ "$ERROR0" ];then
exec sudo pkill supervisord
exec sudo supervisord -c /etc/supervisor/supervisord.conf
echo restarted supervisord
fi
ERROR1=$(sudo supervisord -c /etc/supervisor/supervisord.conf 2>&1)
if [ "$ERROR1" ];then
exec sudo pkill -9 supervisorctl
exec sudo supervisorctl -c /etc/supervisor/supervisord.conf
echo restarted supervisorctl
fi
exec replaces your process with the executable that's the argument to it, so you will never execute another statement in your script after it hits an exec. Your process will no longer exist. In the first example your process will no longer be your script it will be kill and pkill in the second.
To fix it, just remove exec from all those lines. It's not needed. When executing a script the shell will execute the commands on every line already, you don't have to tell it to do so.

How to pgrep over ssh, or use pgrep as a larger bash command?

I'd like to run pgrep to find the ID of some process. It works great, except when run as a larger bash command as pgrep will also match it's parent shell/bash process which includes the match expression as part of it command-line.
pgrep sensibly excludes it own PID from the results, but less sensibly, doesn't seem to have an option to exclude its parent process(es).
Anyone come across this and have a good workaround.
Update.
pgrep -lf java || true
works fine, but
bash -c "(pgrep -lf java || true)"
echo 'bash -c "(pgrep -lf java || true)"' | ssh <host>
also identify the parent bash process.
I'm using pgrep as part of a much larger system, which it why the extra madness.
I ran into this issue using python's os.system(command) which executes the command in a subshell.
pgrep does not match itself, but it does match it's parent shell which includes pgrep's arguments.
I found a solution:
pgrep -f the-arguments-here[^\[]
The [^\[] regex assures that it does not match the [ (the beginning of the regex itself) and thus excludes the parent shell.
Example:
$ sh -c "pgrep -af the-arguments-here"
12345 actual-process with the-arguments-here
23456 sh -c pgrep -af the-arguments-here
vs:
$ sh -c "pgrep -af the-arguments-here[^\[]"
12345 actual-process with the-arguments-here
I'm still not seeing why you need the bash -c see part. You should be able to do
ssh <host> pgrep -lf java || true
which would actually run true on the local machine, but you could do
sssh <host> "pgrep -lf java || true"
if you needed true to be on the remote side. Again, assuming your shell accepts that syntax (i.e., is bash)
You're already running everything on the other side of the ssh in a bash shell, so I don't think you need to explicitly invoke bash again--unless your default shell is something else then you may want to consider either changing that or scripting in the appropriate default shell.

Bash: Using SSH to start a long-running remote command and collect its PID

When I do the following, then I have to press CTRL-c afterwards or the shell acts weird. Left/right arrows keys e.g. doesn't move correctly and the text is messed up.
# read -r pid < <(ssh 10.10.10.46 'sleep 50 & echo $!') ; echo $pid
2135
# Killed by signal 2.
^C
#
I need this for a script, so I'd like to know why CTRL-c is needed and is it possible to work around it?
Update
It looks like it opens an extra Bash shell, and that is the one that needs to be exited.
The command I am actually interesting in is
read -r pid < <(ssh 10.10.10.46 "mbuffer -4 -v 0 -q -I 8023 > /tmp/mtest & echo $!"); echo $pid
Try this instead:
read -r pid \
< <(ssh 10.10.10.46 'nohup mbuffer >/tmp/mtest </dev/null 2>/tmp/mtest.err & echo $!')
Three important changes:
Use of nohup (you could also get a similar effect with the bash built-in disown)
Redirection of stdin and stderr to files (preventing them from holding handles that connect, eventually, to your terminal).
Use of single quotes for the remote command (with double-quotes, expansions happen before ssh is started, so the $! you get is the PID of the most recently started local background process).

How to run a pkill when invoking a shell to execute a string of commands?

To automate a system administration task, I wrote down the following line of shell code:
bash -c 'pkill -TERM -f java; true'
The problem is that pkill kills the bash immediately after the pkill command executes, and therefore subsequent commands do not have a chance to execute.
Apart from splitting the them into two lines:
bash -c 'pkill -TERM -f java'
bash -c 'true'
Is there any other workaround?
If you want to kill all java processes, simply drop the -f:
bash -c 'pkill -TERM java; true'
If you really also want to kill non-java processes like mplayer "jungle_gremlins_of_java.avi", the typical "solution" is to rewrite the command so that the pattern doesn't match itself:
bash -c 'pkill -TERM -f "[j]ava"; true'

How do I know if a bash script is running with nohup?

I have a script to process records in some files, it usually takes 1-2 hours. When it's running, it prints a progress of number of records processed.
Now, what I want to do is: when it's running with nohup, I don't want it to print the progress; it should print progress only when it run manually.
My question is how do I know if a bash script is running with nohup?
Suppose the command is nohup myscript.sh &. In the script, how do I get the nohup from command line? I tried to use $0, but it gives myscript.sh.
Checking for file redirections is not robust, since nohup can be (and often is) used in scripts where stdin, stdout and/or stderr are already explicitly redirected.
Aside from these redirections, the only thing nohup does is ignore the SIGHUP signal (thanks to Blrfl for the link.)
So, really what we're asking for is a way to detect if SIGHUP is being ignored. In linux, the signal ignore mask is exposed in /proc/$PID/status, in the least-significant bit of the SigIgn hex string.
Provided we know the pid of the bash script we want to check, we can use egrep. Here I see if the current shell is ignoring SIGHUP (i.e. is "nohuppy"):
$ egrep -q "SigIgn:\s.{15}[13579bdf]" /proc/$$/status && echo nohuppy || echo normal
normal
$ nohup bash -c 'egrep -q "SigIgn:\s.{15}[13579bdf]" /proc/$$/status && echo nohuppy || echo normal'; cat nohup.out
nohup: ignoring input and appending output to `nohup.out'
nohuppy
You could check if STDOUT is associated with a terminal:
[ -t 1 ]
You can either check if the parent pid is 1:
if [ $PPID -eq 1 ] ; then
echo "Parent pid=1 (runing via nohup)"
else
echo "Parent pid<>1 (NOT running via nohup)"
fi
or if your script ignores the SIGHUP signal (see https://stackoverflow.com/a/35638712/1011025):
if egrep -q "SigIgn:\s.{15}[13579bdf]" /proc/$$/status ; then
echo "Ignores SIGHUP (runing via nohup)"
else
echo "Doesn't ignore SIGHUP (NOT running via nohup)"
fi
One way, but not really portable would be to do a readlink on /proc/$$/fd/1 and test if it ends with nohup.out.
Assuming you are on the pts0 terminal (not really relevant, just to be able to show the result):
#!/bin/bash
if [[ $(readlink /proc/$$/fd/1) =~ nohup.out$ ]]; then
echo "Running under hup" >> /dev/pts/0
fi
But the traditional approach to such problems is to test if the output is a terminal:
[ -t 1 ]
Thank you guys. Check STDOUT is a good idea. I just find another way to do it. That is to test tty.
test tty -s check its return code. If it's 0 , then it's running on a terminal; if it's 1 then it's running with nohup.

Resources