Why does "pgrep -f bash" emit two numbers instead of one? - linux

When I run this script in shell:
printf "Current bash PID is `pgrep -f bash`\n"
using this command:
$ bash script.sh
I get back this output:
Current bash PID is 5430
24390
Every time I run it, I get a different number:
Current bash PID is 5430
24415
Where is the second line coming from?

When you use backticks (or the more modern $(...) syntax for command substitution), you create a subshell. That's a fork()ed-off, independent copy of the shell process which has its own PID, so pgrep finds two separate copies of the shell. (Moreover, pgrep can be finding copies of bash running on the system completely unrelated to the script at hand).
If you want to find the PID of the current copy of bash, you can just look it up directly (printf is better practice than echo when contents can contain backslashes or if the behavior of echo -n or the nonstandard bash extension echo -e is needed, but neither of those things is the case here, so echo is fine):
echo "Current bash PID is $$"
Note that even when executed in a subshell, $$ expands to the PID of the parent shell. With bash 4.0 or newer, you can use $BASHPID to look up the current PID even in a subshell.
See the related question Bash - Two processes for one script

Related

How to pgrep over ssh, or use pgrep as a larger bash command?

I'd like to run pgrep to find the ID of some process. It works great, except when run as a larger bash command as pgrep will also match it's parent shell/bash process which includes the match expression as part of it command-line.
pgrep sensibly excludes it own PID from the results, but less sensibly, doesn't seem to have an option to exclude its parent process(es).
Anyone come across this and have a good workaround.
Update.
pgrep -lf java || true
works fine, but
bash -c "(pgrep -lf java || true)"
echo 'bash -c "(pgrep -lf java || true)"' | ssh <host>
also identify the parent bash process.
I'm using pgrep as part of a much larger system, which it why the extra madness.
I ran into this issue using python's os.system(command) which executes the command in a subshell.
pgrep does not match itself, but it does match it's parent shell which includes pgrep's arguments.
I found a solution:
pgrep -f the-arguments-here[^\[]
The [^\[] regex assures that it does not match the [ (the beginning of the regex itself) and thus excludes the parent shell.
Example:
$ sh -c "pgrep -af the-arguments-here"
12345 actual-process with the-arguments-here
23456 sh -c pgrep -af the-arguments-here
vs:
$ sh -c "pgrep -af the-arguments-here[^\[]"
12345 actual-process with the-arguments-here
I'm still not seeing why you need the bash -c see part. You should be able to do
ssh <host> pgrep -lf java || true
which would actually run true on the local machine, but you could do
sssh <host> "pgrep -lf java || true"
if you needed true to be on the remote side. Again, assuming your shell accepts that syntax (i.e., is bash)
You're already running everything on the other side of the ssh in a bash shell, so I don't think you need to explicitly invoke bash again--unless your default shell is something else then you may want to consider either changing that or scripting in the appropriate default shell.

Linux: start a script after another has finished

I read the answer for this issue from this link
in Stackoverflow.com. But I am so new in writing shell script that I did something wrong. The following are my scripts:
testscript:
#!/bin/csh -f
pid=$(ps -opid= -C csh testscript1)
while [ -d /proc/$pid ] ; do
sleep 1
done && csh testscript2
exit
testscript1:
#!/bin/csh -f
/usr/bin/firefox
exit
testscript2:
#!/bin/csh -f
echo Done
exit
The purpose is for testscript to call testscript1 first; once testscript1 already finish (which means the firefox called in script1 is closed) testscript will call testscript2. However I got this result after running testscript:
$ csh testscript
Illegal variable name.
Please help me with this issue. Thanks ahead.
I believe this line is not CSH:
pid=$(ps -opid= -C csh testscript1)
In general in csh you define variables like this:
set pid=...
I am not sure what the $() syntax is, perhaps back ticks woudl work as a replacement:
set pid=`ps -opid= -C csh testscript1`
Perhaps you didn't notice that the scripts you found were written for bash, not csh, but
you're trying to process them with the csh interpreter.
It looks like you've misunderstood what the original code was trying to do -- it was
intended to monitor an already-existing process, by looking up its process id using the process name.
You seem to be trying to start the first process from inside the ps command. But
in that case, there's no need for you to do anything so complicated -- all you need
is:
#!/bin/csh
csh testscript1
csh testscript2
Unless you go out of your way to run one of the scripts in the background,
the second script will not run until the first script is finished.
Although this has nothing to do with your problem, csh is more oriented toward
interactive use; for script writing, it's considered a poor choice, so you might be
better off learning bash instead.
Try,
below script will check testscript1's pid, if it is not found then it will execute testscirpt2
sp=$(ps -ef | grep testscript1 | grep -v grep | awk '{print $2}')
/bin/ls -l /proc/ | grep $sp > /dev/null 2>&1 && sleep 0 || /bin/csh testscript2

How do I know if a bash script is running with nohup?

I have a script to process records in some files, it usually takes 1-2 hours. When it's running, it prints a progress of number of records processed.
Now, what I want to do is: when it's running with nohup, I don't want it to print the progress; it should print progress only when it run manually.
My question is how do I know if a bash script is running with nohup?
Suppose the command is nohup myscript.sh &. In the script, how do I get the nohup from command line? I tried to use $0, but it gives myscript.sh.
Checking for file redirections is not robust, since nohup can be (and often is) used in scripts where stdin, stdout and/or stderr are already explicitly redirected.
Aside from these redirections, the only thing nohup does is ignore the SIGHUP signal (thanks to Blrfl for the link.)
So, really what we're asking for is a way to detect if SIGHUP is being ignored. In linux, the signal ignore mask is exposed in /proc/$PID/status, in the least-significant bit of the SigIgn hex string.
Provided we know the pid of the bash script we want to check, we can use egrep. Here I see if the current shell is ignoring SIGHUP (i.e. is "nohuppy"):
$ egrep -q "SigIgn:\s.{15}[13579bdf]" /proc/$$/status && echo nohuppy || echo normal
normal
$ nohup bash -c 'egrep -q "SigIgn:\s.{15}[13579bdf]" /proc/$$/status && echo nohuppy || echo normal'; cat nohup.out
nohup: ignoring input and appending output to `nohup.out'
nohuppy
You could check if STDOUT is associated with a terminal:
[ -t 1 ]
You can either check if the parent pid is 1:
if [ $PPID -eq 1 ] ; then
echo "Parent pid=1 (runing via nohup)"
else
echo "Parent pid<>1 (NOT running via nohup)"
fi
or if your script ignores the SIGHUP signal (see https://stackoverflow.com/a/35638712/1011025):
if egrep -q "SigIgn:\s.{15}[13579bdf]" /proc/$$/status ; then
echo "Ignores SIGHUP (runing via nohup)"
else
echo "Doesn't ignore SIGHUP (NOT running via nohup)"
fi
One way, but not really portable would be to do a readlink on /proc/$$/fd/1 and test if it ends with nohup.out.
Assuming you are on the pts0 terminal (not really relevant, just to be able to show the result):
#!/bin/bash
if [[ $(readlink /proc/$$/fd/1) =~ nohup.out$ ]]; then
echo "Running under hup" >> /dev/pts/0
fi
But the traditional approach to such problems is to test if the output is a terminal:
[ -t 1 ]
Thank you guys. Check STDOUT is a good idea. I just find another way to do it. That is to test tty.
test tty -s check its return code. If it's 0 , then it's running on a terminal; if it's 1 then it's running with nohup.

Why are commands executed in backquotes giving me different results when done in as script?

I have a script that I mean to be run from cron that ensures that a daemon that I wrote is working. The contents of the script file are similar to the following:
daemon_pid=`ps -A | grep -c fsdaemon`
echo "daemon_pid: " $daemon_pid
if [ $daemon_pid -eq 0 ]; then
echo "restarting fsdaemon"
/etc/init.d/fsdaemon start
fi
When I execute this script from the command prompt, the line that echoes the value of $daemon_pid is reporting a value of 2. This value is two regardless of whether my daemon is running or not. If, however, I execute the command with back quotes and then examine the $daemon_pid variable, the value of $daemon_pid is now one. I have also tried single stepping through the script using bashdb and, when I examine the variables using that tool, they are what they should be.
My question therefore is: why is there a difference in the behaviour between when the script is executed by the shell versus when the commands in the script are executed manually? I'm sure that there is something very fundamental that I am missing.
You're very likely encountering the grep as part of the 'answer' from ps.
To help fully understand what is happening, turn off the -c option, to see what data is being returned from just ps -A | grep fsdameon.
To solve the issue, some systems have a p(rocess)grep (pgrep). That will work, OR
ps -A | grep -v grep | grep -c fsdaemon
Is a common idiom you will see, but at the expense of another process.
The cleanest solution is,
ps -A | grep -c '[f]sdaemon'
The regular expression syntax should work with all greps, on all systems.
I hope this helps.
The problem is that grep itself shows up... Try running this command with anything after grep -c:
eple:~ erik$ ps -a | grep -c asdfladsf
1
eple:~ erik$ ps -a | grep -c gooblygoolbygookeydookey
1
eple:~ erik$
What does ps -a | grep fsdaemon return? Just look at the processes actually listed... :)
Since this is Linux, why not try the pgrep? This saves you a pipe, and you don't end up with grep reporting back the daemon script itself running.
Aany process with arguments including that name will add to the count - grep, and your script.
psing for a process isn't really reliable, you should use a lock file.
As several people have pointed out already, your process count is inflated because ps | grep detects (1) the script itself and (2) the subprocess created by the backquotes, which inherits the name of the main script. So an easy solution is to change the name of the script to something that doesn't include the name you're looking for. But you can do better.
The "best-practice" solution that I would suggest is to use the facilities provided by your operating system. It's not uncommon for an init script to create a PID file as part of the process of starting your daemon; in other words, instead of just running the daemon itself, you use a wrapper script that starts the daemon and then writes the process ID to a file somewhere. If start-stop-daemon exists on your system (and I think it's fairly common these days), you can use that like so:
start-stop-daemon --start --quiet --background \
--make-pidfile --pidfile /var/run/fsdaemon.pid -- /usr/bin/fsdaemon
(obviously replace the path /usr/bin/fsdaemon as appropriate) to start it, and then
start-stop-daemon --stop --quiet --pidfile /var/run/fsdaemon.pid
to stop it. start-stop-daemon has other options that might be useful to you, which you can investigate by reading the man page.
If you don't have access to start-stop-daemon, you can write a wrapper script to do basically the same thing, something like this to start:
echo "$$" > /var/run/fsdaemon.pid
exec /usr/bin/fsdaemon
and this to stop:
kill $(< /var/run/fsdaemon/pid)
rm /var/run/fsdaemon.pid
(this is pretty crude, of course, but it should normally work).
Anyway, once you have the setup to generate a PID file, whether by using start-stop-daemon or not, you can update your check script to this:
daemon_pid=`ps --no-headers --pid $(< /var/run/fsdaemon.pid) | wc -l`
if [ $daemon_pid -eq 0 ]; then
echo "restarting fsdaemon"
/etc/init.d/fsdaemon restart
fi
(one would think there would be a concise command to check whether a given PID is running, but I don't know it).
If you don't want to (or can't) create a PID file, I would at least suggest pgrep instead of ps | grep, since pgrep will search directly for a process by name and won't find anything that just happens to include the same string.
daemon_pid=`pgrep -x -c fsdaemon`
if [ $daemon_pid -eq 0 ]; then
echo "restarting fsdaemon"
/etc/init.d/fsdaemon restart
fi
The -x means "match exactly", and -c works as with grep.
By the way, it seems a bit misleading to name your variable daemon_pid when it is actually a count.

Will () construct always start a subshell?

Current shell is
$ echo $$
23173
Note the parent of ps is current shell
$ ( ps -o pid,ppid,cmd )
PID PPID CMD
8952 23173 ps -o pid,ppid,cmd
23173 23169 bash
But here , the parent of ps is the subshell (bash)
$ ( echo hello ; ps -o pid,ppid,cmd )
hello
PID PPID CMD
8953 23173 bash
8954 8953 ps -o pid,ppid,cmd
23173 23169 bash
Is bash doing optimizations ? How come an extra echo made the the difference and spawned a subshell in 3rd case ?
Yes, what you're seeing is an optimization. Technically, the (…) construct always starts a subshell, by definition. Most of the time, the subshell runs in a separate subprocess. This ensures that everything done in the subshell stays in the subshell. If bash can guarantee this isolation property, it's free to use any implementation technique it likes.
In the fragment ( ps -o pid,ppid,cmd ), it's obvious that nothing can influence the parent shell, so there's an optimization in bash that makes it not fork a separate process for the subshell. The fragment ( echo hello ; ps -o pid,ppid,cmd ) is too complex for the optimizer to recognize that no subshell is needed.
If you experiment with ksh, you'll notice that its optimizer is more aggressive. For example, it doesn't fork a subprocess for ( echo hello ; ps -o pid,ppid,cmd ) either.
A subshell consisting of a single simple command instead of a list or pipeline of more than one command could be implemented by simply "execing" the command, i.e. replacing the subshell with the process for the command called. If the subshell is more complex then a simple exec is not possible, the subshell must stay around to manage the command sequence.
From your diagnostics it's impossible to tell the difference between a bash optimization where a subshell consisting of a simple command is optimized to a "direct" fork and exec of the called command or a fork of a subshell followed by an exec of the command called. This isn't surprising as the difference is (almost?) entirely academic.

Resources