What does ps actually return? (Different value depending on how it is called) - linux

I have a script containing this snippet:
#!/bin/bash
set +e
if [ -O "myprog.pid" ]; then
PID=`/bin/cat myprog.pid`
if /bin/ps -p ${PID}; then
echo "Already running" >> myprog.log
exit 0
else
echo "Old pidfile found" >> myprog.log
fi
else
echo "No pidfile found" >> myprog.log
fi
echo $$ > myprog.pid
This file is called by a watchdog script, callmyprog, which looks like this:
#!/bin/bash
myprog &
It seems to be a problem with if /bin/ps -p ${PID}. The problem manifests itself in this way. If I manually call myprog when it is running I get the message "Already running" as it should. Same thing happens when I manually run the script callmyprog. But when the watchdog runs it, I instead get "Old pidfile found".
I have checked the output from ps and in all cases it finds the process. When I'm calling myprog manually - either directly or through callmyprog, I get the return code 0, but when the watchdog calls it I get the return code 1. I have added debug printouts to the above snippets to print basically everything, but I really cannot see what the problem is. In all cases it looks something like this in the log when the ps command is run from the script:
$ ps -p 1
PID TTY TIME CMD
1 ? 01:06:36 systemd
The only difference is that the return value is different. I checked the exit code with code like this:
/bin/ps -p ${PID}
echo $? >> myprog.log
What could possibly be the cause here? Why does the return code vary depending on how I call the script? I tried to download the source code for ps but it was to complicated for me to understand.
I was able to "solve" the problem with an ugly hack. I piped ps -p $PID | wc -l and checked that the number of lines were at least 2, but that feels like an ugly hack and I really want to understand what the problem is here.
Answer to comment below:
The original script contains absolute paths so it's not a directory problem. There is no alias for ps. which ps yields /bin/ps. The scripts are run as root, so I cannot see how it can be a permission problem.

Related

snmpd on Beaglebone/Debian, reading file with source

I have installed snmpd on a Beaglebone Black with Debian and everything works perfect so far, except one thing.
I have configured snmpd.conf for a pass-through
pass .1.3.6.1.4.1.45919 /bin/sh /usr/local/bin/snmp-20
Then snmp-20 is a batch script that looks like this
#!/bin/bash
if [ "$1" = "-n" ]
then
exit 0
fi
. /root/snmp.cfg
#sSerialNumber
if [ "$2" = ".1.3.6.1.4.1.45919.1.120.5" ]
then
echo 1.3.6.1.4.1.45919.1.120.5
echo string
echo $serial
exit 0
fi
In snmp.cfg it looks like this
serial=12345
I feel this is all pretty straight forward. Now when I run /bin/sh /usr/local/bin/snmp-20 or just /usr/local/bin/snmp-20 I get the expected output.
When I do the snmpget -c public -v2c localhost 1.3.6.1.4.1.45919.1.120.5 it returns with "No such instance currently exists ..."
However when I comment the ./root/snmp.cfg the snmpget call is successful, so the calling parameters all work correctly.
It seems the script exits when the source /root/snmp.cfg command is called, but only when called by snmpget, not when called from the prompt.
Any idea would be appreciated.

Getting "No such file or directory" even after using > /dev/null 2>&1 at the end of the command

I have written a script where it checks the PID of that JVM. If PID is not existing for that specific JVM, it will give exit status 2 and terminates the script. To check the PID, I do a cat of specific file which contains the PID. When there is no PID or no file which contains PID, it is throwing "No such file or directory" even after redirecting the output using > /dev/null 2>&1 at the end of the command.
I don't want the output displayed on the screen. If someone could help me on that would be really helpful.
EDIT this is the code snippet:
ONLINE=grep -ir "$HOSTNAME-"$VAR" is currently online" /home/logs/$GF_OWNER/$HOSTNAME-"$VAR"/$HOSTNAME-"$VAR".log | wc -l
OFFLINE=0
PID=cat /home/logs/$GF_OWNER/$HOSTNAME-"$VAR"/gemstart.pid
It think you should try to test if your file exists before using cat/ls/whatever on it. For that, use the if/test structure :
if [ -e $PETH_TO_THE_FILE ]; then
//processing here
else
fi
This way you won't have unnecessary output.

Symbols in startup.sh of Apache Tomcat

I've been trying to make my own 'Daemon' java thread.
I couldn't quite get what I wanted, so I got curious about how Tomcat stays alive even
after I disconnect the ssh connection.
So I decided to poke around the Tomcat source files, to see if I could find 'the magic'.
In startup.sh there are some weird looking things I tried to find on the Internet without luck.
in startup.sh
# resolve links - $0 may be a softlink
PRG="$0"
while [ -h "$PRG" ] ; do
ls=`ls -ld "$PRG"`
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '/.*' > /dev/null; then
PRG="$link"
else
PRG=`dirname "$PRG"`/"$link"
fi
done
PRGDIR=`dirname "$PRG"`
EXECUTABLE=catalina.sh
# Check that target executable exists
if $os400; then
# -x will Only work on the os400 if the files are:
# 1. owned by the user
# 2. owned by the PRIMARY group of the user
# this will not work if the user belongs in secondary groups
eval
else
if [ ! -x "$PRGDIR"/"$EXECUTABLE" ]; then
echo "Cannot find $PRGDIR/$EXECUTABLE"
echo "The file is absent or does not have execute permission"
echo "This file is needed to run this program"
exit 1
fi
fi
exec "$PRGDIR"/"$EXECUTABLE" start "$#"
What is '$0' ?
What's '$#' ?
What do they do ?
EDIT
Perhaps this really doesn't have much to do with the OQ but I just wanted to share what I've found.
After analysing the source code of Apache Tomcat, I figured it out. I'm not sure if this is
how Tomcat actually runs.
What I wanted was something like a daemon process.
First you need a launcher written in java. From within the Launcher, make a process and exec("java yourDaemonToBe");
Hope this helps.
the name of the shell-script you are running is $0, argv is found in the array $#, i.e. command line arguments for your script.

Grab output from a script that was ran within another

I know the title is a bit confusing but here's my situation
#!/bin/bash
for node in `cat path/to/node.list`
do
echo "Checking node: $node"
ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no me#$node "nohup scriptXYZ.sh"
done
Basically scriptXYZ has an output. Something like: Node bla is Up or Node bla is Down. I want to do something that amounts to the following psudo code:
if (output_from_scriptXYZ == "Node bla is Up")
do this
else
do that
I've been trying to find a way to do this online but I couldn't find something that does this. Then again, I might not know what I'm searching for.
Also as a bonus: Is there any way to tell if the scriptXYZ has an error while it ran? Something like a "file does not exist" -- not the script itself but something the script tried to do and thus resulting in an error.
First, is it possible to have scriptXYZ.sh exit with 0 if the node is up, and non-zero otherwise? Then you can simply do the following, instead of capturing the output. The scripts standard output and standard error will be connected to you local terminal, so you will see them just as if you had run it locally.
#!/bin/bash
while read -r node; do
echo "Checking node: $node"
if ssh -o UserKnownHostsFile=/dev/null \
-o StrictHostKeyChecking=no me#$node scriptXYZ.sh; then
do something
else
do something else
fi
done < path/to/node.list
It doesn't really make sense to run your script with nohup, since you need to stay connected to have the results sent back to the local host.

Redirecting Output of Bash Child Scripts

I have a basic script that outputs various status messages. e.g.
~$ ./myscript.sh
0 of 100
1 of 100
2 of 100
...
I wanted to wrap this in a parent script, in order to run a sequence of child-scripts and send an email upon overall completion, e.g. topscript.sh
#!/bin/bash
START=$(date +%s)
/usr/local/bin/myscript.sh
/usr/local/bin/otherscript.sh
/usr/local/bin/anotherscript.sh
RET=$?
END=$(date +%s)
echo -e "Subject:Task Complete\nBegan on $START and finished at $END and exited with status $RET.\n" | sendmail -v group#mydomain.com
I'm running this like:
~$ topscript.sh >/var/log/topscript.log 2>&1
However, when I run tail -f /var/log/topscript.log to inspect the log I see nothing, even though running top shows myscript.sh is currently being executed, and therefore, presumably outputting status messages.
Why isn't the stdout/stderr from the child scripts being captured in the parent's log? How do I fix this?
EDIT: I'm also running these on a remote machine, connected via ssh using pseudo-tty allocation, e.g. ssh -t user#host. Could the pseudo-tty be interfering?
I just tried your the following: I have three files t1.sh, t2.sh, and t3.sh all with the following content:
#!/bin/bash
for((i=0;i<10;i++)) ; do
echo $i of 9
sleep 1
done
And a script called myscript.sh with the following content:
#!/bin/bash
./t1.sh
./t2.sh
./t3.sh
echo "All Done"
When I run ./myscript.sh > topscript.log 2>&1 and then in another terminal run tail -f topscript.log I see the lines being output just fine in the log file.
Perhaps the things being run in your subscripts use a large output buffer? I know when I've run python scripts before, it has a pretty big output buffer so you don't see any output for a while. Do you actually see the entire output in the email that gets sent out at the end of topscript.sh? Is it just that while the processes run you're not seeing the output?
try
unbuffer topscript.sh >/var/log/topscript.log 2>&1
Note that unbuffer is not always available as a std binary in old-style Unix platforms and may require a search and installation for a package to support it.
I hope this helps.

Resources