Redirect all output in a bash script when using set -x, capture pid and all output - linux

I'm modifying an old script and for some reason it uses a subshell. I'm not sure if maybe the subshell is what's tripping me up. What I really want is to start a service and capture all of STDOUT and STDERR to a file as well as it's PID. Additionally, however I want some debug information in the log file. Consider the script below (startFoo.sh):
#!/bin/bash
VARIABLE=$(something_dynamic)
echo "Some output"
(
# Enable debugging
set -x
foo -argument1=bar \
-argument2=$VARIABLE
# Disable debugging
set +x
) > /tmp/foo_service.log 2>&1 &
OUTER_PID=$!
echo $OUTER_PID > foo.pid
This seems to work in that I'm capturing most of the output to the log as well as the PID, but for some reason not all of the output is redirected. When I run the script, I see this in my terminal:
[me#home ~]$ sudo startFoo.sh
Some output
[me#home ~]$ + foo -argument1=bar -argument2=value
How can I squash the output in my prompt that says [me#home ~]$ + foo...?
Note, this question may be related to another question: redirect all output in a bash script when using set -x, however my specific usage is different.
UPDATE: My script now looks like this, but something is still not quite right:
#!/bin/bash
VARIABLE=$(something_dynamic)
echo "Some output"
(
# Enable debugging
set -x
foo -argument1=bar \
-argument2=$VARIABLE
# Disable debugging
set +x
) > /tmp/foo_service.log 2>&1 &
PID=$!
echo $PID > foo.pid
However, when I do this, the PID file contains the PID for startFoo.sh, not the actual invocation of foo which is what I really want to capture and be able to kill. Ideally I could kill both startFoo.sh and foo with one PID, but I'm not sure how to do that. How is this normally handled?
UPDATE: The solution (thanks to a conversation with #konsolebox) is below:
#!/bin/bash
VARIABLE=$(something_dynamic)
echo "Some output"
{
# Enable debugging
set -x
foo -argument1=bar \
-argument2="$VARIABLE" &
PID=$!
echo $PID > foo.pid
# Disable debugging
set +x
} > /tmp/foo_service.log 2>&1

Change 2>&1> /tmp/foo_service.log to >/tmp/foo_service.log 2>&1.
You should first redirect fd 1 to the file, then let fd 2 duplicate it. What you're doing on the former is that you first redirect fd 2 to 1 which only copies the default stdout, not the file's fd which is opened after it.

Related

Running a process with the TTY detached

I'd like to run a linux console command from a terminal, preventing it from accessing the TTY by itself (which will, for example, happen often when the console command tries to request a password from the user - this should just fail). The closest I get to a solution is using this wrapper:
temp=`mktemp -d`
echo "$#" > $temp/run.sh
mkfifo $temp/out $temp/err
setsid sh -c "sh $temp/run.sh > $temp/out 2> $temp/err" &
cat $temp/err 1>&2 &
cat $temp/out
rm -f $temp/out $temp/err $temp/run.sh
rmdir $temp
This runs the command as expected without TTY access, but passing the stdout/stderr output through the FIFO pipes does not work for some reason. I end up with no output at all even though the process wrote to stdout or stderr.
Any ideas?
Well, thank you all for having a look. Turns out that the script already contained a working approach. It just contained a typo which caused it to fail. I corrected it in the question so it may serve for future reference.

What does ps actually return? (Different value depending on how it is called)

I have a script containing this snippet:
#!/bin/bash
set +e
if [ -O "myprog.pid" ]; then
PID=`/bin/cat myprog.pid`
if /bin/ps -p ${PID}; then
echo "Already running" >> myprog.log
exit 0
else
echo "Old pidfile found" >> myprog.log
fi
else
echo "No pidfile found" >> myprog.log
fi
echo $$ > myprog.pid
This file is called by a watchdog script, callmyprog, which looks like this:
#!/bin/bash
myprog &
It seems to be a problem with if /bin/ps -p ${PID}. The problem manifests itself in this way. If I manually call myprog when it is running I get the message "Already running" as it should. Same thing happens when I manually run the script callmyprog. But when the watchdog runs it, I instead get "Old pidfile found".
I have checked the output from ps and in all cases it finds the process. When I'm calling myprog manually - either directly or through callmyprog, I get the return code 0, but when the watchdog calls it I get the return code 1. I have added debug printouts to the above snippets to print basically everything, but I really cannot see what the problem is. In all cases it looks something like this in the log when the ps command is run from the script:
$ ps -p 1
PID TTY TIME CMD
1 ? 01:06:36 systemd
The only difference is that the return value is different. I checked the exit code with code like this:
/bin/ps -p ${PID}
echo $? >> myprog.log
What could possibly be the cause here? Why does the return code vary depending on how I call the script? I tried to download the source code for ps but it was to complicated for me to understand.
I was able to "solve" the problem with an ugly hack. I piped ps -p $PID | wc -l and checked that the number of lines were at least 2, but that feels like an ugly hack and I really want to understand what the problem is here.
Answer to comment below:
The original script contains absolute paths so it's not a directory problem. There is no alias for ps. which ps yields /bin/ps. The scripts are run as root, so I cannot see how it can be a permission problem.

bash: what to do when stdout does not exist

In a very simplified scenario, I have a script that looks like this:
mv test _test
sleep 10
echo $1
mv _test test
and if I execute it with:
ssh localhost "test.sh foo"
the test file will have an underscore in the name as long as the script is running, and when the script is finished, it will send foo back. The script SHOULD keep running, even if you terminate the ssh command by pressing ctrl+c or if you lose connection the the server, but it doesn't (the file is not renamed back to "test"). So, I tried the following:
nohup ssh localhost "test.sh foo"
and it makes ssh immune to ctrl+c but flaky connection to the server still causes trouble. After some debugging, it turns out that the script WILL actually reach the end IF THERE IS NO ECHO IN IT. And when you think about it, it makes sense - when the connection is dropped, there is no more stdout (ssh socket) to echo to, so it will fail, silently.
I can, of course, echo to a file and then get the file, but I would prefer something smarter, along the lines of test tty && echo $1 (but tty invoked like this always returns false). Any suggestions are greatly appreciated.
The following command does what you want:
ssh -t user#host 'nohup ~/test.sh foo > nohup.out 2>&1 & p1=$!; tail -f ~/nohup.out & wait $p1'
... test.sh is located in the users home directory
Explanation:
1.) "ssh -t user#host " ... pretty clear ... starts remote session
2.) "nohup ~/test.sh foo > nohup.out 2>&1" ... starts the test.sh script with nohup in background
3.) "p1=$!;" ... stores the child pid of the previous command in p1
4.) "tail -f ~/nohup.out &" ... tail nohup.out in background to see the output of test.sh
5.) "wait $p1" ... waits for proccess test.sh (which pid is stored in p1) to finish
The above command works even if you interrupt it with ctrl+c.
you can use ...
ssh -t localhost "test.sh foo"
... to force a tty allocation
As st0ne suggested, tail fails, but does not cause the script to terminate, as opposed to cat and echo. So, there is no need for nohup, redirecting stdout to a temporary file, etc. just plain and simple:
mv test _test
sleep 10
echo $1 | tail
mv _test test
and execute it with:
ssh localhost "test.sh foo"

Why this Debian-Linux autostart netcat script won't autostart?

I placed a link to my scripts in the rc.local to autostart it on linux debian boot. It starts and then stops at the while loop. It's a netcat script that listens permantently on port 4001.
echo "Start"
while read -r line
do
#some stuff to do
done < <(nc -l -p 4001)
When I start this script as root with command ./myscript it works 100% correctly. Need nc (netcat) root level access or something else?
EDIT:
rc.local
#!/bin/sh -e
#
# rc.local
#
# This script is executed at the end of each multiuser runlevel.
# Make sure that the script will "exit 0" on success or any other
# value on error.
#
# In order to enable or disable this script just change the execution
# bits.
#
# By default this script does nothing.
/etc/samba/SQLScripts
exit 0
rc.local starts my script "SQLScripts"
SQLScripts
#! /bin/sh
# The following part always gets executed.
echo "Starting SQL Scripts" >> /var/log/SQLScriptsStart
/etc/samba/PLCCheck >> /var/log/PLCCheck &
"SQLScripts" starts "PLCCheck" (for example only one)
PLCCheck
#!/bin/bash
echo "before SLEEP" >> /var/log/PLCCheck
sleep 5
echo "after SLEEP" >> /var/log/PLCCheck
echo "vor While" >> /var/log/PLCCheck
while read -r line
do
echo "in While" >> /var/log/PLCCheck
done < <(netcat -u -l -p 6001)
In an rc script you have root level access by default. What does "it stops at the while loop" mean? It quits after a while, or so? I guess you need to run your loop in the background in order to achieve functionality usual in autostart scripts:
echo "Starting"
( while read -r line
do
#some stuff to do
done << (nc -l -p 4001) ) &
echo "Started with pid $( jobs -p )"
I have tested yersterday approximatly the same things, and I have discover that you can bypass the system and execute your netcat script with the following crontask. :
(every minute, but you can ajust that as you want.)
* * * * * /home/kali/script-netcat.sh // working for me
#reboot /home/kali/script-netcat.sh // this is blocked by the system.
According to me, I think that by default debian (and maybe others linux distrib) block every script that try to execute a netcat command.

Redirecting Output of Bash Child Scripts

I have a basic script that outputs various status messages. e.g.
~$ ./myscript.sh
0 of 100
1 of 100
2 of 100
...
I wanted to wrap this in a parent script, in order to run a sequence of child-scripts and send an email upon overall completion, e.g. topscript.sh
#!/bin/bash
START=$(date +%s)
/usr/local/bin/myscript.sh
/usr/local/bin/otherscript.sh
/usr/local/bin/anotherscript.sh
RET=$?
END=$(date +%s)
echo -e "Subject:Task Complete\nBegan on $START and finished at $END and exited with status $RET.\n" | sendmail -v group#mydomain.com
I'm running this like:
~$ topscript.sh >/var/log/topscript.log 2>&1
However, when I run tail -f /var/log/topscript.log to inspect the log I see nothing, even though running top shows myscript.sh is currently being executed, and therefore, presumably outputting status messages.
Why isn't the stdout/stderr from the child scripts being captured in the parent's log? How do I fix this?
EDIT: I'm also running these on a remote machine, connected via ssh using pseudo-tty allocation, e.g. ssh -t user#host. Could the pseudo-tty be interfering?
I just tried your the following: I have three files t1.sh, t2.sh, and t3.sh all with the following content:
#!/bin/bash
for((i=0;i<10;i++)) ; do
echo $i of 9
sleep 1
done
And a script called myscript.sh with the following content:
#!/bin/bash
./t1.sh
./t2.sh
./t3.sh
echo "All Done"
When I run ./myscript.sh > topscript.log 2>&1 and then in another terminal run tail -f topscript.log I see the lines being output just fine in the log file.
Perhaps the things being run in your subscripts use a large output buffer? I know when I've run python scripts before, it has a pretty big output buffer so you don't see any output for a while. Do you actually see the entire output in the email that gets sent out at the end of topscript.sh? Is it just that while the processes run you're not seeing the output?
try
unbuffer topscript.sh >/var/log/topscript.log 2>&1
Note that unbuffer is not always available as a std binary in old-style Unix platforms and may require a search and installation for a package to support it.
I hope this helps.

Resources