Showing progress of scp during cloud-init process

Showing progress of scp during cloud-init process - linux

When I spin up a new server I use a cloud-init script to automate the process. I follow progress of this process with:
$ tail -f /var/log/cloud-init-output.log
When it comes to fetching a backup file using scp and piping it through tar -zx I tried the following in the cloud-init script:
$ scp myuser#123.45.6.789:path/to/file.tar.gz /dev/stdout | tar -zx
Whilst this command works, the handy progress indications that scp outputs do not appear in the tail -f output... i.e. I do not see progress like:
file.tar.gz 57% 52MB 25.2MB/s 00:02
I have also tried bash process substitution like so:
$ scp myuser#123.45.6.789:path/to/file.tar.gz >(tar -zx)
And still the progress indications do not appear in the tail -f output.
How do I preserve the progress indications in the tail -f output? Particularly to see progress when fetching the larger backup files, it would be really handy to preserve these indications.
Note that when I run both of the above in a bash script directly (with set -x at the top), progress does show for the 'bash process substitution' variant but not the 'piping' variant... but when tailing the cloud-init logs, progress shows for neither variant.
It looks like cloud-init just sends both stdout and stderr from all cloud-init stages to /var/log/cloud-init-output.log (see here).
So we can re-create the cloud-init process (for the purposes of this question) with the following call:
$ scp myuser#123.45.6.789:path/to/file.tar.gz >(tar -zx) >output.log 2>&1
and we then follow this log file separately with:
$ tail -f output.log
Strange thing is... no progress status ever appears in this output.log file. And yet progress status is definitely being sent... the only change in the following is to direct to /dev/stdout rather than output.log:
$ scp myuser#123.45.6.789:path/to/file.tar.gz >(tar -zx) >/dev/stdout 2>&1
file.tar.gz 57% 52MB 25.2MB/s 00:02
So why do we see progress status on /dev/stdout but not in output.log when we direct stdout to output.log?

Like some other tools, scp checks if its output is going to a TTY (by use of isatty), disabling the progress meter output if it is not (you can find similar cases e.g., ls --color=auto emits color codes only when output goes to a terminal).
You can trick it into thinking it is outputting to a TTY by running it under script (as shown in this answer), or any other tool (e.g expect) that runs a program with its output connected to a PTY.

Related

Taking sequentially output of multi ssh with bash scripting

I'm wrote a bash script and I don't have a chance to download pssh. However, I need to run multiple ssh commands in parallel. There is no problem so far, but when I run ssh commands, I want to see the outputs sequentially from the remote machine. I mean one ssh has multiple outputs and they get mixed up because more than one ssh is running.
#!/bin/bash
pid_list=""
while read -r list
do
ssh user#$list 'commands'&
c_pid=$!
pid_list="$pid_list $c_pid"
done < list.txt
for pid in $pid_list
do
wait $pid
done
What should I add to the code to take the output unmixed?

The most obvious way to me would be to write the outputs in a file and cat the files at the end:
#!/bin/bash
me=$$
pid_list=""
while read -r list
do
ssh user#$list 'hostname; sleep $((RANDOM%5)); hostname ; sleep $((RANDOM%5))' > /tmp/log-${me}-$list &
c_pid=$!
pid_list="$pid_list $c_pid"
done < list.txt
for pid in $pid_list
do
wait $pid
done
cat /tmp/log-${me}-*
rm /tmp/log-${me}-* 2> /dev/null
I didn't handle stderr because that wasn't in your question. Nor did I address the order of output because that isn't specified either. Nor is whether the output should appear as each host finishes. If you want those aspects covered, please improve your question.

Running a process with the TTY detached

I'd like to run a linux console command from a terminal, preventing it from accessing the TTY by itself (which will, for example, happen often when the console command tries to request a password from the user - this should just fail). The closest I get to a solution is using this wrapper:
temp=`mktemp -d`
echo "$#" > $temp/run.sh
mkfifo $temp/out $temp/err
setsid sh -c "sh $temp/run.sh > $temp/out 2> $temp/err" &
cat $temp/err 1>&2 &
cat $temp/out
rm -f $temp/out $temp/err $temp/run.sh
rmdir $temp
This runs the command as expected without TTY access, but passing the stdout/stderr output through the FIFO pipes does not work for some reason. I end up with no output at all even though the process wrote to stdout or stderr.
Any ideas?

Well, thank you all for having a look. Turns out that the script already contained a working approach. It just contained a typo which caused it to fail. I corrected it in the question so it may serve for future reference.

Why doesn't tcpdump run in background?

I logged in a virtual machine via ssh and I tried to run a script in background, the script is shown below:
#!/bin/bash
APP_NAME=`basename $0`
CFG_FILE=$1
. $CFG_FILE #just some variables
CMD=$2
PID_FILE="$PIDS_DIR/$APP_NAME.pid"
CUR_LOG_DIR=$LOGS_RUNNING
echo $$ > $PID_FILE
#Main script code
#This script shall be called using the following syntax
# $ nohup script_name output_dir &
TIMESTAMP=`date +"%Y%m%d%H%M%S"`
CAP_INTERFACE="eth0"
/usr/sbin/tcpdump -nei $CAP_INTERFACE -s 65535 -w file_result
rm $PID_FILE
The result should be tcpdump running in background, redirecting the command result to file_result.
The script is called with:
nohup $SCRIPT_NAME $CFG_FILE start &
And It is stopped calling the STOP_SCRIPT:
##STOP_SCRIPT
PID_FILE="$PIDS_DIR/$APP_NAME.pid"
if [ -f $PID_FILE ]
then
PID=`cat $PID_FILE`
# send SIGTERM to kill all children of $PID
pkill -TERM -P $PID
fi
When I check the file_result, after running the stop script, It is empty.
What is happening? How can I solve it?
I found this link: https://it.toolbox.com/question/launching-tcpdump-processes-in-background-using-ssh-060614
The author seems to have faced a similar issue. They debate about race conditions, but I didn't understand completely.

I'm not sure what you're trying to accomplish by having the startup script itself continue to run, but here's an approach that I think accomplishes what you're trying to do, namely start tcpdump and have it continue to run immune to hangups via nohup. I've simplified things a bit for illustrative purposes - feel free to add any variables back as you see fit, such as the nohup.out output directory, TIMESTAMP, etc.
Script #1: tcpdump_start.sh
#!/bin/sh
rm -f nohup.out
nohup /usr/sbin/tcpdump -ni eth0 -s 65535 -w file_result.pcap &
# Write tcpdump's PID to a file
echo $! > /var/run/tcpdump.pid
Script #2: tcpdump_stop.sh
#!/bin/sh
if [ -f /var/run/tcpdump.pid ]
then
kill `cat /var/run/tcpdump.pid`
echo tcpdump `cat /var/run/tcpdump.pid` killed.
rm -f /var/run/tcpdump.pid
else
echo tcpdump not running.
fi
To start tcpdump, just run tcpdump_start.sh.
To stop the tcpdump instance started with tcpdump_start.sh, just run tcpdump_stop.sh.
The captured packets will be written to the file_result.pcap file, and yes, it's a pcap file, not a text file, so it helps to name it with the proper file extension. The tcpdump statistics will be written to the nohup.out file when tcpdump is terminated.

I too had faced problems when running tcpdump over an SSH session.
In my case, I was running
sudo nohup tcpdump -w {pcap_dump_file} {filter} > /dev/null 2>&1 &
Where, running this command over Paramiko SSH session as a background process was the problem.
To get around this, I used screen utility of Linux.
screen is an easy to use tool for long-running of processes as a service.

Might be an old post, but this is also relevant. I couldn;t understand why no file was being created only to realise that the file might not be created until a certain amount of data had been captured.
https://github.com/the-tcpdump-group/tcpdump/issues/485

Run a script in the same shell(bash)

My problem is specific to the running of SPECCPU2006(a benchmark suite).
After I installed the benchmark, I can invoke a command called "specinvoke" in terminal to run a specific benchmark. I have another script, where part of the codes are like following:
cd (specific benchmark directory)
specinvoke &
pid=$!
My goal is to get the PID of the running task. However, by doing what is shown above, what I got is the PID for the "specinvoke" shell command and the real running task will have another PID.
However, by running specinvoke -n ,the real code running in the specinvoke shell will be output to the stdout. For example, for one benchmark,it's like this:
# specinvoke r6392
# Invoked as: specinvoke -n
# timer ticks over every 1000 ns
# Use another -n on the command line to see chdir commands and env dump
# Starting run for copy #0
../run_base_ref_gcc43-64bit.0000/milc_base.gcc43-64bit < su3imp.in > su3imp.out 2>> su3imp.err
Inside it it's running a binary.The code will be different from benchmark to benchmark(by invoking under different benchmark directory). And because "specinvoke" is installed and not just a script, I can not use "source specinvoke".
So is there any clue? Is there any way to directly invoke the shell command in the same shell(have same PID) or maybe I should dump the specinvoke -n and run the dumped materials?

You can still do something like:
cd (specific benchmark directory)
specinvoke &
pid=$(pgrep milc_base.gcc43-64bit)
If there are several invocation of the milc_base.gcc43-64bit binary, you can still use
pid=$(pgrep -n milc_base.gcc43-64bit)
Which according to the man page:
-n
Select only the newest (most recently started) of the matching
processes

when the process is a direct child of the subshell:
ps -o pid= -C=milc_base.gcc43-64bit --ppid $!
when not a direct child, you could get the info from pstree:
pstree -p $! | grep -o 'milc_base.gcc43-64bit(.*)'
output from above (PID is in brackets): milc_base.gcc43-64bit(9837)

Redirecting Output of Bash Child Scripts

I have a basic script that outputs various status messages. e.g.
~$ ./myscript.sh
0 of 100
1 of 100
2 of 100
...
I wanted to wrap this in a parent script, in order to run a sequence of child-scripts and send an email upon overall completion, e.g. topscript.sh
#!/bin/bash
START=$(date +%s)
/usr/local/bin/myscript.sh
/usr/local/bin/otherscript.sh
/usr/local/bin/anotherscript.sh
RET=$?
END=$(date +%s)
echo -e "Subject:Task Complete\nBegan on $START and finished at $END and exited with status $RET.\n" | sendmail -v group#mydomain.com
I'm running this like:
~$ topscript.sh >/var/log/topscript.log 2>&1
However, when I run tail -f /var/log/topscript.log to inspect the log I see nothing, even though running top shows myscript.sh is currently being executed, and therefore, presumably outputting status messages.
Why isn't the stdout/stderr from the child scripts being captured in the parent's log? How do I fix this?
EDIT: I'm also running these on a remote machine, connected via ssh using pseudo-tty allocation, e.g. ssh -t user#host. Could the pseudo-tty be interfering?

I just tried your the following: I have three files t1.sh, t2.sh, and t3.sh all with the following content:
#!/bin/bash
for((i=0;i<10;i++)) ; do
echo $i of 9
sleep 1
done
And a script called myscript.sh with the following content:
#!/bin/bash
./t1.sh
./t2.sh
./t3.sh
echo "All Done"
When I run ./myscript.sh > topscript.log 2>&1 and then in another terminal run tail -f topscript.log I see the lines being output just fine in the log file.
Perhaps the things being run in your subscripts use a large output buffer? I know when I've run python scripts before, it has a pretty big output buffer so you don't see any output for a while. Do you actually see the entire output in the email that gets sent out at the end of topscript.sh? Is it just that while the processes run you're not seeing the output?

try
unbuffer topscript.sh >/var/log/topscript.log 2>&1
Note that unbuffer is not always available as a std binary in old-style Unix platforms and may require a search and installation for a package to support it.
I hope this helps.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string