broken pipe with remote rsync between two servers - linux

I am trying to transfer a large dataset (768 Gigs) from one remote machine to another using bash on ubuntu 16.04. The problem I appear to be having is that I use rsync and the machine will transfer for a few hours and then quit when the connection inevitably gets interrupted. So suppose Im on machine A and the remote servers are machines B and C (all machines using ubuntu 16.04). I ssh to machine B and use this command:
nohup rsync -P -r -e ssh /path/to/files/on/machine_B user#machine_C:directory &
note that I have the authorized key setup so no password is required between machines B and C
A few hours later I get the following in the nohup file:
sending incremental filelist
file_1.bam
90,310,583,648 100% 36.44MB/s 0:39:23 (xfr#4, to-chk=5/10)
file_2.bam
79,976,321,885 100% 93.25MB/s 0:13:37 (xfr#3, to-chk=6/10)
file_3.bam
88,958,959,616 88% 12.50MB/s 0:15:28 rsync error: unexplained error (code 129) at rsync.c(632) [sender=3.1.1]
rsync: [sender] write error: Broken pipe (32)
I used nohup because I though it would keep running even if there was a hangup. I have not tried sh -c and I have not tried running the command from machine A because at this point whatever I try would be guesswork, ideas would be appreciated.

for those that are interested I also tried running the following script with the nohup command on machine B.
script:
chomp( my #files = `ls /path/to/files/on/machineB/*` );
foreach ( #files ) { system("scp $_ user#machineC:destination/"); }
I still got truncated files.
at the moment the following command appears to be working:
nohup rsync -P --append -r -e ssh /path/to/files/on/machine_B user#machine_C:directory &
you just have to check the nohup file for a broken pipe error and re-enter the command if necessary.

I had the same problem and solved it in multiple steps:
First I made sure that I ran all commands on tmux terminals. This adds a layer of safety on top of nohup, as it keeps connections alive: https://en.wikipedia.org/wiki/Tmux
I combined the rsync command with the while command to enforce that the copy is attempted an infinite number of times even if the pipe breaks:
while ! rsync <your_source> <your_destination>; do echo "Rsync failed. Retrying ..."; done
This approach is brute force and it will work if for each attempt, rsync manages to copy at least a few files. Eventually, even with wasteful repeats and multiple failures, all the files will be copied and my command above will exit gratefully.

Related

calling SSH Sudo Commands in a bash script

I am trying to run the command while looping through
a series of sever addresses.
while read server
do
ssh -t $sever "sudo md5sum $fileName >> hashes"
done < serverNamesFile
within a script in bash but i keep getting this error
sudo: sorry, you must have a tty to run sudo
if I run the same line of commands in the command line though, it works perfectly fine.
Can someone tell me why this keeps happening?
you probably have
Defaults requiretty
in your /etc/sudoers file.
As the option's name suggests, that will cause sudo to require a tty
I solved my problem. apparently looping through a series of servers inside a script causes the "TTY" error for SSH.
a better practice is to create a script that takes in the address of the server you want to SSH in and then pass in the commands that way. you can still loop through a series of file or commands by calling SSH each time and use this command:
while read stuff
do
ssh -qtt $severName " command"
done < $fileStuff

SSH, run process and then ignore the output

I have a command that will SSH and run a script after SSH'ing. The script runs a binary file.
Once the script is done, I can type any key and my local terminal goes back to its normal state. However, since the process is still running in the machine I SSH'ed into, any time it logs to stdout I see it in my local terminal.
How can I ignore this output without monkey patching it on my local machine by passing it to /dev/null? I want to keep the output inside the machine I am SSH'ing to and I want to just leave the SSH altogether after the process starts. I can pass it to /dev/null in the machine, however.
This is an example of what I'm running:
cat ./sh/script.sh | ssh -i ~/.aws/example.pem ec2-user#11.111.11.111
The contents of script.sh looks something like this:
# Some stuff...
# Run binary file
./bin/binary &
Solved it with ./bin/binary &>/dev/null &
Copy the script to the remote machine and then run it remotely. Following commands are executed on your local machine.
$ scp -i /path/to/sshkey /some/script.sh user#remote_machine:/path/to/some/script.sh
# Run the script in the background on the remote machine and pipe the output to a logfile. This will also exit from the SSH session right away.
$ ssh -i /path/to/sshkey \
user#remote_machine "/path/to/some/script.sh &> /path/to/some/logfile &"
Note, logfile will be created on the remote machine.
# View the log file while the process is executing
$ ssh -i /path/to/sshkey user#remote_machine "tail -f /path/to/some/logfile"

how to send different commands to multiple hosts to run programs in Linux

I am an R user. I always run programs on multiple computers of campus. For example, I need to run 10 different programs. I need to open PuTTY 10 times to log into the 10 different computers. And submit each of programs to each of 10 computers (their OS is Linux). Is there a way to log in 10 different computers and send them command at same time? I use following command to submit program
nohup Rscript L_1_cc.R > L_1_sh.txt
nohup Rscript L_2_cc.R > L_2_sh.txt
nohup Rscript L_3_cc.R > L_3_sh.txt
First set up ssh so that you can login without entering a password (google for that if you don't know how). Then write a script to ssh to each remote host to run the command. Below is an example.
#!/bin/bash
host_list="host1 host2 host3 host4 host5 host6 host7 host8 host9 host10"
for h in $host_list
do
case $h in
host1)
ssh $h nohup Rscript L_1_cc.R > L_1_sh.txt
;;
host2)
ssh $h nohup Rscript L_2_cc.R > L_2_sh.txt
;;
esac
done
This is a very simplistic example. You can do much better than this (for example, you can put the ".R" and the ".txt" file names into a variable and use that rather than explicitly listing every option in the case).
Based on your topic tags I am assuming you are using ssh to log into the remote machines. Hopefully the machine you are using is *nix based so you can use the following script. If you are on Windows consider cygwin.
First, read this article to set up public key authentication on each remote target: http://www.cyberciti.biz/tips/ssh-public-key-based-authentication-how-to.html
This will prevent ssh from prompting you to input a password each time you log into every target. You can then script the command execution on each target with something like the following:
#!/bin/bash
#kill script if we throw an error code during execution
set -e
#define hosts
hosts=( 127.0.0.1 127.0.0.1 127.0.0.1)
#define associated user names for each host
users=( joe bob steve )
#counter to track iteration for correct user name
j=0
#iterate through each host and ssh into each with user#host combo
for i in ${hosts[*]}
do
#modify ssh command string as necessary to get your script to execute properly
#you could even add commands to transfer the file into which you seem to be dumping your results
ssh ${users[$j]}#$i 'nohup Rscript L_1_cc.R > L_1_sh.txt'
let "j=j+1"
done
#exit no error
exit 0
If you set up the public key authentication, you should just have to execute your script to make every remote host do their thing. You could even look into loading the users/hosts data from file to avoid having to hard code that information into the arrays.

Using `rsync` and `ssh` in perl causes unexplained timeout?

I'm running a two-stage process on a large number of files.
CODE:
$server_sleep = 1;
$ssh_check = 'ssh '.$destination_user."#".$destination_hostname.' "test -e '.$destination_path.$file_filename.'.txt && echo 1 || echo 0"';
while (`$ssh_check` ne "1\n") { # for some reason, the backticks return the 1 with a newline
$upload_command = "/usr/bin/rsync -qogt --timeout=".$server_sleep." --partial --partial-dir=".$destination_path."partials ".$file_path."/".$file_filename.".txt ". $destination_user."#".$destination_hostname.":".$destination_path;
sleep $server_sleep; # to avoid hammering the server (for the rsync)
$upload_result = `$upload_command 2>&1`;
$file_errorReturn = "FAIL" if $?;
if (defined($file_errorReturn)) {
#log an error. there is code to do this, but I have omitted it.
}
sleep $server_sleep; # to avoid hammering the server (for the ssh check)
$server_sleep++; # increase the timeout if failures continue
}
BEHAVIOUR:
For the first few files this works fine (which should take care of your first few questions about keys, access, permissions, typos, etc.), and at some point, I get this error back:
ssh: connect to host remote_server.com port 22: Connection timed out
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: unexplained error (code 255) at io.c(600) [sender=3.0.6]
I get this regardless of if I have specified -e ssh in the command, so I assume there is a default somewhere to ssh (which is fine). I have also tried the rsync section with scp, and it resulted in a similar connection timed out error:
ssh: connect to host remote_server.com port 22: Connection timed out
lost connection
QUESTIONS YOU MAY HAVE
1) Because the first few files work, the path is clear for this to work (ie there should be no problems with typos, permissions, etc.), and my debug code outputs the actual command that is tried, and this works fine in the command line (even on the files that fail in the script).
2) I have tried to add -vvvvv to both ssh and rsync, but I don't know how to get it to output more error information WITHIN my script. All I ever get is the above errors and when I run it on the command line, I get no errors. (even after I added "2>&1" and ">> log.txt" to the end both commands.) It is certainly possible that I'm not collecting all the logs I should be, so your help there would be appreciated as well.
3) I am just a regular user on both the local and remote machines.
local: rsync version 3.0.6 protocol version 30
remote: rsync version 3.0.9 protocol version 30
path to ssh and rsync is the same on both.
4) In response to the excellent question (from qwrrty) in the comment (thanks!):
It is not terribly consistent. The files are numbered and they were being run in the following order: 4, 5, 3, 2, 1. It WAS failing on 1. Then I removed 3. It still failed on 1. When I put 3 back in, it started to fail on 2.
The files are all small (5mb max), so the transfer is basically instant (as the machines are not far from each other physically or networky).
Please let me know if you need more detail. Thanks in advance for any advice you can offer.
Can you try to minimize extra ssh sessions and subprocesses by doing it all in a single rsync? Something like this?
open (RSYNC, "| /usr/bin/rsync -qogt \
--files-from=- \
--ignore-existing \
--timeout=${server_sleep} \
--partial --partial_dir=${destination_path}partials \
${destination_user}#${destination_hostname}:${destination_path}");
for $f (#big_list_of_files) {
print RSYNC $f, "\n";
}
close RSYNC;
rsync has quite a lot of smarts built into it around how to transfer and synchronize large numbers of files at a time, and in my experience it usually works best to let it do as much of the work as possible.

LDAP - SSH script across multiple VM's

So I'm ssh'ing into a router that has several VM's. It is setup using LDAP so that each VM has the same files, settings, etc. However they have different cores allocated, different libraries and packages installed. Instead of logging into each VM individually and running the command, I want to automate it by putting the script in .bashrc.
So what I have so far:
export LD_LIBRARY_PATH=/lhome/username
# .so files are in ~/ to avoid permission denied problems
output=$(cat /proc/cpuinfo | grep "^cpu cores" | uniq | tail -c 2)
current=server_name
if [[ `hostname-s` != $current ]]; then
ssh $current
fi
/path/to/program --hostname $(echo $(hostname -s)) --threads $((output*2))
Each VM, upon logging in, will execute this script, so I have to check if the current VM has the hostname to avoid an SSH loop. The idea is to run the program, then exit back out to the origin to resume the script. The problem is of course that the process will die upon logging out.
It's been suggested to me to use TMUX on an array of the hostnames, but I would have no idea on how to approach this.
You could install clusterSSH, set up a list of hostnames, and execute things from the terminal windows opened. You may use screen/tmux/nohup to allow processes started to keep running, even after logout.
Yet, if you still want to play around with scripting, you may install tmux, and use:
while read host; do
scp "script_to_run_remotely" ${host}:~/
ssh ${host} tmux new-session -d '~/script_to_run_remotely'\; detach
done < hostlist
Note: hostlist should be a list of hostnames, one per line.

Resources