Delay before the file transfer on Linux or Solaris - linux

I would like to put delay before file transfer on Linux or Solaris.
As far as i know scp commands has no delay option.
scp file_name root#dest_ip:/dest_path
So, how can i put delay (it can change between 250ms and 500ms)?

You can run a command at a specific time.
See eg here
at 1200
scp file_name root#dest_ip:/dest_path
However, there may be better approaches - for example, if you are waiting for a file to be ready.

Related

How to copy multiple files simultaneously using scp

I would like to copy multiple files simultaneously to speed up my process I currently used the follow
scp -r root#xxx.xxx.xx.xx:/var/www/example/example.example.com .
but it only copies one file at a time. I have a 100 Mbps fibre so I have the bandwidth available to really copy a lot at the same time, please help.
You can use background task with wait command.
Wait command ensures that all the background tasks are completed before processing next line. i.e echo will be executed after scp for all three nodes are completed.
#!/bin/bash
scp -i anuruddha.pem myfile1.tar centos#192.168.30.79:/tmp &
scp -i anuruddha.pem myfile2.tar centos#192.168.30.80:/tmp &
scp -i anuruddha.pem myfile.tar centos#192.168.30.81:/tmp &
wait
echo "SCP completed"
SSH is able to do so-called "multiplexing" - more connections over one (to one server). It can be one way to afford what you want. Look up keywords like "ControlMaster"
Second way is using more connections, then send every job at background:
for file in file1 file2 file3 ; do
scp $file server:/tmp/ &
done
But, this is answer to your question - "How to copy multiple files simultaneously". For speed up, you can use weaker encryption (rc4 etc) and also don't forget, that the bottleneck can be your hard drive - because SCP don't implicitly limit transfer speed.
Last thing is using rsync - in some cases, it can be lot faster than scp...
I am not sure if this helps you, but I generally archive (compression is not required. just archiving is sufficient) file at the source, download it, extract them. This will speed up the process significantly.
Before archiving it took > 8 hours to download 1GB
After archiving it took < 8 minutes to do the same
You can use parallel-scp (AKA pscp): http://manpages.ubuntu.com/manpages/natty/man1/parallel-scp.1.html
With this tool, you can copy a file (or files) to multiple hosts simultaneously.
Regards,
100mbit Ethernet is pretty slow, actually. You can expect 8 MiB/s in theory. In practice, you usually get between 4-6 MiB/s at best.
That said, you won't see a speed increase if you run multiple sessions in parallel. You can try it yourself, simply run two parallel SCP sessions copying two large files. My guess is that you won't see a noticeable speedup. The reasons for this are:
The slowest component on the network path between the two computers determines the max. speed.
Other people might be accessing example.com at the same time, reducing the bandwidth that it can give you
100mbit Ethernet requires pretty big gaps between two consecutive network packets. GBit Ethernet is much better in this regard.
Solutions:
Compress the data before sending it over the wire
Use a tool like rsync (which uses SSH under the hood) to copy on the files which have changed since the last time you ran the command.
Creating a lot of small files takes a lot of time. Try to create an archive of all the files on the remote side and send that as a single archive.
The last suggestion can be done like this:
ssh root#xxx "cd /var/www/example ; tar cf - example.example.com" > example.com.tar
or with compression:
ssh root#xxx "cd /var/www/example ; tar czf - example.example.com" > example.com.tar.gz
Note: bzip2 compresses better but slower. That's why I use gzip (z) for tasks like this.
If you specify multiple files scp will download them sequentially:
scp -r root#xxx.xxx.xx.xx:/var/www/example/file1 root#xxx.xxx.xx.xx:/var/www/example/file2 .
Alternatively, if you want the files to be downloaded in parallel, then use multiple invocations of scp, putting each in the background.
#! /usr/bin/env bash
scp root#xxx.xxx.xx.xx:/var/www/example/file1 . &
scp root#xxx.xxx.xx.xx:/var/www/example/file2 . &

When executing "tar" on a directory with over a billion files, the process stayed in D status

I was doing some experiments to learn more about Linux process states.
So, there's a directory(named big_dir) with over a billion files in it(the directory has many sub-directories recursively), and then I run tar -cv big_dir | ssh anotherServer "tar -xv -C big_dir" and found out via executing top that, the tar process stays in D status. Meanwhile, the tar command keeps outputting the paths of the files.
I know that, the process was in D status because it was doing disk I/O, but why didn't its status keep switching between D and R? Printing the file names under the directory must have used some CPU computation, isn't it? Otherwise how could the find command know that it should print something?
If I run dd if=/dev/zero of=/dev/null, then the dd process status kept in R status from the top output. But why wasn't it in D status? Wasn't it doing I/O all the time?
/dev/zero and /dev/null are pseudo-devices. So there's no physical device behind them.
If I do
dd if=/dev/zero of=/tmp/zeroes
then top does show me dd in the D status. However it does spend a lot of it's time in R (in CPU time). top will simply sample the process table and consequently you may need to watch it for some time in order to see transient states.
I suspect for your tar example above that the amount of time outputting to stdout is negligible compared to the disk time. Note also that outputting to stdout will also involve the windowing system writing and whilst it's doing that the process will be sleeping. e.g. I'm running yes right now, and the majority of the work is being performed by my X server. The yes process is sleeping for most of the time I'm watching it (via top)
I'm sure your tar process SOMETIMES goes to R, but it's probably for a very short period of time, because it doesn't do that much - particularly since you are sending the data through a network. Unless that's a 10Gb/s network card [and everything else to "anotherServer" is really working at 1GB/s], this will be the slowest part of the chain. ssh itself will take a little bit of overhead as it encrypts the data.
It probably takes tar a few microseconds to ask for some data from the disk, and a few milliseconds for the disk to move its head and read the actual data. So you have about 0.1% of the time in "R", the rest is in "D".

Time taken by `less` command to show output

I have a script that produces a lot of output. The script pauses for a few seconds at point T.
Now I am using the less command to analyze the output of the script.
So I execute ./script | less. I leave it running for sufficient time so that the script would have finished executing.
Now I go through the output of the less command by pressing Pg Down key. Surprisingly while scrolling at the point T of the output I notice the pause of few seconds again.
The script does not expect any input and would have definitely completed by the time I start analyzing the output of less.
Can someone explain how the pause of few seconds is noticable in the output of less when the script would have finished executing?
Your script is communicating with less via a pipe. Pipe is an in-memory stream of bytes that connects two endpoints: your script and the less program, the former writing output to it, the latter reading from it.
As pipes are in-memory, it would be not pleasant if they grew arbitrarily large. So, by default, there's a limit of data that can be inside the pipe (written, but not yet read) at any given moment. By default it's 64k on Linux. If the pipe is full, and your script tries to write to it, the write blocks. So your script isn't actually working, it stopped at some point when doing a write() call.
How to overcome this? Adjusting defaults is a bad option; what is used instead is allocating a buffer in the reader, so that it reads into the buffer, freeing the pipe and thus letting the writing program work, but shows to you (or handles) only a part of the output. less has such a buffer, and, by default, expands it automatically, However, it doesn't fill it in the background, it only fills it as you read the input.
So what would solve your problem is reading the file until the end (like you would normally press G), and then going back to the beginning (like you would normally press g). The thing is that you may specify these commands via command line like this:
./script | less +Gg
You should note, however, that you will have to wait until the whole script's output loads into memory, so you won't be able to view it at once. less is insufficiently sophisticated for that. But if that's what you really need (browsing the beginning of the output while the ./script is still computing its end), you might want to use a temporary file:
./script >x & less x ; rm x
The pipe is full at the OS level, so script blocks until less consumes some of it.
Flow control. Your script is effectively being paused while less is paging.
If you want to make sure that your command completes before you use less interactively, invoke less as less +G and it will read to the end of the input, you can then return to the start by typing 1G into less.
For some background information there's also a nice article by Alexander Sandler called "How less processes its input"!
http://www.alexonlinux.com/how-less-processes-its-input
Can I externally enforce line buffering on the script?
Is there an off the shelf pseudo tty utility I could use?
You may try to use the script command to turn on line-buffering output mode.
script -q /dev/null ./script | less # FreeBSD, Mac OS X
script -c "./script" /dev/null | less # Linux
For more alternatives in this respect please see: Turn off buffering in pipe.

Linux: uploading unfinished files - with file size check (scp/rsync) [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I typically end up in the following situation: I have, say, a 650 MB MPEG-2 .avi video file from a camera. Then, I use ffmpeg2theora to convert it into Theora .ogv video file, say some 150 MB in size. Finally, I want to upload this .ogv file to an ssh server.
Let's say, the ffmpeg2theora encoding process takes some 15 minutes on my PC. On the other hand, the upload goes on with a speed of about 60 KB/s, which takes some 45 minutes (for the 150MB .ogv). So: if I first encode, and wait for the encoding process to finish - and then upload, it would take approximately
15 min + 45 min = 1 hr
to complete the operation.
So, I thought it would be better if I could somehow start the upload, in parallel with the encoding operation; then, in principle - as the uploading process is slower (in terms of transferred bytes/sec) than the encoding one (in terms of generated bytes/sec) - the uploading process would always "trail behind" the encoding one, and so the whole operation (enc+upl) would complete in just 45 minutes (that is, just the time of the upload process +/- some minutes depending on actual upload speed situation on wire).
My first idea was to pipe the output of ffmpeg2theora to tee (so as to keep a local copy of the .ogv) and then, pipe the output further to ssh - as in:
./ffmpeg2theora-0.27.linux32.bin -v 8 -a 3 -o /dev/stdout MVI.AVI | tee MVI.ogv | ssh user#ssh.server.com "cat > ~/myvids/MVI.ogv"
While this command does, indeed, function - one can easily see in the running log in the terminal from ffmpeg2theora, that in this case, ffmpeg2theora calculates a predicted time of completion to be 1 hour; that is, there seems to be no benefit in terms of smaller completion time for both enc+upl. (While it is possible that this is due to network congestion, and me getting less of a network speed at the time - it seems to me, that ffmpeg2theora has to wait for an acknowledgment for each little chunk of data it sends through the pipe, and that ACK finally has to come from ssh... Otherwise, ffmpeg2theora would not have been able to provide a completion time estimate. Then again, maybe the estimate is wrong, while the operation would indeed complete in 45 mins - dunno, never had patience to wait and time the process; I just get pissed at 1hr as estimate, and hit Ctrl-C ;) ...)
My second attempt was to run the encoding process in one terminal window, i.e.:
./ffmpeg2theora-0.27.linux32.bin -v 8 -a 3 MVI.AVI # MVI.ogv is auto name for output
..., and the uploading process, using scp, in another terminal window (thereby 'forcing' 'parallelization'):
scp MVI.ogv user#ssh.server.com:~/myvids/
The problem here is: let's say, at the time when scp starts, ffmpeg2theora has already encoded 5 MB of the output .ogv file. At this time, scp sees this 5 MB as the entire file size, and starts uploading - and it exits when it encounters the 5 MB mark; while in the meantime, ffmpeg2theora may have produced additional 15 MB, making the .ogv file 20 MB in total size at the time scp has exited (finishing the transfer of the first 5 MB).
Then I learned (joen.dk ยป Tip: scp Resume) that rsync supports 'resume' of partially completed uploads, as in:
rsync --partial --progress myFile remoteMachine:dirToPutIn/
..., so I tried using rsync instead of scp - but it seems to behave exactly the same as scp in terms of file size, that is: it will only transfer up to the file size read at the beginning of the process, and then it will exit.
So, my question to the community is: Is there a way to parallelize the encoding and uploading process, so as to gain the decrease in total processing time?
I'm guessing there could be several ways, as in:
A command line option (that I haven't seen) that forces scp/rsync to continuously check the file size - if the file is open for writing by another process (then I could simply run the upload in another terminal window)
A bash script; say running rsync --partial in a while loop, that runs as long as the .ogv file is open for writing by another process (I don't actually like this solution, since I can hear the harddisk scanning for the resume point, every time I run rsync --partial - which, I guess, cannot be good; if I know that the same file is being written to at the same time)
A different tool (other than scp/rsync) that does support upload of a "currently generated"/"unfinished" file (the assumption being it can handle only growing files; it would exit if it encounters that the local file is suddenly less in size than the bytes already transferred)
... but it could also be, that I'm overlooking something - and 1hr is as good as it gets (in other words, it is maybe logically impossible to achieve 45 min total time - even if trying to parallelize) :)
Well, I look forward to comments that would, hopefully, clarify this for me ;)
Thanks in advance,
Cheers!
May be you can try sshfs (http://fuse.sourceforge.net/sshfs.html).This being a file system should have some optimization though I am not very sure.

Syncing two files when one is still being written to

I have an application (video stream capture) which constantly writes its data to a single file. Application typically runs for several hours, creating ~1 gigabyte file. Soon (in a matter of several seconds) after it quits, I'd like to have 2 copies of file it was writing - let's say, one in /mnt/disk1, another in /mnt/disk2 (the latter is an USB flash drive with FAT32 filesystem).
I don't really like an idea of modifying the application to write 2 copies simulatenously, so I though of:
Application starts and begins to write the file (let's call it /mnt/disk1/file.mkv)
Some utility starts, copies what's already there in /mnt/disk1/file.mkv to /mnt/disk2/file.mkv
After getting initial sync state, it continues to follow a written file in a manner like tail -f does, copying everything it gets from /mnt/disk1/file.mkv to /mnt/disk2/file.mkv
Several hours pass
Application quits, we stop our syncing utility
Afterwards, we run a quick rsync /mnt/disk1/file.mkv /mnt/disk2/file.mkv just to make sure they're the same. In case if they're the same, it should just run a quick check and quit fairly soon.
What is the best approach for syncing 2 files, preferably using simple Linux shell-available utilities? May be I could use some clever trick with FUSE / md device / tee / tail -f?
Solution
The best possible solution for my case seems to be
mencoder ... -o >(
tee /mnt/disk1/file.mkv |
tee /mnt/disk2/file.mkv |
mplayer -
)
This one uses bash/zsh-specific magic named "process substitution" thus eliminating the need to make named pipes manually using mkfifo, and displays what's being encoded, as a bonus :)
Hmmm... the file is not usable while it's being written, so why don't you "trick" your program into writing through a pipe/fifo and use a 2nd, very simple program, to create 2 copies?
This way, you have your two copies as soon as the original process ends.
Read the manual page on tee(1).

Resources