Understand the rsync transfer rate in its output [closed]

Understand the rsync transfer rate in its output [closed] - linux

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 10 years ago.
Improve this question
I transferred a large file (>60GB) using rsync but I got confused when I was calculating the actual transfer rate. The output is
dbdump.sql
69840316437 100% 7.75MB/s 2:23:09 (xfer#1, to-check=0/1)
sent 30 bytes received 17317620159 bytes 2015199.88 bytes/sec
total size is 69840316437 speedup is 4.03
The rate displayed directly from the second line is 7.75MB/s. But the rate I calculated from last line but one is about 2MB/s. However, if you divide the total size with the total time 69840316437/(2x3600+23x60+9)=8131367 byte/sec about 8MB/s.
Which one is the actual mean transfer rate?
Thanks

The 7.75MB/s is just the transfer speed reported for the last block of transfer - the statistics are reported once a second or so. It would appear that you have sparse file handling enabled, as well, because, while the file is 69GB in size, it only transferred 17GB. Either that, or, you had partially transferred the file in the past, and this run just finished it up, or maybe it had been fully transferred in the past and this run only sent the blocks that changed... The reported speed up is <full size> / <transferred size>, which is about 69 / 17 = 4.03 in this case - meaning it managed to fully replicate a 69GB file in the time it took to actually transfer a 17GB file.

Related

Why dd can't handle sparse files in shell scripts? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I have the following sparse file that I want to flash to an SD card:
647M -rw------- 1 root root 4.2G Sep 21 16:53 make_sd_card.sh.xNws4e
As you can see, it takes ~647M on disk for an apparent size of 4.2G.
If I flash it directly with dd, in my shell, it's really fast, ~6s:
$ time (sudo /bin/dd if=make_sd_card.sh.xNws4e of=/dev/mmcblkp0 conv=sparse; sync)
8601600+0 records in
8601600+0 records out
4404019200 bytes (4.4 GB, 4.1 GiB) copied, 6.20815 s, 709 MB/s
real 0m6.284s
user 0m1.920s
sys 0m4.336s
But when I do the very same commands inside a shell script, it behaves like if it was copying all the zeroes and takes a big amount of time (~2m10):
$ time sudo ./plop.sh ./make_sd_card.sh.xNws4e
+ dd if=./make_sd_card.sh.xNws4e of=/dev/mmcblk0 conv=sparse
8601600+0 records in
8601600+0 records out
4404019200 bytes (4.4 GB, 4.1 GiB) copied, 127.984 s, 34.4 MB/s
+ sync
real 2m9.885s
user 0m3.520s
sys 0m15.560s
If I watch the dirty section of /proc/meminfo, I can see that this counter is much higher when dd-ing from a shell script than directly from the shell.
My shell is bash an for the record, the script is:
#!/bin/bash
set -xeu
dd if=$1 of=/dev/mmcblk0 conv=sparse bs=512
sync
[EDIT] I'm resurrecting this topic, because a developer I work with, has found these commands: bmap_create and bmap_copy which seems to do exactly what I was trying with achieve clumsily with dd.
In debian, they are part of the bmap-tools package.
With it, it takes 1m2s to flash a 4.1GB sparse SD image, with a real size of 674MB, when it takes 6m26s with dd or cp.

This difference is caused by a typo in the non-scripted invocation, which did not actually write to your memory card. There is no difference in dd behavior between scripted and interactive invocation.
Keep in mind what a sparse file is: It's a file on a filesystem that's able to store metadata tracking which blocks have values at all, and thus for which zero blocks have never been allocated any storage on disk whatsoever.
This concept -- of a sparse file -- is specific to files. You can't have a sparse block device.
The distinction between your two lines of code is that one of them (the fast one) has a typo (mmcblkp0 instead of mmcblk0), so it's referring to a block device name that doesn't exist. Thus, it creates a file. Files can be sparse. Thus, it creates a sparse file. Creating a sparse file is fast.
The other one, without the typo, writes to the block device. Block devices can't be sparse. Thus, it always takes the full execution time to run.

Class 4 SDHC vs Class 10 SDHC cards [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 5 years ago.
Improve this question
I have been working on some programs that require data to be written/stored onto the SDHC cards, few MBs in size, Sandisk class 4 SDHC * Sandisk class 10 SDHC 16 GB cards in particular.
The results I have observed seems more strange. The write speeds of class 4 cards vs class 10 cards.
Commands used:
I have used dd command to write the data; something like:
dd if=file_10mb.img of=/dev/sdc conv=fsync bs=4096 count=2560
Measured the write speeds by:
iostat /dev/sdc 1 -m -t
Few figures:
Writing a 100MB file:
On class 10 card: 53 secs ->Avg. write speed = 2.03 MB_wrtn/sec
On class 4 card: 31 secs ->Avg. write speed = 2.62 MB_wrtn/sec
Writing a 10MB file:
On class 10 card: 5.7 secs ->Max. & Min. write speed = 1.85 & 1.15
MB_wrtn/sec
On class 4 card: 4 secs ->Max. & Min. write speed = 2.56 & 1.15
MB_wrtn/sec
I expected these results to be exactly opposite as class 10 cards should outperform class 4 cards.
I've tested these on two different cards to remove the probability of wrong readings due to aged cards. Also, the cards are fairly new.
Please let me know about the strange behaviour. Thanks in advance.

A brief research on internet lead me to this page: https://www.raspberrypi.org/forums/viewtopic.php?t=11258&p=123670
which talks about "erase blocks", the size of an "erase" operation; this erase block is generally bigger than a sector size, which is the minimum size for a write operation. On that page some example is shown:
16 GB SanDisk Extreme Pro: erase block size of 4 MB.
8 GB Transcend SDHC 150x: erase block size of 4 MB.
2 GB Transcend SD 150x: erase block size of 8 kB.
Now, your fsync options passed to dd means that after every write, a sync is performed on both the data and metadata, which could involve rewriting part of the FAT, or some other blocks if no FAT is used.
On a classic spinning magnetic disk, that would mean that the head travels a lot, every 4Kb; on a flash memory there is no head, but an erase operation is very costly. Moreover, flash memories have internal algorithms that reduce the wearing, so it becomes very difficult to know what really goes on underneath, inside the memory card.
The conclusion is that, as noted in a comment, 4K block size can be too small, and the fsync option slows down and can be very problematic. Get rid of fsync options, and perform again tests with different block sizes.
In reality, probably every different card has a preferred set of parameters. One way class 10 cards can work faster, can be to choose a big erase block. The time for erasing a block is more or less independent of its size, so a really big erase block effectively improves speed, by erasing more data in the same time. But if blocks are erased too often, speed is reduced instead.
The final answer, from inference, is that your set of parameters seem better suited for a class 4 card than for a class 10 card. In my opinion, your parameters are not well suited for anything, but nobody can be perfectly sure: flash memory cards are intricated. For example, often I record TV transmissions on my TV decoder; there are periods of time in which things go smoothly, and other periods not. 4 months ago the decoder was often complaining about "slow writing speed", with horrible results. Since a couple of months, everything is fine. I touched nothing, the flash USB memory is the same. Probably it entered another phase of its life...

Comprehensive list of rsync error codes [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 9 years ago.
Improve this question
I'm writing a script that does daily snapshots of users' home directories. First I do a dry run using:
rsync -azvrn --out-format="%M %f" source/dir dest/dir
and then the actual rsync operation (by removing the -n option).
I'm trying to parse the output of the dry run. Specifically, I'm interested in learning the exact cause of the rsync error (if one occurred). Does anyone know of
The most common rsync errors and their codes?
A link to a comprehensive rsync error code page?
Most importantly, rsync (at least on CentOs 5) does not return an error code. Rather it displays the errors internally and returns with 0. Like thus:
sending incremental file list
rsync: link_stat "/data/users/gary/testdi" failed: No such file or directory (2)
sent 18 bytes received 12 bytes 60.00 bytes/sec
total size is 0 speedup is 0.00 (DRY RUN)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1039) [sender=3.0.6]
Has anyone had to parse rsync errors and have a suggestion on how to store the rsync return state(s)? I believe, when transferring multiple files, that errors may be raised on a per file basis and are collected at the end as shown on the last line of code above.

Per the rsync "man" page, here are the error codes it could return and what they mean. If you're scripting it in bash, you could look at $?
0 Success
1 Syntax or usage error
2 Protocol incompatibility
3 Errors selecting input/output files, dirs
4 Requested action not supported: an attempt was made to manipulate 64-bit
files on a platform that cannot support them; or an option was specified
that is supported by the client and not by the server.
5 Error starting client-server protocol
6 Daemon unable to append to log-file
10 Error in socket I/O
11 Error in file I/O
12 Error in rsync protocol data stream
13 Errors with program diagnostics
14 Error in IPC code
20 Received SIGUSR1 or SIGINT
21 Some error returned by waitpid()
22 Error allocating core memory buffers
23 Partial transfer due to error
24 Partial transfer due to vanished source files
25 The --max-delete limit stopped deletions
30 Timeout in data send/receive
35 Timeout waiting for daemon connection
I've never seen a comprehensive "most common errors" list but I'm betting error code 1 would be at the top.

In SSHD Configuration what does "MaxStartups 10:30:60" mean? [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
Problem Is: Some SFTP connections are failing in customer environment
But using sample code if i test with same server no connection is failed.
may be in customer envi many parallel sftp connection started at a time.
I Want to know what is the meaning of MaxStartups 10:30:60
In the above i know only 10 , which means maximum unauthenticated ssh connection allowed.
means at same time 12 sssh connection request comes 2 request fail and 10 success.
What is the mean of 30 and 60 ?

I Want to know what is the meaning of MaxStartups 10:30:60?
10: Number of unauthenticated connections before we start dropping
30: Percentage chance of dropping once we reach 10 (increases linearly for more than 10)
60: Maximum number of connections at which we start dropping everything

Linux: uploading unfinished files - with file size check (scp/rsync) [closed]

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I typically end up in the following situation: I have, say, a 650 MB MPEG-2 .avi video file from a camera. Then, I use ffmpeg2theora to convert it into Theora .ogv video file, say some 150 MB in size. Finally, I want to upload this .ogv file to an ssh server.
Let's say, the ffmpeg2theora encoding process takes some 15 minutes on my PC. On the other hand, the upload goes on with a speed of about 60 KB/s, which takes some 45 minutes (for the 150MB .ogv). So: if I first encode, and wait for the encoding process to finish - and then upload, it would take approximately
15 min + 45 min = 1 hr
to complete the operation.
So, I thought it would be better if I could somehow start the upload, in parallel with the encoding operation; then, in principle - as the uploading process is slower (in terms of transferred bytes/sec) than the encoding one (in terms of generated bytes/sec) - the uploading process would always "trail behind" the encoding one, and so the whole operation (enc+upl) would complete in just 45 minutes (that is, just the time of the upload process +/- some minutes depending on actual upload speed situation on wire).
My first idea was to pipe the output of ffmpeg2theora to tee (so as to keep a local copy of the .ogv) and then, pipe the output further to ssh - as in:
./ffmpeg2theora-0.27.linux32.bin -v 8 -a 3 -o /dev/stdout MVI.AVI | tee MVI.ogv | ssh user#ssh.server.com "cat > ~/myvids/MVI.ogv"
While this command does, indeed, function - one can easily see in the running log in the terminal from ffmpeg2theora, that in this case, ffmpeg2theora calculates a predicted time of completion to be 1 hour; that is, there seems to be no benefit in terms of smaller completion time for both enc+upl. (While it is possible that this is due to network congestion, and me getting less of a network speed at the time - it seems to me, that ffmpeg2theora has to wait for an acknowledgment for each little chunk of data it sends through the pipe, and that ACK finally has to come from ssh... Otherwise, ffmpeg2theora would not have been able to provide a completion time estimate. Then again, maybe the estimate is wrong, while the operation would indeed complete in 45 mins - dunno, never had patience to wait and time the process; I just get pissed at 1hr as estimate, and hit Ctrl-C ;) ...)
My second attempt was to run the encoding process in one terminal window, i.e.:
./ffmpeg2theora-0.27.linux32.bin -v 8 -a 3 MVI.AVI # MVI.ogv is auto name for output
..., and the uploading process, using scp, in another terminal window (thereby 'forcing' 'parallelization'):
scp MVI.ogv user#ssh.server.com:~/myvids/
The problem here is: let's say, at the time when scp starts, ffmpeg2theora has already encoded 5 MB of the output .ogv file. At this time, scp sees this 5 MB as the entire file size, and starts uploading - and it exits when it encounters the 5 MB mark; while in the meantime, ffmpeg2theora may have produced additional 15 MB, making the .ogv file 20 MB in total size at the time scp has exited (finishing the transfer of the first 5 MB).
Then I learned (joen.dk » Tip: scp Resume) that rsync supports 'resume' of partially completed uploads, as in:
rsync --partial --progress myFile remoteMachine:dirToPutIn/
..., so I tried using rsync instead of scp - but it seems to behave exactly the same as scp in terms of file size, that is: it will only transfer up to the file size read at the beginning of the process, and then it will exit.
So, my question to the community is: Is there a way to parallelize the encoding and uploading process, so as to gain the decrease in total processing time?
I'm guessing there could be several ways, as in:
A command line option (that I haven't seen) that forces scp/rsync to continuously check the file size - if the file is open for writing by another process (then I could simply run the upload in another terminal window)
A bash script; say running rsync --partial in a while loop, that runs as long as the .ogv file is open for writing by another process (I don't actually like this solution, since I can hear the harddisk scanning for the resume point, every time I run rsync --partial - which, I guess, cannot be good; if I know that the same file is being written to at the same time)
A different tool (other than scp/rsync) that does support upload of a "currently generated"/"unfinished" file (the assumption being it can handle only growing files; it would exit if it encounters that the local file is suddenly less in size than the bytes already transferred)
... but it could also be, that I'm overlooking something - and 1hr is as good as it gets (in other words, it is maybe logically impossible to achieve 45 min total time - even if trying to parallelize) :)
Well, I look forward to comments that would, hopefully, clarify this for me ;)
Thanks in advance,
Cheers!

May be you can try sshfs (http://fuse.sourceforge.net/sshfs.html).This being a file system should have some optimization though I am not very sure.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Understand the rsync transfer rate in its output [closed] - linux

Related

Why dd can't handle sparse files in shell scripts? [closed]

Class 4 SDHC vs Class 10 SDHC cards [closed]

Comprehensive list of rsync error codes [closed]

In SSHD Configuration what does "MaxStartups 10:30:60" mean? [closed]

Linux: uploading unfinished files - with file size check (scp/rsync) [closed]

Categories

Resources