How to grep download speed from wget output? - linux

I need to download several files with wget and measure download speed.
e.g. I download with
wget -O /dev/null http://ftp.bit.nl/pub/OpenBSD/4.7/i386/floppy47.fs http://ftp.bit.nl/pub/OpenBSD/4.7/i386/floppyB47.fs
and the output is
--2010-10-11 18:56:00-- http://ftp.bit.nl/pub/OpenBSD/4.7/i386/floppy47.fs
Resolving ftp.bit.nl... 213.136.12.213, 2001:7b8:3:37:20e:cff:fe4d:69ac
Connecting to ftp.bit.nl|213.136.12.213|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1474560 (1.4M) [text/plain]
Saving to: `/dev/null'
100%[==============================================================>] 1,474,560 481K/s in 3.0s
2010-10-11 18:56:03 (481 KB/s) - `/dev/null' saved [1474560/1474560]
--2010-10-11 18:56:03-- http://ftp.bit.nl/pub/OpenBSD/4.7/i386/floppyB47.fs
Reusing existing connection to ftp.bit.nl:80.
HTTP request sent, awaiting response... 200 OK
Length: 1474560 (1.4M) [text/plain]
Saving to: `/dev/null'
100%[==============================================================>] 1,474,560 499K/s in 2.9s
2010-10-11 18:56:06 (499 KB/s) - `/dev/null' saved [1474560/1474560]
FINISHED --2010-10-11 18:56:06--
Downloaded: 2 files, 2.8M in 5.9s (490 KB/s)
I need to grep the total download speed, that is, the string 490 KB/s.
How do I do this?
P.S. May need to account for the case that we will actually download only one file, so there won't be final output starting with FINISHED

Update, a grep-style version using sed:
wget ... 2>&1 | sed -n '$,$s/.*(\(.*\)).*/\1/p'
Old version:
I thought, it's easier to divide the file size by the download time after the download. ;-)
(/usr/bin/time -p wget ... 2>&1 >/dev/null; ls -l newfile) | \
awk '
NR==1 {t=$2};
NR==4 {printf("rate=%f bytes/second\n", $5/t)}
'
The first awk line stores the elapsed real time of "real xx.xx" in variabe t. The second awk line divides the file size (column 5 of ls -l) by the time and outputs this as the rate.

This worked for me, using your wget -O /dev/null <resource>
The regex I used was \([0-9.]\+ [KM]B/s\)
But note I had to redirect stderr onto stdout so the command was:
wget -O /dev/null http://example.com/index.html 2>&1 | grep '\([0-9.]\+ [KM]B/s\)'
This allows things like 923 KB/s and 1.4 MB/s
grep just finds matches. To get the value(s) you can use sed instead:
wget -O /dev/null http://example.com/index.html 2>&1 |
sed -e 's|^.*(\([0-9.]\+ [KM]B/s\)).*$|\1|'

This works when only 1 file is being downloaded.
I started using sed to get the speed from wget, but I found it irritating so I switched to grep.
This is my command:
wget ... 2>&1 | grep -o "[0-9.]\+ [KM]*B/s"
The -o option means it only returns that part. It matches 1 or more of the 10 digits then a space. Then optionally K or M before the B/s
That will return 423 KB/s (for example).
To grep for just the units, use grep -o "[KM]*B/s" and for just the number use grep -o "[0123456789]\+.

For example, get speed in MBit per second (by adding --report-speed=bits for wget, and small change grep pattern):
wget -O /dev/null --report-speed=bits http://www.ovh.net/files/10Mb.dat 2>&1 | grep -o "[0-9.,]\+ [KM]*[Bb]/s"
answer:
1,51 Mb/s

Why can't you just do this:
perl -ne "/^Downloaded.*?\((.*?)\)/; print $1"

here's suggestion. You can make use of wget's --limit-rate=amount option. For example,
--limit-rate=400k will limit the retrieval rate to 400KB/s. Then its easier for you to
calculate the total speed. Saves you time and mental anguish trying to regex it.

Related

How do I print only second row into output by script in bash

I wrote a simple script:
snmpwalk -v2c -c Aruba.58601 192.168.4.9 1.3.6.1.4.1.14823.2.3.3.1.2.1.1.11 | wc -l
And my output is:
Bad operator (INTEGER): At line 73 in /usr/share/snmp/mibs/ietf/SNMPv2-PDU
47
Problem is:
I need to receive only that second line (only that number) to my output, how can I do it?
The best would be to fix/replace the broken MIB file. As a quick hack you can discard SNMP tool errors like so:
snmpwalk -v2c -c community 192.168.4.9 1.3.6.1.4.1.14823.2.3.3.1.2.1.1.11 2>/dev/null | wc -l
Also take care not to post potentially sensitive SNMP community values.

How to redirect xz's normal stdout when do tar | xz?

I need to use a compressor like xz to compress huge tar archives.
I am fully aware of previous questions like
Create a tar.xz in one command
and
Utilizing multi core for tar+gzip/bzip compression/decompression
From them, I have found that this command line mostly works:
tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz
I use the pipe solution because I absolutely must be able to pass options to xz. In particular, xz is very CPU intensive, so I must use -T0 to use all available cores. This is why I am not using other possibilities, like tar's --use-compress-program, or -J options.
Unfortunately, I really want to capture all of tar and xz's log output (i.e. non-archive output) into a log file. In the example above, log outout is always generated by those -v options.
With the command line above, that log output is now printed on my terminal.
So, the problem is that when you use pipes to connect tar and xz as above, you cannot end the command line with something like
>Log_File 2>&1
because of that earlier
> OUTPUT_FILE.tar.xz
Is there a solution?
I tried wrapping in a subshell like this
(tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz) >Log_File 2>&1
but that did not work.
The normal stdout of tar is the tarball, and the normal stdout of xz is the compressed file. None of these things are logs that you should want to capture. All logging other than the output files themselves are written exclusively to stderr for both processes.
Consequently, you need only redirect stderr, and must not redirect stdout unless you want your output file mixed up with your logging.
{ tar -cvf - paths_to_archive | xz -1 -T0 -v > OUTPUT_FILE.tar.xz; } 2>Log_File
By the way -- if you're curious about why xz -v prints more content when its output goes to the TTY, the answer is in this line of message.c: The progress_automatic flag (telling xz to set a timer to trigger a SIGALRM -- which it treats as an indication that status should be printed -- every second) is only set when isatty(STDERR_FILENO) is true. Thus, after stderr has been redirected to a file, xz no longer prints this output at all; the problem is not that it isn't correctly redirected, but that it no longer exists.
You can, however, send SIGALRM to xz every second from your own code, if you're really so inclined:
{
xz -1 -T0 -v > OUTPUT_FILE.tar.xz < <(tar -cvf - paths_to_archive) & xz_pid=$!
while sleep 1; do
kill -ALRM "$xz_pid" || break
done
wait "$xz_pid"
} 2>Log_File
(Code that avoids rounding up the time needed for xz to execute to the nearest second is possible, but left as an exercise to the reader).
First -cvf - can be replaced by cv.
But the normal stdout-output of tar cvf - is the tar file which is piped into xz. Not sure if I completely understand, maybe this:
tar cv paths | xz -1 -T0 > OUTPUT.tar.xz 2> LOG.stderr
or
tar cv paths 2> LOG.stderr | xz -1 -T0 > OUTPUT.tar.xz
or
tar cv paths 2> LOG.tar.stderr | xz -1 -T0 > OUTPUT.tar.xz 2> LOG.xz.stderr
Not sure if -T0 is implemented yet, which version of xz do you use? (Maybe https://github.com/vasi/pixz is worth a closer look) The pv program, installed with sudo apt-get install pv on some systems, is better at showing progress for pipes than xz -v. It will tell you the progress as a percentage with an ETA:
size=$(du -bc path1 path2 | tail -1 | awk '{print$1}')
tar c paths 2> LOG.stderr | pv -s$size | xz -1 -T0 > OUTPUT.tar.xz

Grep files in between wget recursive downloads

I am trying to recursively download several files using wget -m, and I intend to grep all of the downloaded files to find specific text. Currently, I can wait for wget to fully complete, and then run grep. However, the wget process is time consuming as there are many files and instead I would like to show progress by grep-ing each file as it downloads and printing to stdout, all before the next file downloads.
Example:
download file1
grep file1 >> output.txt
download file2
grep file2 >> output.txt
...
Thanks for any advice on how this could be achieved.
As c4f4t0r pointed out
wget -m -O - <wesbites>|grep --color 'pattern'
using grep's color function to highlight the patterns may seem helpful especially when dealing with bulky data output to terminal.
EDIT:
Below is a command line you can use. it creates a file called file and save the output messages from wget.Afterwards it tails the message file.
Using awk to find any lines with "saved" and extract filename, then use grep to pattern from filename.
wget -m websites &> file & tail -f -n1 file|awk -F "\'|\`" '/saved/{system( ("grep --colour pattern ") $2)}'
Based on Xorg's solution I was able to achieve my desired effect with some minor adjustments:
wget -m -O file.txt http://google.com 2> /dev/null & sleep 1 && tail -f -n1 file.txt | grep pattern
This will print out all lines that contain pattern to stdout, and wget itself will produce no output visible from the terminal. The sleep is included because otherwise file.txt would not be created by the time the tail command executed.
As a note, this command will miss any results that wget downloads within the first second.

Ambiguous Redirection on shell script

I was trying to create a little shell script that allowed me to check the transfer progress when copying large files from my laptop's hdd to an external drive.
From the command line this is a simple feat using pv and simple redirection, although the line is rather long and you must know the file size (which is why I wanted the script):
console: du filename (to get the exact file size)
console: cat filename | pv -s FILE_SIZE -e -r -p > dest_path/filename
On my shell script I added egrep "[0-9]{1,}" -o to strip the filename and keep just the size numbers from the return value of du, and the rest should be straightforward.
#!/bin/bash
du $1 | egrep "[0-9]{1,}" -o
sudo cat $1 | pv -s $? -e -r -p > $2/$1
The problem is when I try to copy file12345.mp3 using this I get an ambiguous redirection error because egrep is getting the 12345 from the filename, but I just want the size.
Which means the return value from the first line is actually:
FILE_SIZE
12345
which bugs it.
How should I modify this script to parse just the first numbers until the first " " (space)?
Thanks in advance.
If I understand you correctly:
To retain only the filesize from the du command output:
du $1 | awk '{print $1}'
(assuming the 1st field is the size of the file)
Add double quotes to your redirection to avoid the error:
sudo cat $1 | pv -s $? -e -r -p > "$2/$1"
This quoting is done since your $2 contains spaces.

CURL Progress Bar: How to pipe and extract numbers only using grep?

This is what I have so far:
[my1#graf home]$ curl -# -o f1.flv 'http://osr.com/f1.flv' | grep -o '*[0-9]*'
####################################################################### 100.0%
I wish to use grep and only extract the percentage from that progress bar that CURL outputs.
I think my regex is not correct and I am also not sure if this grep will take effect of the the percentage being continuously updated?
What I am trying to do is basically get CURL only to give me the percentage number as the output and nothing else.
Thank you for any help.
With curl 7.36.0 (should also work for other versions) you can extract the percentage in the following way:
curl ... 2>&1 -# | stdbuf -oL tr '\r' '\n' | grep -o '[0-9]*\.[0-9]'
Here ... stands for options/filenames. This outputs a sequence of percentage numbers.
Curl uses carriage returns \r in its output, so you need tr to transform them first into \n because grep is line oriented. You also need to modify output buffer settings with stdbuf to get the percentage numbers immediately after curl outputs them.
You can't get the progress info like that through grep; it doesn't make sense.
curl writes the progress bar to stderr, so you have to redirect to stdout before you can grep it:
$ curl -# -o f1.flv 'http://osr.com/f1.flv' 2>&1 | grep 1 | less results in:
^M 0.0
%^M######################################################################## 100.
0%^M######################################################################## 100
.0%^M######################################################################## 10
0.0%
Are you expecting a continual stream of numbers that you are redirecting somewhere else? Or do you expect to grab the numbers at a single point?
If it's the former, this sort of half-assedly works on a small file:
$ curl -# -o f1.flv 'http://osr.com/f1.flv' 2>&1 | sed 's/#//g' -
100.0% 0.0%
But it's useless on a large file. The output doesn't print until the download is finished, probably because curl seems to be sending ^H's to the terminal. There might be a better way to sed it, but I wouldn't hold my breath.
$ curl -# -o l.tbz 'ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009/06/2009-06-02-05-mozilla-1.9.1/firefox-3.5pre.en-US.linux-x86_64.tar.bz2' 2>&1 | sed 's/#//g' -
100.0%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Try this:
curl source -o dest -# 2> tmp&
grep -o ".....%" tmp | tail -n1
You need to use .* not * in your regex.
grep -o '.*[0-9].*'
That will catch all text though, so maybe try:
grep -p '[0-9]+'

Resources