CURL Progress Bar: How to pipe and extract numbers only using grep? - linux

This is what I have so far:
[my1#graf home]$ curl -# -o f1.flv 'http://osr.com/f1.flv' | grep -o '*[0-9]*'
####################################################################### 100.0%
I wish to use grep and only extract the percentage from that progress bar that CURL outputs.
I think my regex is not correct and I am also not sure if this grep will take effect of the the percentage being continuously updated?
What I am trying to do is basically get CURL only to give me the percentage number as the output and nothing else.
Thank you for any help.

With curl 7.36.0 (should also work for other versions) you can extract the percentage in the following way:
curl ... 2>&1 -# | stdbuf -oL tr '\r' '\n' | grep -o '[0-9]*\.[0-9]'
Here ... stands for options/filenames. This outputs a sequence of percentage numbers.
Curl uses carriage returns \r in its output, so you need tr to transform them first into \n because grep is line oriented. You also need to modify output buffer settings with stdbuf to get the percentage numbers immediately after curl outputs them.

You can't get the progress info like that through grep; it doesn't make sense.
curl writes the progress bar to stderr, so you have to redirect to stdout before you can grep it:
$ curl -# -o f1.flv 'http://osr.com/f1.flv' 2>&1 | grep 1 | less results in:
^M 0.0
%^M######################################################################## 100.
0%^M######################################################################## 100
.0%^M######################################################################## 10
0.0%
Are you expecting a continual stream of numbers that you are redirecting somewhere else? Or do you expect to grab the numbers at a single point?
If it's the former, this sort of half-assedly works on a small file:
$ curl -# -o f1.flv 'http://osr.com/f1.flv' 2>&1 | sed 's/#//g' -
100.0% 0.0%
But it's useless on a large file. The output doesn't print until the download is finished, probably because curl seems to be sending ^H's to the terminal. There might be a better way to sed it, but I wouldn't hold my breath.
$ curl -# -o l.tbz 'ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009/06/2009-06-02-05-mozilla-1.9.1/firefox-3.5pre.en-US.linux-x86_64.tar.bz2' 2>&1 | sed 's/#//g' -
100.0%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Try this:
curl source -o dest -# 2> tmp&
grep -o ".....%" tmp | tail -n1

You need to use .* not * in your regex.
grep -o '.*[0-9].*'
That will catch all text though, so maybe try:
grep -p '[0-9]+'

Related

Problems with tail -f and awk? [duplicate]

Is that possible to use grep on a continuous stream?
What I mean is sort of a tail -f <file> command, but with grep on the output in order to keep only the lines that interest me.
I've tried tail -f <file> | grep pattern but it seems that grep can only be executed once tail finishes, that is to say never.
Turn on grep's line buffering mode when using BSD grep (FreeBSD, Mac OS X etc.)
tail -f file | grep --line-buffered my_pattern
It looks like a while ago --line-buffered didn't matter for GNU grep (used on pretty much any Linux) as it flushed by default (YMMV for other Unix-likes such as SmartOS, AIX or QNX). However, as of November 2020, --line-buffered is needed (at least with GNU grep 3.5 in openSUSE, but it seems generally needed based on comments below).
I use the tail -f <file> | grep <pattern> all the time.
It will wait till grep flushes, not till it finishes (I'm using Ubuntu).
I think that your problem is that grep uses some output buffering. Try
tail -f file | stdbuf -o0 grep my_pattern
it will set output buffering mode of grep to unbuffered.
If you want to find matches in the entire file (not just the tail), and you want it to sit and wait for any new matches, this works nicely:
tail -c +0 -f <file> | grep --line-buffered <pattern>
The -c +0 flag says that the output should start 0 bytes (-c) from the beginning (+) of the file.
In most cases, you can tail -f /var/log/some.log |grep foo and it will work just fine.
If you need to use multiple greps on a running log file and you find that you get no output, you may need to stick the --line-buffered switch into your middle grep(s), like so:
tail -f /var/log/some.log | grep --line-buffered foo | grep bar
you may consider this answer as enhancement .. usually I am using
tail -F <fileName> | grep --line-buffered <pattern> -A 3 -B 5
-F is better in case of file rotate (-f will not work properly if file rotated)
-A and -B is useful to get lines just before and after the pattern occurrence .. these blocks will appeared between dashed line separators
But For me I prefer doing the following
tail -F <file> | less
this is very useful if you want to search inside streamed logs. I mean go back and forward and look deeply
Didn't see anyone offer my usual go-to for this:
less +F <file>
ctrl + c
/<search term>
<enter>
shift + f
I prefer this, because you can use ctrl + c to stop and navigate through the file whenever, and then just hit shift + f to return to the live, streaming search.
sed would be a better choice (stream editor)
tail -n0 -f <file> | sed -n '/search string/p'
and then if you wanted the tail command to exit once you found a particular string:
tail --pid=$(($BASHPID+1)) -n0 -f <file> | sed -n '/search string/{p; q}'
Obviously a bashism: $BASHPID will be the process id of the tail command. The sed command is next after tail in the pipe, so the sed process id will be $BASHPID+1.
Yes, this will actually work just fine. Grep and most Unix commands operate on streams one line at a time. Each line that comes out of tail will be analyzed and passed on if it matches.
This one command workes for me (Suse):
mail-srv:/var/log # tail -f /var/log/mail.info |grep --line-buffered LOGIN >> logins_to_mail
collecting logins to mail service
Coming some late on this question, considering this kind of work as an important part of monitoring job, here is my (not so short) answer...
Following logs using bash
1. Command tail
This command is a little more porewfull than read on already published answer
Difference between follow option tail -f and tail -F, from manpage:
-f, --follow[={name|descriptor}]
output appended data as the file grows;
...
-F same as --follow=name --retry
...
--retry
keep trying to open a file if it is inaccessible
This mean: by using -F instead of -f, tail will re-open file(s) when removed (on log rotation, for sample).
This is usefull for watching logfile over many days.
Ability of following more than one file simultaneously
I've already used:
tail -F /var/www/clients/client*/web*/log/{error,access}.log /var/log/{mail,auth}.log \
/var/log/apache2/{,ssl_,other_vhosts_}access.log \
/var/log/pure-ftpd/transfer.log
For following events through hundreds of files... (consider rest of this answer to understand how to make it readable... ;)
Using switches -n (Don't use -c for line buffering!).By default tail will show 10 last lines. This can be tunned:
tail -n 0 -F file
Will follow file, but only new lines will be printed
tail -n +0 -F file
Will print whole file before following his progression.
2. Buffer issues when piping:
If you plan to filter ouptuts, consider buffering! See -u option for sed, --line-buffered for grep, or stdbuf command:
tail -F /some/files | sed -une '/Regular Expression/p'
Is (a lot more efficient than using grep) a lot more reactive than if you does'nt use -u switch in sed command.
tail -F /some/files |
sed -une '/Regular Expression/p' |
stdbuf -i0 -o0 tee /some/resultfile
3. Recent journaling system
On recent system, instead of tail -f /var/log/syslog you have to run journalctl -xf, in near same way...
journalctl -axf | sed -une '/Regular Expression/p'
But read man page, this tool was built for log analyses!
4. Integrating this in a bash script
Colored output of two files (or more)
Here is a sample of script watching for many files, coloring ouptut differently for 1st file than others:
#!/bin/bash
tail -F "$#" |
sed -une "
/^==> /{h;};
//!{
G;
s/^\\(.*\\)\\n==>.*${1//\//\\\/}.*<==/\\o33[47m\\1\\o33[0m/;
s/^\\(.*\\)\\n==> .* <==/\\o33[47;31m\\1\\o33[0m/;
p;}"
They work fine on my host, running:
sudo ./myColoredTail /var/log/{kern.,sys}log
Interactive script
You may be watching logs for reacting on events?
Here is a little script playing some sound when some USB device appear or disappear, but same script could send mail, or any other interaction, like powering on coffe machine...
#!/bin/bash
exec {tailF}< <(tail -F /var/log/kern.log)
tailPid=$!
while :;do
read -rsn 1 -t .3 keyboard
[ "${keyboard,}" = "q" ] && break
if read -ru $tailF -t 0 _ ;then
read -ru $tailF line
case $line in
*New\ USB\ device\ found* ) play /some/sound.ogg ;;
*USB\ disconnect* ) play /some/othersound.ogg ;;
esac
printf "\r%s\e[K" "$line"
fi
done
echo
exec {tailF}<&-
kill $tailPid
You could quit by pressing Q key.
you certainly won't succeed with
tail -f /var/log/foo.log |grep --line-buffered string2search
when you use "colortail" as an alias for tail, eg. in bash
alias tail='colortail -n 30'
you can check by
type alias
if this outputs something like
tail isan alias of colortail -n 30.
then you have your culprit :)
Solution:
remove the alias with
unalias tail
ensure that you're using the 'real' tail binary by this command
type tail
which should output something like:
tail is /usr/bin/tail
and then you can run your command
tail -f foo.log |grep --line-buffered something
Good luck.
Use awk(another great bash utility) instead of grep where you dont have the line buffered option! It will continuously stream your data from tail.
this is how you use grep
tail -f <file> | grep pattern
This is how you would use awk
tail -f <file> | awk '/pattern/{print $0}'

Grep files in between wget recursive downloads

I am trying to recursively download several files using wget -m, and I intend to grep all of the downloaded files to find specific text. Currently, I can wait for wget to fully complete, and then run grep. However, the wget process is time consuming as there are many files and instead I would like to show progress by grep-ing each file as it downloads and printing to stdout, all before the next file downloads.
Example:
download file1
grep file1 >> output.txt
download file2
grep file2 >> output.txt
...
Thanks for any advice on how this could be achieved.
As c4f4t0r pointed out
wget -m -O - <wesbites>|grep --color 'pattern'
using grep's color function to highlight the patterns may seem helpful especially when dealing with bulky data output to terminal.
EDIT:
Below is a command line you can use. it creates a file called file and save the output messages from wget.Afterwards it tails the message file.
Using awk to find any lines with "saved" and extract filename, then use grep to pattern from filename.
wget -m websites &> file & tail -f -n1 file|awk -F "\'|\`" '/saved/{system( ("grep --colour pattern ") $2)}'
Based on Xorg's solution I was able to achieve my desired effect with some minor adjustments:
wget -m -O file.txt http://google.com 2> /dev/null & sleep 1 && tail -f -n1 file.txt | grep pattern
This will print out all lines that contain pattern to stdout, and wget itself will produce no output visible from the terminal. The sleep is included because otherwise file.txt would not be created by the time the tail command executed.
As a note, this command will miss any results that wget downloads within the first second.

How do I grep multiple lines (output from another command) at the same time?

I have a Linux driver running in the background that is able to return the current system data/stats. I view the data by running a console utility (let's call it dump-data) in a console. All data is dumped every time I run dump-data. The output of the utility is like below
Output:
- A=reading1
- B=reading2
- C=reading3
- D=reading4
- E=reading5
...
- variableX=readingX
...
The list of readings returned by the utility can be really long. Depending on the scenario, certain readings would be useful while everything else would be useless.
I need a way to grep only the useful readings whose names might have have nothing in common (via a bash script). I.e. Sometimes I'll need to collect A,D,E; and other times I'll need C,D,E.
I'm attempting to graph the readings over time to look for trends, so I can't run something like this:
# forgive my pseudocode
Loop
dump-data | grep A
dump-data | grep D
dump-data | grep E
End Loop
to collect A,D,E as that would actually give me readings from 3 separate calls of dump-data as that would not be accurate.
If you want to save all result of grep in the same file, you can just join all expressions in one:
grep -E 'expr1|expr2|expr3'
But if you want to have results (for expr1, expr2 and expr3) in separate files, things are getting more interesting.
You can do this using tee >(command).
For example, here I process the same pipe with thre different commands:
$ echo abc | tee >(sed s/a/_a_/ > file1) | tee >(sed s/b/_b_/ > file2) | sed s/c/_c_/ > file3
$ grep "" file[123]
file1:_a_bc
file2:a_b_c
file3:ab_c_
But the command seems to be too complex.
I would better save dump-data results to a file and then grep it.
TEMP=$(mktemp /tmp/dump-data-XXXXXXXX)
dump-data > ${TEMP}
grep A ${TEMP}
grep B ${TEMP}
grep C ${TEMP}
You can use dump-data | grep -E "A|D|E". Note the -E option of grep. Alternatively you could use egrep without the -E option.
you can simply use:
dump-data | grep -E 'A|D|E'
awk '/MY PATTERN/{print > "matches-"FILENAME;}' myfile{1,3}
thx Guru at Stack Exchange

Linux, how using tee in piped command

time curl http://www.google.com | tee | wc | gzip > google.gz
Why doesn't this command work? It creates the file, and times the operation, but does not print the number of lines, words, and characters (wc).
time curl http://www.google.com | tee | wc
This will print print the words characters and lines, but obviously, the tee portion is pointless.
Is it because I'm sending the word count of the url to google.gz?
I have to use tee, gzip, time, curl to download google web page to a gziped file, print the word count, how long it took.
It is an assignment, so I'm not looking for someone to do it for me. I just am having a problem in that I can't tee to utility, and I cant to and gzip at the same time.
Maybe there is a way to use gzip with curl?
Well, wc outputs the number of characters, words and lines, but then you send it to gzip which compresses it. Eventually, compressed information ends up in google.gz. If you decompress the file, e.g. with
gunzip google.gz
you'll see the three numbers.
Also, normally when one uses tee, they specify a file where the tee'ed data is supposed to be stored.
time curl http://www.google.com | tee /dev/tty | gzip > google.gz
I'm going to guess that something like this is what you want:
time curl http://www.google.com | tee /tmp/z | gzip > google.gz; wc /tmp/z; rm /tmp/z

How to grep download speed from wget output?

I need to download several files with wget and measure download speed.
e.g. I download with
wget -O /dev/null http://ftp.bit.nl/pub/OpenBSD/4.7/i386/floppy47.fs http://ftp.bit.nl/pub/OpenBSD/4.7/i386/floppyB47.fs
and the output is
--2010-10-11 18:56:00-- http://ftp.bit.nl/pub/OpenBSD/4.7/i386/floppy47.fs
Resolving ftp.bit.nl... 213.136.12.213, 2001:7b8:3:37:20e:cff:fe4d:69ac
Connecting to ftp.bit.nl|213.136.12.213|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1474560 (1.4M) [text/plain]
Saving to: `/dev/null'
100%[==============================================================>] 1,474,560 481K/s in 3.0s
2010-10-11 18:56:03 (481 KB/s) - `/dev/null' saved [1474560/1474560]
--2010-10-11 18:56:03-- http://ftp.bit.nl/pub/OpenBSD/4.7/i386/floppyB47.fs
Reusing existing connection to ftp.bit.nl:80.
HTTP request sent, awaiting response... 200 OK
Length: 1474560 (1.4M) [text/plain]
Saving to: `/dev/null'
100%[==============================================================>] 1,474,560 499K/s in 2.9s
2010-10-11 18:56:06 (499 KB/s) - `/dev/null' saved [1474560/1474560]
FINISHED --2010-10-11 18:56:06--
Downloaded: 2 files, 2.8M in 5.9s (490 KB/s)
I need to grep the total download speed, that is, the string 490 KB/s.
How do I do this?
P.S. May need to account for the case that we will actually download only one file, so there won't be final output starting with FINISHED
Update, a grep-style version using sed:
wget ... 2>&1 | sed -n '$,$s/.*(\(.*\)).*/\1/p'
Old version:
I thought, it's easier to divide the file size by the download time after the download. ;-)
(/usr/bin/time -p wget ... 2>&1 >/dev/null; ls -l newfile) | \
awk '
NR==1 {t=$2};
NR==4 {printf("rate=%f bytes/second\n", $5/t)}
'
The first awk line stores the elapsed real time of "real xx.xx" in variabe t. The second awk line divides the file size (column 5 of ls -l) by the time and outputs this as the rate.
This worked for me, using your wget -O /dev/null <resource>
The regex I used was \([0-9.]\+ [KM]B/s\)
But note I had to redirect stderr onto stdout so the command was:
wget -O /dev/null http://example.com/index.html 2>&1 | grep '\([0-9.]\+ [KM]B/s\)'
This allows things like 923 KB/s and 1.4 MB/s
grep just finds matches. To get the value(s) you can use sed instead:
wget -O /dev/null http://example.com/index.html 2>&1 |
sed -e 's|^.*(\([0-9.]\+ [KM]B/s\)).*$|\1|'
This works when only 1 file is being downloaded.
I started using sed to get the speed from wget, but I found it irritating so I switched to grep.
This is my command:
wget ... 2>&1 | grep -o "[0-9.]\+ [KM]*B/s"
The -o option means it only returns that part. It matches 1 or more of the 10 digits then a space. Then optionally K or M before the B/s
That will return 423 KB/s (for example).
To grep for just the units, use grep -o "[KM]*B/s" and for just the number use grep -o "[0123456789]\+.
For example, get speed in MBit per second (by adding --report-speed=bits for wget, and small change grep pattern):
wget -O /dev/null --report-speed=bits http://www.ovh.net/files/10Mb.dat 2>&1 | grep -o "[0-9.,]\+ [KM]*[Bb]/s"
answer:
1,51 Mb/s
Why can't you just do this:
perl -ne "/^Downloaded.*?\((.*?)\)/; print $1"
here's suggestion. You can make use of wget's --limit-rate=amount option. For example,
--limit-rate=400k will limit the retrieval rate to 400KB/s. Then its easier for you to
calculate the total speed. Saves you time and mental anguish trying to regex it.

Resources