Curl "write out" value of specific header - linux

I am currently writing a bash script and I'm using curl. What I want to do is get one specific header of a response.
Basically I want this command to work:
curl -I -w "%{etag}" "server/some/resource"
Unfortunately it seems as if the -w, --write-out option only has a set of variables it supports and can not print any header that is part of the response. Do I need to parse the curl output myself to get the ETag value or is there a way to make curl print the value of a specific header?
Obviously something like
curl -sSI "server/some/resource" | grep 'ETag:' | sed -r 's/.*"(.*)".*/\1/'
does the trick, but it would be nicer to have curl filter the header.

The variables specified for "-w" are not directly connected to the http header.
So it looks like you have to "parse" them on your own:
curl -I "server/some/resource" | grep -Fi etag

You can print a specific header with a single sed or awk command, but HTTP headers use CRLF line endings.
curl -sI stackoverflow.com | tr -d '\r' | sed -En 's/^Content-Type: (.*)/\1/p'
With awk you can add FS=": " if the values contain spaces:
awk 'BEGIN {FS=": "}/^Content-Type/{print $2}'

The other answers use the -I option and parse the output. It's worth noting that -I changes the HTTP method to HEAD. (The long opt version of -I is --head). Depending on the field you're after and the behaviour of the web server, this may be a distinction without a difference. Headers like Content-Length may be different between HEAD and GET. Use the -X option to force the desired HTTP method and still only see the headers as the response.
curl -sI http://ifconfig.co/json | awk -v FS=": " '/^Content-Length/{print $2}'
18
curl -X GET -sI http://ifconfig.co/json | awk -v FS=": " '/^Content-Length/{print $2}'
302

Related

Download latest version of Slack automatically

I have a script that I download slack with the wget command, as the script runs every time a computer is configured I need to always download the latest version of slack.
i work in debian9
I'm doing it right now:
wget https://downloads.slack-edge.com/linux_releases/slack-desktop-3.3.7-amd64.deb
and I tried this:
curl -s https://slack.com/intl/es/release-notes/linux | grep "<h2>Slack" | head -1 | sed 's/[<h2>/]//g' | sed 's/[a-z A-Z]//g' | sed "s/ //g"
this return: 3.3.7
add this to: wget https://downloads.slack-edge.com/linux_releases/slack-desktop-$curl-amd64.deb
and not working.
Do you know why this can not work?
Your script produces a long string with a lot of leading whitespace.
bash$ curl -s https://slack.com/intl/es/release-notes/linux |
> grep "<h2>Slack" | head -1 |
> sed 's/[<h2>/]//g' | sed 's/[a-z A-Z]//g' | sed "s/ //g"
3.3.7
You want the string without spaces, and the fugly long pipeline can be simplified significantly.
barh$ curl -s https://slack.com/intl/es/release-notes/linux |
> sed -n "/^.*<h2>Slack /{;s///;s/[^0-9.].*//p;q;}"
3.3.7
Notice also that the character class [<h2>/] doesn't mean at all what you think. It matches a single character which is < or h or 2 or > or / regardless of context. So for example, if the current version number were to contain the digit 2, you would zap that too.
Scraping like this is very brittle, though. I notice that if I change the /es/ in the URL to /en/ I get no output at all. Perhaps you can find a better way to obtain the newest version (using apt should allow you to install the newest version without any scripting on your side).
echo wget "https://downloads.slack-edge.com/linux_releases/slack-desktop-$(curl -s "https://slack.com/intl/es/release-notes/linux" | xmllint --html --xpath '//h2' - 2>/dev/null | head -n1 | sed 's/<h2>//;s#</h2>##;s/Slack //')-amd64.deb"
will output:
wget https://downloads.slack-edge.com/linux_releases/slack-desktop-3.3.7-amd64.deb
I used xmllint to parse the html and extract the first part between <h2> tags. Then some removing with sed and I receive the newest version.
#edit:
On noticing, that you could just grep <h2> from the site to get the version, you can the version with just:
curl -s "https://slack.com/intl/es/release-notes/linux" | grep -m1 "<h2>" | cut -d' ' -f2 | cut -d'<' -f1

Split Header and Content in curl

curl -L -i google.com
I want to split the HEADER and CONTENT from the response in two variables
curl -I google.com
curl -L google.com
I cant use these two becouse im going to use it with 10000+ links
Both Header and Content can have three or more blank lines, so spliting blank lines wont work everytime
I found the answer
b=$(curl -LsD h google.com)
h=$(<h)
echo "$h$b"
This code works too
curl -sLi google.com |
awk -v bl=1 'bl{bl=0; h=($0 ~ /HTTP\/1/)} /^\r?$/{bl=1} {print $0>(h?"header":"body")}'
header=$(<header)
body=$(<body)
You can use this script:
curl -sLi google.com |
awk -v bl=1 'bl{bl=0; h=($0 ~ /HTTP\/1/)} /^\r?$/{bl=1} {print $0>(h?"header":"body")}'
header=$(<header)
body=$(<body)

Piped Arguments: echo "value1 value2" | command $1 $2

I have used Sed to grab two interesting values. Now I want to send those two values as parameters to Curl. I have been successful piping Sed output to Curl with only 1 argument using xargs, however I am unable to use two arguments for one command.
echo "value1" "value2" | curl --data 'valA=$1&valB=$2' http://example.com
I am stuck in both theory and practice. I wasn't planning to use bash scripting.
[ I am running tshark, piping output to sed, and hoping to pipe that output to curl so as to record data in a remote DB. ]
As long as you're not expecting to use valA and valB in future commands, you can use read to store the the whitespace-delimited output from your exiting commands:
$ echo foo bar | { read var1 var2 ; echo $var1 $var2 ; }
foo bar
Which means you can do:
echo "value1" "value2" | { read a b ; curl --data 'valA=$a&valB=$b' http://example.com ; }
Assuming you can get sed to output something like .csv (foo\tbar):
tshark | sed ... | parallel --colsep '\t' -q curl --data 'valA={1}&valB={2}' http://example.com
You can find more about GNU Parallel at:
http://www.gnu.org/s/parallel/
Watch the intro video on
http://www.youtube.com/playlist?list=PL284C9FF2488BC6D1
Walk through the tutorial (man parallel_tutorial). Your command line will love you for it.
Backticks sound like the best bet, if you can't run the sed twice (eg you're doing this on a live tshark stream):
curl --data $(printf "valA=%s&valB=%s" `tshark --whatever | sed -e whatever`)

wget spider returns all URLs twice -- where is the bug?

I was looking for a script to create a URL list for a sitemap and found this one:
wget --spider --force-html -r -l1 http://sld.tld 2>&1 \
| grep '^--' | awk '{ print $3 }' \
| grep -v '\.\(css\|js\|png\|gif\|jpg\|ico\|txt\)$' \
> urllist.txt
The result is:
http://sld.tld/
http://sld.tld/
http://sld.tld/home.html
http://sld.tld/home.html
http://sld.tld/news.html
http://sld.tld/news.html
...
Every URL entry is saved twice. How should the script be changed to fix this?
If you look at the output of wget when you use the --spider flag, it'll look something like:
Spider mode enabled. Check if remote file exists.
--2013-04-12 22:01:03-- http://www.google.com/intl/en/about/products/
Connecting to www.google.com|173.194.75.103|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Remote file exists and could contain links to other resources -- retrieving.
--2013-04-12 22:01:03-- http://www.google.com/intl/en/about/products/
Reusing existing connection to www.google.com:80.
HTTP request sent, awaiting response... 200 OK
It checks if the link is there (thus prints out a --), then it has to download it to look for additional links (thus the second --). This is why it shows up (at least twice) when you use --spider.
Compare that to without --spider:
Location: http://www.google.com/intl/en/about/products/ [following]
--2013-04-12 22:00:49-- http://www.google.com/intl/en/about/products/
Reusing existing connection to www.google.com:80.
So you only get one line that starts with --.
You can remove the --spider flag but you could still get duplicates. If you really don't want any duplicates, add a | sort | uniq to your command:
wget --spider --force-html -r -l1 http://sld.tld 2>&1 \
| grep '^--' | awk '{ print $3 }' \
| grep -v '\.\(css\|js\|png\|gif\|jpg\|ico\|txt\)$' \
| sort | uniq > urllist.txt

CURL Progress Bar: How to pipe and extract numbers only using grep?

This is what I have so far:
[my1#graf home]$ curl -# -o f1.flv 'http://osr.com/f1.flv' | grep -o '*[0-9]*'
####################################################################### 100.0%
I wish to use grep and only extract the percentage from that progress bar that CURL outputs.
I think my regex is not correct and I am also not sure if this grep will take effect of the the percentage being continuously updated?
What I am trying to do is basically get CURL only to give me the percentage number as the output and nothing else.
Thank you for any help.
With curl 7.36.0 (should also work for other versions) you can extract the percentage in the following way:
curl ... 2>&1 -# | stdbuf -oL tr '\r' '\n' | grep -o '[0-9]*\.[0-9]'
Here ... stands for options/filenames. This outputs a sequence of percentage numbers.
Curl uses carriage returns \r in its output, so you need tr to transform them first into \n because grep is line oriented. You also need to modify output buffer settings with stdbuf to get the percentage numbers immediately after curl outputs them.
You can't get the progress info like that through grep; it doesn't make sense.
curl writes the progress bar to stderr, so you have to redirect to stdout before you can grep it:
$ curl -# -o f1.flv 'http://osr.com/f1.flv' 2>&1 | grep 1 | less results in:
^M 0.0
%^M######################################################################## 100.
0%^M######################################################################## 100
.0%^M######################################################################## 10
0.0%
Are you expecting a continual stream of numbers that you are redirecting somewhere else? Or do you expect to grab the numbers at a single point?
If it's the former, this sort of half-assedly works on a small file:
$ curl -# -o f1.flv 'http://osr.com/f1.flv' 2>&1 | sed 's/#//g' -
100.0% 0.0%
But it's useless on a large file. The output doesn't print until the download is finished, probably because curl seems to be sending ^H's to the terminal. There might be a better way to sed it, but I wouldn't hold my breath.
$ curl -# -o l.tbz 'ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009/06/2009-06-02-05-mozilla-1.9.1/firefox-3.5pre.en-US.linux-x86_64.tar.bz2' 2>&1 | sed 's/#//g' -
100.0%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Try this:
curl source -o dest -# 2> tmp&
grep -o ".....%" tmp | tail -n1
You need to use .* not * in your regex.
grep -o '.*[0-9].*'
That will catch all text though, so maybe try:
grep -p '[0-9]+'

Resources