Wait until curl command has finished - linux

I'm using curl to grab a list of subscribers. Once this has been downloaded the rest of my script will process the file.
How could I make the script wait until the file has been downloaded and error if it failed?
curl "http://mydomain/api/v1/subscribers" -u
'user:pass' | json_pp >>
new.json
Thanks

As noted in the comment, curl will not return until requests is completed (or failed). I suspect you are looking for a way to identify errors in the curl, which currently are getting lost. Consider the following:
If you just need error status, you can use bash pipefail option set -o pipefail. This will allow you to check for failure in curl
set -o pipefail
if curl ... | json_pp >> new.json ; then
# All good
else
# Something wrong.
fi
Also, you might want to save the "raw" response, before trying to pretty-print it. Either using a temporary file, or using tee
set -o pipefail
if curl ... | tee raw.json | json_pp >> new.json ; then
# All good
else
# Something wrong - look into raw.json
fi

Related

The script sometimes doesn't run after wget

The script sometimes doesn't run after wget. Perhaps it is necessary to wait for the completion of wget?
#!/usr/bin/env bash
set -Eeuo pipefail
# Installing tor-browser
echo -en "\033[1;33m Installing tor-browser... \033[0m \n"
URL='https://tor.eff.org/download/' # Official mirror https://www.torproject.org/download/, may be blocked
LINK=$(wget -qO- $URL | grep -oP -m 1 'href="\K/dist.+?ALL.tar.xz')
URL='https://tor.eff.org'${LINK}
curl --location $URL | tar xJ --extract --verbose --preserve-permissions
sudo mv tor-browser /opt
sudo chown -R $USER /opt/tor-browser
cd /opt/tor-browser
./start-tor-browser.desktop --register-app
There are pitfalls associated with set -e (aka set -o errexit). See BashFAQ/105 (Why doesn't set -e (or set -o errexit, or trap ERR) do what I expected?).
If you decide to use set -e despite the problems then it's a very good idea to set up an ERR trap to show what has happened, and use set -E (aka set -o errtrace) so it fires in functions and subshells etc. A basic ERR trap can be set up with
trap 'echo "ERROR: ERR trap: line $LINENO" >&2' ERR
This will prevent the classic set -e problem: the program stops suddenly, at an unknown place, and for no obvious reason.
Under set -e, the script stops on any error.
set -Eeuo pipefail
# ^
Maybe the site is sometimes unavailable, or the fetched page doesn't match the expression grep is searching for.
You are doing
wget -qO- $URL
according to wget man page
-q
--quiet
Turn off Wget's output.
this is counterproductive for finding objective cause of malfunction, by default wget is verbose and write information to stderr, if you wish to store that into file you might redirect stderr to some file, consider following simple example
wget -O - http://www.example.com 2>>wget_out.txt
it does download Example Domain and write its' content to standard output (-) whilst stderr is appended to file named wget_out.txt, therefore if you run that command e.g. 3 times you will have information from 3 runs in wget_out.txt

Execute a command each time curl outputs a newline

I'm trying to use curl to infinitely stream messages from a remote server.
I periodically receive notifications like this:
$ curl -s https://my-server.com/status
Backup in progress...
Backup completed.
Disk space is about to fill up.
I want send_notify to be executed each time curl outputs a new line.
I created the script below, but it did not work.
notify_pipe() {
read -r msg
notify-send "$msg"
}
# Does nothing, just a blinking cursor.
stdbuf -oL curl -s https://my-server.com/status | notify_pipe
Please help me understand why this is not working and how to make it work.
I'm a beginner, I would appreciate any help! (and please forgive my poor English)
You need to read the output lines in a loop:
stdbuf -oL curl -s https://my-server.com/status |
while read line ; do
notify-send Status "$line"
done

Perl/ curl How to get Status Code and Response Body

I am trying to write a simple perl script that calls and API and if the status code is 2xx the do something with the response. While if it is 4xx or 5xx then do something else.
The issue I am encountering is I am able to either get the response code (using a custom write-out formatter and pass the output somewhere else) or I can get the whole response and the headers.
my $curlResponseCode = `curl -s -o /dev/null -w "%{http_code}" ....`;
Will give me the status code only.
my $curlResponse = `curl -si ...`;
Will give me the entire header plus the response.
My question is how can I obtain the response body from the server and the http status code in a neat format that allows me to separate them into two separate variables.
Unfortunately I cannot use LWP or any other separate libraries.
Thanks in advance.
-Spencer
I came up with this solution:
URL="http://google.com"
# store the whole response with the status at the and
HTTP_RESPONSE=$(curl --silent --write-out "HTTPSTATUS:%{http_code}" -X POST $URL)
# extract the body
HTTP_BODY=$(echo $HTTP_RESPONSE | sed -e 's/HTTPSTATUS\:.*//g')
# extract the status
HTTP_STATUS=$(echo $HTTP_RESPONSE | tr -d '\n' | sed -e 's/.*HTTPSTATUS://')
# print the body
echo "$HTTP_BODY"
# example using the status
if [ ! $HTTP_STATUS -eq 200 ]; then
echo "Error [HTTP status: $HTTP_STATUS]"
exit 1
fi
...Will give me the entire header plus the response.
...in a neat format that allows me to separate them into two separate variables.
Since header and body are simply delimited by an empty line you can split the content on this line:
my ($head,$body) = split( m{\r?\n\r?\n}, `curl -si http://example.com `,2 );
And to get the status code from the header
my ($code) = $head =~m{\A\S+ (\d+)};
You might also combine this into a single expression with a regexp, although this might be harder to understand:
my ($code,$body) = `curl -si http://example.com`
=~m{\A\S+ (\d+) .*?\r?\n\r?\n(.*)}s;
Pretty fundamentally - you're capturing output from a system command. It is far and away better to do this by using the library built for it - LWP.
Failing that though - curl -v will produce status code and content, and you'll have to parse it.
You might also find this thread on SuperUser useful:
https://superuser.com/questions/272265/getting-curl-to-output-http-status-code
Specifically
#creates a new file descriptor 3 that redirects to 1 (STDOUT)
exec 3>&1
# Run curl in a separate command, capturing output of -w "%{http_code}" into HTTP_STATUS
# and sending the content to this command's STDOUT with -o >(cat >&3)
HTTP_STATUS=$(curl -w "%{http_code}" -o >(cat >&3) 'http://example.com')
(That isn't perl, but you can probably use something similar. At very least, running the -w and capturing your content to a temp file.
Haven't figured out a "pure" Perl solution, but I drafted this snippet to check the HTTP response code of a page via curl:
#!/usr/bin/perl
use v5.30;
use warnings;
use diagnostics;
our $url = "";
my $username = "";
my $password = "";
=begin url_check
Exit if HTTP response code not 200.
=cut
sub url_check {
print "Checking URL status code...\n";
my $status_code =
(`curl --max-time 2.5 --user ${username}:${password} --output /dev/null --silent --head --write-out '%{http_code}\n' $url`);
if ($status_code != '200'){
{
print "URL not accessible. Exiting. \n";
exit;
}
} else {
print "URL accessible. Continuing... \n";
}
}
url_check
The verbose use of curl more or less documents itself. My example allows you to pass credentials to a page, but that can be removed as needed.

Parallel download using Curl command line utility

I want to download some pages from a website and I did it successfully using curl but I was wondering if somehow curl downloads multiple pages at a time just like most of the download managers do, it will speed up things a little bit. Is it possible to do it in curl command line utility?
The current command I am using is
curl 'http://www...../?page=[1-10]' 2>&1 > 1.html
Here I am downloading pages from 1 to 10 and storing them in a file named 1.html.
Also, is it possible for curl to write output of each URL to separate file say URL.html, where URL is the actual URL of the page under process.
My answer is a bit late, but I believe all of the existing answers fall just a little short. The way I do things like this is with xargs, which is capable of running a specified number of commands in subprocesses.
The one-liner I would use is, simply:
$ seq 1 10 | xargs -n1 -P2 bash -c 'i=$0; url="http://example.com/?page${i}.html"; curl -O -s $url'
This warrants some explanation. The use of -n 1 instructs xargs to process a single input argument at a time. In this example, the numbers 1 ... 10 are each processed separately. And -P 2 tells xargs to keep 2 subprocesses running all the time, each one handling a single argument, until all of the input arguments have been processed.
You can think of this as MapReduce in the shell. Or perhaps just the Map phase. Regardless, it's an effective way to get a lot of work done while ensuring that you don't fork bomb your machine. It's possible to do something similar in a for loop in a shell, but end up doing process management, which starts to seem pretty pointless once you realize how insanely great this use of xargs is.
Update: I suspect that my example with xargs could be improved (at least on Mac OS X and BSD with the -J flag). With GNU Parallel, the command is a bit less unwieldy as well:
parallel --jobs 2 curl -O -s http://example.com/?page{}.html ::: {1..10}
Well, curl is just a simple UNIX process. You can have as many of these curl processes running in parallel and sending their outputs to different files.
curl can use the filename part of the URL to generate the local file. Just use the -O option (man curl for details).
You could use something like the following
urls="http://example.com/?page1.html http://example.com?page2.html" # add more URLs here
for url in $urls; do
# run the curl job in the background so we can start another job
# and disable the progress bar (-s)
echo "fetching $url"
curl $url -O -s &
done
wait #wait for all background jobs to terminate
As of 7.66.0, the curl utility finally has built-in support for parallel downloads of multiple URLs within a single non-blocking process, which should be much faster and more resource-efficient compared to xargs and background spawning, in most cases:
curl -Z 'http://httpbin.org/anything/[1-9].{txt,html}' -o '#1.#2'
This will download 18 links in parallel and write them out to 18 different files, also in parallel. The official announcement of this feature from Daniel Stenberg is here: https://daniel.haxx.se/blog/2019/07/22/curl-goez-parallel/
For launching of parallel commands, why not use the venerable make command line utility.. It supports parallell execution and dependency tracking and whatnot.
How? In the directory where you are downloading the files, create a new file called Makefile with the following contents:
# which page numbers to fetch
numbers := $(shell seq 1 10)
# default target which depends on files 1.html .. 10.html
# (patsubst replaces % with %.html for each number)
all: $(patsubst %,%.html,$(numbers))
# the rule which tells how to generate a %.html dependency
# $# is the target filename e.g. 1.html
%.html:
curl -C - 'http://www...../?page='$(patsubst %.html,%,$#) -o $#.tmp
mv $#.tmp $#
NOTE The last two lines should start with a TAB character (instead of 8 spaces) or make will not accept the file.
Now you just run:
make -k -j 5
The curl command I used will store the output in 1.html.tmp and only if the curl command succeeds then it will be renamed to 1.html (by the mv command on the next line). Thus if some download should fail, you can just re-run the same make command and it will resume/retry downloading the files that failed to download during the first time. Once all files have been successfully downloaded, make will report that there is nothing more to be done, so there is no harm in running it one extra time to be "safe".
(The -k switch tells make to keep downloading the rest of the files even if one single download should fail.)
Curl can also accelerate a download of a file by splitting it into parts:
$ man curl |grep -A2 '\--range'
-r/--range <range>
(HTTP/FTP/SFTP/FILE) Retrieve a byte range (i.e a partial docu-
ment) from a HTTP/1.1, FTP or SFTP server or a local FILE.
Here is a script that will automatically launch curl with the desired number of concurrent processes: https://github.com/axelabs/splitcurl
Starting from 7.68.0 curl can fetch several urls in parallel. This example will fetch urls from urls.txt file with 3 parallel connections:
curl --parallel --parallel-immediate --parallel-max 3 --config urls.txt
urls.txt:
url = "example1.com"
output = "example1.html"
url = "example2.com"
output = "example2.html"
url = "example3.com"
output = "example3.html"
url = "example4.com"
output = "example4.html"
url = "example5.com"
output = "example5.html"
curl and wget cannot download a single file in parallel chunks, but there are alternatives:
aria2 (written in C++, available in Deb and Cygwin repo's)
aria2c -x 5 <url>
axel (written in C, available in Deb repo)
axel -n 5 <url>
wget2 (written in C, available in Deb repo)
wget2 --max-threads=5 <url>
lftp (written in C++, available in Deb repo)
lftp -n 5 <url>
hget (written in Go)
hget -n 5 <url>
pget (written in Go)
pget -p 5 <url>
Run a limited number of process is easy if your system have commands like pidof or pgrep which, given a process name, return the pids (the count of the pids tell how many are running).
Something like this:
#!/bin/sh
max=4
running_curl() {
set -- $(pidof curl)
echo $#
}
while [ $# -gt 0 ]; do
while [ $(running_curl) -ge $max ] ; do
sleep 1
done
curl "$1" --create-dirs -o "${1##*://}" &
shift
done
to call like this:
script.sh $(for i in `seq 1 10`; do printf "http://example/%s.html " "$i"; done)
The curl line of the script is untested.
I came up with a solution based on fmt and xargs. The idea is to specify multiple URLs inside braces http://example.com/page{1,2,3}.html and run them in parallel with xargs. Following would start downloading in 3 process:
seq 1 50 | fmt -w40 | tr ' ' ',' \
| awk -v url="http://example.com/" '{print url "page{" $1 "}.html"}' \
| xargs -P3 -n1 curl -o
so 4 downloadable lines of URLs are generated and sent to xargs
curl -o http://example.com/page{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16}.html
curl -o http://example.com/page{17,18,19,20,21,22,23,24,25,26,27,28,29}.html
curl -o http://example.com/page{30,31,32,33,34,35,36,37,38,39,40,41,42}.html
curl -o http://example.com/page{43,44,45,46,47,48,49,50}.html
Bash 3 or above lets you populate an array with multiple values as it expands sequence expressions:
$ urls=( "" http://example.com?page={1..4} )
$ unset urls[0]
Note the [0] value, which was provided as shorthand to make the indices line up with page numbers, since bash arrays autonumber starting at zero. This strategy obviously might not always work. Anyway, you can unset it in this example.
Now you have a an array, and you can verify the contents with declare -p:
$ declare -p urls
declare -a urls=([1]="http://example.com?Page=1" [2]="http://example.com?Page=2" [3]="http://example.com?Page=3" [4]="http://example.com?Page=4")
Now that you have a list of URLs in an array, expand the array into a curl command line:
$ curl $(for i in ${!urls[#]}; do echo "-o $i.html ${urls[$i]}"; done)
The curl command can take multiple URLs and fetch all of them, recycling the existing connection (HTTP/1.1) to a common server, but it needs the -o option before each one in order to download and save each target. Note that characters within some URLs may need to be escaped to avoid interacting with your shell.
I am not sure about curl, but you can do that using wget.
wget \
--recursive \
--no-clobber \
--page-requisites \
--html-extension \
--convert-links \
--restrict-file-names=windows \
--domains website.org \
--no-parent \
www.website.org/tutorials/html/

How do I pipe or redirect the output of curl -v?

For some reason the output always gets printed to the terminal, regardless of whether I redirect it via 2> or > or |. Is there a way to get around this? Why is this happening?
add the -s (silent) option to remove the progress meter, then redirect stderr to stdout to get verbose output on the same fd as the response body
curl -vs google.com 2>&1 | less
Your URL probably has ampersands in it. I had this problem, too, and I realized that my URL was full of ampersands (from CGI variables being passed) and so everything was getting sent to background in a weird way and thus not redirecting properly. If you put quotes around the URL it will fix it.
The answer above didn't work for me, what did eventually was this syntax:
curl https://${URL} &> /dev/stdout | tee -a ${LOG}
tee puts the output on the screen, but also appends it to my log.
If you need the output in a file you can use a redirect:
curl https://vi.stackexchange.com/ -vs >curl-output.txt 2>&1
Please be sure not to flip the >curl-output.txt and 2>&1, which will not work due to bash's redirection behavior.
Just my 2 cents.
The below command should do the trick, as answered earlier
curl -vs google.com 2>&1
However if need to get the output to a file,
curl -vs google.com > out.txt 2>&1
should work.
I found the same thing: curl by itself would print to STDOUT, but could not be piped into another program.
At first, I thought I had solved it by using xargs to echo the output first:
curl -s ... <url> | xargs -0 echo | ...
But then, as pointed out in the comments, it also works without the xargs part, so -s (silent mode) is the key to preventing extraneous progress output to STDOUT:
curl -s ... <url> | perl -ne 'print $1 if /<sometag>([^<]+)/'
The above example grabs the simple <sometag> content (containing no embedded tags) from the XML output of the curl statement.
The following worked for me:
Put your curl statement in a script named abc.sh
Now run:
sh abc.sh 1>stdout_output 2>stderr_output
You will get your curl's results in stdout_output and the progress info in stderr_output.
This simple example shows how to capture curl output, and use it in a bash script
test.sh
function main
{
\curl -vs 'http://google.com' 2>&1
# note: add -o /tmp/ignore.png if you want to ignore binary output, by saving it to a file.
}
# capture output of curl to a variable
OUT=$(main)
# search output for something using grep.
echo
echo "$OUT" | grep 302
echo
echo "$OUT" | grep title
Solution = curl -vs google.com 2>&1 | less
BUT, if you want to redirect the output to a file and the output is still on the screen, then the URL response contains a newline char \n which messed up your shell.
To avoit this put everything in a variable:
result=$(curl -v . . . . )

Resources