analyse return value of wget command - linux

I would like to analyse returned value of wget command.
I try those :
GET=$(wget ftp://user:user#192.168.1.110/conf.txt
echo $GET
GET=`wget ftp://user:user#192.168.1.110/conf.txt`
echo $GET
but I don't get the returned value when display GET variable
how to get returned value of wget

Your question is a little ambiguous. If you're asking "What is the exit code of the 'wget' process, that is accessible in the $? special variable."
[~/tmp]$ wget www.google.foo
--2013-11-01 08:33:52-- http://www.google.foo/
Resolving www.google.foo... failed: nodename nor servname provided, or not known.
wget: unable to resolve host address ‘www.google.foo’
[~/tmp]$ echo $?
4
If you're asking for the standard output of the 'wget' command, then what you're doing is going to give you that, although you have a typo in your first line (Add a closing parentheses after "conf.txt"). The problem is that wget doesn't put anything to stdout, by default. The progress bars and messages you see when you run wget interactively are actually going to stderr, which you can see by redirecting stderr to stdout using shell redirection 2>&1:
[~/tmp]$ GET=`wget www.google.com 2>&1`
[~/tmp]$ echo $GET
--2013-11-01 08:36:23-- http://www.google.com/ Resolving www.google.com... 74.125.28.104, 74.125.28.99, 74.125.28.103, ... Connecting to www.google.com|74.125.28.104|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 18637 (18K) [text/html] Saving to: ‘index.html’ 0K .......... ........ 100% 2.72M=0.007s 2013-11-01 08:36:23 (2.72 MB/s) - ‘index.html’ saved [18637/18637]
If you're asking for the contents of the resource that wget received, then you need to instruct wget to send its output to stdout instead of a file. Depending on your flavor of wget, it's likely an option like -O or --output-document, and you can construct your command line as: wget -O - <url>. By convention the single dash (-) represents stdin and stdout in command line options, so you're telling wget to send its file to stdout.
[~/tmp]$ GET=`wget -O - www.google.com`
--2013-11-01 08:37:31-- http://www.google.com/
Resolving www.google.com... 74.125.28.104, 74.125.28.99, 74.125.28.103, ...
Connecting to www.google.com|74.125.28.104|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 18621 (18K) [text/html]
Saving to: ‘STDOUT’
100%[=======================================>] 18,621 98.5KB/s in 0.2s
2013-11-01 08:37:32 (98.5 KB/s) - written to stdout [18621/18621]
[~/tmp]$ echo $GET
<!doctype html><html itemscope="" itemtype="http://schema.org/WebPage"><head>
<snip lots of content>

You can get the exit code with
echo $?
after executing the command. But if you like to react on a working/not working download, you can use if
if wget -q www.google.com
then
echo "works"
else
echo "doesn't work"
fi

Related

Get/fetch a file with a bash script using /dev/tcp over https without using curl, wget, etc

I try to read/fetch this file:
https://blockchain-office.com/file.txt with a bash script over dev/tcp without using curl,wget, etc..
I found this example:
exec 3<>/dev/tcp/www.google.com/80
echo -e "GET / HTTP/1.1\r\nhost: http://www.google.com\r\nConnection: close\r\n\r\n" >&3
cat <&3
I change this to my needs like:
exec 3<>/dev/tcp/www.blockchain-office.com/80
echo -e "GET / HTTP/1.1\r\nhost: http://www.blockchain-office.com\r\nConnection: close\r\n\r\n" >&3
cat <&3
When i try to run i receive:
400 Bad Request
Your browser sent a request that this server could not understand
I think this is because strict ssl/only https connections is on.
So i change it to :
exec 3<>/dev/tcp/www.blockchain-office.com/443
echo -e "GET / HTTP/1.1\r\nhost: https://www.blockchain-office.com\r\nConnection: close\r\n\r\n" >&3
cat <&3
When i try to run i receive:
400 Bad Request
Your browser sent a request that this server could not understand.
Reason: You're speaking plain HTTP to an SSL-enabled server port.
Instead use the HTTPS scheme to access this URL, please.
So i even can't get a normal connection without get the file!
All this post's does not fit, looks like ssl/tls is the problem only http/80 works, if i don't use curl, wget, lynx, openssl, etc...:
how to download a file using just bash and nothing else (no curl, wget, perl, etc.)
Using /dev/tcp instead of wget
How to get a response from any URL?
Read file over HTTP in Shell
I need a solution to get/read/fetch a normal txt file from a domain over https only with /dev/tcp no other tools like curl, and output in my terminal or save in a variable without wget, etc.., is it possible and how, or is it there an other solution over the terminal with the standard terminal utilities?
You can use openssl s_client to perform the equivalent operation but delegate the SSL part:
#!/bin/sh
host='blockchain-office.com'
port=443
path='/file.txt'
crlf="$(printf '\r\n_')"
crlf="${crlf%?}"
{
printf '%s\r\n' \
"GET ${path} HTTP/1.1" \
"host: ${host}" \
'Connection: close' \
''
} |
openssl s_client -quiet -connect "${host}:${port}" 2 >/dev/null | {
# Skip headers by reading up until encountering a blank line
while IFS="${crlf}" read -r line && [ -n "$line" ]; do :; done
# Output the raw body content
cat
}
Instead of cat to output the raw body, you may want to check some headers like Content-Type, Content-Transfer-Encoding and even maybe navigate and handle recursive MIME chunks, then decode the raw content to something.
After all the comments and research, the answer is no, we can't get/fetch files using only the standard tools with the shell like /dev/tcp because we can't handle ssl/tls without handle the complete handshake.
It is only possbile with the http/80.
i dont think bash's /dev/tcp supports ssl/tls
If you use /dev/tcp for a http/https connection you have to manage the complete handshake including ssl/tls, http headers, chunks and more. Or you use curl/wget that manage it for you.
then shell is the wrong tool because it is not capable of performing any of the SSL handshake without using external resources/commands. Now relieve and use what you want and can from what I show you here as the cleanest and most portable POSIX-shell grammar implementation of a minimal HTTP session through SSL. And then maybe it is time to consider alternative options (not using HTTPS, using languages with built-in or standard library SSL support).
We will use curl, wget and openssl on seperate docker containers now.
I think there are still some requirements in the future to see if we keep only one of them or all of them.
We will use the script from #Léa Gris in a docker container too.

How to get http status code and content separately using curl in linux

I have to fetch some data using curl linux utility. There are two cases, one request is successful and second it is not. I want to save output to a file if request is successful and if request is failed due to some reason then error code should be saved only to a log file. I have search a lot on www but could not found exact solution that's why I have posted a new question on curl.
One option is to get the response code with -w, so you could do it something like
code=$(curl -s -o file -w '%{response_code}' http://example.com/)
if test "$code" != "200"; then
echo $code >> response-log
else
echo "wohoo 'file' is fine"
fi
curl -I -s -L <Your URL here> | grep "HTTP/1.1"
curl + grep is your friend, then you can extract the status code later for your need.

Checking connectivity with tftp server and accessibility of file there

I have BusyBox v1.23.2 multi-call binary. with simple tftp-client.
I need to check connectivity with a tftp server and accessibility of file there.
For ftp it may look like this:
if wget -q -s $url; then
echo "found"
fi
Is there a reliable solution for tftp?
p.s. I can't try to download the file (it's too big).
Update: I solved the problem by adding hack to source code of BusyBox, which allows to implement a scenario, like that:
"No.","Source","Destination","Info"
"1","192.168.0.8","192.168.0.6","Read Request, File: some_folder/file.txt, Transfer type: octet, blksize\\000=4096\\000, tsize\\000=0\\000"
"2","192.168.0.6","192.168.0.8","Option Acknowledgement, blksize\\000=4096\\000, tsize\\000=10094\\000"
"3","192.168.0.8","192.168.0.6","Error Code, Code: Not defined, Message: Connection checking"
I guess this will work for you.
$ wget --spider http://henning.makholm.net/
Spider mode enabled. Check if remote file exists.
--2011-08-08 19:39:48-- http://henning.makholm.net/
Resolving henning.makholm.net (henning.makholm.net)... 85.81.19.235
Connecting to henning.makholm.net (henning.makholm.net)|85.81.19.235|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 9535 (9.3K) [text/html] <-------------------------
Remote file exists and could contain further links,
but recursion is disabled -- not retrieving.

Auto-download involving password and specific online clicking

I want to use cron to do a daily download of portfolio info with 2 added complications:
It needs a password
I want to get the format I can get, when on the site myself, by clicking on "Download to a Spreadsheet
If I use:
wget -U Chromium --user='e-address' --password='pass' \
https://www.google.com/finance/portfolio > "file_"`date +"%d-%m-%Y"`+.csv
I Get the response:
=========================================================================
--2013-10-20 12:16:13-- https://www.google.com/finance/portfolio
Resolving www.google.com (www.google.com)... 74.125.195.105, 74.125.195.103, 74.125.195.99, ...
Connecting to www.google.com (www.google.com)|74.125.195.105|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘portfolio’
[ <=> ] 16,718 --.-K/s in 0.04s
2013-10-20 12:16:13 (431 KB/s) - ‘portfolio’ saved [16718]
==========================================================================
It saves to a file called "portfolio" rather than where I asked it to ("file_"date +"%d-%m-%Y"+.csv).
When I look at "portfolio" in the browser it says I need to sign in to my account ie no notice is taken of the user and password information I've included.
If I add to the web address the string I get by hovering on the "Download to a Spreadsheet" link:-
wget -U Chromium --user='e-address' --password='pass' \
https://www.google.com/finance/portfolio?... > "file_"`date +"%d-%m-%Y"`+.csv
I get:
[1] 5175
[2] 5176
[3] 5177
[4] 5178
--2013-10-20 12:44:56-- https://www.google.com/finance/portfolio?pid=1
Resolving www.google.com (www.google.com)... [2] Done output=csv
[3]- Done action=view
[4]+ Done pview=pview
hg21#hg21-sda2:~$ 74.125.195.106, 74.125.195.103, 74.125.195.104, ...
Connecting to www.google.com (www.google.com)|74.125.195.106|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [text/html]
Saving to: ‘portfolio?pid=1’
[ <=> ] 16,768 --.-K/s in 0.05s
2013-10-20 12:44:56 (357 KB/s) - ‘portfolio?pid=1.1’ saved [16768]
and at this point it hangs. The file it writes at this point (‘portfolio?pid=1’) is the same as the 'portfolio' file with the previously used wget.
If I then put in my password it continues:
pass: command not found
[1]+ Done wget -U Chromium --user="e-address" --password='pass' https://www.google.com/finance/portfolio?pid=1
[1]+ Done wget -U Chromium --user="e-address" --password='pass' https://www.google.com/finance/portfolio?pid=1
Any help much appreciated.
There are a couple of issues here:
1) wget is not saving to the correct filename
Use the -O option instead of > shell redirection.
Change > file_`date+"%d-%m-%Y"`.csv to -O file_`date+"%d-%m-%Y"`.csv
Tip: If you use date+"%Y-%m-%d", your files will naturally sort chronologically.
This is esssentially a duplicate of wget command to download a file and save as a different filename
See also man wget for options.
2) wget is spawning multiple processes and "hanging"
You have &s in your URL which are being interpreted by the shell instead of being included in the argument passed to wget. You need to wrap the URL in quotation marks.
https://finance.google.com/?...&...&...
becomes
"https://finance.google.com/?...&...&..."

Cron output to nothing

I've noticed that my cron outputs are creating index.html files on my server. The command I'm using is wget http://www.example.com 2>&1. I've also tried including --reject "index.html*"
How can I prevent the output from creating index.html files?
--2013-07-21 16:03:01-- http://www.examplel.com
Resolving example.com... 192.0.43.10
Connecting to www.example.com|192.0.43.10|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 0 [text/html]
Saving to: `index.html.9'
0K 0.00 =0s
2013-07-21 16:03:03 (0.00 B/s) - `index.html.9' saved [0/0]
Normally, the whole point of running wget is to create an output file. A URL like http://www.example.com typically resolves to http://www.example.com/index.html, so by creating index.html, the wget command is just doing its job.
If you want to run wget and discard the downloaded file, you can use:
wget -q -O /dev/null http://www.example.com
The -o /dev/null discards log messages; -O /dev/null discards the downloaded file.
If you want to be sure that anything wget writes to stdout or stderr is discarded:
wget -q -O /dev/null http://www.example.com >/dev/null 2>&1
In a comment, you say that you're using the wget command to "trigger items on your cron controller" using CodeIgniter. I'm not familiar with CodeIgniter, but downloading and discarding an HTML file seems inefficient. I suspect (and hope) that there's a cleaner way to do whatever you're trying to do.

Resources