How to handle special characters in a wget download link?

How to handle special characters in a wget download link? - linux

I have a link like this:
wget --user=user_nm --http-password=pass123 https://site.domain.com/Folder/Folder/page.php?link=/Folder/Folder/Csv.Stock.php\&namefile=STOCK.Stock.csv
But while the password authorization is fine, wget still cannot process the link. Why?

The safest way when handling a link from e.g. a browser is to use single quotes (') to quote the whole link string. That way the shell will not try to break it up, without you having to manually escape each special character:
wget --user=user_nm --http-password=pass123 'https://site.domain.com/Folder/Folder/page.php?link=/Folder/Folder/Csv.Stock.php&namefile=STOCK.Stock.csv'
Or, for a real example:
wget --user-agent=firefox 'https://www.google.com/search?q=bash+shell+singl+quote&ie=utf-8&oe=utf-8&aq=t&rls=org.mageia:en-US:official&client=firefox-a#q=bash+single+quote&rls=org.mageia:en-US:official'
Keep in mind that server-side restrictions might make using wget like this quite hard. Google, for example, forbids certain user agent strings, hence the --user-agent option above. Other servers use cookies to maintain session information and simply feeding a link to wget will not work. YMMV.

Related

Is there a Linux command line utility for getting random data to work with from the web?

I am a Linux newbie and I often find myself working with a bunch of random data.
For example: I would like to work on a sample text file to try out some regular expressions or read some data into gnuplot from some sample data in a csv file or something.
I normally do this by copying and pasting passages from the internet but I was wondering if there exists some combination of commands that would allow me to do this without having to leave the terminal. I was thinking about using something like the curl command but I dont exactly know how it works...
To my knowledge there are websites that host content. I would simply like to access them and store them in my computer.
In conclusion and as a concrete example, how would i copy and paste a random passage off the internet from a website and store it in a file in my system using only the command line? Maybe you can point me in the right direction. Thanks.

You could redirect the output of a curl command into a file e.g.
curl https://run.mocky.io/v3/5f03b1ef-783f-439d-b8c5-bc5ad906cb14 > data-output
Note that I've mocked data in Mocky which is a nice website for quickly mocking an API.

I normally use "Project Gutenberg" which has 60,000+ books freely downloadable online.
So, if I want the full text of "Peter Pan and Wendy" by J.M. Barrie, I'd do:
curl "http://www.gutenberg.org/files/16/16-0.txt" > PeterPan.txt
If you look at the page for that book, you can see how to get it as HTML, plain text, ePUB or UTF-8.

Linux tool to bulk change a website from http:// to https://

I need to change an entire PHP-based website from http:// to https://. The SSL certificate has already been installed and shows validity.
Now, the website has many many subdirectories, shops, newsletters etc., but stems from one major directory.
Is there either a tool or a methodology I can do this under Linux recursively, i. e. incorporating all various sub-directories in my search and automatically exchange http:// to https://? Is there a way not only to do the exchange but also to save the changed files automatically?
Maybe a stupid question, but I'd appreciate your help a lot so as to prevent myself from going through every single PHP file in every single directory.

The sed command has an in-place option which can be helpful in executing your change. For example
sed -i 's/original/new/g' file.txt
In your case this may work
sed -i 's/http:\/\//https:\/\//g' ./*.php
I would recommend a backup before you try this since the sed command -i option may work differently on your system.
Here is a reference with more information.

Trying to extract field from browser page

I'm trying to extract one field from Firefox to my local Ubuntu 12.04 PC and Mac OS 19.7.4 from an online form
I can manually save the page locally as a text document and then search for the text using Unix script but this seems rather cumbersome & I require it to be automated. Is there another more efficient method?
My background is on Macs but the company is trialling Linux PC's, so please be tolerant of my relevant Ubuntu ignorance.

If you mean to program something try
WWW:Mechanize library, it have python and perl bindings,
several mousescripting engines in lunux, (actionaz)
test automation tool which works with firefox (Selenium)

You can do it by simple BASH script.
Take a look at some useful stuff like:
wget
sed
grep
and then nothing will by cumbersome and everything can go automatic.

If you want to go with the method that you mentioned, you can use curl to automate the saving of the form. Your BASH script would then look something like this:
curl http://locationofonlineform.com -o tempfile
valueOfField=$(grep patternToFindField tempfile)
// Do stuff
echo $valueOfField
If you want to get rid of the temporary file, you can directly feed the result of curl into the grep command.

Using ' in Shellscript (wget)

I'm trying to get wget to work with a post-request and a special password. It contains ' and it's like this:
wget --save-cookie cookie.txt --post-data "user=Abraham&password=--my'precious!" http://localhost/login.php
But when I use the tick in with wget I get strange errors. Does anybody know how to get it to work?

The backtick in your request is a straightforward issue, although you may have a second one lurking in there.
The word you are looking for is 'escape' - the backtick has a special meaning on the commandline and you need to escape it so that it is not interpreted as such. In the bash shell (typical linux console) the escape character is \ - if you put that in front of the backtick, it will no longer get interpreted.
The second potential issue is with the way you are using wget - are you certain that is the request you are meant to send? Are you trying to authenticate with the server using a web form or with Basic, Digest or some other form of HTTP authentication?
If this is the manner in which you should be authenticating, then you will also need to percent encode the --post-data as wget will not do this for you.

Download a file with machine-readable progress output

I need a (linux) program that can download from a HTTP (or optionally FTP) source, and output its progress to the terminal, in a machine-readable form.
What I mean by this is I would like it to NOT use a progress bar, but output progress as a percentage (or other number), one line at a time.
As far as I know, both wget and curl don't support this.

Use wget. The percentage is already there.
PS. Also, this isn't strictly programming related..

Try to use curl with PipeViewer (http://www.ivarch.com/programs/quickref/pv.shtml).

Presumably you want another script or application to read the progress and do something with it, yes? If this is the case, then I'd suggest using libcurl in that application/script to do the downloading. You'll be able to easily process the progress and do whatever you want with it. This is far easier than trying to parse output from wget or curl.
The progress bar from curl and wget can be parsed, just ignore the bar itself and extract the % done, time left, data downloaded, and whatever metrics you want. The bar is overwritten using special control characters. When parsed by another application, you will see many \r's and \b's.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to handle special characters in a wget download link? - linux

I have a link like this: wget --user=user_nm --http-password=pass123 https://site.domain.com/Folder/Folder/page.php?link=/Folder/Folder/Csv.Stock.php\&namefile=STOCK.Stock.csv But while the password authorization is fine, wget still cannot process the link. Why?

Related

Is there a Linux command line utility for getting random data to work with from the web?

Linux tool to bulk change a website from http:// to https://

Trying to extract field from browser page

Using ' in Shellscript (wget)

Download a file with machine-readable progress output

Categories

Resources