Download a file with machine-readable progress output - linux

I need a (linux) program that can download from a HTTP (or optionally FTP) source, and output its progress to the terminal, in a machine-readable form.
What I mean by this is I would like it to NOT use a progress bar, but output progress as a percentage (or other number), one line at a time.
As far as I know, both wget and curl don't support this.

Use wget. The percentage is already there.
PS. Also, this isn't strictly programming related..

Try to use curl with PipeViewer (http://www.ivarch.com/programs/quickref/pv.shtml).

Presumably you want another script or application to read the progress and do something with it, yes? If this is the case, then I'd suggest using libcurl in that application/script to do the downloading. You'll be able to easily process the progress and do whatever you want with it. This is far easier than trying to parse output from wget or curl.
The progress bar from curl and wget can be parsed, just ignore the bar itself and extract the % done, time left, data downloaded, and whatever metrics you want. The bar is overwritten using special control characters. When parsed by another application, you will see many \r's and \b's.

Related

Is os.system() the best way to wget a group of files within a Python script?

I'd like to download a bunch of files hosted and password-protected at a url onto a directory within a Python script. The vision is that I'd one day be able to use joblib or something to download each file in parallel, but for now, I'm just focusing on the wget command.
Right now, I can download a single file using:
import os
os.system("wget --user myUser --password myPassword --no-parent -nH --recursive -A gz,pdf,bam,vcf,csv,txt,zip,html https://url/to/file")
However, there are some issues with this - for example, there isn't a record of how the download is proceeding - I only know it is working because I can see the file appear on my directory.
Does anyone have suggestions for how I can improve this, especially in light of the fact that I'd one day like to download many files in parallel, and then go back to see which ones failed?
Thanks for your help!
There are some good libraries to download files via HTTP natively in Python, rather than launching external programs. A very popular one which is powerful yet easy to use is called Requests: https://requests.readthedocs.io/en/master/
You'll have to implement certain features like --recursive yourself if you need those (though your example is confusing because you use --recursive but say you're downloading one file). See for example recursive image download with requests .
If you need a progress bar you can use another library called tqdm in conjunction with Requests. See Python progress bar and downloads .
If the files you're downloading are large, here is an answer I wrote showing how to get the best performance (as fast as wget): https://stackoverflow.com/a/39217788/4323 .

Is there a Linux command line utility for getting random data to work with from the web?

I am a Linux newbie and I often find myself working with a bunch of random data.
For example: I would like to work on a sample text file to try out some regular expressions or read some data into gnuplot from some sample data in a csv file or something.
I normally do this by copying and pasting passages from the internet but I was wondering if there exists some combination of commands that would allow me to do this without having to leave the terminal. I was thinking about using something like the curl command but I dont exactly know how it works...
To my knowledge there are websites that host content. I would simply like to access them and store them in my computer.
In conclusion and as a concrete example, how would i copy and paste a random passage off the internet from a website and store it in a file in my system using only the command line? Maybe you can point me in the right direction. Thanks.
You could redirect the output of a curl command into a file e.g.
curl https://run.mocky.io/v3/5f03b1ef-783f-439d-b8c5-bc5ad906cb14 > data-output
Note that I've mocked data in Mocky which is a nice website for quickly mocking an API.
I normally use "Project Gutenberg" which has 60,000+ books freely downloadable online.
So, if I want the full text of "Peter Pan and Wendy" by J.M. Barrie, I'd do:
curl "http://www.gutenberg.org/files/16/16-0.txt" > PeterPan.txt
If you look at the page for that book, you can see how to get it as HTML, plain text, ePUB or UTF-8.

Trying to extract field from browser page

I'm trying to extract one field from Firefox to my local Ubuntu 12.04 PC and Mac OS 19.7.4 from an online form
I can manually save the page locally as a text document and then search for the text using Unix script but this seems rather cumbersome & I require it to be automated. Is there another more efficient method?
My background is on Macs but the company is trialling Linux PC's, so please be tolerant of my relevant Ubuntu ignorance.
If you mean to program something try
WWW:Mechanize library, it have python and perl bindings,
several mousescripting engines in lunux, (actionaz)
test automation tool which works with firefox (Selenium)
You can do it by simple BASH script.
Take a look at some useful stuff like:
wget
sed
grep
and then nothing will by cumbersome and everything can go automatic.
If you want to go with the method that you mentioned, you can use curl to automate the saving of the form. Your BASH script would then look something like this:
curl http://locationofonlineform.com -o tempfile
valueOfField=$(grep patternToFindField tempfile)
// Do stuff
echo $valueOfField
If you want to get rid of the temporary file, you can directly feed the result of curl into the grep command.

Is there a Linux command to replicate/replace the "banner" command?

I am writing a script on Red Hat Linux (I forget the version) that needs a header, but the banner command is not there for me to use and I won't be able to get it installed. I read via Google that it may well have been deprecated.
So is there a new version of the command that produces similar results, or a way I can replicate the command, or even just temporarily change the script output so that characters are a different size?
I've tried looking at stty but we don't access via xterm, we log in directly via putty.
In its simplest form, 'banner' is less than a few pages of code (e.g. this one). Perhaps you could just compile and run it from your home directory?
Use some web site, for example http://patorjk.com/software/taag/.
If you need it frequently you can create a script to scrap the result.
BTW, stty has nothing to do with your problem, I don't know why you mentioned it.

how can I extract text contents from GUI apps in linux?

I want to extract text contents from GUI apps,here are 2 examples::
example 1:
Suppose I opened firefox, and input url : www.google.com
how can I extract the string "www.google.com" from firefox using my own app ?
example 2:
open calculator(using gcalctool),then input 1+1
How can I extract the string "1+1" of calculator from my own program?
in brief ,what I want is to find out whether there is a way to extract the text contents from any widget of an GUI application
Thanks
I don't think there's a generic way to do this, at least not a very elegant one.
Some inelegant ideas:
You might be able to modify the X window system or even some toolkit framework to extract what is being displayed in specific window elements as text.
You could take a screenshot and use an OCR library to convert the pixels back into text for the interesting areas.
You could recompile the apps of interest to add some kind of mechanism for asking them questions.
You could use something like xtest to inject events highlighting the region of interest and copying it to the clipboard.
I believe firefox and gcalctool are for examples only and you just want to know in general how to pass output of one application to other application.
There are many ways to do that on Linux, like:
piping
application1 | application2
btw here is the Firefox command line manual if you want to start firefox on Ubuntu with a URL. eg:
firefox "$url"
where $url is a variable whose value can be www.mozilla.org
That sounds difficult. Supposing you're running X11, you can very easily grab a window picture ( see "man xwd"); however there is no easy way to get to the text unless it's selected and therefore copied to the clipboard.
Alternatively, if you only want to capture user input, this is quite easy to do, too, by activation the X11 record extension: put this in your /etc/X11/xorg.conf:
Section "Module"
Load "record"
#Load other modules you need ...
EndSection
though it may prove difficult to use too, see example code for Xorg/X11 record extension fails

Resources