Is os.system() the best way to wget a group of files within a Python script? - python-3.x

I'd like to download a bunch of files hosted and password-protected at a url onto a directory within a Python script. The vision is that I'd one day be able to use joblib or something to download each file in parallel, but for now, I'm just focusing on the wget command.
Right now, I can download a single file using:
import os
os.system("wget --user myUser --password myPassword --no-parent -nH --recursive -A gz,pdf,bam,vcf,csv,txt,zip,html https://url/to/file")
However, there are some issues with this - for example, there isn't a record of how the download is proceeding - I only know it is working because I can see the file appear on my directory.
Does anyone have suggestions for how I can improve this, especially in light of the fact that I'd one day like to download many files in parallel, and then go back to see which ones failed?
Thanks for your help!

There are some good libraries to download files via HTTP natively in Python, rather than launching external programs. A very popular one which is powerful yet easy to use is called Requests: https://requests.readthedocs.io/en/master/
You'll have to implement certain features like --recursive yourself if you need those (though your example is confusing because you use --recursive but say you're downloading one file). See for example recursive image download with requests .
If you need a progress bar you can use another library called tqdm in conjunction with Requests. See Python progress bar and downloads .
If the files you're downloading are large, here is an answer I wrote showing how to get the best performance (as fast as wget): https://stackoverflow.com/a/39217788/4323 .

Related

How do I run a .py script?

I just started learning Python last week to automate some stuff I do (thanks to automatetheboringstuff.com). Assume I know nothing about programming. The only thing I know is HTML and CSS.
I created a simple automation workflow already and I want to improve not the code (maybe in the future because it's not yet finished) but how I can maintain my setup/program on two laptops -- Both Mac OS running on High Sierra.
I have a .py file that contains my automated workflow. I don't know where to place it. It currently resides in my Dropbox so i can use it on laptop1 and laptop2.
I also created a virtualenv for each machine and did the requirements.txt thing as well (just to prep for the future). The directory is on both username/python/project_name.
I read in some posts that these files and other resources can exist anywhere whether inside each virtualenv or not. And that it's just a preference. I also read that the virtualenv itself isn't recommended to be placed inside apps like Dropbox (that's why i separated it on each laptop).
I switch between both laptops frequently. The environment which contains the packages doesn't really concern me that much when switching. It's the other files that is bothering me. For example, there's an image I need, this has to be available on both laptops so my solution to this is to have a Resources folder inside Dropbox as well. It currently looks like this:
Dropbox
Projects
Project 1 files (images, etc.)
Project 2 files (images, etc.)
Workflows (this would contain my completed .py files)
I read some stuff about the virtualenvwrapper, but haven't looked at it yet. Maybe in the future when i do have more projects to manage. Because right now, it's just this one.
Lastly, I noticed that every time i open up Terminal and activate my virtualenv, the file directory is in Users/username
How can i set it to default to Dropbox/Projects/project_name? I always have to set it using the chdir(). That way, when i do have multiple projects (and virtualenv) i don't have to worry about where the files load/ save.
Finally, how do I run the .py script? If i open the IDLE, open the .py file there, and use f5, it runs properly. But as far as I know, that doesn't look into the virtualenv i setup. Is that correct?
I tried right-clicking, then Open With > Python Launcher the .py file. and i'm getting an error saying there are no modules found. It seems it's not loading the right virtualenv. So there must be something wrong with the file i made.
Then I read about the #! you place at the beginning of the .py files but i don't understand it. Can someone explain that further? Is that why my file isn't loading properly?
Thanks for helping out!
You can run .py scripts from the command line using:
python test.py
That tells terminal to run test.py in the python interpreter and send the output to your terminal, just like when you run it in the IDLE. If your .py script is not in your current directory and you don't want to change directories, you can access it using it's absolute path:
python /Users/username/Dropbox/Workflows/test.py
As long as you have already activated your virtualenv, it should run your script using only the libraries you have added to your virtualenv. Also, once your virtualenv is activated, you can move around directories using "cd" and it will bring your virtualenv with you.

Linux bash to compare two files but the second file must be find

I have a batch that integrates an xml file time to time but could happens daily. After it integrates it puts in a folder like /archives/YYMMDD(current day). The problem is if the same file is integrated twice. So I need a script what verifys the file (with diff command its possible but risky to make a bottleneck) but the problem is I can't find to resolve how to make to give the second files location.
P.S. I can't install on the server anything.
Thanks in advance.

AutoIt unzipping files

I've been searching all day for a solution to unzip a file with AutoIt Script. I would like to unzip a file called full.zip to a folder.
This is my last place to turn since I can't find a solution of my own. I have found many solutions made by others; AutoIt3 files containing functions, but the code has issues of which I do not understand, and I'm unable to them copy here because I'm using a screen reader and it doesn't seem to format properly. This is why I can not copy code here.
Does anyone know of a method, tutorial or resource that I can use to unzip a file with AutoIt?
Thanks for any help,
josh.
There are a lot of solutions people have coded. A few examples are the 7zip UDF, Zip.au3, zipfldr UDF. If those are not working for you it is most likely because of small changes to AutoIt, which is usually just #incudes being restructured.
I usually just keep 7za.exe (7-zip's standalone executable, 7-zip can be downloaded from here, and then after installing you can copy the 7za.exe from its program directory).
Then it becomes as simple as a call to RunWait to create the archive:
RunWait("7za.exe a MyNewArchive.zip file1.ext file2.ext ...")
And then to unzip:
RunWait('7za.exe x MyArchive.zip -o"Path\To\MyOutputFolder"')
The 7-zip FAQ also mentions that you can use this exe in your own applications (including commercial ones) provided you mention it in the documentation and provide a link. That means you are ok to use FileInstall(...) to include 7za.exe in the compiled .exe.

Trying to extract field from browser page

I'm trying to extract one field from Firefox to my local Ubuntu 12.04 PC and Mac OS 19.7.4 from an online form
I can manually save the page locally as a text document and then search for the text using Unix script but this seems rather cumbersome & I require it to be automated. Is there another more efficient method?
My background is on Macs but the company is trialling Linux PC's, so please be tolerant of my relevant Ubuntu ignorance.
If you mean to program something try
WWW:Mechanize library, it have python and perl bindings,
several mousescripting engines in lunux, (actionaz)
test automation tool which works with firefox (Selenium)
You can do it by simple BASH script.
Take a look at some useful stuff like:
wget
sed
grep
and then nothing will by cumbersome and everything can go automatic.
If you want to go with the method that you mentioned, you can use curl to automate the saving of the form. Your BASH script would then look something like this:
curl http://locationofonlineform.com -o tempfile
valueOfField=$(grep patternToFindField tempfile)
// Do stuff
echo $valueOfField
If you want to get rid of the temporary file, you can directly feed the result of curl into the grep command.

Capturing all the data that has changed during a Linux install

I am trying to figure out which files were changed when I run an app install via make install. I can look at the script, but that calls other scripts and may or may not touch other files, etc. How can I do this programmatically?
Implementation: http://asic-linux.com.mx/~izto/checkinstall/
Several ways come to mind. First, use some sort of LD_PRELOAD to track all files opened. Second approach, compare filesystem before and after.
If your kernel supports it, you can use inotify (a handy interface is inotify tools) and watch your home directory, if the package was configured with --prefix=/home/myusername
I've noticed that checkinstall (using installwatch via LD_PRELOAD) does not always catch everything, the last time I used it it did not catch empty directories that were created for spooling, which caused the subsequent generated .deb's to break.
Note, don't use inotify if you are installing to /, in that case you have to use installwatch or just read all of the makefiles / install scripts closely.

Resources