Download xml file from the server with Python3 - python-3.x

am trying to download a xml file from public data bank
http://api.worldbank.org/v2/en/indicator/SP.POP.TOTL?downloadformat=xml
I tried to do it with requests:
import requests
response = requests.get(url)
response.encoding = 'utf-8' #or response.apparent_encoding
print(response.content)
and wget
import wget
wget.download(url, './my.xml')
But both of the ways provide mess instead of a correct file (it looks like a broken encoding, but I cannot fix it)
If I try to download the file via web browser I get correct a UTF-8 xml file.
What am I doing wrong in the code?

Related

Download file from website directly into Linux directory - Python

If I manually click on button, the browser starts downloading a CSV file (2GB) onto my computer. But I want to automate this.
This is the link to download:
https://data.cityofnewyork.us/api/views/bnx9-e6tj/rows.csv?accessType=DOWNLOAD
Issue; when I use either (requests or pandas) libraries it just hangs. I have no idea if it is being downloaded or not.
My goal is to:
Know if the file is being downloaded and
Have the CSV downloaded to a specified directory ie.
~/mydirectory
Can someone provide the code to do this?
Try this...
import requests
URL = "https://data.cityofnewyork.us/api/views/bnx9-e6tj/rows.csv?accessType=DOWNLOAD"
response = requests.get(URL)
print('Download Complete')
open("/mydirectory/downloaded_file.csv", "wb").write(response.content)
Or you could do it this way and have a progress bar ...
import wget
wget.download('https://data.cityofnewyork.us/api/views/bnx9-e6tj/rows.csv?accessType=DOWNLOAD')
The output will look like this:
11% [........ ] 73728 / 633847

Python configparser remote file in Gitlab

I have requirement to refactor a K8s Python app so that it gets some configuration from a remote Giltab project because for various reasons we want to decouple applicaton settings from our pipeline/deployment environment.
In my functional testing, this works:
import configparser
config = configparser.ConfigParser()
config_file = "config.ini" # local file for testing
config.read(config_file)
['config.ini']
However, when I attempt to read the configuration from a remote file (our requirement), this DOES NOT work:
import requests
import os
token = os.environ.get('GITLAB_TOKEN')
headers = {'PRIVATE_TOKEN': token}
params = { 'ref' : 'master' }
response = requests.get('https:/path/to/corp/gitlab/file/raw', params=params,
headers=headers
config = configparser.ConfigParser()
configfile = response.content.decode('utf-8')
print(configfile) # this is good!
config.read(configfile) # this fails to load the contents into configparser
[]
I get an empty list. I can create a file and or print the contents of the configfile object from the requests.get call, and the ini data looks good. config.read() seems unable to load this as an object in memory and only seems to work by reading a file from disk. Seems like writing the contents of the requests.get to a local .ini file would defeat the whole purpose of using the remote configuration repo.
Is there a good way to read that configuration file from the remote and have configparser access it at container runtime?
I got this working with:
config.read_string(configfile)

Base64 encoded file says "GZIP", but decoding it in Python outputs corrupt HTML

I'm having trouble reading data from files I have from an old backup (Windows system).
Example how the content looks like:
GZIP
-}_HTML>
<H AD>
<META HTTP-EQUV="Conten-Type" CO!TENT="tex/html; chrset=wind&ws-1252">
It's almost proper HTML... but some characters are corrupted.
In Base64, it looks like this:
R1pJUAwAAAAKAAAALX0AAF9IVE1MPg0KPEggQUQ+DQo8TUVUQSBIVFRQLUVRVQ5WPSJDb250ZW4ZLVR5cGUiIENPIVRFTlQ9InRleBwvaHRtbDsgY2gTcnNldD13aW5kJndzLTEyNTIi
Since it says "GZIP" at the top, I tried decompressing it with gzip in Python.
import zlib
import base64
s = "R1pJUAwAAAAKAAAALX0AAF9IVE1MPg0KPEggQUQ+DQo8TUVUQSBIVFRQLUVRVQ5WPSJDb250ZW4ZLVR5cGUiIENPIVRFTlQ9InRleBwvaHRtbDsgY2gTcnNldD13aW5kJndzLTEyNTIi"
s = base64.b64decode(s.encode('Latin1'))
zlib.decompress(s, 31)
Though I'm getting the error:
zlib.error: Error -3 while decompressing data: incorrect header check
Same with this code:
import gzip
s = gzip.decompress(s)
s = str(s,'utf-8')
print(s)
gzip.BadGzipFile: Not a gzipped file (b'GZ')
Any idea how I can recover data from this file?
It is neither gzip nor any sort of compression at all. Despite the word "GZIP" at the top. It is what you see.

How to download zip file from a Hyperlink in python

there is website with url
https://www.fda.gov/drugs/drug-approvals-and-databases/drugsfda-data-files
and there is one downloadable file
Drugs#FDA Download File (ZIP - 3.2MB) as Hyperlink in the content of the site.
I have tried the code as below
import urllib.request
import gzip
url = 'https://www.fda.gov/media/89850/download'
with urllib.request.urlopen(url) as response:
with gzip.GzipFile(fileobj=response) as uncompressed:
file_header = uncompressed.read()
But i am getting error of : Not a Zipped file
you can use the python requests library to get the data from the url then write the contents to a file.
import requests
with open('my_zip_file.zip', 'wb') as my_zip_file:
data = requests.get('https://www.fda.gov/media/89850/download')
my_zip_file.write(data.content)
This will create a file in the same directory. you can of course name your file anything.

Download files with Python - "unknown url type"

I need to download a list of RTF files locally with Python3.
I tried with urllib
import urllib
url = "www.calhr.ca.gov/Documents/wfp-recruitment-flyer-bachelor-degree-jobs.rtf"
urllib.request.urlopen(url)
but I get a ValueError
ValueError: unknown url type: 'www.calhr.ca.gov/Documents/wfp-recruitment-flyer-bachelor-degree-jobs.rtf'
How to deal with this kind of file format?
Try adding http:// in front of the url,
import urllib
url = "http://www.calhr.ca.gov/Documents/wfp-recruitment-flyer-bachelor-degree-jobs.rtf"
urllib.request.urlopen(url)

Resources