I have to concatenate a hardcoded path of "string" type to a URL to have a result which is a URL.
url (which doesn't end with "/") + "/path/to/file/" = new_url
I tried concatenation using URL join and also tried used simple string concat but the result is not a URL which can be reached. (not that the URL address is invalid )
mirror_url = "http://amazonlinux.us-east-
2.amazonaws.com/2/core/latest/x86_64/mirror.list"
response = requests.get(mirror_url)
contents_in_url = response.content
## returns a URL as shown below but of string type which cannot be
##concatenated to another string type which could be requested as a valid
##URL.
'http://amazonlinux.us-east- 2.amazonaws.com/2/core/2.0/x86_64/8cf736cd3252ada92b21e91b8c2a324d05b12ad6ca293a14a6ab7a82326aec43'
path_to_add_to_url = "/repodata/primary.sqlite.gz"
final_url = contents_in_url + path_to_add_to_url
Desired Result:
Without omitting any path to that file.
final_url = "http://amazonlinux.us-west-2.amazonaws.com/2/core/2.0/x86_64/8cf736cd3252ada92b21e91b8c2a324d05b12ad6ca293a14a6ab7a82326aec43/repodata/primary.sqlite.gz"
You need to get contents of the first response by response.text method, not response.content:
import requests
mirror_url = "http://amazonlinux.us-east-2.amazonaws.com/2/core/latest/x86_64/mirror.list"
response = requests.get(mirror_url)
contents_in_url = response.text.strip()
path_to_add_to_url = "/repodata/primary.sqlite.gz"
response = requests.get(contents_in_url + path_to_add_to_url)
with open('primary.sqlite.gz', 'wb') as f_out:
f_out.write(response.content)
print('Downloading done.')
Related
I've been working on a mini project with Python3 and tkinter recently that is used to sanitise URLs and IP addresses. I've hit a roadblock with my function that I cannot workout. What I am trying to achieve is:
Has a user entered a URL such as http://www.google.com or https://www.google.com and if so, sanitise as:
hxxp[:]//www[.]google[.]com or hxxps[:]//www[.]google[.]com
Has a user entered an IP address such as 192.168.1.1 or http://192.168.1.1 and sanitise as:
192[.]168[.]1[.]1 or hxxp[:]//192[.]168[.]1[.]1
Has a user entered already sanitised input? Is there unsanitised input along with it? If so, just sanitise the unsanitised input and print them to the results output Textbox.
I have included a screenshot of what is currently happening to my normal input, after input is sanitised and how I want to handle the above issues.
Also: Is the .strip() in the OutputTextbox.insert line redundant?
I appreciate any help and recommendations!
def printOut():
outputTextbox.delete("1.0", "end")
url = inputTextbox.get("1.0", "end-1c")
if len(url) == 0:
tk.messagebox.showerror("Error", "Please enter content to sanitise")
if "hxxp" and "[:]" and "[.]" in url or "hxxps" and "[:]" and "[.]" in url:
outputTextbox.insert("1.0", url, "\n".strip())
pass
elif "http" and ":" and "." in url:
url = url.replace("http", "hxxp")
url = url.replace(":", "[:]")
url = url.replace(".", "[.]")
outputTextbox.insert("1.0", url, "\n".strip())
elif "https" and ":" and "." in url:
url = url.replace("https", "hxxps")
url = url.replace(":", "[:]")
url = url.replace(".", "[.]")
outputTextbox.insert("1.0", url, "\n".strip())
elif "http" and ":" and range(0, 10) and "." in url or range(0, 10) and "." in url:
url = url.replace("http", "hxxp")
url = url.replace(".", "[.]")
outputTextbox.insert("1.0", url, "\n".strip())
The expression like A AND B AND C IN URL will has result like A AND B AND (C IN URL), not what you expect that A, B, C are all found in URL.
You can use re (regex module) to achieve what you want:
import re
def printOut():
outputTextbox.delete("1.0", "end")
url = inputTextbox.get("1.0", "end-1c")
if len(url) == 0:
messagebox.showerror("Error", "Please enter content to sanitise")
result = url.replace("http", "hxxp")
result = re.sub(r"([^\[]):([^\]])", r"\1[:]\2", result)
result = re.sub(r"([^\[])\.([^\]])", r"\1[.]\2", result)
outputTextbox.insert("end", result, "\n")
There may be better regex for that.
I have a piece of Python 3 code that fetches a webpage every 10 seconds which gives back some JSON information:
s = requests.Session()
while True:
r = s.get(currenturl)
data = r.json()
datetime = data['Timestamp']['DateTime']
value = data['PV']
print(str(datetime) + ": " + str(value) + "W")
time.sleep(10)
The output of this code is:
2020-10-13T13:26:53: 888W
2020-10-13T13:26:53: 888W
2020-10-13T13:26:53: 888W
2020-10-13T13:26:53: 888W
As you can see, the DateTime does not change with every iteration. When I refresh the page manually in my browser it does get updated every time.
I have tried adding
Cache-Control max-age=0
to the headers of my request but that does not resolve the issue.
Even when explicitely setting everything to None after loop, the same issue remains:
while True:
r = s.get(currenturl, headers={'Cache-Control': 'no-cache'})
data = r.json()
datetime = data['Timestamp']['DateTime']
value = data['PV']
print(str(datetime) + ": " + str(value) + "W")
time.sleep(10)
counter += 1
r = None
data = None
datetime = None
value = None
How can I "force" a refresh of the page with requests.get()?
It turns out this particular website doesn't continuously refresh on its own, unless the request comes from its parent url.
r = s.get(currenturl, headers={'Referer' : 'https://originalurl.com/example'})
I had to include the original parent URL as referer. Now it works as expected:
2020-10-13T15:32:27: 889W
2020-10-13T15:32:37: 889W
2020-10-13T15:32:47: 884W
2020-10-13T15:32:57: 884W
2020-10-13T15:33:07: 894W
How to generate a new access token with the use of refresh token in python.if I'm using google fit API.?
I need to update that i have found my answer
from urllib2 import Request, urlopen, URLError
import json
import mimetools
BOUNDARY = mimetools.choose_boundary()
CRLF = '\r\n'
def EncodeMultiPart(fields, files, file_type='application/xml'):
"""Encodes list of parameters and files for HTTP multipart format.
Args:
fields: list of tuples containing name and value of parameters.
files: list of tuples containing param name, filename, and file contents.
file_type: string if file type different than application/xml.
Returns:
A string to be sent as data for the HTTP post request.
"""
lines = []
for (key, value) in fields:
lines.append('--' + BOUNDARY)
lines.append('Content-Disposition: form-data; name="%s"' % key)
lines.append('') # blank line
lines.append(value)
for (key, filename, value) in files:
lines.append('--' + BOUNDARY)
lines.append(
'Content-Disposition: form-data; name="%s"; filename="%s"'
% (key, filename))
lines.append('Content-Type: %s' % file_type)
lines.append('') # blank line
lines.append(value)
lines.append('--' + BOUNDARY + '--')
lines.append('') # blank line
return CRLF.join(lines)
def refresh_token():
url = "https://oauth2.googleapis.com/token"
headers = [
("grant_type", "refresh_token"),
("client_id", "xxxxxx"),
("client_secret", "xxxxxx"),
("refresh_token", "xxxxx"),
]
files = []
edata = EncodeMultiPart(headers, files, file_type='text/plain')
#print(EncodeMultiPart(headers, files, file_type='text/plain'))
headers = {}
request = Request(url, headers=headers)
request.add_data(edata)
request.add_header('Content-Length', str(len(edata)))
request.add_header('Content-Type', 'multipart/form-data;boundary=%s' % BOUNDARY)
response = urlopen(request).read()
print(response)
refresh_token()
#response = json.decode(response)
#print(refresh_token())
import http.client
import urllib.parse
def unshorten_url(url):
parsed = urllib.parse.urlparse(url)
h = http.client.HTTPConnection(parsed.netloc)
resource = parsed.path
if parsed.query != "":
resource += "?" + parsed.query
h.request('HEAD', resource )
response = h.getresponse()
if response.status/100 == 3 and response.getheader('Location'):
return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
else:
return url
unshorten_url("http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34")
Input will be :
http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34 #yes the same is returned.'
Output URL after unshorten which i need : https://ec.europa.eu/esco/portal/occupation?uri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F00030d09-2b3a-4efd-87cc-c4ea39d27c34&conceptLanguage=en&full=true#&uri=http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34'
As you can see I have two URLs one which Short URL which is my input and The other one is Full URL, to achieve the required output URL I identified a pattern from a set of the same kind URLs. And I wrote this code and achieved the required output.
my_url = "http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34"
a="https://ec.europa.eu/esco/portal/occupationuri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F"
b = my_url.split("/")[-1]
URL = a+ b+ "&conceptLanguage=en&full=true#&uri=" + my_url
the output i.e; Required full URL is URL.
URL = " https://ec.europa.eu/esco/portal/occupation?uri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F00030d09-2b3a-4efd-87cc-c4ea39d27c34&conceptLanguage=en&full=true#&uri=http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34'"
I am trying to upload a blob (pdf) file from my laptop to a container in Azure storage account. I found it to be working but with one glitch.
I am calculating the file size using:
f_info = os.stat(file_path)
file_size = (f_info.st_size) # returns - 19337
Then I insert this value in below canonicalized header:
ch = "PUT\n\n\n"+str(file_size)+"\n\napplication/pdf\n\n\n\n\n\n\nx-ms-blob-type:BlockBlob" + "\nx-ms-date:" + date + "\nx-ms-version:" + version + "\n"
and send the PUT request to PUT Blob API, however, it returns an error saying, "Authentication failed because the server used below below string to calculate the signature"
\'PUT\n\n\n19497\n\napplication/pdf\n\n\n\n\n\n\nx-ms-blob-type:BlockBlob\nx-ms-date:[date]\nx-ms-version:[API version]
Looking at this string it obvious that authentication failed because file size which azure calculated returns a different value! I don't understand how its calculating this value of file size?!?!
FYI: If I replace 19337 with 19497 in canonicalized string and re run. It works!
Any suggestion on where I am making mistakes?
Below is the code:
storage_AccountName = '<storage account name>'
storage_ContainerName = "<container_name>"
storageKey='<key>'
fd = "C:\\<path>\\<to>\\<file_to_upload>.pdf"
URI = 'https://' + storageAccountName + '.blob.core.windows.net/<storage_ContainerName >/<blob_file_name.pdf>
version = '2017-07-29'
date = datetime.datetime.utcnow().strftime("%a, %d %b %Y %H:%M:%S GMT")
if os.path.isfile(fd):
file_info = os.stat(fd)
file_size = (file_info.st_size)
ch = "PUT\n\n\n"+str(file_size)+"\n\napplication/pdf\n\n\n\n\n\n\nx-ms-blob-type:BlockBlob" + "\nx-ms-date:" + date + "\nx-ms-version:" + version + "\n"
cr = "/<storage_AccountName>/<storage_Containername>/<blob_file_name.pdf>"
canonicalizedString = ch + cr
storage_account_key = base64.b64decode(storageKey)
byte_canonicalizedString=canonicalizedString.encode('utf-8')
signature = base64.b64encode(hmac.new(key=storage_account_key, msg=byte_canonicalizedString, digestmod=hashlib.sha256).digest())
header = {
'x-ms-blob-type': "BlockBlob",
'x-ms-date': date,
'x-ms-version': version,
'Authorization': 'SharedKey ' + storageAccountName + ':' + signature.decode('utf-8'),
#'Content-Length': str(19497), # works
'Content-Length': str(file_size), # doesn't work
'Content-Type': "application/pdf"}
files = {'file': open(fd, 'rb')}
result = requests.put(url = URI, headers = header, files = files)
print (result.content)
As mentioned in the comments, the reason you're getting the content length mismatched header is because instead of uploading the file, you're uploading an object which contains file contents and that is causing the content length to increase.
Please change the following line of codes:
files = {'file': open(fd, 'rb')}
result = requests.put(url = URI, headers = header, files = files)
to something like:
data = open(fd, 'rb') as stream
result = requests.put(url = URI, headers = header, data = data)
And now you're only uploading the file contents.