python requests.get() does not refresh the page - python-3.x

I have a piece of Python 3 code that fetches a webpage every 10 seconds which gives back some JSON information:
s = requests.Session()
while True:
r = s.get(currenturl)
data = r.json()
datetime = data['Timestamp']['DateTime']
value = data['PV']
print(str(datetime) + ": " + str(value) + "W")
time.sleep(10)
The output of this code is:
2020-10-13T13:26:53: 888W
2020-10-13T13:26:53: 888W
2020-10-13T13:26:53: 888W
2020-10-13T13:26:53: 888W
As you can see, the DateTime does not change with every iteration. When I refresh the page manually in my browser it does get updated every time.
I have tried adding
Cache-Control max-age=0
to the headers of my request but that does not resolve the issue.
Even when explicitely setting everything to None after loop, the same issue remains:
while True:
r = s.get(currenturl, headers={'Cache-Control': 'no-cache'})
data = r.json()
datetime = data['Timestamp']['DateTime']
value = data['PV']
print(str(datetime) + ": " + str(value) + "W")
time.sleep(10)
counter += 1
r = None
data = None
datetime = None
value = None
How can I "force" a refresh of the page with requests.get()?

It turns out this particular website doesn't continuously refresh on its own, unless the request comes from its parent url.
r = s.get(currenturl, headers={'Referer' : 'https://originalurl.com/example'})
I had to include the original parent URL as referer. Now it works as expected:
2020-10-13T15:32:27: 889W
2020-10-13T15:32:37: 889W
2020-10-13T15:32:47: 884W
2020-10-13T15:32:57: 884W
2020-10-13T15:33:07: 894W

Related

How can I unshorten a URL in python 3

import http.client
import urllib.parse
def unshorten_url(url):
parsed = urllib.parse.urlparse(url)
h = http.client.HTTPConnection(parsed.netloc)
resource = parsed.path
if parsed.query != "":
resource += "?" + parsed.query
h.request('HEAD', resource )
response = h.getresponse()
if response.status/100 == 3 and response.getheader('Location'):
return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
else:
return url
unshorten_url("http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34")
Input will be :
http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34 #yes the same is returned.'
Output URL after unshorten which i need : https://ec.europa.eu/esco/portal/occupation?uri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F00030d09-2b3a-4efd-87cc-c4ea39d27c34&conceptLanguage=en&full=true#&uri=http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34'
As you can see I have two URLs one which Short URL which is my input and The other one is Full URL, to achieve the required output URL I identified a pattern from a set of the same kind URLs. And I wrote this code and achieved the required output.
my_url = "http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34"
a="https://ec.europa.eu/esco/portal/occupationuri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F"
b = my_url.split("/")[-1]
URL = a+ b+ "&conceptLanguage=en&full=true#&uri=" + my_url
the output i.e; Required full URL is URL.
URL = " https://ec.europa.eu/esco/portal/occupation?uri=http%3A%2F%2Fdata.europa.eu%2Fesco%2Foccupation%2F00030d09-2b3a-4efd-87cc-c4ea39d27c34&conceptLanguage=en&full=true#&uri=http://data.europa.eu/esco/occupation/00030d09-2b3a-4efd-87cc-c4ea39d27c34'"

Getting the number of old issues and the table (login and number of commits) of the most active members of the repository

I can not get the above information using github.api. Reading the documentation did not help much. There is still no complete understanding of the work with dates. Here is an example of my code for getting open issues:
import requests
import json
from datetime import datetime
username = '\'
password = '\'
another_page = True
opened = 0
closed = 0
api_oldest = 'https://api.github.com/repos/grpc/grpc/issues?
per_page=5&q=sort=created:>`date -v-14d "+%Y-%m-%d"`&order=asc'
api_issue = 'https://api.github.com/repos/grpc/grpc/issues?
page=1&per_page=5000'
api_pulls = 'https://api.github.com/repos/grpc/grpc/pulls?page=1'
datetime.now()
while another_page:
r = requests.get(api_issue, auth=(username, password))
#json_response = json.loads(r.text)
#results.append(json_response)
if 'next' in r.links:
api_issue = r.links['next']['url']
if item['state'] == 'open':
opened += 1
else:
closed += 1
else:
another_page=False
datetime.now()
print(opened)
There are a few issues with your code. For example, what does item represent ?. Your code can be modified as follows to iterate and get the number of open issues .
import requests
username = '/'
password = '/'
another_page = True
opened = 0
closed = 0
api_issue = "https://api.github.com/repos/grpc/grpc/issues?page=1&per_page=5000"
while another_page:
r = requests.get(api_issue, auth=(username, password))
json_response = r.json()
#results.append(json_response)
for item in json_response:
if item['state'] == 'open':
opened += 1
else:
closed += 1
if 'next' in r.links:
api_issue = r.links['next']['url']
else:
another_page=False
print(opened)
If you want issues that were created in the last 14 days, you could make the api request using the following URL.
api_oldest = "https://api.github.com/repos/grpc/grpc/issues?q=sort=created:>`date -d '14 days ago'`&order=asc"

Selective download and extraction of data (CAB)

So I have a specific need to download and extract a cab file but the size of each cab file is huge > 200MB. I wanted to selectively download files from the cab as rest of the data is useless.
Done so much so far :
Request 1% of the file from the server. Get the headers and parse them.
Get the files list, their offsets according to This CAB Link.
Send a GET request to server with the Range header set to the file Offset and the Offset+Size.
I am able to get the response but it is in a way "Unreadable" cause it is compressed (LZX:21 - Acc to 7Zip)
Unable to decompress using zlib. Throws invlid header.
Also I did not quite understand nor could trace the CFFOLDER or CFDATA as shown in the example cause its uncompressed.
totalByteArray =b''
eofiles =0
def GetCabMetaData(stream):
global eofiles
cabMetaData={}
try:
cabMetaData["CabFormat"] = stream[0:4].decode('ANSI')
cabMetaData["CabSize"] = struct.unpack("<L",stream[8:12])[0]
cabMetaData["FilesOffset"] = struct.unpack("<L",stream[16:20])[0]
cabMetaData["NoOfFolders"] = struct.unpack("<H",stream[26:28])[0]
cabMetaData["NoOfFiles"] = struct.unpack("<H",stream[28:30])[0]
# skip 30,32,34,35
cabMetaData["Files"]= {}
cabMetaData["Folders"]= {}
baseOffset = cabMetaData["FilesOffset"]
internalOffset = 0
for i in range(0,cabMetaData["NoOfFiles"]):
fileDetails = {}
fileDetails["Size"] = struct.unpack("<L",stream[baseOffset+internalOffset:][:4])[0]
fileDetails["UnpackedStartOffset"] = struct.unpack("<L",stream[baseOffset+internalOffset+4:][:4])[0]
fileDetails["FolderIndex"] = struct.unpack("<H",stream[baseOffset+internalOffset+8:][:2])[0]
fileDetails["Date"] = struct.unpack("<H",stream[baseOffset+internalOffset+10:][:2])[0]
fileDetails["Time"] = struct.unpack("<H",stream[baseOffset+internalOffset+12:][:2])[0]
fileDetails["Attrib"] = struct.unpack("<H",stream[baseOffset+internalOffset+14:][:2])[0]
fileName =''
for j in range(0,len(stream)):
if(chr(stream[baseOffset+internalOffset+16 +j])!='\x00'):
fileName +=chr(stream[baseOffset+internalOffset+16 +j])
else:
break
internalOffset += 16+j+1
cabMetaData["Files"][fileName] = (fileDetails.copy())
eofiles = baseOffset + internalOffset
except Exception as e:
print(e)
pass
print(cabMetaData["CabSize"])
return cabMetaData
def GetFileSize(url):
resp = requests.head(url)
return int(resp.headers["Content-Length"])
def GetCABHeader(url):
global totalByteArray
size = GetFileSize(url)
newSize ="bytes=0-"+ str(int(0.01*size))
totalByteArray = b''
cabHeader= requests.get(url,headers={"Range":newSize},stream=True)
for chunk in cabHeader.iter_content(chunk_size=1024):
totalByteArray += chunk
def DownloadInfFile(baseUrl,InfFileData,InfFileName):
global totalByteArray,eofiles
if(not os.path.exists("infs")):
os.mkdir("infs")
baseCabName = baseUrl[baseUrl.rfind("/"):]
baseCabName = baseCabName.replace(".","_")
if(not os.path.exists("infs\\" + baseCabName)):
os.mkdir("infs\\"+baseCabName)
fileBytes = b''
newRange = "bytes=" + str(eofiles+InfFileData["UnpackedStartOffset"] ) + "-" + str(eofiles+InfFileData["UnpackedStartOffset"]+InfFileData["Size"] )
data = requests.get(baseUrl,headers={"Range":newRange},stream=True)
with open("infs\\"+baseCabName +"\\" + InfFileName ,"wb") as f:
for chunk in data.iter_content(chunk_size=1024):
fileBytes +=chunk
f.write(fileBytes)
f.flush()
print("Saved File " + InfFileName)
pass
def main(url):
GetCABHeader(url)
cabMetaData = GetCabMetaData(totalByteArray)
for fileName,data in cabMetaData["Files"].items():
if(fileName.endswith(".txt")):
DownloadInfFile(url,data,fileName)
main("http://path-to-some-cabinet.cab")
All the file details are correct. I have verified them.
Any guidance will be appreciated. Am I doing it wrong? Another way perhaps?
P.S : Already Looked into This Post
First, the data in the CAB is raw deflate, not zlib-wrapped deflate. So you need to ask zlib's inflate() to decode raw deflate with a negative windowBits value on initialization.
Second, the CAB format does not exactly use standard deflate, in that the 32K sliding window dictionary carries from one block to the next. You'd need to use inflateSetDictionary() to set the dictionary at the start of each block using the last 32K decompressed from the last block.

Lambda/boto3/python loop

This code acts as an early warning system for ADFS failures, which works fine when run locally. Problem is that when I run it in Lambda, it loops non stop.
In short:
lambda_handler() runs pagecheck()
pagecheck() produces the info needed then passes 2 lists (msgdet_list, error_list) and an int (error_count) to notification().
notification() collates and prints the output. The output is two key variables (notificationheader and notificationbody).
I've #commentedOut the SNS piece which would usually email the info, and am using print() to instead send the info to CloudWatch logs until I can get the loop sorted. Logs:
CloudWatch logs
If I run this locally, it produces a clean single output. In Lambda, the function will loop until it times out. It's almost like every time the lists are updated, they're passed to the notification() module and it's run. I can limit the function time, but would rather fix the code!
Cheers,
tac
# This python/boto3/lambda script sends a request to an Office 365 landing page, parses return details to confirm a successful redirect to /
# the organisation ADFS homepage, authenticates homepge is correct, raises any errors, and sends a consolodated report to /
# an AWS SNS topic.
# Run once to produce pageserver and htmlchar values for global variables.
# Import required modules
import boto3
import urllib.request
from urllib.request import Request, urlopen
from datetime import datetime
import time
import re
import sys
# Global variables to be set
url = "https://outlook.com/CONTOSSO.com"
adfslink = "https://sts.CONTOSSO.com/adfs/ls/?client-request-id="
# Input after first run
pageserver = "Microsoft-HTTPAPI/2.0 Microsoft-HTTPAPI/2.0"
htmlchar = 18600
# Input AWS SNS ARN
snsarn = 'arn:aws:sns:ap-southeast-2:XXXXXXXXXXXXX:Daily_Check_Notifications_CONTOSSO'
sns = boto3.client('sns')
def pagecheck():
# Present the request to the webpage as if coming from a user in a browser
user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)'
values = {'name' : 'user'}
headers = { 'User-Agent' : user_agent }
data = urllib.parse.urlencode(values)
data = data.encode('ascii')
# "Null" the Message Detail and Error lists
msgdet_list = []
error_list = []
request = Request(url)
req = urllib.request.Request(url, data, headers)
response = urlopen(request)
with urllib.request.urlopen(request) as response:
# Get the URL. This gets the real URL.
acturl = response.geturl()
msgdet_list.append("\nThe Actual URL is:")
msgdet_list.append(str(acturl))
if adfslink not in acturl:
error_list.append(str("Redirect Fail"))
# Get the HTTP resonse code
httpcode = response.code
msgdet_list.append("\nThe HTTP code is: ")
msgdet_list.append(str(httpcode))
if httpcode//200 != 1:
error_list.append(str("No HTTP 2XX Code"))
# Get the Headers as a dictionary-like object
headers = response.info()
msgdet_list.append("\nThe Headers are:")
msgdet_list.append(str(headers))
if response.info() == "":
error_list.append(str("Header Error"))
# Get the date of request and compare to UTC (DD MMM YYYY HH MM)
date = response.info()['date']
msgdet_list.append("The Date is: ")
msgdet_list.append(str(date))
returndate = str(date.split( )[1:5])
returndate = re.sub(r'[^\w\s]','',returndate)
returndate = returndate[:-2]
currentdate = datetime.utcnow()
currentdate = currentdate.strftime("%d %b %Y %H%M")
if returndate != currentdate:
date_error = ("Date Error. Returned Date: ", returndate, "Expected Date: ", currentdate, "Times in UTC (DD MMM YYYY HH MM)")
date_error = str(date_error)
date_error = re.sub(r'[^\w\s]','',date_error)
error_list.append(str(date_error))
# Get the server
headerserver = response.info()['server']
msgdet_list.append("\nThe Server is: ")
msgdet_list.append(str(headerserver))
if pageserver not in headerserver:
error_list.append(str("Server Error"))
# Get all HTML data and confirm no major change to content size by character lenth (global var: htmlchar).
html = response.read()
htmllength = len(html)
msgdet_list.append("\nHTML Length is: ")
msgdet_list.append(str(htmllength))
msgdet_list.append("\nThe Full HTML is: ")
msgdet_list.append(str(html))
msgdet_list.append("\n")
if htmllength // htmlchar != 1:
error_list.append(str("Page HTML Error - incorrect # of characters"))
if adfslink not in str(acturl):
error_list.append(str("ADFS Link Error"))
error_list.append("\n")
error_count = len(error_list)
if error_count == 1:
error_list.insert(0, 'No Errors Found.')
elif error_count == 2:
error_list.insert(0, 'Error Found:')
else:
error_list.insert(0, 'Multiple Errors Found:')
# Pass completed results and data to the notification() module
notification(msgdet_list, error_list, error_count)
# Use AWS SNS to create a notification email with the additional data generated
def notification(msgdet_list, error_list, errors):
datacheck = str("\n".join(msgdet_list))
errorcheck = str("\n".join(error_list))
notificationbody = str(errorcheck + datacheck)
if errors >1:
result = 'FAILED!'
else:
result = 'passed.'
notificationheader = ('The daily ADFS check has been marked as ' + result + ' ' + str(errors) + ' ' + str(error_list))
if result != 'passed.':
# message = sns.publish(
# TopicArn = snsarn,
# Subject = notificationheader,
# Message = notificationbody
# )
# Output result to CloudWatch logstream
print('Response: ' + notificationheader)
else:
print('passed')
sys.exit()
# Trigger the Lambda handler
def lambda_handler(event, context):
aws_account_ids = [context.invoked_function_arn.split(":")[4]]
pagecheck()
return "Successful"
sys.exit()
Your CloudWatch logs contain the following error message:
Process exited before completing request
This is caused by invoking sys.exit() in your code. Locally your Python interpreter will just terminate when encountering such a sys.exit().
AWS Lambda on the other hand expects a Python function to just return and handles sys.exit() as an error. As your function probably got invoked asynchronously AWS Lambda retries to execute it twice.
To solve your problem, you can replace the occurences of sys.exit() with return or even better, just remove the sys.exit() calls, as there would be already implicit returns in the places where you use sys.exit().

How to Infinitely Update Labels with while loop tkinter

Hi I have some simple code here. https://pastebin.com/97uuqKQD
I would simply like to add something like this, to the button function, so while the root window is open the datetime and exrates are constantly regotten from the xe website and displayed.
amount = '1'
def continuousUpdate():
while amount == '0':
results()
def results():
#Get xe data put to labels etc here
btnConvert = tk.Button(root, text="Get Exchange Rates",command=continuousUpdate).place(x=5,y=102)
Once I input the two exrates and they then display on there respective labels I would like the program to continuously grab the data from xe over and over again.
Like this code here which runs in IPython no problems,
import requests
from bs4 import BeautifulSoup
from datetime import datetime
amount = '1'
while amount != '0':
t = datetime.utcnow()
url1 = "http://www.xe.com/currencyconverter/convert/" + "?Amount=" + amount + "&From=" + cur1 + "&To=" + cur2
url2 = "http://www.xe.com/currencyconverter/convert/" + "?Amount=" + amount + "&From=" + cur2 + "&To=" + cur1
#url = "http://www.xe.com/currencycharts/" + "?from=" + cur1 + "&to=" + cur2
html_code1 = requests.get(url1).text
html_code2 = requests.get(url2).text
soup1 = BeautifulSoup(html_code1, 'html.parser')
soup2 = BeautifulSoup(html_code2, 'html.parser')
i = i + 1
rate1 = soup1.find('span', {'class', 'uccResultAmount'})
rate2 = soup2.find('span', {'class', 'uccResultAmount'})
print ('#',i, t,'\n', cur1,'-',cur2, rate1.contents[0], cur2,'-',cur1, rate2.contents[0], '\n')
I thought I would just be able to throw the entire results function into a while loop function then simply call that function but no luck any help would be appreciated.???
Put this at the end of your results function without the while loop:
after_id = root.after(milliseconds,results)
This way it just keeps running itself after that time you specified. And this code will cancel it.
root.after_cancel(after_id)
Answering your other question in the comments:
To make a cancel button make sure after_id is global. Also if the time you specify is very low (very fast recall) it might restart again before you can cancel it. So to be safe better make a global boolean and put .after in an if boolean==True. And you can set that boolean to False whenever you hit cancel button or to True whenever you hit the start button, here's how you can do it:
# button_call default will be True so if you click on the button
# this will be True (no need to pass var through). You can use this to set
# your restart boolean to True
def func(button_call=True):
global restart
global after_id
if button_call:
restart = True
if restart:
after_id = root.after(ms,lambda: func(button_call=False) )
# This way you know that func was called by after and not the button
Now you can put this in your cancel button function:
def cancel():
global restart
global after_id
root.after_cancel(after_id)
restart = False
Let me know if this works (haven't tested it myself).

Resources