Python Requests Refresh - python-3.x

I'm trying to use python's requests library to log in to a website. It's a pretty simple code, and you can really get the gist of requests just by going on its website. I, however, want to check if I'm successfully logged in via the url. The problem I've encountered is when I initiate the post requests and give it (the variable p) a url, whether the html has changed or not I'm still passed the same url when I type print(p.url). Is there any way for me to refresh the browser or update the url to whatever it's currently set at?
(I can add a line for checking the url against itself later, but for now I just want to get the correct url)
#!usr/bin/env python3
import requests
payload = {'login': 'USERNAME,
'password': 'PASSWORD'}
with requests.Session() as s:
p = s.post('WEBSITE', data=payload)
#print p.text
print(p.url)

The usuage of python-requests may not as complex as you think. It will automatically handle the redirect of your post ( or session.get()).
Here, session.post() method return a response object:
r = s.post('website', data=payload)
which means r.url is current url you are looking for.
If you still want to refresh current page, just use:
s.get(r.url)
To verify whether you has login successfully, one solution is to do the login in your browser.
Based on the title or content of the webpage returned (i.e, use the content in r.text), you can judge whether you have made it.
BTW, python-requests is a great library, enjoy it.

Related

Python requests module GET method: handling pagination token in params containing %

I am trying to handle an API response with pagination. The first page provides a pagination token to reach the next one, but when I try to feed this back into the params parameter of the requests.get method it seems to slightly encode the token in the wrong way.
My attempt to retrieve the next page (using the response output of the first requests.get method):
# Initial request
response = requests.get(url=url, headers=headers, params=params)
params.update({"paginationToken": response.json()["paginationToken"]})
# Next page
response = requests.get(url=url, headers=headers, params=params)
This fails with status 500: Internal Server Error and message Padding is invalid and cannot be removed.
An example pagination token:
gyuqfh%2bqyNrV9SI1%2bXulE6MXxJgb1VmOu68eH4YZ6dWUgRItb7yJPnO9bcEXdwg6gnYStBuiFhuMxILSB2gpZCLb2UjRE0pp9RkDdIP226M%3d
The url attribute of response seems to show a slightly different token if you look carefully, especially around the '%' signs:
https://www.wikiart.org/en/Api/2/DictionariesByGroup?group=1&paginationToken=gyuqfh%252bqyNrV9SI1%252bXulE6MXxJgb1VmOu68eH4YZ6dWUgRItb7yJPnO9bcEXdwg6gnYStBuiFhuMxILSB2gpZCLb2UjRE0pp9RkDdIP226M%253d
For example, the pagination token and url end differently: 226M%3d and 226M%253d. When I manually copy the first part of the url and add in the correct pagination token it does retrieve the information in a browser.
Am I missing some kind of encoding I should apply to the request.get parameters before feeding them back into a new request?
You are right it is some form of encoding, percentage encoding to be precise. It is frequently used to encode URLs. It is easy to decode:
from urllib.parse import unquote
pagination_token="gyuqfh%252bqyNrV9SI1%252bXulE6MXxJgb1VmOu68eH4YZ6dWUgRItb7yJPnO9bcEXdwg6gnYStBuiFhuMxILSB2gpZCLb2UjRE0pp9RkDdIP226M%253d"
pagination_token = unquote(pagination_token)
print(pagination_token)
Outputs:
gyuqfh%2bqyNrV9SI1%2bXulE6MXxJgb1VmOu68eH4YZ6dWUgRItb7yJPnO9bcEXdwg6gnYStBuiFhuMxILSB2gpZCLb2UjRE0pp9RkDdIP226M%3d
But I expect that is half your problem, use a requests session object https://requests.readthedocs.io/en/master/user/advanced/#session-objects to make the requests as there is most likely a cookie which will be sent with the request to be used in conjunction with the pagination token. I can not tell for sure as the website is currently down.

How to scrape image/file from web page in Python?

I try to use Python3.7.4 to backup pictures in a blog site, e.g.
http://s2.sinaimg.cn/mw690/001H6t4Fzy7zgC0WLXb01&690
If I input the above address in Firefox address bar, the file is shown correctly.
If I use following code to download picture, server always redirects to a default picture:
from requests import get # just to try different methods
from urllib.request import urlopen
from urllib.parse import urlsplit, urlunsplit, quote
# hard-coded address is randomly selected for debug purpose.
origPict = 'http://s2.sinaimg.cn/mw690/001H6t4Fzy7zgC0WLXb01&690'
p = urlsplit (origPict)
newP = quote (p.path)
origPict = urlunsplit ([p.scheme, p.netloc, newP, p.query, p.fragment])
try:
#url_file = urlopen(origPict)
#u = url_file.geturl ()
url_file = get (origPict)
u = url_file.url
if u != origPict:
raise Exception ('Failed to get picture ' + origPict)
...
Any clue why requests.get or urllib.urlopen don't like '&' in url?
Updates: Thanks for Artur's comments, I realize the question is not on request itself, but on site protection mechanism: js or cookies or something else in webpage feedback something to server to allow it to judge if request comes from scraper. So now the question turns to how to scrape image from web page which is more complex than simply download image from url.
It's not about & symbol, but about redirection. Try adding parameter allow_redirects=False to get, it should be okay

Sending data in GET request Python

I know that it is not an advisable solution to use GET however I am not in control of how this server works and have very little experience with requests.
I'm looking to add a dictionary via a GET request and was told that the server had been set up to accept this but I'm not sure how that works. I have tried using
import requests
r = request.get('www.url.com', data = 'foo:bar')
but this leaves the webpage unaltered, any ideas?
To use request-body with a get request, you must override the post method. e.g.
request_header={
'X-HTTP-Method-Override': 'GET'
}
response = requests.post(request_uri, request_body, headers=request_header)
Use requests like this pass the the data in the data field of the requests
requests.get(url, headers=head, data=json.dumps({"user_id": 436186}))
It seems that you are using the wrong parameters for the get request. The doc for requests.get() is here.
You should use params instead of data as the parameter.
You are missing the http in the url.
The following should work:
import requests
r = request.get('http://www.url.com', params = {'foo': 'bar'})
print(r.content)
The actual request can be inspected via r.request.url, it should be like this:
http://www.url.com?foo=bar
If you're not sure about how the server works, you should send a POST request, like so:
import requests
data = {'name', 'value'}
requests.post('http://www.example.com', data=data)
If you absolutely need to send data with a GET request, make sure that data is in a dictionary and instead pass information with params keyword.
You may find helpful the requests documentation

Response URL different from initial browser URL

Im getting a different URL from what was initially displayed when tried on a browser
Facebook's docs say that a
Login Request
should have a format like this so using requests and urllib.parse I tried getting the response URL
import requests, facebook, logging
# REQUIRED AUTHENTICATION PARAMS
APP_ID = '1976346389294466'
APP_SECRET = '*************************'
REDIRECT_URI = 'https://www.facebook.com/connect/login_success.html/'
logging.basicConfig(level=logging.DEBUG)
perms = ['manage_pages','publish_pages']
fb_login_url = facebook.auth_url(app_id=APP_ID, canvas_url=REDIRECT_URI, perms=perms)
logging.debug("-----LOGIN URL:" + fb_login_url)
response = requests.get(fb_login_url, params={'response_type':'token'}, allow_redirects=True)
try:
response.raise_for_status()
except Exception as exec:
print("%(There was a problem)s" % (exec))
response = requests.get(response.url)
logging.debug("-----Response URL: "+response.url)
I'm expecting a Expected Return URL in the format of
https://www.facebook.com/connect/login_success.html#
access_token=ACCESS_TOKEN...
However, I'm only getting the correct response when I use a browser, on my program the response returns a URL of an entirely different format
https://www.facebook.com/login.php?skip_api_login=1&api_key=xxxxxxxxx&signed_next=1&next=https%3A%2F%2Fwww.facebook.com%2Fv2.11%2Fdialog%2Foauth%3Fredirect_uri%3Dhttps%253A%252F%252Fwww.facebook.com%252Fconnect%252Flogin_success.html%252F%26scope%3Dmanage_pages%252Cpublish_pages%26response_type%3Dtoken%26client_id%xxxxxxxxxxx%26ret%3Dlogin%26logger_id%xxxxxxxxxxxxxxx&cancel_url=https%3A%2F%2Fwww.facebook.com%2Fconnect%2Flogin_success.html%2F%3Ferror%3Daccess_denied%26error_code%3D200%26error_description%3DPermissions%2Berror%26error_reason%3Duser_denied%23_%3D_&display=page&locale=en_US&logger_id=xxx-xxxxx-xxxxx-xxxxxxxx
When I GET from the last redirect url from response.history,
the response returns a url to itself, so I'm not sure how to go about capturing
the initial value of the url such as when I use the browser
the thing is, Im not looking for anything else from the response besides the URL itself.
Additional Notes:
-in the browser after getting the response url I think javascript also changes the url to blank after a brief moment for security reasons
-When I enter the wrong formatted url to the browser, it redirects to the right value so is there something thats handling the response differently when I'm using the browser. If so, how do grab the right url?
Simply put
When I enter fb_login_url in browser I get...
https://www.facebook.com/connect/login_success.html#access_token=ACCESS_TOKEN...
which is what I want, but
when I do it in the app with requests...
either with requests.get(fb_login_url).url
OR (because of a 303) something like
for r in response.history:
requests.get(r.url).url
i get the wrong url which is
https://www.facebook.com/login.php?skip_api_login=1&api_key=xxxxxxxxx&signed_next=1&n....

How to make multiple API calls from multiple pages in single URL

So the title is a little confusing I guess..
I have a script that I've been writing that will display some random data and other non-essentials when I open my shell. I'm using grequests to make my API calls since I'm using more than one URL. For my weather data, I use WeatherUnderground's API since it will offer active alerts. The alerts and conditions data are on separate pages. What I can't figure out is how to insert the appropriate name in the grequests object when it is making requests. Here is the code that I have:
URLS = ['http://api.wunderground.com/api/'+api_id+'/conditions/q/autoip.json',
'http://www.ourmanna.com/verses/api/get/?format=json',
'http://quotes.rest/qod.json',
'http://httpbin.org/ip']
requests = (grequests.get(url) for url in URLS)
responses = grequests.map(requests)
data = [response.json() for response in responses]
#json parsing from here
In the URL 'http://api.wunderground.com/api/'+api_id+'/conditions/q/autoip.json' I need to make an API request to conditions and alerts to retrieve the data I need. How do I do this without rewriting a fourth URLS string?
I've tried
pages = ['conditions', 'alerts']
URL = ['http://api.wunderground.com/api/'+api_id+([p for p in pages])/q/autoip.json']
but, as I'm sure some of you more seasoned programmers know, threw and exception. So how can I iterate through these pages, or will I have to write out both complete URLS?
Thanks!
Ok I was actually able to figure out how to call each individual page within the grequests object by using a simple for loop. Here is the the code that I used to produced the expected results:
import grequests
pages = ['conditions', 'alerts']
api_id = 'myapikeyhere'
for p in pages:
URLS = ['http://api.wunderground.com/api/'+api_id+'/'+p+'/q/autoip.json',
'http://www.ourmanna.com/verses/api/get/?format=json',
'http://quotes.rest/qod.json',
'http://httpbin.org/ip']
#create grequest object and retrieve results
requests = (grequests.get(url) for url in URLS)
responses = grequests.map(requests)
data = [response.json() for response in responses]
#json parsing from here
I'm still not sure why I couldn't figure this out before.
Documentation for the grequests library here

Resources