Response URL different from initial browser URL - python-3.x

Im getting a different URL from what was initially displayed when tried on a browser
Facebook's docs say that a
Login Request
should have a format like this so using requests and urllib.parse I tried getting the response URL
import requests, facebook, logging
# REQUIRED AUTHENTICATION PARAMS
APP_ID = '1976346389294466'
APP_SECRET = '*************************'
REDIRECT_URI = 'https://www.facebook.com/connect/login_success.html/'
logging.basicConfig(level=logging.DEBUG)
perms = ['manage_pages','publish_pages']
fb_login_url = facebook.auth_url(app_id=APP_ID, canvas_url=REDIRECT_URI, perms=perms)
logging.debug("-----LOGIN URL:" + fb_login_url)
response = requests.get(fb_login_url, params={'response_type':'token'}, allow_redirects=True)
try:
response.raise_for_status()
except Exception as exec:
print("%(There was a problem)s" % (exec))
response = requests.get(response.url)
logging.debug("-----Response URL: "+response.url)
I'm expecting a Expected Return URL in the format of
https://www.facebook.com/connect/login_success.html#
access_token=ACCESS_TOKEN...
However, I'm only getting the correct response when I use a browser, on my program the response returns a URL of an entirely different format
https://www.facebook.com/login.php?skip_api_login=1&api_key=xxxxxxxxx&signed_next=1&next=https%3A%2F%2Fwww.facebook.com%2Fv2.11%2Fdialog%2Foauth%3Fredirect_uri%3Dhttps%253A%252F%252Fwww.facebook.com%252Fconnect%252Flogin_success.html%252F%26scope%3Dmanage_pages%252Cpublish_pages%26response_type%3Dtoken%26client_id%xxxxxxxxxxx%26ret%3Dlogin%26logger_id%xxxxxxxxxxxxxxx&cancel_url=https%3A%2F%2Fwww.facebook.com%2Fconnect%2Flogin_success.html%2F%3Ferror%3Daccess_denied%26error_code%3D200%26error_description%3DPermissions%2Berror%26error_reason%3Duser_denied%23_%3D_&display=page&locale=en_US&logger_id=xxx-xxxxx-xxxxx-xxxxxxxx
When I GET from the last redirect url from response.history,
the response returns a url to itself, so I'm not sure how to go about capturing
the initial value of the url such as when I use the browser
the thing is, Im not looking for anything else from the response besides the URL itself.
Additional Notes:
-in the browser after getting the response url I think javascript also changes the url to blank after a brief moment for security reasons
-When I enter the wrong formatted url to the browser, it redirects to the right value so is there something thats handling the response differently when I'm using the browser. If so, how do grab the right url?
Simply put
When I enter fb_login_url in browser I get...
https://www.facebook.com/connect/login_success.html#access_token=ACCESS_TOKEN...
which is what I want, but
when I do it in the app with requests...
either with requests.get(fb_login_url).url
OR (because of a 303) something like
for r in response.history:
requests.get(r.url).url
i get the wrong url which is
https://www.facebook.com/login.php?skip_api_login=1&api_key=xxxxxxxxx&signed_next=1&n....

Related

Python requests module GET method: handling pagination token in params containing %

I am trying to handle an API response with pagination. The first page provides a pagination token to reach the next one, but when I try to feed this back into the params parameter of the requests.get method it seems to slightly encode the token in the wrong way.
My attempt to retrieve the next page (using the response output of the first requests.get method):
# Initial request
response = requests.get(url=url, headers=headers, params=params)
params.update({"paginationToken": response.json()["paginationToken"]})
# Next page
response = requests.get(url=url, headers=headers, params=params)
This fails with status 500: Internal Server Error and message Padding is invalid and cannot be removed.
An example pagination token:
gyuqfh%2bqyNrV9SI1%2bXulE6MXxJgb1VmOu68eH4YZ6dWUgRItb7yJPnO9bcEXdwg6gnYStBuiFhuMxILSB2gpZCLb2UjRE0pp9RkDdIP226M%3d
The url attribute of response seems to show a slightly different token if you look carefully, especially around the '%' signs:
https://www.wikiart.org/en/Api/2/DictionariesByGroup?group=1&paginationToken=gyuqfh%252bqyNrV9SI1%252bXulE6MXxJgb1VmOu68eH4YZ6dWUgRItb7yJPnO9bcEXdwg6gnYStBuiFhuMxILSB2gpZCLb2UjRE0pp9RkDdIP226M%253d
For example, the pagination token and url end differently: 226M%3d and 226M%253d. When I manually copy the first part of the url and add in the correct pagination token it does retrieve the information in a browser.
Am I missing some kind of encoding I should apply to the request.get parameters before feeding them back into a new request?
You are right it is some form of encoding, percentage encoding to be precise. It is frequently used to encode URLs. It is easy to decode:
from urllib.parse import unquote
pagination_token="gyuqfh%252bqyNrV9SI1%252bXulE6MXxJgb1VmOu68eH4YZ6dWUgRItb7yJPnO9bcEXdwg6gnYStBuiFhuMxILSB2gpZCLb2UjRE0pp9RkDdIP226M%253d"
pagination_token = unquote(pagination_token)
print(pagination_token)
Outputs:
gyuqfh%2bqyNrV9SI1%2bXulE6MXxJgb1VmOu68eH4YZ6dWUgRItb7yJPnO9bcEXdwg6gnYStBuiFhuMxILSB2gpZCLb2UjRE0pp9RkDdIP226M%3d
But I expect that is half your problem, use a requests session object https://requests.readthedocs.io/en/master/user/advanced/#session-objects to make the requests as there is most likely a cookie which will be sent with the request to be used in conjunction with the pagination token. I can not tell for sure as the website is currently down.

HERE Maps URL not being decoded

I'm trying to make a query to Here Maps API with JavaScript to calculate a route with waypoints, where the waypoints are of type "passThrough", the actual produced URL is (I just removed the API key):
https://router.hereapi.com/v8/routes?xnlp=CL_JSMv3.1.21.3&apikey={API_KEY_HERE}&routingMode=fast&transportMode=truck&origin=25.900672%2C-80.253709&destination=40.213615%2C-97.188347&unit=imperial&truck=%5Bobject%20Object%5D&return=polyline%2CtravelSummary&via=40.052839%2C-87.410475!passThrough%3Dtrue
This query returns an error response, even when I'm following the documentation. Here is the problem I found,
If I paste this URL in the browser and remove "%3D" after "passThrough" from the URL, and explicitly change it to "=", the API then returns the expected response. Have to clarify that the URL from above works with curl -X GET. So I really think that the Here Maps API is not decoding the URL, even when they say that special characters have to be encoded.
Any clue on this?
Am I wrong?

How to scrape image/file from web page in Python?

I try to use Python3.7.4 to backup pictures in a blog site, e.g.
http://s2.sinaimg.cn/mw690/001H6t4Fzy7zgC0WLXb01&690
If I input the above address in Firefox address bar, the file is shown correctly.
If I use following code to download picture, server always redirects to a default picture:
from requests import get # just to try different methods
from urllib.request import urlopen
from urllib.parse import urlsplit, urlunsplit, quote
# hard-coded address is randomly selected for debug purpose.
origPict = 'http://s2.sinaimg.cn/mw690/001H6t4Fzy7zgC0WLXb01&690'
p = urlsplit (origPict)
newP = quote (p.path)
origPict = urlunsplit ([p.scheme, p.netloc, newP, p.query, p.fragment])
try:
#url_file = urlopen(origPict)
#u = url_file.geturl ()
url_file = get (origPict)
u = url_file.url
if u != origPict:
raise Exception ('Failed to get picture ' + origPict)
...
Any clue why requests.get or urllib.urlopen don't like '&' in url?
Updates: Thanks for Artur's comments, I realize the question is not on request itself, but on site protection mechanism: js or cookies or something else in webpage feedback something to server to allow it to judge if request comes from scraper. So now the question turns to how to scrape image from web page which is more complex than simply download image from url.
It's not about & symbol, but about redirection. Try adding parameter allow_redirects=False to get, it should be okay

Sending data in GET request Python

I know that it is not an advisable solution to use GET however I am not in control of how this server works and have very little experience with requests.
I'm looking to add a dictionary via a GET request and was told that the server had been set up to accept this but I'm not sure how that works. I have tried using
import requests
r = request.get('www.url.com', data = 'foo:bar')
but this leaves the webpage unaltered, any ideas?
To use request-body with a get request, you must override the post method. e.g.
request_header={
'X-HTTP-Method-Override': 'GET'
}
response = requests.post(request_uri, request_body, headers=request_header)
Use requests like this pass the the data in the data field of the requests
requests.get(url, headers=head, data=json.dumps({"user_id": 436186}))
It seems that you are using the wrong parameters for the get request. The doc for requests.get() is here.
You should use params instead of data as the parameter.
You are missing the http in the url.
The following should work:
import requests
r = request.get('http://www.url.com', params = {'foo': 'bar'})
print(r.content)
The actual request can be inspected via r.request.url, it should be like this:
http://www.url.com?foo=bar
If you're not sure about how the server works, you should send a POST request, like so:
import requests
data = {'name', 'value'}
requests.post('http://www.example.com', data=data)
If you absolutely need to send data with a GET request, make sure that data is in a dictionary and instead pass information with params keyword.
You may find helpful the requests documentation

Python Requests Refresh

I'm trying to use python's requests library to log in to a website. It's a pretty simple code, and you can really get the gist of requests just by going on its website. I, however, want to check if I'm successfully logged in via the url. The problem I've encountered is when I initiate the post requests and give it (the variable p) a url, whether the html has changed or not I'm still passed the same url when I type print(p.url). Is there any way for me to refresh the browser or update the url to whatever it's currently set at?
(I can add a line for checking the url against itself later, but for now I just want to get the correct url)
#!usr/bin/env python3
import requests
payload = {'login': 'USERNAME,
'password': 'PASSWORD'}
with requests.Session() as s:
p = s.post('WEBSITE', data=payload)
#print p.text
print(p.url)
The usuage of python-requests may not as complex as you think. It will automatically handle the redirect of your post ( or session.get()).
Here, session.post() method return a response object:
r = s.post('website', data=payload)
which means r.url is current url you are looking for.
If you still want to refresh current page, just use:
s.get(r.url)
To verify whether you has login successfully, one solution is to do the login in your browser.
Based on the title or content of the webpage returned (i.e, use the content in r.text), you can judge whether you have made it.
BTW, python-requests is a great library, enjoy it.

Resources