I'm trying to download some files from behind a SSO (Single Sign-On) site. It seems to be SAML authenticated, that's where I'm stuck. Once authenticated I'll be able to perform API requests that return JSON, so no need to interpret/scrape.
Not really sure how to deal with that in mechanicalsoup (and relatively unfamiliar with web-programming in general), help would be much appreciated.
Here's what I've got so far:
import mechanicalsoup
from getpass import getpass
import json
login_url = ...
br = mechanicalsoup.StatefulBrowser()
response = br.open(login_url)
if verbose: print(response)
# provide the username + password
br.select_form('form[id="loginForm"]')
print(br.get_current_form().print_summary()) # Just to see what's there.
br['UserName'] = input('Email: ')
br['Password'] = getpass()
response = br.submit_selected().text
if verbose: print(response)
At this point I get a page telling me javascript is disabled and that I must click submit to continue. So I do:
br.select_form()
response = br.submit_selected().text
if verbose: print(response)
That's where I get a complaint about state information being lost.
Output:
<h2>State information lost</h2>
State information lost, and no way to restart the request<h3>Suggestions for resolving this problem:</h3><ul><li>Go back to the previous page and try again.</li><li>Close the web browser, and try again.</li></ul><h3>This error may be caused by:</h3><ul><li>Using the back and forward buttons in the web browser.</li><li>Opened the web browser with tabs saved from the previous session.</li><li>Cookies may be disabled in the web browser.</li></ul>
The only hits I've found on scraping behind SAML logins are all going with a selenium approach (and sometimes dropping down to requests).
Is this possible with mechanicalsoup?
My situation turned out to require Javascript for login. My original question about getting into SAML auth was not the true environment. So this question has not truly been answered.
Thanks to #Daniel Hemberger for helping me figure that out in the comments.
In this situation MechanicalSoup is not the correct tool (due to Javascript) and I ended up using selenium to get through authenication then using requests.
Related
I'm working on authorizing a new Twitter standalone app.
For this, I'm using python script from Github:
from birdy.twitter import UserClient
key = "..."
secret = "..."
client = UserClient(key, secret)
token = client.get_signin_token()
access_key = token.oauth_token
access_secret = token.oauth_token_secret
print(token.auth_url)
pin = input("PIN: ")
client = UserClient(key, secret, access_key, access_secret)
client.get_access_token(pin)
print(token)
So this get me a authorization URL, and if I paste the URL to the browser I got stuck at 'Redirecting you back to the application. This may take a few moments.' forever.
Thinking it could be a problem of call-back URL, I tried every solution on this stackoverflow Page. such as localhost, localhost.me, 127.0.0.1, 127.0.0.1:xxxx(where xxx is port number), tlocalhost.com and so on. But none of them have worked. Those call back even failed at taking me to this page: 'Redirecting you back to the application. This may take a few moments.'
As far as now only www.twittersdk:// found on Github took me that page.
Also tried to apply solution from here, but It couldn't help
I can't figure out what I am doing wrong, and how to fix it.
Please can anyone help me with this?
Many Thanks
Good day. I'm writing a python program that requests some posts from my Facebook page. To do so, Facebook offers a tool that they call "Graph API Explorer". Using something similar to a GET request, I can get anything that I want (granted that I have access and a valid token). I've come up with my own solution for the Graph API Explorer and that is generating my URLs. After generating a URL, I use pycurl to get a JSON object from Facebook that contains all of my data.
When I use pycurl, I get the following error:
pycurl.error: (3, 'Illegal characters found in URL')
but when printing said URL and pasting it to a browser, I got a valid response.
URL: https://graph.facebook.com/v7.0/me?fields=posts%7Bmessage%2Cfrom%7D&access_token=<and my access token which is valid>
my code looks like this:
def get_posts_curl(nodes=['posts'], fields=[['message', 'from']], token_file='Facebook/token.txt'):
curl = pycurl.Curl()
response = BytesIO()
token = get_token_from_file(token_file)
# constructing request.
url = parse_facebook_url_request(nodes, fields, token)
url = convert_to_curl(url)
print("---URL---: " + url)
# curl session and settings.
curl.setopt(curl.CAINFO, certifi.where())
curl.setopt(curl.URL, url)
curl.setopt(curl.WRITEDATA, response)
curl.perform()
curl.close()
return response.getvalue().decode('utf-8')
The error pops up at curl.perform()
Some info that might be relevant:
All was working great a while ago. After transferring my program from my workstation (that is running Windows 10) to my server (Ubuntu 18.04 Server) still, all was working fine and I placed that project to the side. Only now that error pops up and I haven't touched the project in a while.
It seems that the token is causing the issue. I've tried about 100 tokens and some cause the problem and some don't. Also, a fix that solved it all was using urllib3.unquote
from urllib.parse import unquote
...
url = unquote(url)
I'm trying to get data from fbo.gov, which is a government website where they post contracts that vendors can bid in. They have a document containing ways of accessing information on the site through SOAP requests, which is what I'm trying to do. Although all of the examples in that document are in PHP, I am trying to make my requests in Python, because I've never done anything with PHP before.
To make the SOAP requests in Python, I'm using zeep.
Right now, I can successfully authenticate myself through HTTP, but no matter what method I try to call, I always get the same error: This user has an inactive agency.
Here is the code I'm using to send the request
from requests import Session
from requests.auth import HTTPBasicAuth
import zeep
from zeep.transports import Transport
test = "https://fbo-test.symplicity.com"
prod = "https://fbo.gov"
session = Session()
session.auth = HTTPBasicAuth("sample_username", "sample_password")
client = zeep.Client(f"{test}/ws/fbo_api.php?wsdl", transport=Transport(session=session))
dictionary = {"notice_type": "PRESOL"}
print(client.service.getList(data=dictionary))
I realize this is a long shot, but what could be causing this error? I can't find anything even remotely related to the error anywhere on the internet.
Per the Federal Service Desk:
The FBO API is only available for government user accounts.
Some of the FBO data is available at: ftp://ftp.fbo.gov
Currently, FBO is in the process of moving to SAM, and will have a public API once the move is complete. The new API is under development, with the latest specification at: https://open.gsa.gov/api/get-opportunities-public-api/
FBO.GOV has been retired as of 11/12/2019 along with the ftp.fbo.gov bulk download, use the following instead,
https://open.gsa.gov/api/sam-entity-extracts-api/
I want to use twitter api for poll votes using tweepy is there any methods to implement this?
I tried doing api.update_status("poll_1"[1112227775552223333]) but its not working.
No. The poll API is private and is not available to anyone apart from Twitter's own apps.
There are no plans to open this to the public - https://twittercommunity.com/t/poll-support/78235
The other answer is incorrect. There is a way, it's just the JSON data you get back is quite unwieldy (I have plans to build a tweepy addon to make this easier). The way to get it with Python 3.x is through requests. Don't worry though, it's only a line or so of code. As you can see in this link, we can indeed get back twitter poll data. Here's a code snippet:
import requests, json, OAuth1
# provided by a Twitter dev account
auth = OAuth1(my_key, my_secret_key, my_access_token, my_secret_access_token)
tweet_ids_string = "12345678,123456790,09886654333"
# for more expansions see the link above
data_url = f'https://api.twitter.com/2/tweets?ids={tweet_ids_string}&expansions=attachments.poll_ids&poll.fields=duration_minutes,end_datetime,options,voting_status'
response = requests.get(get_data_url, auth=auth)
response_json = response.json() # then you can get all sorts of data from here
Beware though, the Twitter polls API isn't lightning fast, so there might be several seconds of a delay between a poll getting voted on and the request JSON changing.
I'm integrating Google's login with a Flask site using Flask-OAuth.
Everything is working fine. I can authorise the login and get a token back etc without any difficulties. But when I use Flask-OAuth's get method to request the logged in user's email address I get an error saying:
TypeError: Unicode-objects must be encoded before hashing
I'm using Python3 and this has the smell of a Python version issue but I can't figure out what I'd need to change.
The code I'm using is this:
def get_additional_data(self):
access_token = session.get('oauth_token')
headers = {'Authorization': 'OAuth ' + access_token[0]}
return self.service.get(
'https://www.googleapis.com/oauth2/v1/userinfo', None,
headers=headers)
I'm not sure what I can encode in that request. Even if I don't pass the headers I get the same error (rather than an invalid request or something like that).
I've run 2to3 on oauth2/__init__.py and the tweaks is suggests are very minor and shouldn't prevent the code from running in Python 3. Also, everything else OAuth2 related is working.
The bad news is that the solution to this problem is switching to Flask-OAuthlib.
The good news is it required very few changes from Flask-OAuth to get it working.