python using requests and a webpage with a login issue - python-3.x

I'm trying to login to a website via python to print the info. So I don't have to keep logging into multiple accounts.
In the tutorial I followed, he just had a login and password, but this one has
Website Form Data
Does the _wp attributes change each login?
The code I use:
mffloginurl = ('https://myforexfunds.com/login-signup/')
mffsecureurl = ('https://myforexfunds.com/account-2')
payload = {
'log': '*****#gmail.com',
'pdw': '*****'
'''brandnestor_action':'login',
'_wpnonce': '9d1753c0b6',
'_wp_http_referer': '/login-signup/',
'_wpnonce': '9d1753c0b6',
'_wp_http_referer': '/login-signup/'''
}
r = requests.post(mffloginurl, data=payload)
print(r.text)
using the correct details of course, but it doesn't login.
I tried without the extra wordpress elements and also with them but it still just goes to the signin page.
python output
different site addresses, different login details

Yeah the nonce will change with every new visit to the page.
I would use request.session() so that it automatically stores session cookies and all that good stuff.
Do a session.GET('some_login_page.com')
Parse with the response content with BeautifulSoup to retrieve the nonce.
Then add that into the payload of your POST request when you login.
A very quick and dirty example:
import requests
from bs4 import BeautifulSoup as bs
email = 'test#email.com'
password = 'password1234'
url = 'https://myforexfunds.com/account-2/'
# Start a session
with requests.session() as session:
# Send a GET request to the login page
r = session.get(url)
# Check if the request was successful
if r.status_code != 200:
print("Get Request Failed")
# Parse the HTML content of the page
soup = bs(r.content, 'lxml')
# Extract the value of the nonce from the HTML
nonce = soup.find(id='woocommerce-login-nonce')['value']
# Set up the login form data
params ={
"username": email,
"password": password,
"woocommerce-login-nonce": nonce,
"_wp_http_referer": "/account-2/",
"login": "Log+in"
}
# Send a POST request with the login form data
r = session.post(url, params=params)
# Check if the request was successful
if r.status_code != 200:
print("Login Failed")

Related

Empty token with Python Requests, but multiple tokens seen in chrome dev tools

I'm trying to use requests to login to a site, navigate to a page, and scrape some data. This question is about the first step (to get in).
I cannot fetch the token from the site:
import requests
URL = 'https://coderbyte.com/sl'
with requests.Session() as s:
response = s.get(URL)
print([response.cookies])
Result is empty:
[<RequestsCookieJar[]>]
This make sense according to the response I'm seeing in Chrome's dev tools. After I login with my username and password, I see four tokens, three of them deleted, but one valid:
How can I fetch the valid token?
you can use the post method to the url you want in order to fetch the token (to pass the login first). For example :
url = "url-goes-here"
url_login = "login-url-goes-here"
with requests.Session() as s:
# get the link first
s.get(url)
payload = json.dumps({
"email" : "your-email",
"password" : "your-password"
})
headers = {
'Content-Type': 'application/json'
}
response = s.post(url=url_login, data=payload, headers=headers)
print(response.text)
Based on your question, i assume that if you only use username or password to login, then you can use HTTPBasicAuth() which is provided by requests package.

Python authorization

I need to write a script that is included in the personal account of my Internet provider and transmits information about the current balance.
At the moment I am stuck at the time of authorization. I found and edited such a script for myself:
import requests
url = 'https://bill.tomtel.ru/login.html'
USERNAME, PASSWORD, = 'mylogin', 'mypass'
resp = requests.get(url, auth=(USERNAME, PASSWORD))
r = requests.post(url)
print(r.content)
But this does not help to pass authorization...
I can enter this link through a browser and go to a page of this type:
https://bill.tomtel.ru/fastcom/!w3_p_main.showform?FORMNAME=QFRAME&CONFIG=CONTRACT&SID=BLABLABLA&NLS=WR
I can go through browser authorization through both links, but why can't I do this through a script?
Please help with this.
Your browser probably has a session token/cookie stored and that is why you can access it through the browser. There are a couple issues here:
It looks like you need to login to the site first -- through a POST method, not a GET. The GET is what loads the page. But once you submit the form it's going to do a POST request.
Actually, using requests to login to a site is not as easy as it looks. Usually you have to find the url it's posting to (examine the developer toolbar to see the url), and you often have to pass information in addition to your username/password, such as a csrf token, a cookie, or something else.
I would suggest using a browser-automator for this, perhaps something like selenium Webdriver. It makes logging into a site much simpler than using HTTP in a raw request, as it emulates a browser. I would suggest this -- it's much simpler and faster!
Another thing to note: auth=(USERNAME, PASSWORD) is not quite the username/password in the form (it's something else) but I don't think understanding that is too relevant to what you're trying to do.
Here is the url and required form data to log in:
I think you should try this:
import requests
url = 'https://bill.tomtel.ru/signin.php'
USERNAME = input('Enter your username: ')
PASSWORD = input('Enter your password: ')
d = {
'USERNAME' : USERNAME,
'PASSWORD' : PASSWORD,
'FORMNAME' : 'QFRAME'}
session = requests.Session()
resp = session.post(url, data=d).text
if not '<TITLE>' in resp:
print('Incorrect username or password!')
quit()
print('Logging in ... ')
for line in resp.split('\n'):
if 'location' in line:
red = 'https://bill.tomtel.ru/fastcom/!w3_p_main.showform%s' % line.replace(' if (P>0) self.location.replace("', '').replace('");', '')
if not red:
print('An error has occured')
quit()
print('Redirecting to %s' % red)
page = session.get(red).text
print('')
print(' MAIN PAGE')
print(page)

Scraping from site that requires login, how to access the contents?

So I am trying to scrape a website that requires a login. I have used requests and submitted my login details, although when I try to extract the data from the website, I am not getting the website I am looking for.
USERNAME = "test#gmail.com"
PASSWORD = "test"
#MIDDLEWARE_TOKEN = "TESTTOKEN"
LOGIN_URL = "https://vrdistribution.com.au/auth/login/process"
VR_URL = "https://vrdistribution.com.au/categories/tabletop-gaming?page=1"
def main():
session_requests = requests.session()
# Get login csrf token
result = session_requests.get(LOGIN_URL)
tree = html.fromstring(result.text)
authenticity_token = list(set(tree.xpath("//input[#name='_token']/#value")))
# Create payload
payload = {
"email": USERNAME,
"password": PASSWORD,
"csrfmiddlewaretoken": authenticity_token
}
# Perform login
result = session_requests.post(LOGIN_URL, data = payload, headers = dict(referer = LOGIN_URL))
#Scrape
result = session_requests.get(VR_URL, headers =dict(referer=VR_URL))
response = requests.get(VR_URL)
soup = BeautifulSoup(response.text, 'lxml')
print(soup)
The output is not the same contents as the VR_URL(https://vrdistribution.com.au/categories/tabletop-gaming?page=1) that I had specified, when I inspect the page I want to scrape as opposed to the output of the soup object, it is completely different.
Is there a way for me to access and scrape contents off the VR_URL?

Get cookie using aiohttp

I am trying to get cookies from the browser using aiohttp. From the docs and googling I have only found articles about setting cookies in aiohttp.
In flask I would get the cookies as simply as
cookie = request.cookies.get('name_of_cookie')
# do something with cookie
Is there a simple way to fetch the cookie from browser using aiohttp?
Is there a simple way to fetch the cookie from the browser using aiohttp?
Not sure about whether this is simple but there is a way:
import asyncio
import aiohttp
async def main():
urls = [
'http://httpbin.org/cookies/set?test=ok',
]
for url in urls:
async with aiohttp.ClientSession(cookie_jar=aiohttp.CookieJar()) as s:
async with s.get(url) as r:
print('JSON', await r.json())
cookies = s.cookie_jar.filter_cookies('http://httpbin.org')
for key, cookie in cookies.items():
print('Key: "%s", Value: "%s"' % (cookie.key, cookie.value))
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
The program generates the following output:
JSON: {'cookies': {'test': 'ok'}}
Key: "test", Value: "ok"
Example adapted from https://aiohttp.readthedocs.io/en/stable/client_advanced.html#custom-cookies + https://docs.aiohttp.org/en/stable/client_advanced.html#cookie-jar
Now if you want to do a request using a previously set cookie:
import asyncio
import aiohttp
url = 'http://example.com'
# Filtering for the cookie, saving it into a varibale
async with aiohttp.ClientSession(cookie_jar=aiohttp.CookieJar()) as s:
cookies = s.cookie_jar.filter_cookies('http://example.com')
for key, cookie in cookies.items():
if key == 'test':
cookie_value = cookie.value
# Using the cookie value to do anything you want:
# e.g. sending a weird request containing the cookie in the header instead.
headers = {"Authorization": "Basic f'{cookie_value}'"}
async with s.get(url, headers=headers) as r:
print(await r.json())
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
For testing urls containing a host part made up by an IP address use aiohttp.ClientSession(cookie_jar=aiohttp.CookieJar(unsafe=True)), according to https://github.com/aio-libs/aiohttp/issues/1183#issuecomment-247788489
Yes, the cookies are stored in request.cookies as a dict, just like in flask, so request.cookies.get('name_of_cookie') works the same.
In the examples section of the aiohttp repository there is a file, web_cookies.py that shows how to retrieve, set, and delete a cookie. Here's the section from that script that reads the cookies and returns it to the template as a preformatted string:
from pprint import pformat
from aiohttp import web
tmpl = '''\
<html>
<body>
Login<br/>
Logout<br/>
<pre>{}</pre>
</body>
</html>'''
async def root(request):
resp = web.Response(content_type='text/html')
resp.text = tmpl.format(pformat(request.cookies))
return resp
You can get the cookie value, domain, path etc, without having to loop thru all cookies.
s.cookie_jar._cookies
gives you all the cookies in a defaultdict with the domains as keys and their respective cookies as values. aiohttp uses SimpleCookie
So, to get the value of a cookie
s.cookie_jar._cookies.get("https://httpbin.org")["cookie_name"].value
for domain, path:
s.cookie_jar._cookies.get("https://httpbin.org")["cookie_name"]["domain"]
s.cookie_jar._cookies.get("https://httpbin.org")["cookie_name"]["path"]
more info can be found here: https://docs.python.org/3/library/http.cookies.html

requests.Session POST not allowing login

I am attempting to scrape some data from an internal GUI for a modem in the field. Here is the code I am using:
## import the requests python library
import requests
## defines the login URL and the Post URL
post_loginURL = 'http://100.255.255.255/cgi-bin/login.cgi'
## defines the request URL to scrape from
requestURL = 'http://100.255.255.255/cgi-bin/stat_eth0.cgi'
## define the login data paramaters
payload = {
'NAME': 'username',
'PWD': 'password',
'CMD': '1' ## hidden value being passed in the post request
}
## define the content type as text/html
headers = {
'Content-Type': 'application/x-www-form-urlencoded'
}
## creates the session object
with requests.Session() as session:
## sends the post request, headers and login data to the server
post = session.post(post_loginURL, data=payload, headers=headers)
r = session.get(requestURL)
print(r.text)
Here is the HTML Error message I receive in the IDLE console:
<em><u><p align="right">ERROR: You must log in</u></em>
Here are the details from the Chrome Dev Tools:
screenshot
I am unsure if I am just missing a crucial step, or if this is an issue with syntax, etc. Any constructive advice is greatly appreciated.
Thanks in advance!

Resources