Login Request with CSRF Token in HTML HTTP Response using Python - python-3.x

I need to create a login request using Python, but the roadblock is when I load the URL using GET Request, a specific value of CSRF Token is set by the server in HTML Response, and the same CSRF Token in being validated in the login request without that CSRF Token value, and I won't be able to login to the web application.
Let's say, I am loading the URL using GET Request and I am printing the value of CSRF Token which is being generated new token with each refresh.
Using the following Script 1, the value of the CSRF Token is retrieved:
import requests
from bs4 import BeautifulSoup
response = request.get("https://myloginpage.com", verify=False)
print(response.status_code)
soup = BeautifulSoup(response.text, "html.parser")
csrf_token = soup.find("input", {"name": "csrf_token"})["value"]
print(csrf_token)
And the login request is given below where the CSRF Token value is set by the server in the response to GET Request given above (or retrieved by Script 1).
POST /auth/login HTTP/1.1
Host: myloginpage.com
Cookie: cookiesession1=0A1256CFAXDBEO4P8WONHSGEAPSP5B4F; csrf_token=<token value of csrf>; ci_session=r90o6pi8not5barjnhhtnu8atobmgbae
Content-Length: 131
Sec-Ch-Ua: "Not?A_Brand";v="8", "Chromium";v="108"
Accept: */*
Content-Type: application/x-www-form-urlencoded; charset=UTF-8
X-Requested-With: XMLHttpRequest
Sec-Ch-Ua-Mobile: ?0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.72 Safari/537.36
Sec-Ch-Ua-Platform: "Windows"
Origin: https://myloginpage.com
Sec-Fetch-Site: same-origin
Sec-Fetch-Mode: cors
Sec-Fetch-Dest: empty
Referer: https://myloginpage.com
Accept-Encoding: gzip, deflate
Accept-Language: en-US,en;q=0.9
Connection: close
user_id=0&csrf_token=<token value of csrf>&iusername=<username>&ipassword=<password>&is_ajax=1&form=login
Hence, I created a script as given below:
import requests
from bs4 import BeautifulSoup
headers = {
'Accept': '*/*',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.5359.72 Safari/537.36',
'Referrer': 'https://myloginpage.com/',
}
response = requests.get("https://myloginpage.com", headers, verify=False)
print(response.status_code)
soup = BeautifulSoup(response.text, "html.parser")
csrf_token = soup.find("input", {"name": "csrf_token"})["value"]
print(csrf_token)
login = 'https://myloginpage.com/auth/login'
payload = {
'user_id': '<user_id>',
'csrf_token': <csrf_token>,
'iusername': '<username>',
'ipassword': '<password>',
'is_ajax': '1',
'form': 'login'
}
response = requests.post(login, data=payload, verify=False)
print(response.status_code)
But still, the login is unsuccessful with the 403 HTTP Response code.

Related

socket cant detect anyhting in a praw implicit authentication url

im opening a socket server that hosts the reddit confirmation s that you can login to the application without compromising the bot and logging in with one sting called an access token that is in the url of the socket server and what i want to do it sake that access token from the url and display it into the socket server page but the server cant detect any variables inside of the url. Here is my code if you need it. it works when implicit is not part of the url generation, but as soon as i add implicit=True (which i need to, i plan on this being an app) it no longer works.
what i want: (about, this is for the non implicit)GET /?state=state&code=code HTTP/1.1.
what i get: GET / HTTP/1.1.
that is what happens when i do data = client.recv(1024).decode("utf-8").
it will return proper data when the url is generated with url = reddit.auth.url(scopes, state).
but when i generate the url with url = reddit.auth.url(scopes, state, implicit=True).
the data will return GET / HTTP/1.1.
I need the access token to be recognized after the /.
I am using praw 7.1.0
GET / HTTP/1.1
Host: localhost:8080
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.15.2 Chrome/83.0.4103.122 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Language: en-US,en;q=0.9
DNT: 1
Sec-Fetch-Site: cross-site
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
/
/
Traceback (most recent call last):
File "path/prawTest.py", line 81, in <module>
sys.exit(main())
File "path/prawTest.py", line 61, in main
params = {
File "path/prawTest.py", line 62, in <dictcomp>
key: value for (key, value) in [token.split("=") for token in param_tokens]
ValueError: not enough values to unpack (expected 2, got 1)
It expects to see variables after the /.
Without implicit=Tru i get the following:
GET /?state=50470&code=code HTTP/1.1
Host: localhost:8080
Connection: keep-alive
Cache-Control: max-age=0
Upgrade-Insecure-Requests: 1
User-Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) QtWebEngine/5.15.2 Chrome/83.0.4103.122 Safari/537.36
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
Accept-Language: en-US,en;q=0.9
DNT: 1
Sec-Fetch-Site: cross-site
Sec-Fetch-Mode: navigate
Sec-Fetch-User: ?1
Sec-Fetch-Dest: document
Accept-Encoding: gzip, deflate, br
The implicit=True url looks like this:
http://localhost:8080/#access_token=token&token_type=bearer&state=39771&expires_in=3600&scope=%2A

Why am i being detected as robot when i am replicating the exact request a browser is making?

This is the website "https://www.interlinecenter.com/" this website is making request to "http://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y" for loading html content in an "I-FRAME". I am making the exact same request using the same headers being sent by the browser but i am not getting the same content.
Here is the code i am using:
url='http://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y'
header = {
'Host': 'cs.cruisebase.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.interlinecenter.com/',
'Connection': 'keep-alive',
'Cookie': 'visid_incap_312345=yt2dprI6SuGoy44xQsnF36dOwV0AAAAAQUIPAAAAAAAqm0pG5WAWOGjtyY8GOrLv; __utma=15704100.1052110012.1572947038.1574192877.1575447075.6; __utmz=15704100.1575447075.6.6.utmcsr=interlinecenter.com|utmccn=(referral)|utmcmd=referral|utmcct=/; ASP.NET_SessionId=pzd3a0l5kso41hhbqf3jiqlg; nlbi_312345=/7dzbSeGvDjg2/oY/eQfhwAAAACv806Zf3m7TsjHAou/y177; incap_ses_1219_312345=tMxeGkIPugj4d1gaasLqECHE5l0AAAAAg1IvjaYhEfuSIYLXtc2f/w==; LastVisitedClient=605; AWSELB=85D5DF550634E967F245F317B00A8C32EB84DA2B6B927E6D5CCB7C26C3821788BFC50D95449A1BA0B0AFD152140A70F5EA06CBB8492B21E10EC083351D7EBC4C68F086862A; incap_ses_500_312345=6PJ9FxwJ3gh0vta6kVvwBthz510AAAAAvUZPdshu8GVWM2sbkoUXmg==; __utmb=15704100.2.10.1575447075; __utmc=15704100; __utmt_tt=1',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0'
}
response = requests.get(url, timeout=10, headers=header)
byte_data = response.content
source_code = html.fromstring(byte_data)
print(response)
print(byte_data)
This is the response i am getting:
<Response [200]>
<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe id="main-iframe" src="/_Incapsula_Resource?SWUDNSAI=9&xinfo=10-99927380-0%200NNN%20RT%281575456049298%202%29%20q%280%20-1%20-1%200%29%20r%281%20-1%29%20B12%284%2c316%2c0%29%20U2&incident_id=500000240101726326-477561257670738314&edet=12&cinfo=04000000&rpinfo=0" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 500000240101726326-477561257670738314</iframe></body></html>
I need to extract/scrape data at "https://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y".
Note: i don't want to use the selenium webdriver to get the data any help will be much appreciated, Thanks!
Did you try getting the headers by loading the target URL directly?
I sent a GET request to https://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y with the following headers, and I was able to get the complete response.
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding':'gzip, deflate',
'Accept-Language':'en-GB,en;q=0.9,en-US;q=0.8,hi;q=0.7,la;q=0.6',
'Cache-Control':'no-cache',
'Connection':'keep-alive',
'Cookie':'ENTER COOKIES',
'DNT':'1',
'Host':'cs.cruisebase.com',
'Pragma':'no-cache',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
}
I have left the Cookie field blank, you will have to enter cookies otherwise the page won't load. You can get the cookies from Chrome.

How to send http 2.0 pseudo headers in python3

Basically i tried sending http2.0 headers with hyper for python
https://hyper.readthedocs.io/en/latest/
https://github.com/python-hyper/hyper
Mounting HTTP20Adapter in my request.session but didnt worked as expected.
First i explain thath "from tls_version import MyAdapter" thath is used later in the Main code its these tls_version.py file
from requests.adapters import HTTPAdapter
from urllib3.poolmanager import PoolManager
import ssl
class MyAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.PROTOCOL_TLSv1_2)
Just to force to use tls1.2 nothing more.
The Main code is here but basically im trying to send a get call with http2.0 pseudo headers mounting hyper adapter in request.session and having control over headers order with collections.OrderectDict
import requests
from tls_version import MyAdapter
import json
import collections
from userdata import UserData
from hyper.contrib import HTTP20Adapter
headers2 = [('Upgrade-Insecure-Requests', '1'),
('User-Agent', 'Mozilla/5.0 (Linux; Android 5.1.1; google Pixel 2 Build/LMY47I; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36'),
('Accept', 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3'),
('Accept-Encoding', 'gzip, deflate'),
('Accept-Language', 'es-ES,es;q=0.9,en-US;q=0.8,en;q=0.7'),
]
class post_calls():
def start(self,headers_pass,body_pass,params,url,method):
proxies = {
'http': ip,
'https': ip
}
body = str(body_pass)
#send the POST request
session = requests.session()
session.mount('https://', MyAdapter())
session.headers = collections.OrderedDict(headers_pass)
if method == 'get':
q = 'https://' + server + '.' + host
q = q.replace('.www.', '.')
session.mount('https://', HTTP20Adapter())
print('q='+q)
response = session.get(url, proxies=proxies, params=params, verify=charlesproxy)
def login_world2(sid):
a = post_calls()
q ='https://'+server+'.'+ host+'/login.php?mobile&sid='+sid+'&2'
q = q.replace('.www.','.')
params = {}
url = q
body = '0'
login = a.start(headers2,body,params,url,'get')
return login
if __name__ == "__main__":
login_get = login_world(sid)
print(login_get)
these is headers sends these file:
:method: GET
:scheme: https
:authority: server.url.com
:path: /login.php?mobile&sid=577f0967545d6acec94716d265dd4867fa4db4a446326ecde7486a97feede14702f4911438f4a4cd097677f0dd962786ef14b3f16f1184ee63a506155d522f53&2
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Linux; Android 5.1.1; google Pixel 2 Build/LMY47I; wv) AppleWebKit/537.36 (KHTML
user-agent: like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36
accept: text/html
accept: application/xhtml+xml
accept: application/xml;q=0.9
accept: image/webp
accept: image/apng
accept: */*;q=0.8
accept: application/signed-exchange;v=b3
accept-encoding: gzip
accept-encoding: deflate
accept-language: es-ES
accept-language: es;q=0.9
accept-language: en-US;q=0.8
accept-language: en;q=0.7
and these is what i need to send cause if i send them like i put above, like these script does, the server rejects my get requests.
:method: GET
:authority: server.url.com
:scheme: https
:path: /login.php?mobile&sid=2ea530a62cb63af6c14be116b7df86ad85cd77c9a11aa3c881b3a460e6c14fbd1fd8b79bd66c9782073705cdff25e890e65b5aeb852fde24c2d54a6e4ee49890&2
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Linux; Android 5.1.1; google Pixel 2 Build/LMY47I; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/74.0.3729.136 Mobile Safari/537.36
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3
accept-encoding: gzip, deflate
accept-language: es-ES,es;q=0.9,en-US;q=0.8,en;q=0.7
Seems thath for each "," i put in headers dict, hyper creates a new header instead of sending them in same value, witouth hyper it works normal but i need to send those headers thath are http 2.0 and witouth hyper or any other alternative i cannot, requests dosnt have support for it
:method: GET
:scheme: https
:authority: server.url.com
:path: /login.php?mobile&sid=577f0967545d6acec94716d265dd4867fa4db4a446326ecde7486a97feede14702f4911438f4a4cd097677f0dd962786ef14b3f16f1184ee63a506155d522f53&2
Set the headers in the HTTP20Adapter instead of in session and it should work.
adapter = HTTP20Adapter(headers=headers)
session.mount(prefix='https://', adapter=adapter)

Python3 requests post correctly but get nothing(but by browser is ok)

When I visit 'https://baike.baidu.com/wikitag/taglist?tagId=75953' on chrome,through fiddler I find the browser sends a post request to 'https://baike.baidu.com//wikitag/api/getlemmas'.
So I'm trying to send a 'POST' request with form data to the url:'https://baike.baidu.com//wikitag/api/getlemmas' and get the JSON data from its 'response' request.
I get all the headers and form data through the Fiddler and try to send the same 'POST' request by python3 using requests package.
But even I send the 'POST' request with the same headers and form data, I get the request(status:200) with an empty body.
the same request I send by 'postman' is also all right, but by python3 I failed anyway.
# -*- coding:UTF-8 -*-
import requests
def disease_json():
host = 'https://baike.baidu.com'
target = host + '/wikitag/api/getlemmas'
cookies={
'BAIDUID':'EEE35ACB030447144E615B191397065B:FG=1;PSTM=1523192637;BIDUPSID=B34DD366905D15BB907C1667346970AE;Hm_lvt_55b574651fcae74b0a9f1cf9c8d7c93a=1522304864,1522305101,1523192946,1523253565;PSINO=2;H_PS_PSSID=1990_1438_26082_21 125_22074;BDORZ=B490B5EBF6F3CD402E515D22BCDA1598'
}
headers = {
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.8',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'X-Requested-With':'XMLHttpRequest',
'Content-Length':'91',
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
'Referer':'https://baike.baidu.com/wikitag/taglist?tagId=75953',
'Origin':'https://baike.baidu.com',
'Connection':'keep-alive',
'Host':'baike.baidu.com',
}
forms = {
'limit': '24',
'timeout': '3000',
'filterTags':'[]',
'tagID': '75953',
'fromLemma': 'false',
'contentLength': '40',
'page': '0',
}
req=requests.post(url=target,data=forms,verify=False,headers=headers)
print(req.text)
"""
html = json.loads(req.text)
for each in html['lemmaList']:
print('lemmaCroppedTitle:',each['lemmaCroppedTitle'])
print(req.text)
"""
def main():
disease_json()
if __name__ == '__main__':
main()
Following is the correct request sent by browser:
Modified content-type and your request payload. Also added method encode_multipart_data for payload transformation to be consistent with multipart-form-data
import sys
import requests
def encode_multipart_data(fields):
boundary = '------WebKitFormBoundary7MA4YWxkTrZu0gW'
CRLF = '\r\n'
L = []
for key, value in fields.items():
L.append(boundary)
L.append('Content-Disposition: form-data; name="%s"\r\n' % key)
L.append(value)
L.append(boundary + "--")
body = CRLF.join(L)
return body
def disease_json():
host = 'https://baike.baidu.com'
target = host + '/wikitag/api/getlemmas'
headers = {
'Accept':'application/json, text/javascript, */*; q=0.01',
'Accept-Encoding':'gzip, deflate, br',
'Accept-Language':'zh-CN,zh;q=0.8',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.36',
'X-Requested-With':'XMLHttpRequest',
# changed content-type
'content-type': "multipart/form-data; boundary=----WebKitFormBoundary7MA4YWxkTrZu0gW",
'Referer':'https://baike.baidu.com/wikitag/taglist?tagId=75953',
'Origin':'https://baike.baidu.com',
'Connection':'keep-alive',
'Host':'baike.baidu.com'
}
forms = {
'limit': '24',
'timeout': '3000',
'filterTags': '[]',
'tagId': '75953',
'fromLemma': 'false',
'contentLength': '40',
'page': '0',
}
payload = encode_multipart_data(forms)
resp = requests.post(url=target, data=payload, headers=headers)
print(resp.text)
if __name__ == '__main__':
disease_json()
This way can also solve the problem.
import requests
import http.cookiejar
import json
url = "https://baike.baidu.com/wikitag/api/getlemmas"
payload = "limit=24&timeout=3000&filtetTags=%5B%5D&tagId=75953&fromLemma=false&contentLegth=40&page=0"
headers = {
'Content-Type': "application/x-www-form-urlencoded",
'User-Agent': "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181\
Safari/537.36"
}
def get_cookies():
session = requests.Session()
session.cookies = http.cookiejar.LWPCookieJar("cookie")
response = session.post(url, headers=headers, data=payload, allow_redirects=False,verify=False)
session.cookies.save(ignore_discard=True, ignore_expires=True)
return response
def disease_json(times=-1):
times += 1
response = get_cookies()
if response.status_code == 302:
session = requests.session()
session.cookies = http.cookiejar.LWPCookieJar(filename='cookie')
session.cookies.load(ignore_discard=True)
url = response.headers['Location']
response = session.post(url, headers=headers, data=payload, allow_redirects=False)
json_data = response.text
print(json.loads(json_data))
print(times)
if __name__ == '__main__':
disease_json()

LinkedIn HTTP Error 999 - Request denied

I am writing a simple script to get public profile visible without login on LinkedIn.
Below is my code to get the page for beautifulsoup. I am using public proxies as well.
import urllib.request, urllib.error
from bs4 import BeautifulSoup
url = "https://www.linkedin.com/company/amazon"
proxy = urllib.request.ProxyHandler({'https': proxy, })
opener = urllib.request.build_opener(proxy)
urllib.request.install_opener(opener)
hdr = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3218.0 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,hi;q=0.8',
'Connection': 'keep-alive'}
req = urllib.request.Request(url, headers=hdr)
page = urllib.request.urlopen(req, timeout=20)
self.soup = BeautifulSoup(page.read(), "lxml")
But it is raising "HTTPError 999 - request Denied" error. This is only for testing purpose till I am getting access via partnership program.
What am I doing wrong? Please help.
You did not do anything wrong, LinkedIn blacklist cloud servers ip addresses to prevent "stealing" their data. Questionable practice but this is how it is.

Resources