Getting 403 response with python's requests module - python-3.x

import requests
headers = {
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
"Accept-Encoding": "gzip, deflate",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Dnt": "1",
"Host": "httpbin.org",
"Upgrade-Insecure-Requests": "1",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0",
}
url="https://app.shkolo.bg/dashboard"
res = requests.get(url,headers=headers)
print(res)
This throws a 403 response.
Any idea why?
I've just starting using the requests module so I cannot much more information.

By removing all information except the User-Agent I managed to get the reply:
import requests
url = "https://app.shkolo.bg/dashboard"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:83.0) Gecko/20100101 Firefox/83.0'}
result = requests.get(url, headers=headers)
print(result.content.decode())
# Begin of the output
<!DOCTYPE HTML>
<html lang="en-US">
<head>
<meta charset="UTF-8" />
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
Note that the answer is completely similar to this post.

Related

Can I use post method in requests lib on this Binance site?

Here is the site:
https://www.binance.com/ru/futures-activity/leaderboard?type=myProfile&tradeType=PERPETUAL&encryptedUid=E921F42DCD4D9F6ECC0DFCE3BAB1D11A
I am parsing the positions of the trader by selenium, but today realize, that I can use post method.
Here is what "NETWORK" shows:
Here is response preview:
I have no experience with post method of requests, I tried this, but doesn't work.
import requests
hd = {'accept':"*/*",'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36'}
ses = requests.Session()
c = ses.post('https://www.binance.com/bapi/futures/v1/public/future/leaderboard/getOtherPosition',headers=hd)
print(c.text)
Output is:
{"code":"000002","message":"illegal parameter","messageDetail":null,"data":null,"success":false}
Can someone help me to do it, please? Is it real?
It's working as POST method
import requests
url='https://www.binance.com/bapi/futures/v1/public/future/leaderboard/getOtherPerformance'
headers= {
"content-type": "application/json",
"x-trace-id": "4c3d6fce-a2d8-421e-9d5b-e0c12bd2c7c0",
"x-ui-request-trace": "4c3d6fce-a2d8-421e-9d5b-e0c12bd2c7c0"
}
payload = {"encryptedUid":"E921F42DCD4D9F6ECC0DFCE3BAB1D11A","tradeType":"PERPETUAL"}
req=requests.post(url,headers=headers,json=payload).json()
#print(req)
for item in req['data']:
roi = item['value']
print(roi)
Output:
-0.023215
-91841.251668
0.109495
390421.996614
-0.063094
-266413.73955621
0.099181
641189.24407088
0.072079
265977.556474
-0.09197
-400692.52138279
-0.069988
-469016.33171481
0.0445
292594.20440128
I am used Curlconverter, and it helped me a lot! Here is the working code:
import requests
headers = {
'authority': 'www.binance.com',
'accept': '*/*',
'accept-language': 'ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7,ja;q=0.6,ko;q=0.5,zh-CN;q=0.4,zh;q=0.3',
'bnc-uuid': '0202c537-8c2b-463a-bdef-33761d21986a',
'clienttype': 'web',
'csrftoken': 'd41d8cd98f00b204e9800998ecf8427e',
'device-info': 'eyJzY3JlZW5fcmVzb2x1dGlvbiI6IjE5MjAsMTA4MCIsImF2YWlsYWJsZV9zY3JlZW5fcmVzb2x1dGlvbiI6IjE5MjAsMTA0MCIsInN5c3RlbV92ZXJzaW9uIjoiV2luZG93cyAxMCIsImJyYW5kX21vZGVsIjoidW5rbm93biIsInN5c3RlbV9sYW5nIjoicnUtUlUiLCJ0aW1lem9uZSI6IkdNVCszIiwidGltZXpvbmVPZmZzZXQiOi0xODAsInVzZXJfYWdlbnQiOiJNb3ppbGxhLzUuMCAoV2luZG93cyBOVCAxMC4wOyBXaW42NDsgeDY0KSBBcHBsZVdlYktpdC81MzcuMzYgKEtIVE1MLCBsaWtlIEdlY2tvKSBDaHJvbWUvMTAxLjAuNDk1MS42NyBTYWZhcmkvNTM3LjM2IiwibGlzdF9wbHVnaW4iOiJQREYgVmlld2VyLENocm9tZSBQREYgVmlld2VyLENocm9taXVtIFBERiBWaWV3ZXIsTWljcm9zb2Z0IEVkZ2UgUERGIFZpZXdlcixXZWJLaXQgYnVpbHQtaW4gUERGIiwiY2FudmFzX2NvZGUiOiI1ZjhkZDMyNCIsIndlYmdsX3ZlbmRvciI6Ikdvb2dsZSBJbmMuIChJbnRlbCkiLCJ3ZWJnbF9yZW5kZXJlciI6IkFOR0xFIChJbnRlbCwgSW50ZWwoUikgVUhEIEdyYXBoaWNzIDYyMCBEaXJlY3QzRDExIHZzXzVfMCBwc181XzAsIEQzRDExKSIsImF1ZGlvIjoiMTI0LjA0MzQ3NTI3NTE2MDc0IiwicGxhdGZvcm0iOiJXaW4zMiIsIndlYl90aW1lem9uZSI6IkV1cm9wZS9Nb3Njb3ciLCJkZXZpY2VfbmFtZSI6IkNocm9tZSBWMTAxLjAuNDk1MS42NyAoV2luZG93cykiLCJmaW5nZXJwcmludCI6IjE5YWFhZGNmMDI5ZTY1MzU3N2Q5OGYwMmE0NDE4Nzc5IiwiZGV2aWNlX2lkIjoiIiwicmVsYXRlZF9kZXZpY2VfaWRzIjoiMTY1MjY4OTg2NTQwMGdQNDg1VEtmWnVCeUhONDNCc2oifQ==',
'fvideo-id': '3214483f88c0abbba34e5ecf5edbeeca1e56e405',
'lang': 'ru',
'origin': 'https://www.binance.com',
'referer': 'https://www.binance.com/ru/futures-activity/leaderboard?type=myProfile&tradeType=PERPETUAL&encryptedUid=E921F42DCD4D9F6ECC0DFCE3BAB1D11A',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="101", "Google Chrome";v="101"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-origin',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/101.0.4951.67 Safari/537.36',
'x-trace-id': 'e9d5223c-5d71-4834-8563-c253a1fc3ae8',
'x-ui-request-trace': 'e9d5223c-5d71-4834-8563-c253a1fc3ae8',
}
json_data = {
'encryptedUid': 'E921F42DCD4D9F6ECC0DFCE3BAB1D11A',
'tradeType': 'PERPETUAL',
}
response = requests.post('https://www.binance.com/bapi/futures/v1/public/future/leaderboard/getOtherPosition', headers=headers, json=json_data)
print(response.text)
So output now is:
{"code":"000000","message":null,"messageDetail":null,
"data":{
"otherPositionRetList":[{"symbol":"ETHUSDT","entryPrice":1985.926527932,"markPrice":2013.57606795,"pnl":41926.93300012,"roe":0.05492624,"updateTime":[2022,5,22,15,35,39,358000000],"amount":1516.370,"updateTimeStamp":1653233739358,"yellow":true,"tradeBefore":false},{"symbol":"KSMUSDT","entryPrice":80.36574159583,"markPrice":79.46000000,"pnl":-1118.13799285,"roe":-0.01128900,"updateTime":[2022,5,16,11,0,5,608000000],"amount":1234.5,"updateTimeStamp":1652698805608,"yellow":false,"tradeBefore":false},{"symbol":"IMXUSDT","entryPrice":0.9969444089129,"markPrice":0.97390429,"pnl":-13861.75961996,"roe":-0.02365747,"updateTime":[2022,5,22,15,57,3,329000000],"amount":601636,"updateTimeStamp":1653235023329,"yellow":true,"tradeBefore":false},{"symbol":"MANAUSDT","entryPrice":1.110770201096,"markPrice":1.09640000,"pnl":-6462.14960820,"roe":-0.05242685,"updateTime":[2022,5,21,16,6,2,291000000],"amount":449691,"updateTimeStamp":1653149162291,"yellow":false,"tradeBefore":false},{"symbol":"EOSUSDT","entryPrice":1.341744945184,"markPrice":1.35400000,"pnl":-4572.78323455,"roe":-0.09051004,"updateTime":[2022,5,22,11,47,48,542000000],"amount":-373134.3,"updateTimeStamp":1653220068542,"yellow":true,"tradeBefore":false},{"symbol":"BTCUSDT","entryPrice":29174.44207538,"markPrice":30015.10000000,"pnl":-173841.33354801,"roe":-0.47613317,"updateTime":[2022,5,21,15,13,0,252000000],"amount":-206.792,"updateTimeStamp":1653145980252,"yellow":false,"tradeBefore":false},{"symbol":"DYDXUSDT","entryPrice":2.21378804417,"markPrice":2.11967778,"pnl":-48142.71521969,"roe":-0.08879676,"updateTime":[2022,5,18,16,40,18,654000000],"amount":511556.5,"updateTimeStamp":1652892018654,"yellow":false,"tradeBefore":false}],"updateTime":[2022,5,16,11,0,5,608000000],"updateTimeStamp":1652698805608},"success":true}

Scraping values from View Source using Requests Python 3

So this code below is working fine but when i change the url to another site it doesn't work
import requests
import re
url = "https://www.autotrader.ca/a/ram/1500/hamilton/ontario/19_12052335_/?showcpo=ShowCpo&ncse=no&ursrc=pl&urp=2&urm=8&sprx=-2"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
response = requests.get(url, headers=headers)
phone_number = re.findall('"phoneNumber":"([\d-]+)"', response.text)
print(phone_number)
['905-870-7127']
This code below doesn't work it gives the output [] Please tell me what am i doing wrong
import requests
import re
urls = "https://www.kijijiautos.ca/vip/22686710/","https://www.kijijiautos.ca/vip/22686710/"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
for url in urls:
response = requests.get(url, headers=headers)
number = re.findall('"number":"([\d-]+)"', response.text)
print(number)
[]
I think you are not getting The HTTP 200 OK success status as a response.for that cause you are unable to get the exptected ouptput. To get the HTTP 200 OK success status, I have changed the headers from inspecting http requests.
please try this
import requests
import re
import requests
headers = {
'authority': 'www.kijijiautos.ca',
'sec-ch-ua': '"Chromium";v="94", "Google Chrome";v="94", ";Not A Brand";v="99"',
'pragma': 'no-cache',
'accept-language': 'en-CA',
'sec-ch-ua-mobile': '?0',
'user-agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36',
'content-type': 'application/json',
'accept': 'application/json',
'cache-control': 'no-cache',
'x-client-id': 'c89e7ff8-1d5a-4c2b-a095-c08dc08ccd3b',
'x-client': 'ca.move.web.app',
'sec-ch-ua-platform': '"Linux"',
'sec-fetch-site': 'same-origin',
'sec-fetch-mode': 'cors',
'sec-fetch-dest': 'empty',
'referer': 'https://www.kijijiautos.ca/cars/hyundai/sonata/used/',
'cookie': 'mvcid=c89e7ff8-1d5a-4c2b-a095-c08dc08ccd3b; locale=en-CA; trty=e; _gcl_au=1.1.1363596757.1633936124; _ga=GA1.2.1193080228.1633936126; _gid=GA1.2.71842091.1633936126; AAMC_kijiji_0=REGION%7C3; aam_uuid=43389576784435124231935699643302941454; _fbp=fb.1.1633936286669.1508597061; __gads=ID=bb71a6fc168c1c33:T=1633936286:S=ALNI_MZk3lgy-9xgSGLPnfrkBET60uS6fA; GCLB=COyIgrWs-PWPsQE; lux_uid=163402080128473094; cto_bundle=zxCnjF95NFglMkZrTG5EZ2dzNHFSdjJ6QSUyQkJvM1BUbk5WTkpjTms0aWdZb3RxZUR3Qk1nd1BmcSUyQjZaZVFUdFdSelpua3pKQjFhTFk0U2ViTHVZbVg5ODVBNGJkZ2NqUGg1cHZJN3V0MWlwRkQwb1htcm5nNDlqJTJGUUN3bmt6ZFkzU1J0bjMzMyUyRkt5aGVqWTJ5RVJCa2ZJQUwxcFJnJTNEJTNE; _td=7f855061-c320-4570-b2d2-73c94bd22b13; rbzid=54THgSkyCRKwhVBqy+iHmjb1RG+uE6uH1XjpsXIazO5nO45GtpIXHGYii/PbJcdG3ahjIgKaBrjh0Yx2J6YCOLHEv3QYL559oz3jQaVrssH2/1Ui9buvIpuCwBOGG2xXGWW2qvcU5807PGsdubQDUvLkxmy4sor+4EzCI1OoUHMOG2asQwsgChqwzJixVvrE21E/NJdRfDLlejb5WeGEgU4B3dOYH95yYf5h+7fxV6H/XLhqbNa8e41DM3scfyeYWeqWCWmOH2VWZ7i3oQ0OXW1SkobLy0D6G+V9J5QMxb0=; rbzsessionid=ca53a07d3404ca93b3f8bc879291dc83; _uetsid=131a47702a6211ecba407d9ff6588dde; _uetvid=131b13602a6211ecacd0b56b0815e9b2',
}
response = requests.get('https://www.kijijiautos.ca/consumer/svc/a/22686710', headers=headers)
if response.status_code == 200:
# print(response.text)
numbers = re.findall(r'"number":"\+\d+"', response.text) # number one or more
print(numbers[0])
else:
print('status code is ', response.status_code)
output
# "number":"+17169905088"

How to prevent beautifulsoap from extracting the information as strange symbols?

I am trying to extract an information with beautifulsoap, however when I do it it extracts it with very rare symbols. But when I enter directly to the page everything looks good and the page has the label <meta charset="utf-8">
my code is:
HEADERS = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'referrer': 'https://google.com',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9',
'Pragma': 'no-cache',
}
urls = 'https://www.jcchouinard.com/web-scraping-with-python-and-requests-html/'
r = requests.get(urls, headers=HEADERS)
soup = bs4.BeautifulSoup(r.text, "html.parser")
print (soup)
Nevertheless, the result I get is this:
J{$%X Àà’8}±ŸÅ
I guess it's something with the encoding, however I don't understand why since the page is utf-8.
It is worth clarifying that this only happens in some cases, since with others I manage to extract the information without problems.
Edit: updated with a sample url.
Edit2: added the headers dictionary, which is the one that generates the problem.
The problem is Accept-Encoding HTTP header. There you have br specified, which means brotli compression method. requests module cannot handle that. Remove br and the server responds without this compression method.
import requests
from bs4 import BeautifulSoup
HEADERS = {
'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Safari/537.36',
'referrer': 'https://google.com',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8',
'Accept-Encoding': 'gzip, deflate', # <-- remove br
'Accept-Language': 'en-US,en;q=0.9',
'Pragma': 'no-cache',
}
urls = 'https://www.jcchouinard.com/web-scraping-with-python-and-requests-html/'
r = requests.get(urls, headers=HEADERS)
soup = BeautifulSoup(r.text, "html.parser")
print (soup)
Prints:
<!DOCTYPE html>
<html lang="fr-FR">
<head><style>img.lazy{min-height:1px}</style><link as="script" href="https://www.jcchouinard.com/wp-content/plugins/w3-total-cache/pub/js/lazyload.min.js?x73818" rel="preload"/>
...and so on.
import requests
from bs4 import BeautifulSoup
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:79.0) Gecko/20100101 Firefox/79.0'
}
def main(url):
r = requests.get(url, headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')
print(soup.prettify())
main("https://www.jcchouinard.com/web-scraping-with-python-and-requests-html/")
You've just to add headers

Why am i being detected as robot when i am replicating the exact request a browser is making?

This is the website "https://www.interlinecenter.com/" this website is making request to "http://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y" for loading html content in an "I-FRAME". I am making the exact same request using the same headers being sent by the browser but i am not getting the same content.
Here is the code i am using:
url='http://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y'
header = {
'Host': 'cs.cruisebase.com',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br',
'Referer': 'https://www.interlinecenter.com/',
'Connection': 'keep-alive',
'Cookie': 'visid_incap_312345=yt2dprI6SuGoy44xQsnF36dOwV0AAAAAQUIPAAAAAAAqm0pG5WAWOGjtyY8GOrLv; __utma=15704100.1052110012.1572947038.1574192877.1575447075.6; __utmz=15704100.1575447075.6.6.utmcsr=interlinecenter.com|utmccn=(referral)|utmcmd=referral|utmcct=/; ASP.NET_SessionId=pzd3a0l5kso41hhbqf3jiqlg; nlbi_312345=/7dzbSeGvDjg2/oY/eQfhwAAAACv806Zf3m7TsjHAou/y177; incap_ses_1219_312345=tMxeGkIPugj4d1gaasLqECHE5l0AAAAAg1IvjaYhEfuSIYLXtc2f/w==; LastVisitedClient=605; AWSELB=85D5DF550634E967F245F317B00A8C32EB84DA2B6B927E6D5CCB7C26C3821788BFC50D95449A1BA0B0AFD152140A70F5EA06CBB8492B21E10EC083351D7EBC4C68F086862A; incap_ses_500_312345=6PJ9FxwJ3gh0vta6kVvwBthz510AAAAAvUZPdshu8GVWM2sbkoUXmg==; __utmb=15704100.2.10.1575447075; __utmc=15704100; __utmt_tt=1',
'Upgrade-Insecure-Requests': '1',
'Cache-Control': 'max-age=0'
}
response = requests.get(url, timeout=10, headers=header)
byte_data = response.content
source_code = html.fromstring(byte_data)
print(response)
print(byte_data)
This is the response i am getting:
<Response [200]>
<html style="height:100%"><head><META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW"><meta name="format-detection" content="telephone=no"><meta name="viewport" content="initial-scale=1.0"><meta http-equiv="X-UA-Compatible" content="IE=edge,chrome=1"></head><body style="margin:0px;height:100%"><iframe id="main-iframe" src="/_Incapsula_Resource?SWUDNSAI=9&xinfo=10-99927380-0%200NNN%20RT%281575456049298%202%29%20q%280%20-1%20-1%200%29%20r%281%20-1%29%20B12%284%2c316%2c0%29%20U2&incident_id=500000240101726326-477561257670738314&edet=12&cinfo=04000000&rpinfo=0" frameborder=0 width="100%" height="100%" marginheight="0px" marginwidth="0px">Request unsuccessful. Incapsula incident ID: 500000240101726326-477561257670738314</iframe></body></html>
I need to extract/scrape data at "https://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y".
Note: i don't want to use the selenium webdriver to get the data any help will be much appreciated, Thanks!
Did you try getting the headers by loading the target URL directly?
I sent a GET request to https://cs.cruisebase.com/cs/forms/hotdeals.aspx?skin=605&nc=y with the following headers, and I was able to get the complete response.
headers = {
'Accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3',
'Accept-Encoding':'gzip, deflate',
'Accept-Language':'en-GB,en;q=0.9,en-US;q=0.8,hi;q=0.7,la;q=0.6',
'Cache-Control':'no-cache',
'Connection':'keep-alive',
'Cookie':'ENTER COOKIES',
'DNT':'1',
'Host':'cs.cruisebase.com',
'Pragma':'no-cache',
'Upgrade-Insecure-Requests':'1',
'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36',
}
I have left the Cookie field blank, you will have to enter cookies otherwise the page won't load. You can get the cookies from Chrome.

Dynamic data from this website(nseindia.com) not importing in python pandas

Website Link : https://www.nseindia.com/products/content/equities/equities/eq_security.htm
import requests
import pandas as pd
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0",
"Accept": "*/*",
"Accept-Language": "en-US,en;q=0.5",
"X-Requested-With": "XMLHttpRequest"
}
r = requests.get('https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol=TCS&segmentLink=3&symbolCount=2&series=ALL&dateRange=+&fromDate=13-12-2019&toDate=18-12-2019&dataType=PRICEVOLUMEDELIVERABLE', headers=headers)
df = pd.read_html(r.text)
print(df)
I have tried code this but its giving me this error ValueError: No tables found

Resources