Forbidden (403) when requesting a page via python3 - python-3.x

I want to request this via python3.
This is the code:
import requests
bizportal_company_url = "https://www.bizportal.co.il/realestates/quote/generalview/373019"
self.page = requests.get(self.bizportal_company_url)
and I get:
<Response [403]>
When I add verify=False to the get command, I get:
InsecureRequestWarning: Unverified HTTPS request is being made. Adding certificate verification is strongly advised. See: https://urllib3.readthedocs.io/en/latest/advanced-usage.html#ssl-warnings
InsecureRequestWarning)
<Response [403]>
How can I fix it? When I access the url there is no password or anything.

You might be missing a few things in your code.
Try this:
import requests
headers = {
"user-agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36"
}
session = requests.Session()
response = session.get("https://www.bizportal.co.il", headers=headers)
url = "https://www.bizportal.co.il/realestates/quote/generalview/373019"
print(session.get(url, headers=headers).status_code)
This should print:
200
Which basically means the request has been successful.

Related

Python: Can't extract tbody information from website

I want to extract all links of this website: https://pflegefinder.bkk-dachverband.de/pflegeheime/searchresult.php?required=1&statistics=1&searchdata%5BmaxDistance%5D=0&searchdata%5BcareType%5D=inpatientCare#/tab/general
The information I want are stored in the tbody: page code
Every time I try to extract the data I get no result.
from bs4 import BeautifulSoup
import requests
from requests_html import HTMLSession
url = "https://pflegefinder.bkk-dachverband.de/pflegeheime/searchresult.php?required=1&statistics=1&searchdata%5BmaxDistance%5D=0&searchdata%5BcareType%5D=inpatientCare#complex-searchresult"
session = HTMLSession()
r = session.get(url)
r.html.render()
soup = BeautifulSoup(r.html.html,'html.parser')
print(r.html.search("Details"))
Thank you for your help!
The site uses a backend api to deliver the info, if you look at your browser's Developer Tools - Network - fetch/XHR and refresh the page you'll see the data load via json in a request with a similar url to the one you posted.
You can scrape that data like this, it returns json which is easy enough to parse:
import requests
headers = {
'Referer':'https://pflegefinder.bkk-dachverband.de/pflegeheime/searchresult.php?required=1&statistics=1&searchdata%5BmaxDistance%5D=0&searchdata%5BcareType%5D=inpatientCare',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36'
}
for page in range(2):
url = f'https://pflegefinder.bkk-dachverband.de/api/nursing-homes?required=1&statistics=1&maxDistance=0&careType=inpatientCare&limit=20&offset={page*20}'
resp = requests.get(url,headers=headers).json()
print(resp)
The api checks that you have a "Referer" header otherwise you get a 400 response.

Get response 403 when i'm trying to crawling, user agent doesn't work in Python 3

I'm trying to crawling this website and get the message:
"You don't have permission to access"
there is a way to bypass this ? already used user agents and urlopen
Here is my code:
import requests
from bs4 import BeautifulSoup
import json
import pandas as pd
from urllib.request import Request, urlopen
url = 'https://www.oref.org.il/12481-he/Pakar.aspx'
header = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/56.0.2924.76 Safari/537.36'}
res = requests.get(url, headers=header)
soup = BeautifulSoup(res.content, 'html.parser')
print(res)
output:
<Response [403]>
also tried to do that:
req = Request(url, headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/95.0.4638.69 Safari/537.36'})
webpage = urlopen(req).read()
output:
HTTP Error 403: Forbidden
still blocked and get response 403, Anyone who can help?

How to access the Medium.com final URL from link.medium.com, using Axios npm

Accessing https://link.medium.com/C1hxgiphAcb on browser is redirecting to https://medium.com/javascript-in-plain-english/add-size-limit-to-github-actions-551c8fe9e7d7
From the backend, I am trying to figure out the final URL, given the shortened URL.
I am using Axios package at the backend (nodeJS).
var mediumRequest = await Axios.get('https://link.medium.com/C1hxgiphAcb')
console.log(mediumRequest.request.res.responseUrl)
>> 'https://rsci.app.link/C1hxgiphAcb?_p=c21634dc9a016ceeeb1d90f4e8'
But, that is not the actual Final URL.
Am I missing something?
This did the job, adding a user-agent to the headers that represent a browser.
url = 'https://link.medium.com/C1hxgiphAcb';
headers = { 'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.116 Safari/537.36' };
var x = await Axios.get(url, { headers: headers } )
console.log(x.request.res.responseUrl)
Output = https://medium.com/javascript-in-plain-english/add-size-limit-to-github-actions-551c8fe9e7d7

Python requests not receiving response cookies

I am sending a GET request to this url (mobile user-agent needed). When sending this request on my phone or in postman, it returns a cookie called oidc.sid but when i do this in python requests, it does not return any cookies.
Here is my requests code:
get_resp = requests.get("https://www.uniqlo.com/ca/auth/v1/login", headers=headers)
headers = {
"user-agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/70.0.3538.75 Mobile/15E148 Safari/605.1",
}
Any help would be appreciated. Thank you
It is easy understand why you saw this, because get_resp is the response(last response) after redirects. Website set cookie in first response so you could not get any cookies in get_resp. Only need to set allow_redirects=False your question will be solved
import requests
headers = {
"User-Agent": "Mozilla/5.0 (iPhone; CPU iPhone OS 12_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/70.0.3538.75 Mobile/15E148 Safari/605.1",
}
get_resp = requests.get("https://www.uniqlo.com/ca/auth/v1/login", headers=headers,allow_redirects=False)
print(get_resp.cookies)

hangs on open url with urllib (python3)

I try to open url with python3:
import urllib.request
fp = urllib.request.urlopen("http://lebed.com/")
mybytes = fp.read()
mystr = mybytes.decode("utf8")
fp.close()
print(mystr)
But it hangs on second line.
What's the reason of this problem and how to fix it?
I suppose the reason is that the url does not support robot visiting a site visit. You need to fake a browser visit by sending browser headers along with your request
import urllib.request
url = "http://lebed.com/"
req = urllib.request.Request(
url,
data=None,
headers={
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
}
)
f = urllib.request.urlopen(req)
Tried this one on my system and it works.
Agree with Arpit Solanki. Shown output for a failed request vs successful.
Failed
GET / HTTP/1.1
Accept-Encoding: identity
Host: www.lebed.com
Connection: close
User-Agent: Python-urllib/3.5
Success
GET / HTTP/1.1
Accept-Encoding: identity
Host: www.lebed.com
Connection: close
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36

Resources