urlopen(url) 403 Forbidden error - http-status-code-403

I'm using python to open an URL with the following code and sometimes I get this error:
from urllib import urlopen
url = "http://www.gutenberg.org/files/2554/2554.txt"
raw = urlopen(url).read()
error:'\n\n403 Forbidden\n\nForbidden\nYou don\'t have permission to access /files/2554/2554.txt\non this server.\n\nApache Server at www.gutenberg.org Port 80\n\n'
What is this?
Thank you

This is the web page blocking Python access as it is making requests with the header 'User-Agent'.
To get around this, download the 'urllib2' module and use this code:
req = urllib2.Request(url, headers ={'User-Agent':'Chrome'})
raw = urllib2.urlopen(req).read()
You are know accessing the site with the header 'Chrome' and should no longer be forbidden (I tried it myself and it worked).
Hope this helps.

Related

When I use pycurl to execute a curl command, I get error 3 "illegal characters found in URL" but when pasting said URL in Chome, it can be resolved

Good day. I'm writing a python program that requests some posts from my Facebook page. To do so, Facebook offers a tool that they call "Graph API Explorer". Using something similar to a GET request, I can get anything that I want (granted that I have access and a valid token). I've come up with my own solution for the Graph API Explorer and that is generating my URLs. After generating a URL, I use pycurl to get a JSON object from Facebook that contains all of my data.
When I use pycurl, I get the following error:
pycurl.error: (3, 'Illegal characters found in URL')
but when printing said URL and pasting it to a browser, I got a valid response.
URL: https://graph.facebook.com/v7.0/me?fields=posts%7Bmessage%2Cfrom%7D&access_token=<and my access token which is valid>
my code looks like this:
def get_posts_curl(nodes=['posts'], fields=[['message', 'from']], token_file='Facebook/token.txt'):
curl = pycurl.Curl()
response = BytesIO()
token = get_token_from_file(token_file)
# constructing request.
url = parse_facebook_url_request(nodes, fields, token)
url = convert_to_curl(url)
print("---URL---: " + url)
# curl session and settings.
curl.setopt(curl.CAINFO, certifi.where())
curl.setopt(curl.URL, url)
curl.setopt(curl.WRITEDATA, response)
curl.perform()
curl.close()
return response.getvalue().decode('utf-8')
The error pops up at curl.perform()
Some info that might be relevant:
All was working great a while ago. After transferring my program from my workstation (that is running Windows 10) to my server (Ubuntu 18.04 Server) still, all was working fine and I placed that project to the side. Only now that error pops up and I haven't touched the project in a while.
It seems that the token is causing the issue. I've tried about 100 tokens and some cause the problem and some don't. Also, a fix that solved it all was using urllib3.unquote
from urllib.parse import unquote
...
url = unquote(url)

Unable to access data from .env file in python3

I am trying to write automation scripts for get request.
As part of the request i need to send secure key in headers. I am giving following details in test
url = "some url"
headers = {"key":"12345"}
response = requests.post(url, json_request, headers=headers)
At the moment i'm exposing header details. I am trying to get them saved to environ.env file and import them into config file. But i have no luck.
environ.env file
test_headers = {"key":"12345"}
config file
import os
headers = {"headers": os.environ.get("test_headers")}
Unable to get the code working. Headers are not recognised while executing code.
Any help would be greatly appreciated.
Thanks,
Got this to be working
environ.env file
KEY:"12345"
config file
import os
headers = {"headers": os.environ.get("KEY")}

PRIVACY Card API Authorization

I have recently been working with API's but I am stuck on one thing and it's been holding me back for a few days.
I am trying to work with Privacy's API and I do not understand the Authentication/Authorization process. When I enter the url in a browser I get the error "message": "Please provide API key in Authorization header", even when I use the correct format of Authorization. I also get an error when I make a request in Python. The format I'm using for the url is https://api.privacy.com/v1/card "Authorization: api-key:".
If someone could explain how to work this or simply give an example of how I would make a request through Python3. The API information is in the link below.
Thank you in advance.
https://developer.privacy.com/docs
This is the code I am using in Python. After I run this I receive a 401 status code.
import requests
headers={'Authorization': 'api-key:200e6036-6894-xxxx-xxxx-xxxx'}
url = 'https://api.privacy.com/v1/card'
r = requests.get(url)
print("Status code:", r.status_code)
You need to add the authentication header to the get call. It isn't enough to include it in a header variable. You need to provide those headers to requests
import requests
response = requests.get('https://api.privacy.com/v1/card', headers={'Authorization': 'api-key 65a9566c-XXXXXXXXXXXX'})
print(response.json())

Get Access to Sharepoint site using sharepy

I am trying to connect to a sharepoint site, however I get the AttributeError: 'NoneType' object has no attribute 'text'.
my code is as follows
import sharepy
s = sharepy.connect("https://example.sharepoint.com/",\
username='username', password='password')
Any idea why I get this error? I am allowed in the site, but not on the https://example.sharepoint.com/ server url, and also not allowed through python. I have tried several other ways in, using urllib.request and also requests. There my issue lies with having the HTTPError: Forbidden. The code run there is:
from sharepoint import SharePointSite, basic_auth_opener
server_url = 'https://example.sharepoint.com/'
opener = basic_auth_opener(server_url, "username",
"password")
site = SharePointSite(server_url, opener)
for sp_lists in site.lists:
print(sp_lists.id)
Seems like I have an issue with permission on the server or could it be something different?
sharepy shows this error if SharePoint returns an error message during the authentication. It is an error in the library because it doesn't know how to handle all error codes.
In your case the site url is wrong - "https://example.sharepoint.com" instead of "https://example.sharepoint.com/".
it might possible that you are missing header or other, please follow below link for more ideas on HTTP Error 403: Forbidden
https://mediatemple.net/community/products/dv/204644980/why-am-i-seeing-a-403-forbidden-error-message

Request Returns Response 447

I'm trying to scrape a website using requests and BeautifulSoup. When i run the code to obtain the tags of the webbpage the soup object is blank. I printed out the request object to see whether the request was successful, and it was not. The printed result shows response 447. I cant find what 447 means as a HTTP Status Code. Does anyone know how I can successfully connect and scrape the site?
Code:
r = requests.get('https://foobar)
soup = BeautifulSoup(r.text, 'html.parser')
print(soup.get_text())
Output:
''
When I print request object:
print(r)
Output:
<Response [447]>
Most likely your activity is acknowledged by the site so it's blocking your access,you can fix this problem by including headers in your request to site.
import bs4
import requests
session=requests.session()
headers={"User-Agent":"Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0","Accept":"text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"}
req=session.get(url,headers=headers)
soup=bs4.BeautifulSoup(req.text)
Sounds like they have browser detection software and they don't like your browser. (meaning they don't like your lack of a browser)
While 447 is not a standard error status for http, it is occasionally used in smtp as too many requests.
Without knowing what particular website you are looking at, it's not likely anyone will be able to give you more information. Chances are you just need to add headers.

Resources