Python: Fail to request html from https site - python-3.x

I am trying to request the html data from the web site as shown below, but it prompts following error:
'Connection aborted.', OSError("(54, 'ECONNRESET')"
I have tried to add the certificate as well, but it also prompts following error:
Error: [('x509 certificate routines', 'X509_load_cert_crl_file', 'no certificate or crl found')]
The certificate is exported from Chrome.
Python Code:
import requests
from bs4 import BeautifulSoup
url ='https://www.openrice.com/zh/hongkong/restaurants/type/%E5%BF%AB%E9%A4%90%E5%BA%97?page=1'
html=requests.get(url, verify=False)
#html=requests.get(url, verify="/Users/xxx/Documents/Python/Go Daddy Root Certificate Authority - G2.cer")

Can you try this?
First of all, I didn't reproduce your environment the same way, and I tried to access the site from my PC, but it didn't work so well, so I added a user-agent to the header and it worked fine.
But I don't know if it will work on your PC.
import requests
url ='https://www.openrice.com/zh/hongkong/restaurants/type/%E5%BF%AB%E9%A4%90%E5%BA%97?page=1'
headers = {
'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
}
html=requests.get(url,headers=headers)
print(html.text)

Related

Excel M query: Permission issue

I am trying to access investing.com from excel 2016 using power query, I am getting a permission issue, is there a way to bypass it? It worked till yesterday, but now I am still getting this error even though I passed the cookie and header information in the URL. Below are the error and code
Error: [Permission Error] The credentials provided for the web source are invalid. Please update the credentials through a refresh or in the Data Source settings dialog to continue. Any guidance is much appreciated.
let
Source = ()=> Web.Contents("https://tvc6.investing.com/de339cb1628af015d591805dfffe9b13/1663439222/1/1/8/quotes?symbols=NSE%20%3ANSEI",[Headers=[#"accept-encoding"="gzip, deflate",#"cookie"="jLeqXJArEhTZVUQpwiP1gSPVlhn19b3fsB7uRcSUbyg-1666154846-0-AR2gelOI/3ZaH0SKRFuxhsgOGr/SOXpnbQ7e90mYM2iDmpTf8XcvvIB/FvGtgkB50nccnbWB1rFawcO37vJpSBU=",#"accept-language"="en-US,en;q=0.9",#"user-agent"="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36 Edg/106.0.1370.47"]])
in
Source

Python requests same header like chrome but different response (show Cloudflare)

I'm trying to request (GET Method) with requests Library But I don't know why I get different response from this URL (udemy.com). can be problem from Certificate or cipher or Protocol?
headers = {
'Connection': 'keep-alive',
'Upgrade-Insecure-Requests': '1',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.82 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Language': 'en-US,en;q=0.9',
'Accept-Encoding': 'gzip, deflate',
}
req1 = requests.get('https://www.udemy.com/join/signup-popup/?displayType=ajax&display_type=popup&returnUrlAfterLogin=https&showSkipButton=1',headers=headers)
print(req1.text)
<Response [403]>
Cloudflare could detect pretty much every script, but if you use the browser itself with selenium there should be no problem.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.firefox.options import Options
try:
options = Options()
options.headless = True # for background run
browser = webdriver.Firefox(options=options) # required for firefox: https://github.com/mozilla/geckodriver/releases
browser.get('https://www.udemy.com/join/signup-popup/?displayType=ajax&display_type=popup&returnUrlAfterLogin=https&showSkipButton=1')
print(browser.find_element(By.ID, "auth-to-udemy-title").text) # Prints "Sign Up and Start Learning!"
finally:
if browser:
browser.close() # avoid memory leakage
The Selenium package and for Firefox the geckodriver are required.
The site likely does not want scripts to access it, and is probably using some sort of bot detection to block them. Trying to work around this would be unethical, and possibly illegal. The ethical way to proceed is to ask the site owner for permission to use your script, and have them give you some sort of bypass token for that purpose.

Web Scraping from Oddschecker using BeatifulSoup

I was previously able to scrape data from https://www.oddschecker.com/ using BeautifulSoup, however, now all I am getting is the following 403 status:
import requests
import bs4
result = requests.get("https://www.oddschecker.com/")
result.text
Output:
<html>\r\n<head><title>403 Forbidden</title></head>\r\n<body bgcolor="white">\r\n<center><h1>403 Forbidden</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n
I want to know if this is the same for all users on this website or if there is a way to navigate around this (via another web scraping package or other code) and access the actual data visible on the site.
Just add a user agent. It detects if your a bot by disabling js.
url = 'https://www.oddschecker.com/'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
result = requests.get(url, headers=headers)
print(result.text)
You can also use selenium.
from selenium import webdriver
driver.get("https://www.oddschecker.com/")
print(driver.page_source)

Large File Upload to SharePoint Throwing Error (System.Net.WebException: The request was aborted: The request was canceled.)

I am trying to upload a large file to share point of size 2GB its throwing error
System.Net.WebException: The request was aborted: The request was canceled.
Can anyone let me know what i can modify to overcome this. Here is my code below
The code is written through SSIS script task.
WebClient client = new WebClient();
client.Headers.Add(HttpRequestHeader.UserAgent, "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2490.33 Safari/537.36");
//Adding FedAuth cookies to Header O365
if (!String.IsNullOrEmpty(SPToken))
client.Headers.Add(HttpRequestHeader.Cookie, SPToken);
client.Credentials = new NetworkCredential(userName, password);
You can use webdav like https://msdn.microsoft.com/en-us/library/cc250097.aspx
You can also manually upload from internet explorer -> library -> open in explorer -> upload to use webdav option.

python 3: received 403:forbidden error when using request

HTTP Error 403: Forbidden is generated by using the either one of the following two commands.
requests.get('http://www.allareacodes.com')
urllib.request.urlopen('http://www.allareacodes.com')
however, I am able to browse this website in chrome and check its source. Besides, wget in my cygwin is also capable of grabbing the html source.
anyone knows how to grab the source of this website by using packages in python alone?
You have errors in your code for requests. It should be:
import requests
r = requests.get('http://www.allareacodes.com')
print(r.text)
In your case however, the website has a "noindex" file that stops scripts from getting the raw HTML data. As a solution, simply fake your headers so that the website thinks you're an actual user.
Example:
import requests
r = requests.get('http://www.allareacodes.com', headers={
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"
})
print(r.text)

Resources