Python requests code worked yesterday but now returns TooManyRedirects: Exceeded 30 redirects - python-3.x

I am trying to get the data from a site using requests using this simple code (running on Google Colab):
import requests, json
def GetAllStocks():
url = 'https://iboard.ssi.com.vn/dchart/api/1.1/defaultAllStocks'
res = requests.get(url)
return json.loads(res.text)
This worked well until this morning and I could not figure out why it is returning "TooManyRedirects: Exceeded 30 redirects." error now.
I can still get the data just by browsing the url directly from Google Chrome in Incognito mode so I donot think this is because of the Cookies. I tried passing the whole headers but still it does not work. I tried passing 'allow_redirects=False' and the returned status_code is 302.
I am not sure if there is anything I could try as this is so strange to me.
Any guidance is much appreciated. Thank you very much!

You need to send user-agent header to mimic a regular browser behaviour.
import requests, json, random
def GetAllStocks():
user_agents = [
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:101.0) Gecko/20100101 Firefox/101.0",
"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:77.0) Gecko/20190101 Firefox/77.0",
"Mozilla/5.0 (Windows NT 10.0; WOW64; rv:77.0) Gecko/20100101 Firefox/77.0",
]
headers = {
"User-Agent": random.choice(user_agents),
"Accept": "application/json",
}
url = "https://iboard.ssi.com.vn/dchart/api/1.1/defaultAllStocks"
res = requests.get(url, headers=headers)
return json.loads(res.text)
data = GetAllStocks()
print(data)

Related

Python selenium can't run with chrome v93

yesterday i can run my code successfully with chrome v92.0.4515.107.But after chrome auto updating to v93 today.here's a part of code
class CNVD(object):
def __init__(self):
self.options=webdriver.ChromeOptions()
self.options.add_experimental_option("detach", True)
self.options.add_experimental_option("excludeSwitches", ['enable-automation', 'enable-logging'])
self.driver = webdriver.Chrome(chrome_options=self.options)
# self.driver.maximize_window()
def login(self):
#headers设置,缺少会导致session实效
headers = {
'Host':'www.cnvd.org.cn',
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36',
'Accept':'*/*',
'Accept-Language':'zh-CN,zh;q=0.8,zh-TW;q=0.7,zh-HK;q=0.5,en-US;q=0.3,en;q=0.2',
'Accept-Encoding':'gzip, deflate',
'Content-Type':'application/x-www-form-urlencoded; charset=UTF-8',
'X-Requested-With':'XMLHttpRequest',
'Origin':'https://www.cnvd.org.cn',
'Referer':'https://www.cnvd.org.cn/user/login'
}
data = 'password=xxxx'
response = session.post(url="https://www.cnvd.org.cn/user/doLogin/loginForm",data=data,headers=headers)
response.encoding='utf-8'
self.driver.get("https://www.cnvd.org.cn")
self.driver.add_cookie({'name':'__jsl_clearance_s','value':session.cookies.get_dict()['__jsl_clearance_s']})
self.driver.add_cookie({'name':'JSESSIONID','value':session.cookies.get_dict()['JSESSIONID']})
self.driver.add_cookie({'name':'__jsluid_s','value':session.cookies.get_dict()['__jsluid_s']})
self.driver.get("https://www.cnvd.org.cn/user/doLogin/loginForm")
i'm sure it can run with chrome v92.0.4515.107.
could somebody help me pls QAQ
If your chrome has automatically updated, then doing the following two steps can hopefully solve your problem.
Download updated version of Chrome driver, which in your case in ChromeDriver 93.0.4577.15.
Update your following line of your code
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/92.0.4515.107 Safari/537.36'
You can obtain the User-Agent from network tab in chrome inspect.

python3 - request - cookie -

How can i get cookie of last page with?
My codes are here:
headerMain = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.125"}
istekMain = requests.get("https://www.example.org/", headers=headerMain)
cookie = istekMain.cookies.get_dict()
istekLazim = {"display_type":"popup"}
istekLogin = requests.get("https://www.example.org/", headers=headerMain, params=istekLazim, cookies=cookie)
print(istekLogin.text)

Expand short urls in python using requests library

I have a large number of short URLs and I want to expand them. I found somewhere online (I missed the source) the following code:
short_url = "t.co/NHBbLlfCaa"
r = requests.get(short_url)
if r.status_code == 200:
print("Actual url:%s" % r.url)
It works perfectly. But I get this error when I ping the same server for many times:
urllib3.exceptions.MaxRetryError:
HTTPConnectionPool(host='www.fatlossadvice.pw', port=80): Max retries
exceeded with url:
/TIPS/KILLED-THAT-TREADMILL-WORKOUT-WORD-TO-TIMMY-GACQUIN.ASP (Caused
by NewConnectionError(': Failed to establish a new connection: [Errno
11004] getaddrinfo failed',))
I tried many solutions like the set here: Max retries exceeded with URL in requests, but nothing worked.
I was thinking about another solution, which is to pass an useragent in the request, and each time I change it randomly (using a large number of useragents):
user_agent_list = [
'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:25.0) Gecko/20100101 Firefox/25.0',
'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.6; rv:25.0) Gecko/20100101 Firefox/25.0',
'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:24.0) Gecko/20100101 Firefox/24.0',
'Mozilla/5.0 (Windows NT 6.2; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/32.0.1667.0 Safari/537.36',
]
r = requests.get(short_url, headers={'User-Agent': user_agent_list[np.random.randint(0, len(user_agent_list))]})
if r.status_code == 200:
print("Actual url:%s" % r.url)
My problem is that r.url always return the short url instead of the long one (the expanded one).
What am I missing?
You can prevent the error by adding allow_redirects=False to requests.get() method to prevent redirecting to page that doesn't exist (and thus raising the error). You have to examine the header sent by server yourself (replace XXXX by https, remove spaces):
import requests
short_url = ["XXXX t.co /namDL4YHYu",
'XXXX t.co /MjvmV',
'XXXX t.co /JSjtxfaxRJ',
'XXXX t.co /xxGSANSE8K',
'XXXX t.co /ZRhf5gWNQg']
for url in short_url:
r = requests.get(url, allow_redirects=False)
try:
print(url, r.headers['location'])
except KeyError:
print(url, "Page doesn't exist!")
Prints:
XXXX t.co/namDL4YHYu http://gottimechillinaround.tumblr.com/post/133931725110/tip-672
XXXX t.co/MjvmV Page doesn't exist!
XXXX t.co/JSjtxfaxRJ http://www.youtube.com/watch?v=rE693eNyyss
XXXX t.co/xxGSANSE8K http://www.losefattips.pw/Tips/My-stretch-before-and-after-my-workout-is-just-as-important-to-me-as-my-workout.asp
XXXX .co/ZRhf5gWNQg http://www.youtube.com/watch?v=3OK1P9GzDPM

Login using python requests doesn't work for pythonanywhere.com

I am trying login to the site pythonanywhere.com
import requests
url='https://www.pythonanywhere.com/login'
s = requests.session()
values = {
'auth-username': 'username',
'auth-password': 'password'}
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
u = s.post(url, data=values, headers=headers)
But I am getting a <Response [403]> , Csrf verification failed. How do I login to that site?
You need to get page first.So you can get the crsftoken and sessionid.And remember to set Referer=https://www.pythonanywhere.com/login/
import requests
url='https://www.pythonanywhere.com/login'
s = requests.session()
s.get(url)
values = {
'auth-username': 'username',
'auth-password': 'password',
"csrfmiddlewaretoken" : s.cookies.get("csrftoken"),
"login_view-current_step" : "auth"
}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/69.0.3497.92 Safari/537.36',
'Referer': 'https://www.pythonanywhere.com/login/'}
u = s.post(url, data=values, headers=headers)
print(u.content)

Extracting data tables from website

I want to extract a data table from a website. Pandas read_html is giving a HTTP error 403. Is there any other module through which I can extract the data by python.
Here is the website: https://pakstockexchange.com/stock2/index_new.php?section=research&page=show_price_table_new&symbol=ABOT
Mask your session as if you were using a browser:
import requests
header = {
"User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.75 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
r = requests.get(url, headers=header)
dfs = pd.read_html(r.text)

Resources