I am newbie to python. I am trying to extract data from API's. I tried extracting data in my local using postman and it extracts the data. But when I use python requests I am getting connection aborted error. Can someone please help me in understanding this issue.
Below are the code that I have tried:
import requests
from requests import request
url = "https://abcd/smart_general_codes?category=BANK"
payload={}
headers = {
'TenantId': 'IN0XXX',
'Accept-Language': 'en_us',
'Transfer-Encoding': 'chunked',
'fileType': 'json',
'Authorization': 'Basic XXXXXXXXXX'
}
response = requests.get(url, headers=headers, data=payload, verify=False)
print(response.status_code)
print(response.text)
Code2:
import http.client
conn = http.client.HTTPSConnection("main.com")
payload = ''
headers = {
'powerpayTenantId': 'IN0XXX',
'Accept-Language': 'en_us',
'Transfer-Encoding': 'chunked',
'fileType': 'json',
'Authorization': 'Basic XXXXXXXXXX'
}
conn.request("GET", "abcd/smart_general_codes?category=BANK", payload, headers)
res = conn.getresponse()
data = res.read()
print(data.decode("utf-8"))
Both using httpclient and requests method throws the below error:
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "login_2.py", line 20, in <module>
response = requests.get(url, headers=headers, data=payload, verify=False)
File "/usr/lib/python3/dist-packages/requests/api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 520, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 630, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python3/dist-packages/requests/adapters.py", line 490, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
I have solved the issue. In postman the accept language was showing as en_us but updating that to en_US worked.
Try to add a fake user-agent (such as chrome) and your cookies (if needed) in headers like this:
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36',
'cookie': '<Your cookie>'
}
By the way, you can get your cookies by typing document.cookie in your browser's developer console.
Please remove the data=payload from the requests and try.
Related
I have added a user agent in the header. The following is my code and the error
from urllib.request import Request, urlopen
import json
from bs4 import BeautifulSoup
import time
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1)'}
domain=Request("http://online-courses.club/baugasm-series-8-design-abstract-textures-and-poster-with-acrylic-paint-photoshop-and-cinema-4d/",data=bytes(json.dumps(headers), encoding="utf-8"))
response =urlopen(domain)
I also tried a different version, note the change in domain variable
from urllib.request import Request, urlopen
import json
from bs4 import BeautifulSoup
import time
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1)'}
domain=Request("http://online-courses.club/baugasm-series-8-design-abstract-textures-and-poster-with-acrylic-paint-photoshop-and-cinema-4d/",headers)
response =urlopen(domain)
None of these codes are working.
The error:
line 9, in <module>
response =urlopen(domain)
File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 531, in open
response = meth(req, response)
File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 569, in error
return self._call_chain(*args)
File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Users\ABC\AppData\Local\Programs\Python\Python37\lib\urllib\request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Use .add_header() to add correct User-Agent.
For example:
from urllib.request import Request, urlopen
domain=Request("http://online-courses.club/baugasm-series-8-design-abstract-textures-and-poster-with-acrylic-paint-photoshop-and-cinema-4d/")
domain.add_header('User-Agent', 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0')
response =urlopen(domain)
print(response.read())
Prints:
b'<!DOCTYPE html>\r\n<html lang="en-US" prefix="og: http://ogp.me/ns#">\r\n<head itemscope="itemscope" itemtype="http://schema.org/WebSite">\r\n\t<meta charset="UTF-8" />
... and so on.
write an asynchronous scraper for RSS feeds and sometimes the following error occurs with some sites, for example:
In [1]: import requests_async as requests
In [2]: headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Ch
...: rome/79.0.3945.79 Safari/537.36'}
In [3]: r = await requests.get('https://albumorientedpodcast.com/category/album-oriented/feed/', headers=headers)
Here is the full traceback of this error:
Traceback (most recent call last):
File "rss_parser.py", line 55, in rss_downloader
response = await requests.get(rss, headers=headers)
File "C:\Python3\lib\site-packages\requests_async\api.py", line 11, in get
return await request("get", url, params=params, **kwargs)
File "C:\Python3\lib\site-packages\requests_async\api.py", line 6, in request
return await session.request(method=method, url=url, **kwargs)
File "C:\Python3\lib\site-packages\requests_async\sessions.py", line 79, in request
resp = await self.send(prep, **send_kwargs)
File "C:\Python3\lib\site-packages\requests_async\sessions.py", line 157, in send
async for resp in self.resolve_redirects(r, request, **kwargs):
File "C:\Python3\lib\site-packages\requests_async\sessions.py", line 290, in resolve_redirects
resp = await self.send(
File "C:\Python3\lib\site-packages\requests_async\sessions.py", line 136, in send
r = await adapter.send(request, **kwargs)
File "C:\Python3\lib\site-packages\requests_async\adapters.py", line 48, in send
response = await self.pool.request(
File "C:\Python3\lib\site-packages\http3\interfaces.py", line 49, in request
return await self.send(request, verify=verify, cert=cert, timeout=timeout)
File "C:\Python3\lib\site-packages\http3\dispatch\connection_pool.py", line 130, in send
raise exc
File "C:\Python3\lib\site-packages\http3\dispatch\connection_pool.py", line 120, in send
response = await connection.send(
File "C:\Python3\lib\site-packages\http3\dispatch\connection.py", line 56, in send
response = await self.h2_connection.send(request, timeout=timeout)
File "C:\Python3\lib\site-packages\http3\dispatch\http2.py", line 52, in send
status_code, headers = await self.receive_response(stream_id, timeout)
File "C:\Python3\lib\site-packages\http3\dispatch\http2.py", line 126, in receive_response
event = await self.receive_event(stream_id, timeout)
File "C:\Python3\lib\site-packages\http3\dispatch\http2.py", line 159, in receive_event
events = self.h2_state.receive_data(data)
File "C:\Python3\lib\site-packages\h2\connection.py", line 1463, in receive_data
events.extend(self._receive_frame(frame))
File "C:\Python3\lib\site-packages\h2\connection.py", line 1486, in _receive_frame
frames, events = self._frame_dispatch_table[frame.__class__](frame)
File "C:\Python3\lib\site-packages\h2\connection.py", line 1560, in _receive_headers_frame
frames, stream_events = stream.receive_headers(
File "C:\Python3\lib\site-packages\h2\stream.py", line 1055, in receive_headers
events[0].headers = self._process_received_headers(
File "C:\Python3\lib\site-packages\h2\stream.py", line 1298, in _process_received_headers
return list(headers)
File "C:\Python3\lib\site-packages\h2\utilities.py", line 335, in _reject_pseudo_header_fields
for header in headers:
File "C:\Python3\lib\site-packages\h2\utilities.py", line 291, in _reject_connection_header
for header in headers:
File "C:\Python3\lib\site-packages\h2\utilities.py", line 275, in _reject_te
for header in headers:
File "C:\Python3\lib\site-packages\h2\utilities.py", line 264, in _reject_surrounding_whitespace
raise ProtocolError(
h2.exceptions.ProtocolError: Received header value surrounded by whitespace b'3.vie _dca '
At the same time, this same site is normally loaded through common requests library:
In [1]: import requests
In [2]: headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Ch
...: rome/79.0.3945.79 Safari/537.36'}
In [3]: r = requests.get('https://albumorientedpodcast.com/category/album-oriented/feed/', headers=headers)
In [4]: r
Out[4]: <Response [200]>
I tried to find at least some information on this error, but nothing. Can someone tell me what I can do to avoid a similar error and load the site normally?
requests-async has been archived, but its github page contains a link to the successor - httpx.
httpx seems to have similar syntax and actively maintained.
Consider try it: many bugs may had been fixed there.
Tried to execute the following code but invariably get the "TooManyRedirects" error. What am I doing incorrectly?
My code:
import requests, json
Address = '100 W Grant Street'
City = 'Orlando'
State = 'FL'
url = 'https://tools.usps.com/tools/app/ziplookup/zipByAddress'
data = {'company':'', 'address1': Address, 'address2':'','city': City, 'state': 'State', 'zip': ''}
raw = requests.post(url, data=data)
Here's the massive error message I get:
Traceback (most recent call last):
File "<pyshell#1347>", line 1, in <module>
raw = requests.post(url, data=data)
File "C:\Users\Karun\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 112, in post
return request('post', url, data=data, json=json, **kwargs)
File "C:\Users\Karun\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Users\Karun\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "C:\Users\Karun\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 640, in send
history = [resp for resp in gen] if allow_redirects else []
File "C:\Users\Karun\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 640, in <listcomp>
history = [resp for resp in gen] if allow_redirects else []
File "C:\Users\Karun\AppData\Local\Programs\Python\Python36\lib\site-packages\requests\sessions.py", line 140, in resolve_redirects
raise TooManyRedirects('Exceeded %s redirects.' % self.max_redirects, response=resp)
requests.exceptions.TooManyRedirects: Exceeded 30 redirects.
This particular url, for some reason, wants me to include the headers['User-Agent'] in the requests.post statement also. Then I get an appropriate response. So here's the new code:
import requests
s = requests.Session()
url = 'https://tools.usps.com/tools/app/ziplookup/zipByAddress'
payload = {'companyName':'', 'address1':'10570 Main St', 'address2':'', 'city':'Fairfax', 'state':'VA', 'zip':''}
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'}
r = s.post(url, data = payload, headers = headers)
I'm trying to get a website's source data after logging in but am having trouble logging in to get to the source. The url is the webpage I see after logging in. I.e. if I login on chrome, I can use url to go to where I need to get the source data.
I keep getting multiple errors, primarily handshake errors:
"sslv3 alert handshake failure", "bad handshake", "urllib3.exceptions.MaxRetryError", and I think the primary error is
Traceback (most recent call last):
File "C:\Users\bwayne\AppData\Local\Programs\Python\Python36-32\lib\site-packages\urllib3\contrib\pyopenssl.py", line 441, in wrap_socket
cnx.do_handshake()
File "C:\Users\bwayne\AppData\Local\Programs\Python\Python36-32\lib\site-packages\OpenSSL\SSL.py", line 1716, in do_handshake
self._raise_ssl_error(self._ssl, result)
File "C:\Users\bwayne\AppData\Local\Programs\Python\Python36-32\lib\site-packages\OpenSSL\SSL.py", line 1456, in _raise_ssl_error
_raise_current_error()
File "C:\Users\bwayne\AppData\Local\Programs\Python\Python36-32\lib\site-packages\OpenSSL_util.py", line 54, in exception_from_error_queue
raise exception_type(errors)
OpenSSL.SSL.Error: [('SSL routines', 'ssl3_read_bytes', 'sslv3 alert handshake failure')]
During handling of the above exception, another exception occurred:
import requests, sys
import ssl
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.poolmanager import PoolManager
ctx = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2)
ctx.options |= ssl.OP_NO_SSLv2
ctx.options |= ssl.OP_NO_SSLv3
ctx.options |= ssl.OP_NO_TLSv1
ctx.options |= ssl.OP_NO_TLSv1_1
class Ssl3HttpAdapter(HTTPAdapter):
def init_poolmanager(self, connections, maxsize, block=False):
self.poolmanager = PoolManager(num_pools=connections,
maxsize=maxsize,
block=block,
ssl_version=ssl.PROTOCOL_TLSv1)
url = "www.thewebsite.com"
def do_requests(url):
payload = {'Username': 'myName', 'Password': 'myPass'}
headers = {'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/61.0.3163.100 Mobile Safari/537.36'}
with requests.Session() as s:
s.mount(url,Ssl3HttpAdapter())
p = s.post(url, headers=headers, data=payload, verify=False)
def main(url):
do_requests(url)
main(url)
How can I login? I've double and triple checked that the HTML names are correct:
I am in China. I use a proxy to connect to the internet and when I want to proxy a program, I tunnel it with proxychains. Now, the question : I have this code, wich is a simple auth against the Youtube API :
import httplib2
import os
import logging
from oauth2client import tools
from oauth2client.client import AccessTokenCredentials
#from oauth2client.client import AccessTokenRefreshError
from googleapiclient.discovery import build
from googleapiclient.http import MediaFileUpload
from googleapiclient.errors import HttpError
import urllib
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
def authenticate():
httplib2.debuglevel = 4
acc_token = "ya29.dgLFP1i6jTuc-hnaC9D704i2jbQ2HOHweSqxjL9GxSFBg8QgvU"
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
flow = AccessTokenCredentials(acc_token, user_agent)
http = flow.authorize(httplib2.Http())
service = build('youtube', 'v3', http=http)
return(service)
def initialize_upload(youtube):
tags = 'classical music', 'yehudi mehunin'
body = dict(
snippet=dict(
title='some title',
description='a description',
tags=tags,
categoryId='4'
),
status=dict(
privacyStatus='Private'
)
)
youtube.videos().insert(part=",".join(body.keys()), body=body, media_body=MediaFileUpload(
'1977.mp4', mimetype='video/mp4', chunksize=1024 * 1024, resumable=False))
def first():
youtube = authenticate()
initialize_upload(youtube)
first()
When I first turn on my computer, I activate my virtualenv, execute the script form a terminal without proxying it, and I get a timeout ( I have to break it manually to exit ) and I get this output :
^CTraceback (most recent call last):
File "youtubeconnect.py", line 48, in <module>
first()
File "youtubeconnect.py", line 45, in first
youtube = authenticate()
File "youtubeconnect.py", line 21, in authenticate
service = build('youtube', 'v3', http=http)
File "/home/xavier/Code/autotube/venv/lib/python3.5/site-packages/oauth2client-1.5.2-py3.5.egg/oauth2client/util.py", line 140, in positional_wrapper
File "/home/xavier/Code/autotube/venv/lib/python3.5/site-packages/google_api_python_client-1.4.2-py3.5.egg/googleapiclient/discovery.py", line 196, in build
File "/home/xavier/Code/autotube/venv/lib/python3.5/site-packages/google_api_python_client-1.4.2-py3.5.egg/googleapiclient/discovery.py", line 242, in _retrieve_discovery_doc
File "/home/xavier/Code/autotube/venv/lib/python3.5/site-packages/oauth2client-1.5.2-py3.5.egg/oauth2client/client.py", line 596, in new_request
File "/home/xavier/Code/autotube/venv/lib/python3.5/site-packages/httplib2-0.9.2-py3.5.egg/httplib2/__init__.py", line 1314, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/home/xavier/Code/autotube/venv/lib/python3.5/site-packages/httplib2-0.9.2-py3.5.egg/httplib2/__init__.py", line 1064, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/home/xavier/Code/autotube/venv/lib/python3.5/site-packages/httplib2-0.9.2-py3.5.egg/httplib2/__init__.py", line 987, in _conn_request
conn.connect()
File "/usr/lib/python3.5/http/client.py", line 1229, in connect
super().connect()
File "/usr/lib/python3.5/http/client.py", line 826, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/usr/lib/python3.5/socket.py", line 702, in create_connection
sock.connect(sa)
KeyboardInterrupt
Now, I run it for the first time tunneled with proxychains and I get a response :
|DNS-request| www.googleapis.com
|S-chain|-<>-127.0.0.1:1080-<><>-4.2.2.2:53-<><>-OK
|DNS-response| www.googleapis.com is 74.125.29.95
|S-chain|-<>-127.0.0.1:1080-<><>-74.125.29.95:443-<><>-OK
send: b'GET /discovery/v1/apis/youtube/v3/rest HTTP/1.1\r\nHost: www.googleapis.com\r\nuser-agent: Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36\r\nauthorization: Bearer ya29.dgLFP1i6jTuc-hnaC9D704i2jbQ2HOHweSqxjL9GxSF\r\naccept-encoding: gzip, deflate\r\n\r\n'
reply: 'HTTP/1.1 200 OK\r\n'
header: Expires header: Date header: ETag header: Vary header: Vary header: Content-Type header: Content-Encoding header: X-Content-Type-Options header: X-Frame-Options header: X-XSS-Protection header: Server header: Content-Length header: Age header: Cache-Control header: Alternate-Protocol header: Alt-Svc (venv) xavier#xavier:~/Code/autotube$ proxychains python3 youtubeconnect.py
ProxyChains-3.1 (http://proxychains.sf.net)
Now, why when I run it again, both tunneling it and not tunneling it, the script executes and doesn't give any output anymore ??? The script executes without error and that's it. No output. I can only get output when I restart my computer. Is the API or some library using a cache or something similar ? I ahve also tried deactivating and reactivating the venv, but everything remains the same. Does someone know ?
Ok, it has to do with how httplib2 handles the cache. It has automatic caching enabled, so in order to get a copy of the response you have to specify a cache directory in the httplib2.Http('.cache') object. There is no way to avoid automatic caching if you don't specifically modify the cache headers in order to ignore the cache and send the request again to the server, bypassing it. Bear in mind that if you are proxying, your proxy has also a cache. For good information about HTTP, httplib2 and Python3, check the free resource : http://www.diveintopython3.net/http-web-services.html