urllib cannot read https - python-3.x

(Python 3.4.2)
Would anyone be able to help me fetch https pages with urllib? I've spent hours trying to figure this out.
Here's what I'm trying to do (pretty basic):
import urllib.request
url = "".join((baseurl, other_string, midurl, query))
response = urllib.request.urlopen(url)
html = response.read()
Here's my error output when I run it:
File "./script.py", line 124, in <module>
response = urllib.request.urlopen(url)
File "/usr/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.4/urllib/request.py", line 455, in open
response = self._open(req, data)
File "/usr/lib/python3.4/urllib/request.py", line 478, in _open
'unknown_open', req)
File "/usr/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/usr/lib/python3.4/urllib/request.py", line 1244, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: 'https>
I've also tried using data=None to no avail:
response = urllib.request.urlopen(url, data=None)
I've also tried this:
import urllib.request, ssl
https_sslv3_handler = urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
resp = opener.open(url)
html = resp.read().decode('utf-8')
print(html)
A similar error occurs with this^ script, where the error is found on the "resp = ..." line and complains that 'https' is an unknown url type.
Python was compiled with SSL support on my computer (Arch Linux). I've tried reinstalling python3 and openssl a few times, but that doesn't help. I haven't tried to uninstall python completely and then reinstall because I would also need to uninstall a lot of other programs on my computer.
Anyone know what's going on?
-----EDIT-----
I figured it out, thanks to help from Andrew Stevlov's answer. My url had a ":" in it, and I guess urllib didn't like that. I replaced it with "%3A" and now it's working. Thanks so much guys!!!

this may help
Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter - ')
html = urllib.request.urlopen(url, context=ctx).read()

Double check your compilation options, looks like something is wrong with your box.
At least the following code works for me:
from urllib.request import urlopen
resp = urlopen('https://github.com')
print(resp.read())

urllib.error.URLError: <urlopen error unknown url type: 'https>
The 'https and not https in the error message indicates that you did not try a http:// request but instead a 'https:// request which of course does not exist. Check how you construct your URL.

I had the same error when I tried to open a url with https, but no errors with http.
>>> from urllib.request import urlopen
>>> urlopen('http://google.com')
<http.client.HTTPResponse object at 0xb770252c>
>>> urlopen('https://google.com')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/local/lib/python3.7/urllib/request.py", line 548, in _open
'unknown_open', req)
File "/usr/local/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/local/lib/python3.7/urllib/request.py", line 1387, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: https>
This was done on Ubuntu 16.04 using Python 3.7. The native Ubuntu defaults to Python 3.5 in /usr/bin and previously I had source downloaded and upgraded to 3.7 in /usr/local/bin. The fact that there was no error for 3.5 pointed to the executable /usr/bin/openssl not being installed correctly in 3.7 which is also evident below:
>>> import ssl
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/local/lib/python3.7/ssl.py", line 98, in <module>
import _ssl # if we can't import it, let the error propagate
ModuleNotFoundError: No module named '_ssl'
By consulting this link, I changed SSL=/usr/local/ssl to SSL=/usr in 3.7 source dir's Modules/Setup.dist and also cp it into Setup and then rebuilt Python 3.7.
$ ./configure
$ make
$ make install
Now it is fixed:
>>> import ssl
>>> ssl.OPENSSL_VERSION
'OpenSSL 1.0.2g 1 Mar 2016'
>>> urlopen('https://www.google.com')
<http.client.HTTPResponse object at 0xb74c4ecc>
>>> urlopen('https://www.google.com').read()
b'<!doctype html>...
and 3.7 has been complied with OpenSSL support successfully. Note that the Ubuntu command "openssl version" is not complete until you load it into Python.

Related

Unable to run this Python script in Command Prompt for web scraping

Been following a web scraping tutorial on youtube and I'm getting this error due to the url. It's working perfectly in Spyder but all hell breaks loose when I try to run it in the Command Prompt.
I tried various answers from here (quote() that converts ":" to "%3A" and a few others like urlencode()) but it didn't work. Or may be I don't know how to make them work.
from bs4 import BeautifulSoup as soup
from urllib.request import urlopen as uReq
#from urllib.parse import quote
my_url = 'https://www.newegg.com/Video-Cards-Video-Devices/Category/ID-38?Tpk=graphics%20card'
# opening connection, grabbing information and closing connection
uClient = uReq(my_url)
This is the error message -
Traceback (most recent call last):
File "my_first_webscrape.py", line 8, in <module>
uClient = uReq(my_url)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 548, in _open
'unknown_open', req)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\ProgramData\Anaconda3\lib\urllib\request.py", line 1387, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: https>

python3.6.5 urllib.error.URLError: <urlopen error unknown url type: https>

I installed python3.6.5 # centOS 7 and make without error. but when i test for urlopen it raise error below:
e*urllib.error.URLError: <urlopen error unknown url type: https>*
I suspect it may related https and i double confirmed i have installed openssl
->zlib-dev1
Any advise on it? Thanks
The python3 code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import urllib.request
x = urllib.request.urlopen('https://www.google.com/')
print(x.read())
full error:
./url_test.py
Traceback (most recent call last):
File "./url_test.py", line 6, in <module>
x = urllib.request.urlopen('https://www.google.com/')
File "/usr/local/lib/python3.6/urllib/request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "/usr/local/lib/python3.6/urllib/request.py", line 526, in open
response = self._open(req, data)
File "/usr/local/lib/python3.6/urllib/request.py", line 549, in _open
'unknown_open', req)
File "/usr/local/lib/python3.6/urllib/request.py", line 504, in _ca`enter code here`ll_chain
result = func(*args)
File "/usr/local/lib/python3.6/urllib/request.py", line 1388, in unknown_open
raise URLError('unknown url type: %s' % type)
urllib.error.URLError: <urlopen error unknown url type: https>
finally i figure out what is problem cause it. It caused by openssl-devel that i did not install it.
sudo yum install openssl-devel
after completed the installation. double check
yum list installed|grep openssl
openssl-devel.x86_64 1:1.0.2k-12.el7 #rhui-REGION-rhel-server-releases
re-make
./configure
make
Test it again.Done
I got the same issue when installing poetry. Fixed that by just copying these two dll files
libcrypto-1_1-x64.dll
libssl-1_1-x64.dll
from ..\Anaconda3\Library\bin to ..\Anaconda3\DLLs directory.

Cannot run bind() with ldap3 in python3

Below is my code to try and query my ldap server. However, for some reason, I cannot seem to make the bind command work to progress in my code. I continue to get the following message and am not sure what it means or how to fix it. I apologize about the error message formatting. I tried my best to make it as readable as possible.
Any help would be greatly appreciated!
>>> from ldap3 import Server, Connection, SUBTREE
>>> server = Server('<server_name>')
>>> conn = Connection(server, user='<username>', password='<password>')
>>> conn.open()
>>> conn.bind()
Traceback (most recent call last):
File "", line 1, in
File "/usr/lib/python3.5/site-packages/ldap3/core/connection.py", line 427, in bind
response = self.post_send_single_response(self.send('bindRequest', request, controls))
File "/usr/lib/python3.5/site-packages/ldap3/strategy/sync.py", line 122, in post_send_single_response
responses, result = self.get_response(message_id)
File "/usr/lib/python3.5/site-packages/ldap3/strategy/base.py", line 298, in get_response
responses = self._get_response(message_id)
File "/usr/lib/python3.5/site-packages/ldap3/strategy/sync.py", line 168, in _get_response
dict_response = self.decode_response(ldap_resp)
File "/usr/lib/python3.5/site-packages/ldap3/strategy/base.py", line 403, in decode_response
result = bind_response_to_dict(component)
File "/usr/lib/python3.5/site-packages/ldap3/operation/bind.py", line 119, in bind_response_to_dict
'saslCreds': bytes(response['serverSaslCreds']) if response['serverSaslCreds'] is not None else None}
File "/usr/lib/python3.5/site-packages/pyasn1/type/univ.py", line 984, in bytes
return bytes(self._value)
File "/usr/lib/python3.5/site-packages/pyasn1/type/base.py", line 164, in plug
raise error.PyAsn1Error('Uninitialized ASN.1 value ("%s" attribute looked up)' % name)
pyasn1.error.PyAsn1Error: Uninitialized ASN.1 value ("len" attribute looked up)

TypeError: Can't convert 'bytes' object to str implicitly for tweepy

from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
ckey=''
csecret=''
atoken=''
asecret=''
class listener(StreamListener):
def on_data(self,data):
print(data)
return True
def on_error(self,status):
print(status)
auth = OAuthHandler(ckey,csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track="cricket")
This code filter the twitter stream based on the filter. But I am getting following traceback after running the code. Can somebody please help
Traceback (most recent call last):
File "lab.py", line 23, in <module>
twitterStream.filter(track="car".strip())
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 430, in filter
self._start(async)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 346, in _start
self._run()
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 286, in _run
raise exception
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 255, in _run
self._read_loop(resp)
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 298, in _read_loop
line = buf.read_line().strip()
File "C:\Python34\lib\site-packages\tweepy\streaming.py", line 171, in read_line
self._buffer += self._stream.read(self._chunk_size)
TypeError: Can't convert 'bytes' object to str implicitly
Im assuming you're using tweepy 3.4.0. The issue you've raised is 'open' on github (https://github.com/tweepy/tweepy/issues/615).
Two work-arounds :
1)
In streaming.py:
I changed line 161 to
self._buffer += self._stream.read(read_len).decode('UTF-8', 'ignore')
and line 171 to
self._buffer += self._stream.read(self._chunk_size).decode('UTF-8', 'ignore')
and then reinstalled via python3 setup.py install on my local copy of tweepy.
2)
remove the tweepy 3.4.0 module, and install 3.3.0 using command: pip install -I tweepy==3.3.0
Hope that helps,
-A
You can't do twitterStream.filter(track="car".strip()). Why are you adding the strip() it's serving no purpose in there.
track must be a str type before you invoke a connection to Twitter's Streaming API and tweepy is preventing that connection because you're trying to add strip()
If for some reason you need it, you can do track_word='car'.strip() then track=track_word, that's even unnecessary because:
>>> print('car'.strip())
car
Also, the error you're getting does not match the code you have listed, the code that's in your question should work fine.

python3 github authorizaitons Oauth2 not working

I'm trying to implement an oauth2 client in Python3 so that I can upload files to github. For a very basic start I'm trying to get a list of authorizations using the API.
This code works:
from subprocess import Popen,PIPE
user = 'MYUSERNAME'
pw = 'MYPASSWORD'
git_url = "https://api.github.com/authorizations"
res = Popen(['curl','--user',user + ':' + pw,git_url],stdout=PIPE,stderr=PIPE).communicate()[0]
print(res)
This code does not work:
user = 'MYUSERNAME'
pw = 'MYPASSWORD'
git_url = "https://api.github.com/authorizations"
import urllib.request
# Create an OpenerDirector with support for Basic HTTP Authentication...
auth_handler = urllib.request.HTTPBasicAuthHandler()
auth_handler.add_password(realm=None,
uri=git_url,
user=user,
passwd=pw)
opener = urllib.request.build_opener(auth_handler)
f = opener.open(git_url)
print(f.read())
In fact, it generates this error:
Traceback (most recent call last):
File "demo.py", line 18, in <module>
f = opener.open("https://api.github.com/authorizations")
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 375, in open
response = meth(req, response)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 487, in http_response
'http', request, response, code, msg, hdrs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 413, in error
return self._call_chain(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 347, in _call_chain
result = func(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.2/lib/python3.2/urllib/request.py", line 495, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found
I know that there is an existing Oauth2 implementation in python, but it's python2, not python3, and it does a lot more than I need.
I also know that I could just have my Python program call curl, and that's my fallback.
I'd really like to know what I'm doing wrong.
Thanks.
I have just posted an answer to another question with a full example using urllib2 from python2. Obviously you are interested in python3, but it shouldn't be to difficult to migrate the code.
Hope that helps,

Resources