urlopen error [Errno 11001] getaddrinfo failed? - python-3.x

Hello everyone I am a beginner programmer in language Python and I need help.
this is my code in Python, it gives an error, please help to fix
urllib.error.URLError: urlopen error [Errno 11001] getaddrinfo failed
# -*- coding: utf-8 -*-
import urllib.request
from lxml.html import parse
WEBSITE = 'http://allrecipes.com'
URL_PAGE = 'http://allrecipes.com/recipes/110/appetizers-and-snacks/deviled-eggs/?page='
def correct_str(s):
return s.encode('utf-8').decode('ascii', 'ignore').strip()
for i in range(START_PAGE, END_PAGE+1):
URL = URL_PAGE + str(i)
HTML = urllib.request.urlopen(URL)
page = parse(HTML).getroot()
for elem in page.xpath('//*[#id="grid"]/article[not(contains(#class, "video-card"))]/a[1]'):
href = WEBSITE + elem.get('href')
title = correct_str(elem.find('h3').text)
recipe_page = parse(urllib.request.urlopen(href)).getroot()
photo_url = recipe_page.xpath('//img[#class="rec-photo"]')[0].get('src')
print('\nName: |', title)
print('Photo: |', photo_url)
This into command prompt: python I get this error:
Traceback (most recent call last):
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1240, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
Name: | Crab-Stuffed Deviled Eggs
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1083, in request
Photo: | http://images.media-allrecipes.com/userphotos/720x405/1091564.jpg
self._send_request(method, url, body, headers)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1128, in _send_request
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 1079, in endheaders
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 911, in _send_output
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 854, in send
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\http\client.py", line 826, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 693, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\socket.py", line 732, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:/Users/Ivan/Dropbox/parser/test.py", line 27, in <module>
recipe_page = parse(urllib.request.urlopen(href)).getroot()
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 162, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 465, in open
response = self._open(req, data)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 483, in _open
'_open', req)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 443, in _call_chain
result = func(*args)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1268, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Users\Ivan\AppData\Local\Programs\Python\Python35-32\lib\urllib\request.py", line 1242, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>
Process finished with exit code 1

I'll attempt to explain three main ways to dig into a programming problem:
(1) Use a debugger. You could walk through your code and examine variables before they are used and before they throw an exception. Python comes with pdb. In this problem you would step through the code and print out the href before urlopen().
(2) Assertions. Use Python's assert to assert assumptions in your code. You could, for example, assert not href.startswith('http')
(3) Logging. Log relevant variables before they are used. This is what I used:
I added the following to your code...
href = WEBSITE + elem.get('href')
And got...
Photo: | http://images.media-allrecipes.com/userphotos/720x405/1091564.jpg
From here you can see your getaddrinfo problem: Your system is trying to open a url at a host named allrecipes.comhttp.
This looks to be a problem based upon your assumption that WEBSITE must be prepended to every href you pull from the html.
You can handle the case of an absolute vs relative href with something like this and a function to determine if the url is absolute:
import urlparse
def is_absolute(url):
# See https://stackoverflow.com/questions/8357098/how-can-i-check-if-a-url-is-absolute-using-python
return bool(urlparse.urlparse(url).netloc)
href = elem.get('href')
if not is_absolute(href):
href = WEBSITE + href

A better way to do this is to use parse as such:
from urllib import parse
href = parse.urljoin(base_url, href)
This will will return a complete url for the href in case it is not complete.


Using urlib in a loop

I am trying to write a simple python script that checks the status code of a specific URL and take necessary actions based on the return code. I am using urllib module to achieve this. The issue with this code is that I have different departments defined within a list (I need to iterate over this list inside the urllib) I can't seem to find a way to substitute/insert elements of list in the for loop.
import getopt, sys
import urllib.request
depts = [ 'support', 'edelivery', 'docs']
for dept in depts:
res = urllib.request.urlopen('https://dept.oracle.com').getcode()
I get the below error. Can we actually iterate over a list in urllib module in a loop?
#python3 reg_c_with_all.py
Traceback (most recent call last):
File "/usr/lib64/python3.7/urllib/request.py", line 1350, in do_open
File "/usr/lib64/python3.7/http/client.py", line 1277, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/lib64/python3.7/http/client.py", line 1323, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.7/http/client.py", line 1272, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/lib64/python3.7/http/client.py", line 1032, in _send_output
File "/usr/lib64/python3.7/http/client.py", line 972, in send
File "/usr/lib64/python3.7/http/client.py", line 1439, in connect
File "/usr/lib64/python3.7/http/client.py", line 944, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/usr/lib64/python3.7/socket.py", line 707, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "/usr/lib64/python3.7/socket.py", line 752, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "reg_c_with_all.py", line 6, in <module>
res = urllib.request.urlopen('https://dept.oracle.com').getcode()
File "/usr/lib64/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib64/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/usr/lib64/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/usr/lib64/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/usr/lib64/python3.7/urllib/request.py", line 1393, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/usr/lib64/python3.7/urllib/request.py", line 1352, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>
When I run it individually it returns the response code as expected.
>>> urllib.request.urlopen('https://support.oracle.com').getcode()
>>> urllib.request.urlopen('https://docs.oracle.com').getcode()
>>> urllib.request.urlopen('https://edelivery.oracle.com').getcode()
Your code iterates over the departments but it does not use them to modify the URL. Try using an fstring to insert the value of dept in the URL:
import getopt, sys
import urllib.request
depts = [ 'support', 'edelivery', 'docs']
for dept in depts:
url = f'https://{dept}.oracle.com'
res = urllib.request.urlopen(url).getcode()
print(f'{url} => {res}')
https://support.oracle.com => 200
https://edelivery.oracle.com => 200
https://docs.oracle.com => 200

Another exception occurred when Checking Connection status with a try catch in urllib.error

I tried to make a simple code to check whether the pc has internet or not.
When the pc is connected to the internet the program runs correctly and prints "Cool, thank you for connecting me to the internet" But when I run the program without the internet an error occurs with the note: During handling of the above exception, another exception occurred.
What am I missing?
import urllib.request
import urllib.error
loop_value = 1
while (loop_value == 1):
except urllib.error as e:
print ("Run me again, after connecting")
print ("Cool, thank you for connecting me to internet")
loop_value = 0
the error-
Traceback (most recent call last):
File "C:\Python\Python37\lib\urllib\request.py", line 1317, in do_open
File "C:\Python\Python37\lib\http\client.py", line 1229, in request
self._send_request(method, url, body, headers, encode_chunked)
File "C:\Python\Python37\lib\http\client.py", line 1275, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "C:\Python\Python37\lib\http\client.py", line 1224, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "C:\Python\Python37\lib\http\client.py", line 1016, in _send_output
File "C:\Python\Python37\lib\http\client.py", line 956, in send
File "C:\Python\Python37\lib\http\client.py", line 1384, in connect
File "C:\Python\Python37\lib\http\client.py", line 928, in connect
(self.host,self.port), self.timeout, self.source_address)
File "C:\Python\Python37\lib\socket.py", line 707, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
File "C:\Python\Python37\lib\socket.py", line 748, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno 11001] getaddrinfo failed
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "F:/py/weathr.py", line 10, in <module>
File "C:\Python\Python37\lib\urllib\request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "C:\Python\Python37\lib\urllib\request.py", line 525, in open
response = self._open(req, data)
File "C:\Python\Python37\lib\urllib\request.py", line 543, in _open
'_open', req)
File "C:\Python\Python37\lib\urllib\request.py", line 503, in _call_chain
result = func(*args)
File "C:\Python\Python37\lib\urllib\request.py", line 1360, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\Python\Python37\lib\urllib\request.py", line 1319, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 11001] getaddrinfo failed>
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "F:/py/weathr.py", line 12, in <module>
except urllib.error as e:
TypeError: catching classes that do not inherit from BaseException is not allowed
Your error means that you're trying to catch an object that isn't an exception (i.e. a class that isn't an instance of BaseException). Only exceptions are allowed to be caught, because only exceptions can be raised.
According to the docs the method urllib.request.urlopen raises the exception URLError if something goes wrong, so catch that in order to react to an error.
Example implementation:
except urllib.error.URLError as e:
print("Couldn't open foo.com. Error:")

Selenium - urllib.error.URLError: <urlopen error [Errno 61] Connection refused>

Note: I spend more than one hour trying to solve this issue and found no solution that worked for me.
At the end it turned out to be a very simple mistake, but I thought I will create the question so in case anybody else has the same issue can find a solution fast.
I was trying to scrape a site with the following code:
phantomjs_path = '/Users/xxx/xxx/phantomjs-2.1.1-macosx/bin/phantomjs'
driver = webdriver.PhantomJS(executable_path=phantomjs_path)
driver.set_window_size(1024, 768) #optional
# wait
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CLASS_NAME, "flightrow")))
response = driver.find_element_by_css_selector('table[class="flighttable"]')
html = response.get_attribute('outerHTML') #pass from webdrive object to string
And was getting the following error:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1254, in do_open
h.request(req.get_method(), req.selector, req.data, headers)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1151, in _send_request
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 1102, in endheaders
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 934, in _send_output
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 877, in send
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/http/client.py", line 849, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/socket.py", line 711, in create_connection
raise err
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/socket.py", line 702, in create_connection
ConnectionRefusedError: [Errno 61] Connection refused
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "my_script.py", line 1251, in <module>
File "my_script.py", line 1232, in script_main
File "my_script.py", line 1202, in parse_js
print('response:', response.text)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/selenium/webdriver/remote/webelement.py", line 68, in text
return self._execute(Command.GET_ELEMENT_TEXT)['value']
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/selenium/webdriver/remote/webelement.py", line 461, in _execute
return self._parent.execute(command, params)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/selenium/webdriver/remote/webdriver.py", line 234, in execute
response = self.command_executor.execute(driver_command, params)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/selenium/webdriver/remote/remote_connection.py", line 401, in execute
return self._request(command_info[0], url, body=data)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/selenium/webdriver/remote/remote_connection.py", line 471, in _request
resp = opener.open(request, timeout=self._timeout)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 466, in open
response = self._open(req, data)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 484, in _open
'_open', req)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 444, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1282, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/urllib/request.py", line 1256, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 61] Connection refused>
Loading the url manually in the Chrome browser was working.
Anyway, I tried switching the url from https to http, but I still got the same error.
In addition, during the previous day I did not get any error, so I assumed it could not be a problem with firewalls, as I read in some other questions.
See answer for the solution...
It turned out that apparently I had moved the line driver.quit() upwards, so the error was raised when calling 'get_atribute'.
Just move driver.quit() downwards:
driver = webdriver.PhantomJS(executable_path=phantomjs_path)
driver.set_window_size(1024, 768) #optional
# wait
element = WebDriverWait(driver, 20).until(
EC.presence_of_element_located((By.CLASS_NAME, "flightrow")))
response = driver.find_element_by_css_selector('table[class="flighttable"]')
html = response.get_attribute('outerHTML') #pass from webdrive object to string
#do not move quite() upwards! even if 'driver' is not specifically called with the command 'get_attribute'
#it will raise an error if driver is closed.

Python - How to open urls (html files) which are located on an internal network with urlopen?

The addresses I am trying to open look like this:
"//usa-qa/QAEngr/QA_Team_VSripathi/Negar/Help files/broken/header_test/main.htm"
They are located on a company's internal network.
I tried this piece of code:
import urllib
import urllib.request as ur
address="//usa-qa/QAEngr/QA_Team_VSripathi/Negar/Help files/broken/header_test/main.htm"
which results in the following error:
Traceback (most recent call last):
File "C:/Users/naref/Desktop/networktest.py", line 14, in <module>
File "C:\Python33\lib\urllib\request.py", line 160, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 458, in open
req = Request(fullurl, data)
File "C:\Python33\lib\urllib\request.py", line 279, in __init__
File "C:\Python33\lib\urllib\request.py", line 284, in _parse
raise ValueError("unknown url type: %s" % self.full_url)
ValueError: unknown url type: //usa-qa/QAEngr/QA_Team_VSripathi/Negar/Help files/broken/header_test/main.htm
The same happens when converting the path to url:
Then I tried to add an "http:" to the beginning of the address:
address="http://usa-qa/QAEngr/QA_Team_VSripathi/Negar/Help files/broken/header_test/main.htm"
Which results in the following error:
Traceback (most recent call last):
File "C:/Users/naref/Desktop/networktest.py", line 14, in <module>
File "C:\Python33\lib\urllib\request.py", line 160, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 473, in open
response = self._open(req, data)
File "C:\Python33\lib\urllib\request.py", line 491, in _open
'_open', req)
File "C:\Python33\lib\urllib\request.py", line 451, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 1272, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Python33\lib\urllib\request.py", line 1255, in do_open
raise URLError(err)
urllib.error.URLError:<urlopen error [WinError 10061] No connection could be made because the target machine actively refused it>
It is interesting that this problem gets solved when I map the network drive on my computer which runs windows 8: (http://windows.microsoft.com/en-us/windows-8/create-shortcut-to-map-network-drive)
Also the file can be opened with the open function instead of the urlopen, however I prefer to use urlopen since for my purpose the address might be a website's url.

ValueError: invalid literal for int() with base 10: urllib

I am not a python guru I am just writing a code to check my api authentication and URL access status. I just want to assure that my api and domain url is accessible to users.
For the above reason I am writing a python script which can check and a cron can send an alert to me.
Here is my Code:
def check(argv):
# I'm going to use argpase.It makes
# command-line args a breeze.
parser = argparse.ArgumentParser()
parser.add_argument('-H', '--hostname', dest='hostname', required=True)
parser.add_argument('-a', '--auth_id', dest='authenticationid')
parser.add_argument('-t', '--auth_token', dest='authenticationtoken')
parser.add_argument('-r', '--dest_url', dest='dest_url',help="""Path to report relative to root, like /v1/ OR /""", required=True)
parser.add_argument("-q", "--quiet", action="store_false", dest="verbose", default=True,
help="don't print status messages to stdout")
args = vars(parser.parse_args())
if args['authenticationid'] and args['authenticationtoken'] and not len(sys.argv) == 7:
authurl = urllib.request.Request('https://{%s}:{%s}#%s%s/%s/' %(args['authenticationid'],args['authenticationtoken'],args['hostname'], args['dest_url'],args['authenticationid']))
return (getAuthResponseCode(authurl))
url = urllib.request.Request("https://%s%s" %(args['hostname'], args['dest_url']))
def getResponseCode(url):
conn = urllib.request.urlopen(url,timeout=10)
code = conn.getcode()
return (status['OK'], code)
except timeout:
return (status['WARNING'], logging.error('socket timed out - URL %s', url))
except urllib.error.URLError as e:
return (status['CRITICAL'], e.reason)
return (status['UNKNOWN'])
def getAuthResponseCode(authurl):
authconn = urllib.request.urlopen(authurl, timeout=10)
authcode = authconn.getcode()
return (status['OK'], authcode)
except timeout:
return (status['WARNING'], logging.error('socket timed out - URL %s'))
except urllib.error.URLError as err:
return (status['CRITICAL'], err.reason)
return (status['UNKNOWN'])
ERROR Message:
G:\Python>python check_http.py -H api.mydomain.com -r /API/Function/ -a 'MAMZMZZGVLMG
Traceback (most recent call last):
File "C:\Python33\lib\http\client.py", line 770, in _set_hostport
port = int(host[i+1:])
ValueError: invalid literal for int() with base 10: "{'DafniisfnsifnsifsbANBBDSDNBISDExODZlODAw
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "check_http.py", line 76, in <module>
print (check(sys.argv[1:]))
File "check_http.py", line 41, in check
return (getAuthResponseCode(authurl))
File "check_http.py", line 61, in getAuthResponseCode
authconn = urllib.request.urlopen(authurl, timeout=10)
File "C:\Python33\lib\urllib\request.py", line 156, in urlopen
return opener.open(url, data, timeout)
File "C:\Python33\lib\urllib\request.py", line 469, in open
response = self._open(req, data)
File "C:\Python33\lib\urllib\request.py", line 487, in _open
'_open', req)
File "C:\Python33\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "C:\Python33\lib\urllib\request.py", line 1283, in https_open
context=self._context, check_hostname=self._check_hostname)
File "C:\Python33\lib\urllib\request.py", line 1219, in do_open
h = http_class(host, timeout=req.timeout, **http_conn_args)
File "C:\Python33\lib\http\client.py", line 1172, in __init__
File "C:\Python33\lib\http\client.py", line 749, in __init__
self._set_hostport(host, port)
File "C:\Python33\lib\http\client.py", line 775, in _set_hostport
raise InvalidURL("nonnumeric port: '%s'" % host[i+1:])
http.client.InvalidURL: nonnumeric port: '{'DafniisfnsifnsifsbANBBDSDNBISDExODZlODAw
I know this is not write my code forum but I am helpless and looking for some help.If some one can tell where exactly I went wrong so that I can fix it.
I am using python3.
You're passing ʹhttps://user:pass#whateverʹ as the url.
Python doesn't understand you're trying to authenticate, and thinks you're passing ʹhttps://domain:port...ʹ
To do basic auth with urllib, you need to use a urllib.request.HTTPBasicAuthHandler
Sorry I'm not posting links and/or example code, but I'm typing this on my phone, and it makes those a pain.
