Python script on ubuntu - OSError: [Errno 12] Cannot allocate memory - python-3.x

I am running a script on AWS (Ubunut) EC2 instance. It's a web scraper that uses selenium/chromedriver and headless chrome to scrape some webpages. I've had this script running previously with no problems, but today I'm getting an error. Here's the script:
options = Options()
options.add_argument('--no-sandbox')
options.add_argument('--window-size=1420,1080')
options.add_argument('--headless')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--disable-gpu')
options.add_argument("--disable-notifications")
options.binary_location='/usr/bin/chromium-browser'
driver = webdriver.Chrome(chrome_options=options)
#Set base url (SAN FRANCISCO)
base_url = 'https://www.bandsintown.com/en/c/san-francisco-ca?page='
events = []
for i in range(1,90):
#cycle through pages in range
driver.get(base_url + str(i))
pageURL = base_url + str(i)
print(pageURL)
When I run this script from ubuntu, I get this error:
Traceback (most recent call last):
File "BandsInTown_Scraper_SF.py", line 91, in <module>
driver = webdriver.Chrome(chrome_options=options)
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/chrome/webdriver.py", line 73, in __init__
self.service.start()
File "/home/ubuntu/.local/lib/python3.6/site-packages/selenium/webdriver/common/service.py", line 76, in start
stdin=PIPE)
File "/usr/lib/python3.6/subprocess.py", line 729, in __init__
restore_signals, start_new_session)
File "/usr/lib/python3.6/subprocess.py", line 1295, in _execute_child
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
I confirmed that I'm running the same version of Chromedriver/Chromium Browser:
ChromeDriver 79.0.3945.130 (e22de67c28798d98833a7137c0e22876237fc40a-refs/branch-heads/3945#{#1047})
Chromium 79.0.3945.130 Built on Ubuntu , running on Ubuntu 18.04
For what it's worth, I have this running on a mac, and I do have multiple web scraping scripts like this one running on the same EC2 instance (only 2 scripts so far, so not that much).
Update
I'm now getting these errors as well when trying to run this script on ubuntu:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 141, in _new_conn
(self.host, self.port), self.timeout, **extra_kw)
File "/usr/lib/python3/dist-packages/urllib3/util/connection.py", line 60, in create_connection
for res in socket.getaddrinfo(host, port, family, socket.SOCK_STREAM):
File "/usr/lib/python3.6/socket.py", line 745, in getaddrinfo
for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 852, in _validate_conn
conn.connect()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 284, in connect
conn = self._new_conn()
File "/usr/lib/python3/dist-packages/urllib3/connection.py", line 150, in _new_conn
self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
^[[B File "/usr/lib/python3/dist-packages/urllib3/connectionpool.py", line 639, in urlopen
^[[B^[[A^[[A _stacktrace=sys.exc_info()[2])
File "/usr/lib/python3/dist-packages/urllib3/util/retry.py", line 398, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "BandsInTown_Scraper_SF.py", line 39, in <module>
res = requests.get(url)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/ubuntu/.local/lib/python3.6/site-packages/requests/adapters.py", line 516, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
Finally, here's my currently monthly AWS usage, which doesn't show any memory quota being exceed.

This error message...
restore_signals, start_new_session, preexec_fn)
OSError: [Errno 12] Cannot allocate memory
...implies that the operating system was unable to allocate memory to initiate/spawn a new session.
Additionally, this error message...
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='www.bandsintown.com', port=443): Max retries exceeded with url: /en/c/san-francisco-ca?page=6 (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f90945757f0>: Failed to establish a new connection: [Errno -3] Temporary failure in name resolution',))
...implies that your program have successfully iterated till Page 5 and while on Page 6 you see this error.
I don't see any issues in your code block as such. I have taken your code, made some minor adjustments and here is the execution result:
Code Block:
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
base_url = 'https://www.bandsintown.com/en/c/san-francisco-ca?page='
for i in range(1,10):
#cycle through pages in range
driver.get(base_url + str(i))
pageURL = base_url + str(i)
print(pageURL)
Console Output:
https://www.bandsintown.com/en/c/san-francisco-ca?page=1
https://www.bandsintown.com/en/c/san-francisco-ca?page=2
https://www.bandsintown.com/en/c/san-francisco-ca?page=3
https://www.bandsintown.com/en/c/san-francisco-ca?page=4
https://www.bandsintown.com/en/c/san-francisco-ca?page=5
https://www.bandsintown.com/en/c/san-francisco-ca?page=6
https://www.bandsintown.com/en/c/san-francisco-ca?page=7
https://www.bandsintown.com/en/c/san-francisco-ca?page=8
https://www.bandsintown.com/en/c/san-francisco-ca?page=9
Deep dive
This error is coming from subprocess.py:
self.pid = _posixsubprocess.fork_exec(
args, executable_list,
close_fds, tuple(sorted(map(int, fds_to_keep))),
cwd, env_list,
p2cread, p2cwrite, c2pread, c2pwrite,
errread, errwrite,
errpipe_read, errpipe_write,
restore_signals, start_new_session, preexec_fn)
However, as per the discussion in OSError: [Errno 12] Cannot allocate memory this error OSError: [Errno 12] Cannot allocate memory is related to RAM / SWAP.
Swap Space
Swap Space is the memory space in the system hard drive that has been designated as a place for the os to temporarily store data which it can no longer hold with in the RAM. This gives you the ability to increase the amount of data your program can keep in its working memory. The swap space on the hard drive will be used primarily when there is no longer sufficient space in RAM to hold in-use application data. However, the information written to I/O will be significantly slower than information kept in RAM, but the operating system will prefer to keep running application data in memory and use swap space for the older data. Deploying swap space as a fall back for when your system’s RAM is depleted is a safety measure against out-of-memory issues on systems with non-SSD storage available.
System Check
To check if the system already has some swap space available, you need to execute the following command:
$ sudo swapon --show
If you don’t get any output, that means your system does not have swap space available currently. You can also verify that there is no active swap using the free utility as follows:
$ free -h
If there is no active swap in the system you will see an output as:
Output
total used free shared buff/cache available
Mem: 488M 36M 104M 652K 348M 426M
Swap: 0B 0B 0B
Creating Swap File
In these cases you need to allocate space for swap to use as a separate partition devoted to the task and you can create a swap file that resides on an existing partition. To create a 1 Gigabyte file you need to execute the following command:
$ sudo fallocate -l 1G /swapfile
You can verify that the correct amount of space was reserved by executing the following command:
$ ls -lh /swapfile
#Output
$ -rw-r--r-- 1 root root 1.0G Mar 08 10:30 /swapfile
This confirms the swap file has been created with the correct amount of space set aside.
Enabling the Swap Space
Once the correct size file is available we need to actually turn this into swap space. Now you need to lock down the permissions of the file so that only the users with specific privileges can read the contents. This prevents unintended users from being able to access the file, which would have significant security implications. So you need to follow the steps below:
Make the file only accessible to specific user e.g. root by executing the following command:
$ sudo chmod 600 /swapfile
Verify the permissions change by executing the following command:
$ ls -lh /swapfile
#Output
-rw------- 1 root root 1.0G Apr 25 11:14 /swapfile
This confirms only the root user has the read and write flags enabled.
Now you need to mark the file as swap space by executing the following command:
$ sudo mkswap /swapfile
#Sample Output
Setting up swapspace version 1, size = 1024 MiB (1073737728 bytes)
no label, UUID=6e965805-2ab9-450f-aed6-577e74089dbf
Next you need to enable the swap file, allowing the system to start utilizing it executing the following command:
$ sudo swapon /swapfile
You can verify that the swap is available by executing the following command:
$ sudo swapon --show
#Sample Output
NAME TYPE SIZE USED PRIO
/swapfile file 1024M 0B -1
Finally check the output of the free utility again to validate the settings by executing the following command:
$ free -h
#Sample Output
total used free shared buff/cache available
Mem: 488M 37M 96M 652K 354M 425M
Swap: 1.0G 0B 1.0G
Conclusion
Once the Swap Space has been set up successfully the underlying operating system will begin to use it as necessary.

Probably what has happened is that Chromium browser is updated, now takes more memory (or perhaps leaks memory worse..you don't say how many urls it gets before dying)
As a work around, launch a larger instance size. Do don't say what instance size you are using but if you have a t3.micro try a t3.medium instead.
There is an easy to understand chart here https://www.ec2instances.info/?region=eu-west-1
If you have launched an instance and want to resize it without rebuilding from scratch then use the console to take it to state stopped, alter the size and start again

Related

Airflow scheduler starts up with exception when parallelism is set to a large number

I am new to Airflow and I am trying to use airflow to build a data pipeline, but it keeps getting some exceptions. My airflow.cfg look like this:
executor = LocalExecutor
sql_alchemy_conn = postgresql+psycopg2://airflow:airflow#localhost/airflow
sql_alchemy_pool_size = 5
parallelism = 96
dag_concurrency = 96
worker_concurrency = 96
max_threads = 96
broker_url = postgresql+psycopg2://airflow:airflow#localhost/airflow
result_backend = postgresql+psycopg2://airflow:airflow#localhost/airflow
When I started up airflow webserver -p 8080 in one terminal and then airflow scheduler in another terminal, the scheduler run will have the following execption(It failed when I set the parallelism number greater some amount, it works fine otherwise, this may be computer-specific but at least we know that it is resulted by the parallelism). I have tried run 1000 python processes on my computer and it worked fine, I have configured Postgres to allow maximum 500 database connections but it is still giving me the errors.
[2019-11-20 12:15:00,820] {dag_processing.py:556} INFO - Launched DagFileProcessorManager with pid: 85050
Process QueuedLocalWorker-18:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 811, in _callmethod
conn = self._tls.connection
AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/Users/edward/.local/share/virtualenvs/avat-utils-JpGzQGRW/lib/python3.7/site-packages/airflow/executors/local_executor.py", line 111, in run
key, command = self.task_queue.get()
File "<string>", line 2, in get
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 815, in _callmethod
self._connect()
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/managers.py", line 802, in _connect
conn = self._Client(self._token.address, authkey=self._authkey)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 492, in Client
c = SocketClient(address)
File "/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 619, in SocketClient
s.connect(address)
ConnectionRefusedError: [Errno 61] Connection refused
Thanks
Updated: I tried run in Pycharm, and it worked fine in Pycharm but sometimes failed in the terminal and sometimes it's not
I had the same issue. Turns out I had set max_threads=10 in airflow.cfg in combination with LocalExecutor. Switching max_threads=2 solved the issue.
Found out few days ago, Airflow actually starts up all the parallel process when starting up, I was thinking max_sth and parallelism as the capacity but it is the number of processes it will run when start up. So it looks like this issue is caused by the insufficient resources of the computer.

Selenium urllib3.exceptions.MaxRetryError: [duplicate]

I have one question:I want to test "select" and "input".can I write it like the code below:
original code:
12 class Sinaselecttest(unittest.TestCase):
13
14 def setUp(self):
15 binary = FirefoxBinary('/usr/local/firefox/firefox')
16 self.driver = webdriver.Firefox(firefox_binary=binary)
17
18 def test_select_in_sina(self):
19 driver = self.driver
20 driver.get("https://www.sina.com.cn/")
21 try:
22 WebDriverWait(driver,30).until(
23 ec.visibility_of_element_located((By.XPATH,"/html/body/div[9]/div/div[1]/form/div[3]/input"))
24 )
25 finally:
26 driver.quit()
# #测试select功能
27 select=Select(driver.find_element_by_xpath("//*[#id='slt_01']")).select_by_value("微博")
28 element=driver.find_element_by_xpath("/html/body/div[9]/div/div[1]/form/div[3]/input")
29 element.send_keys("杨幂")
30 driver.find_element_by_xpath("/html/body/div[9]/div/div[1]/form/input").click()
31 driver.implicitly_wait(5)
32 def tearDown(self):
33 self.driver.close()
I want to test Selenium "select" function.so I choose sina website to select one option and input text in textarea.then search it .but when I run this test,it has error:
Traceback (most recent call last):
File "test_sina_select.py", line 32, in tearDown
self.driver.close()
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 688, in close
self.execute(Command.CLOSE)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/webdriver.py", line 319, in execute
response = self.command_executor.execute(driver_command, params)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 376, in execute
return self._request(command_info[0], url, body=data)
File "/usr/lib/python2.7/site-packages/selenium/webdriver/remote/remote_connection.py", line 399, in _request
resp = self._conn.request(method, url, body=body, headers=headers)
File "/usr/lib/python2.7/site-packages/urllib3/request.py", line 68, in request
**urlopen_kw)
File "/usr/lib/python2.7/site-packages/urllib3/request.py", line 81, in request_encode_url
return self.urlopen(method, url, **urlopen_kw)
File "/usr/lib/python2.7/site-packages/urllib3/poolmanager.py", line 247, in urlopen
response = conn.urlopen(method, u.request_uri, **kw)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 617, in urlopen
release_conn=release_conn, **response_kw)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 617, in urlopen
release_conn=release_conn, **response_kw)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 617, in urlopen
release_conn=release_conn, **response_kw)
File "/usr/lib/python2.7/site-packages/urllib3/connectionpool.py", line 597, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/lib/python2.7/site-packages/urllib3/util/retry.py", line 271, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=51379): Max retries exceeded with url: /session/2e64d2a1-3c7f-4221-96fe-9d0b1c102195/window (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused')))
----------------------------------------------------------------------
Ran 1 test in 72.106s
who can tell me why?thanks
This error message...
MaxRetryError: HTTPConnectionPool(host='127.0.0.1', port=51379): Max retries exceeded with url: /session/2e64d2a1-3c7f-4221-96fe-9d0b1c102195/window (Caused by ProtocolError('Connection aborted.', error(111, 'Connection refused')))
...implies that the call to self.driver.close() method failed raising MaxRetryError.
A couple of things:
First and foremost as per the discussion max-retries-exceeded exceptions are confusing the traceback is somewhat misleading. Requests wraps the exception for the users convenience. The original exception is part of the message displayed.
Requests never retries (it sets the retries=0 for urllib3's HTTPConnectionPool), so the error would have been much more canonical without the MaxRetryError and HTTPConnectionPool keywords. So an ideal Traceback would have been:
ConnectionError(<class 'socket.error'>: [Errno 1111] Connection refused)
But again #sigmavirus24 in his comment mentioned ...wrapping these exceptions make for a great API but a poor debugging experience...
Moving forward the plan was to traverse as far downwards as possible to the lowest level exception and use that instead.
Finally this issue was fixed by rewording some exceptions which has nothing to do with the actual connection refused error.
Solution
Even before self.driver.close() within tearDown(self) is invoked, the try{} block within test_select_in_sina(self) includes finally{} where you have invoked driver.quit()which is used to call the /shutdown endpoint and subsequently the web driver & the client instances are destroyed completely closing all the pages/tabs/windows. Hence no more connection exists.
You can find a couple of relevant detailed discussion in:
PhantomJS web driver stays in memory
Selenium : How to stop geckodriver process impacting PC memory, without calling
driver.quit()?
In such a situation when you invoke self.driver.close() the python client is unable to locate any active connection to initiate a clousure. Hence you see the error.
So a simple solution would be to remove the line driver.quit() i.e. remove the finally block.
tl; dr
As per the Release Notes of Selenium 3.14.1:
* Fix ability to set timeout for urllib3 (#6286)
The Merge is: repair urllib3 can't set timeout!
Conclusion
Once you upgrade to Selenium 3.14.1 you will be able to set the timeout and see canonical Tracebacks and would be able to take required action.
References
A couple of relevent references:
Adding max_retries as an argument
Removed the bundled charade and urllib3.
Third party libraries committed verbatim
Just had the same problem. The solution was to change the owner of the folder with a script recursively. In my case the folder had root:root owner:group and I needed to change it to ubuntu:ubuntu.
Solution: sudo chown -R ubuntu:ubuntu /path-to-your-folder
Use Try and catch block to find exceptions
try:
r = requests.get(url)
except requests.exceptions.Timeout:
#Message
except requests.exceptions.TooManyRedirects:
#Message
except requests.exceptions.RequestException as e:
#Message
raise SystemExit(e)

`pipenv install` timing out on RPi

I'm trying to bring up [pipenv][1] on a Raspberry Pi Zero W. The symptom I'm seeing is that pexpect times out when trying to create a tutorial project.
Admittedly, the RPi is a small machine, but I was monitoring memory usage and swap space during the process, and it wasn't running out of memory or swap.
Any idea what it was trying to do? Or how I should debug this? Here's the stack trace:
pi#blue-server:~/testdir $ pipenv install requests
Creating a virtualenv for this project…
Using /usr/bin/python3 (3.5.3) to create virtualenv…
Traceback (most recent call last):
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/pexpect/expect.py", line 109, in expect_loop
return self.timeout()
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/pexpect/expect.py", line 82, in timeout
raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: <pexpect.popen_spawn.PopenSpawn object at 0xb555c950>
searcher: searcher_re:
0: EOF
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/pi/.local/bin/pipenv", line 11, in <module>
sys.exit(cli())
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/click/core.py", line 1066, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/cli.py", line 478, in uninstall
keep_outdated=keep_outdated,
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/core.py", line 2077, in do_uninstall
ensure_project(three=three, python=python)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/core.py", line 620, in ensure_project
three=three, python=python, site_packages=site_packages
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/core.py", line 569, in ensure_virtualenv
do_create_virtualenv(python=python, site_packages=site_packages)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/core.py", line 936, in do_create_virtualenv
click.echo(crayons.blue(c.out), err=True)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/delegator.py", line 99, in out
self.__out = self._pexpect_out
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/delegator.py", line 87, in _pexpect_out
result += self.subprocess.read()
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/pexpect/spawnbase.py", line 441, in read
self.expect(self.delimiter)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/pexpect/spawnbase.py", line 341, in expect
timeout, searchwindowsize, async_)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/pexpect/spawnbase.py", line 369, in expect_list
return exp.expect_loop(timeout)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/pexpect/expect.py", line 119, in expect_loop
return self.timeout(e)
File "/home/pi/.local/lib/python3.5/site-packages/pipenv/vendor/pexpect/expect.py", line 82, in timeout
raise TIMEOUT(msg)
pexpect.exceptions.TIMEOUT: <pexpect.popen_spawn.PopenSpawn object at 0xb553a710>
searcher: searcher_re:
0: EOF
<pexpect.popen_spawn.PopenSpawn object at 0xb553a710>
searcher: searcher_re:
0: EOF
Here's the environment info:
pi#blue-server:~/foo $ uname -a
Linux blue-server 4.14.34+ #1110 Mon Apr 16 14:51:42 BST 2018 armv6l GNU/Linux
pi#blue-server:~/foo $ lsb_release -a
No LSB modules are available.
Distributor ID: Raspbian
Description: Raspbian GNU/Linux 9.4 (stretch)
Release: 9.4
Codename: stretch
pi#blue-server:~/foo $ cat /etc/os-release
PRETTY_NAME="Raspbian GNU/Linux 9 (stretch)"
NAME="Raspbian GNU/Linux"
VERSION_ID="9"
VERSION="9 (stretch)"
ID=raspbian
ID_LIKE=debian
HOME_URL="http://www.raspbian.org/"
SUPPORT_URL="http://www.raspbian.org/RaspbianForums"
BUG_REPORT_URL="http://www.raspbian.org/RaspbianBugs"
pi#blue-server:~/foo $ free -m
total used free shared buff/cache available
Mem: 433 26 260 3 145 353
Swap: 99 0 99
additional info
I noticed it was timing out inside of a subprocess call. Using pdb, I traced it down to the command:
<Command ['/usr/bin/python3.5', '-m', 'pipenv.pew', 'new', 'foo-su43ObVR', '-d', '-p', '/usr/bin/python3.5']>
I tried replicating that call from the command line and it completed without error:
pi#blue-server:~/foo $ /usr/bin/python3.5 -m pipenv.pew new 'asdf' -d -p /usr/bin/python3.5
Already using interpreter /usr/bin/python3.5
Using base prefix '/usr'
New python executable in /home/pi/.local/share/virtualenvs/asdf/bin/python3.5
Also creating executable in /home/pi/.local/share/virtualenvs/asdf/bin/python
Installing setuptools, pip, wheel...done.
That seems hopeful, but I still don't know what causes the timeout.
TL;DR: the fix is to extend the timeout
Solved it. Because the RPi Zero is slow, it was timing out. Taking a clue from the discussion on github, I noticed that its now possible to extend the default timeout with environment variables. So this solved the problem:
pi#blue-server:~/testdir $ export PIPENV_TIMEOUT=500
pi#blue-server:~/testdir $ pipenv install requests

apache spark "Py4JError: Answer from Java side is empty"

I get this error every time...
I use sparkling water...
My conf-file:
***"spark.driver.memory 65g
spark.python.worker.memory 65g
spark.master local[*]"***
The amount of data is about 5 Gb.
There is no another information about this error...
Does anybody know why it happens? Thank you!
***"ERROR:py4j.java_gateway:Error while sending or receiving.
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 746, in send_command
raise Py4JError("Answer from Java side is empty")
Py4JError: Answer from Java side is empty
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server
Traceback (most recent call last):
File "/data/analytics/Spark1.6.1/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 690, in start
self.socket.connect((self.address, self.port))
File "/usr/local/anaconda/lib/python2.7/socket.py", line 228, in meth
return getattr(self._sock,name)(*args)
error: [Errno 111] Connection refused"***
Have you tried setting spark.executor.memory and spark.driver.memory in your Spark configuration file?
See https://stackoverflow.com/a/22742982/5453184 for more info.
Usually, you'll see this error when the Java process get silently killed by the OOM Killer.
The OOM Killer (Out of Memory Killer) is a Linux process that kicks in when the system becomes critically low on memory. It selects a process based on its "badness" score and kills it to reclaim memory.
Read more on OOM Killer here.
Increasing spark.executor.memory and/or spark.driver.memory values will only make things worse in this case, i.e. you may want to do the opposite!
Other options would be to:
increase the number of partitions if you're working with very big data sources;
increase the number of worker nodes;
add more physical memory to worker/driver nodes;
Or, if you're running your driver/workers using docker:
increase docker memory limit;
set --oom-kill-disable on your containers, but make sure you understand possible consequences!
Read more on --oom-kill-disable and other docker memory settings here.
Another point to note if you are on wsl2 using pyspark. Ensure that your wsl2 config file has an increased memory.
# Settings apply across all Linux distros running on WSL 2
[wsl2]
# Limits VM memory to use no more than 4 GB, this can be set as whole numbers using GB or MB
memory=12GB # This was originally set to 3gb which caused me to fail since spark.executor.memory and spark.driver.memory was only able to MAX of 3gb regardless of how high i set it.
# Sets the VM to use eight virtual processors
processors=8
for reference. your .wslconfig config file should be located in C:\Users\USERNAME

urllib.urlopen() gives socket error: Name or service not known on python2.7

I am just starting to use and learn Python, so this may seem vary naive to ask.
On my Linux system if i try to get a webpage using urllib.urlopen() I get an error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/urllib.py", line 86, in urlopen
return opener.open(url)
File "/usr/lib/python2.7/urllib.py", line 207, in open
return getattr(self, name)(url)
File "/usr/lib/python2.7/urllib.py", line 344, in open_http
h.endheaders(data)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 776, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 757, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
IOError: [Errno socket error] [Errno -2] Name or service not known
>>>
If I try to do the same in the Python 2.7 installed in my Windows 7 system, it works fine.
Since i am a novice, its difficult for me to diagnose the problem. I tried to research it but still haven't got any answers.
So my questions are:
What is different in the windows system that urlopen() works there but not on Linux.
What needs to be done to ensure that urlopen() works on the Linux system. Its necessary for me that it works since the the program I am developing has some bash command calls and the program depends extensively on the proper working of urllib.

Resources