set executable PATH in Jupyter Notebook on google cloud cluster Python3 - python-3.x

I opened jupyter notebook on my google cloud cluster with these steps: https://cloud.google.com/dataproc/docs/tutorials/jupyter-notebook
Now I get an error on this piece of code:
import selenium
from contextlib import closing
from selenium.webdriver import PhantomJS
with closing(PhantomJS()) as browser:
#some further code
I get the following error message:
WebDriverException: Message: 'phantomjs' executable needs to be in PATH.
Now in my own environment when I got this error, I fixed it by adding the path to my phantomjs.exe in my system variables.
But now while I am on the google cloud cluster environment, I am looking for another way to add the phantomjs.exe path. Any other solution would be appreciated aswell.

I have no experience with Selenium or PhantomJS. However since Dataproc runs on Debian 8 Jessie and not Windows, you probably want to run sudo apt-get install phantomjs instead of using an exe. You could either install it manually after SSHing or in an initialization action.

Related

Why are these import errors occurring when running python scripts from cmd or windows task scheduler, but not anaconda?

I am encoutering import errors, but only when running my python scripts from cmd or windows task scheduler (effectively the same issue I assume). I have researched answers already and attempted various solutions (detailed below), but nothing has worked yet. I need to understand the problem in any case so that I can manage anything like it in the future.
Here is the issue:
Windows 10. Anaconda Python 3.9.7. Virtual enviromnent.
I have a script that works fine if I open an anaconda prompt, activate the virtual environment and run it.
However, this is where the fun starts. If I try to run the script from the non-anaconda cmd prompt deploying the commands: "C:\Users\user\anaconda3\envs\venv\python.exe" "C:\Users\user\scripts\script.py" if get the following error:
ImportError: DLL load failed while importing etree: The specified module could not be found.
Traceback includes:
"C:\Users\user\anaconda3\envs\venv\lib\site-packages\lxml\html\__init__.py", line 53, in <module>
from ..import etree
This is not as simple as one specific module not being installed, because of course running the script from within the anaconda prompt and the virtual environment works. Similar also happens when I run other scripts. Other errors I have seen include, for example:
ImportError: DLL load failed while importing _imaging: The specified module could not be found.
Traceback includes:
"C:\Users\user\anaconda3\envs\venv\lib\site-packages\PIL\Image.py", line 114, in <module>
from . import _imaging as core
Also, I think this may be somehow related. Importing numpy (1.22.3) from within the python interpreter in the virtual environment works fine, but when I try to run a test script that imports numpy it fails both from anaconda and the cmd with the following error:
ImportError: cannot import name SystemRandom
The oveall issue was noted originally when trying to run various scripts from Windows Task Scheduler with the path to python "C:\Users\user\anaconda3\envs\venv\python.exe" entered as the Program/script and the script "script.py" entered as an argument. The above errors were produced, then reproduced by running the scripts from a non-anaconda cmd.
I am looking to understand what is happening here and for a solution that can get the scripts running from the virtual enviroment from Windows Task Scheduler effectively.
Update:
I have uninstalled and reinstalled numpy (and pandas) using conda. This has left the venv with numpy==1.20.3 (and pandas=1.4.2). On attempting to re-run one of the scripts, it runs fine from within the venv in anaconda, but produces the following error when attempting to run from cmd or from within Windows Task Scheduler as above:
ImportError: Unable to import required dependencies:
numpy:
IMPORTANT: PLEASE READ THIS FOR ADVICE ON HOW TO SOLVE THIS ISSUE!
Importing the numpy C-extensions faled. This error can happen for many reasons, often due to issues with your setup or how NumPy was installed.
We have complied some common reasons and troubleshooting tips at:
https://numpy.org/devdocs/user/troubleshooting-importerror.html
Please note and check the following:
* The Python version is: Python3.9 from "C:\Users\user\anaconda3\envs\venv\python.exe"
* The NumPy version is "1.20.3"
and make sure that they are the versions you expect.
Please carefull study the documentation linked above for further help.
Original error was: DLL load failed while importing _multiarray_umath: The specified module could not be found.
I have looked into the solutions suggested, but am still completely at a loss, especially as to why the script runs from the venv in one place, but NOT the other.

ModuleNotFoundError when accessing python Flask within jupyter notebook in a virtual env

I have created an virtual env gcloudenv on my nividia nano running ubuntu. I was able to successfully install flask and required libraries and able to deploy my appengine into GCP from this virtual env. All my work is in python and I was using nano as my editor to get my code up and running. No issues so far.
my virtual env gcloudenv already has all the required packages for flask, jinga etc and I can see them when I run pip freeze.
Then I tried to work on Jupyter notebook as my code was getting little complicated and I didn't want to write full code and then run.
I already had jupyter notebook installed before creating the virtual env. I also installed jupyter within in virtual env as well.
So I followed the instruction to create a new kernel by running the following command:-
(gcloudenv) sunny#my-nano:~gcloudenv/MyApp/mainfolder$ pip install ipykernel
(gcloudenv) sunny#my-nano:~gcloudenv/MyApp/mainfolder$ ipython kernel install --user --
name=gcloudenv
Now, I ran the notebook as:
(gcloudenv) sunny#my-nano:~gcloudenv/MyApp/mainfolder$
/home/gcloudenv/bin/jupyter notebook
When trying to import the flask I get the following error:
ModuleNotFoundError: No module named 'flask'
Note sure what is going on as I getting blanked out.
Add
!pip install flask
in the beginning of your Jupyter notebook.
Finally I managed to solve my problem. Thanks to a wonderful post https://jakevdp.github.io/blog/2017/12/05/installing-python-packages-from-jupyter/.
In essence, I had 2 problems:-
1. I did not have a jupyter-notebook within my virtual env. Originally I thought I had it installed but that was incorrect. so whenever I tried to launch one, it was picking the first jupyter notebook it could find in the path.
The good way to find out which one is it pointing to , is to run which command
(gcloudenv) sunny#my-nano:~/gcloudenv$ which jupyter-notebook
For me, that was at:
/home/sunny/archiconda3/bin/jupyter-notebook
I had in fact 3 copies of jupyter-notebook on my system. One was probably installed using sudo pip and therefore went into the root folder. Probably not a good thing to do.
So I installed a fresh jupyter-notebook with the following command:-
(gcloudenv) $ pip install jupyter notebook
2.Next is to check list of Jupyter kernels available by running the following from the jupyter notebook ( or from command line):
!jupyter kernelspec list (OR (gcloudenv) $jupyter kernelspec list
My jupyter notebook was not able to import flask libraries because it was pointing to a wrong kernel config outside of my virtualenv gcloudenv.
Available kernels:
gcloudenv /home/sunny/.local/share/jupyter/kernels/gcloudenv ( correct one)
python3 /home/sunny/gcloudenv/share/jupyter/kernels/python3
You can determine which python version it is picking by doing a 'more' on the file:-
(gcloudenv) $
/more/home/sunny/.local/share/jupyter/kernels/gcloudenv/kernel.json
Once I changed my kernel to point to python3 from within the notebook, it picked the correct path and all the relevant libraries I needed.
In summary when you hit the problem as mentioned above, do the following:-
check the path of the python ( whereis python or which python)
check if you are running the 'right' notebook. This is determined by the path and if you have sourced your virtualenv.
Install jupyter notebook using pip from within your virtualenv( do not use sudo)
Check the Jupyter kernel. This may be particularly relevant if you have a common jupyter notebook and you want to work with multiple virtualenv.

Unable to import 'scrapy_splash' pylint(import-error)

when trying to import Splash Request in VS Code, I get the following error message:
Unable to import 'scrapy_splash' pylint(import-error)
Do you know why this is the case? I have Splash up and running and the package is installed in my environment. I am using Python 3.7
Here is a screenshoot
Problem solved, I was in the wrong environment. You can change the environment in the command pallete (Ctrl+Shift+P) and enter Python: Select Interpreter

ModuleNotFoundError: No module named 'gather_keys_oauth2'

I am trying to create a Python Script to get my hands on my Fitbit data so that I can alternately integrate it with another API. I have been following the instruction on this website https://towardsdatascience.com/collect-your-own-fitbit-data-with-python-ff145fa10873
I have used pip to install Fitbit, Pandas, DateTime and also Oauth. To install OAuth I used the following:
pip install oauth -t fitbitAPI
It installed without any issue.
I put the following lines into my Python Script:
import fitbit
import gather_keys_oauth2 as Oauth2
import pandas as pd
import datetime
When I test the script I get the following error message:
Traceback (most recent call last):
File "fitbitAPI.py", line 2, in <module>
import gather_keys_oauth2 as Oauth2
ModuleNotFoundError: No module named 'gather_keys_oauth2'
I spent hours searching the web but have not been able to find anything that's been helpful. Any ideas? Is there another version or way that I need to install OAuth?
Here's how I was able to resolve the error:
Download the fitbit package as described (https://github.com/orcasgit/python-fitbit).
After you install fitbit as described, navigate to the newly created folder \Lib\site-packages\fitbit
Copy and paste the gather_keys_oauth2.py file into the fitbit folder
Then you can from fitbit import gather_keys_oauth2 as Oauth2
One clarification on Jacob Miller's solutions:
Copy and paste the gather_keys_oauth2.py file into the fitbit folder
copy gather_keys_oauth2.py into the Lib/site-packages folder if the above still results in the not found error.
And, I installed cherrypy (required by gather_keys_oauth2.py) using the Anaconda Navigator:
screen shot from Navigator
good luck.
You should be able to source the module from orcasgit. You can download the python script here and stick it in your directory.
I experienced a similar issue and solved by updating my environment and making sure I was using python3. I had activated the wrong environment with the conda activate environmentname command, and so the notebook was throwing this same error.
Install Anaconda-Navigator
Create a new environment with the Anaconda-Navigator tool
Make sure you are using Python 3.x in this new environment
Install cherrpy using the Anaconda-Navigator tool
When you start-up terminal and navigate to the directory of your app, you'll activate your environment with conda activate environmentname and then start up your notebook jupyter notebook. This error should disappear.
Note: there may be some additional packages you need to install. Keep using the anaconda navigator to continue installing the packages; the error messages coming off the notebook will guide you to install the right packages.

Unable to launch Airflow Webserver in fresh install

I'm trying to launch $ airflow websever (on newly installed instance) but the web server does not start (the browser states: This site can’t be reached).
In the terminal I got following error message
Running the Gunicorn Server with:
Workers: 4 sync
Host: 0.0.0.0:8080
Timeout: 120 Logfiles:
> Error: cannot import name 'models'
here are the steps I took, as per Airflow Quick Start :
just made an install of pip install apache-airflow in fresh virtualenv (py3.6)
set the AIRFLOW_HOME env variable: export AIRFLOW_HOME=$(pwd)
Initialize Airflow DB.. airflow initdb
using PyCharm, macOS Mojave 10.14.1
Thanks a lot for taking a look at it.
UPDATE: a simple statement from airflow import models is throwing an error ImportError: cannot import name 'models'. Howevre, when I try in Python Console, the module seems to import successfully
RESOLVED my own python file was named airflow.py that was causing the name clash.. PY interpreter wa looking models module in my own airflow.py instead from airflow package's...
restrain from naming your own PY module / package as airflow.
hope it helps others

Resources