Error while running PySpark DataProc Job due to python version - python-3.x

I create a dataproc cluster using the following command
gcloud dataproc clusters create datascience \
--initialization-actions \
gs://dataproc-initialization-actions/jupyter/jupyter.sh \
However when I submit my PySpark Job I got the following error
Exception: Python in worker has different version 3.4 than that in driver 3.7, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
Any Thoughts?

This is due to a difference in the python versions between the master and the worker. By default, the jupyter image installs the latest version of miniconda, which uses the python3.7. However, the worker is still using the default python3.6.
Solution:
- specify the miniconda version when creating the master node i.e to install python3.6 in the master node
gcloud dataproc clusters create example-cluster --metadata=MINICONDA_VERSION=4.3.30
Note:
may need updating to have a more sustainable solution to managing the environment

UPDATE THE SPARK ENVIRONMENT TO USE PYTHON 3.7:
Open a new terminal and type the following command: export PYSPARK_PYTHON=python3.7 This will ensure that the worker nodes use Python 3.7 (same as the Driver) and not the default Python 3.4
DEPENDING ON VERSIONS OF PYTHON YOU HAVE, YOU MAY HAVE TO DO SOME INSTALL/UPDATE ANACONDA:
(To install see: https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart)
Make sure you have anaconda 4.1.0 or higher. Open a new terminal and check your conda version by typing into a new terminal:
conda --version
checking conda version
if you are below anaconda 4.1.0, type conda update conda
Next we check to see if we have the library nb_conda_kernels by typing
conda list
Checking if we have nb_conda_kernels
If you don’t see nb_conda_kernels type
conda install nb_conda_kernels
Installing nb_conda_kernels
If you are using Python 2 and want a separate Python 3 environment please type the following
conda create -n py36 python=3.6 ipykernel
py35 is the name of the environment. You could literally name it anything you want.
Alternatively, If you are using Python 3 and want a separate Python 2 environment, you could type the following.
conda create -n py27 python=2.7 ipykernel
py27 is the name of the environment. It uses python 2.7.
Ensure the versions of python are installed successfully and close the terminal. Open a new terminal and type pyspark. You should see the new environments appearing.

We fixed it now -- thanks for the intermediate workaround #brotich. Check out the discussion in #300.
PR #306 keeps python at the same version as was already installed (3.6), and installs packages on all nodes to ensure that the master and worker python environments stay identical.
As a side effect, you can choose your python version by passing an argument to the conda init action to change the python version. E.g. --metadata 'CONDA_PACKAGES="python==3.5"'.
PR #311 pins miniconda to a particular version (currently 4.5.4), so we avoid issues like this again. You can use --metadata 'MINICONDA_VERSION=latest' to use the old behavior of always downloading the latest miniconda.

Related

Error while trying to run snakemake in a venv on a protected server

For a project I made a virtual environment (venv) using Python3. I installed all the necessary dependencies using a simple bash script (see picture below) after I activated my venv. (I verified the installed packages using: pip3 list and concluded that every dependency was installed succesfully.)
My project uses snakemake, so I ran this snakemake commando:
snakemake --snakefile Snakefile.py all
I get this error:
I know it has to do something with the venv, because without the venv snakemake runs perfectly. I have read the Snakemake installation documents and it says I have to install conda and make & activate a conda venv. But, I do not have the sudo privileges to download and install conda (I work on a protected server).
What is happening and does someone know a fix?
One possible reason could be the difference in Python versions. What version of Python does the pip3 prepare environment for?
As I can see from the picture provided, the invalid syntax may be because of the version of Python doesn't support f-strings.
Imagine the following two scenarios: when you run Snakemake manually, you use the latest Python3 (e.g. 3.9). But if the pip3 is configured for an older version (e.g. 3.5), you can configure a very different environment for Python3.5 that doesn't support f-strings.

Upgradation to Python 3.7

I have Ubuntu 19.04 OS and I needed python 3.6 version so I somehow managed to get python3.6 on my device without removing python3.7 but now I would like to revert back to using python3.7. Can anyone suggest how to do it?
If you've got multiple version of Python installed, you can choose which one to use as default in update-alternatives:
sudo update-alternatives --config python3
, then follow the prompt instructions.
Try using virtual environments, namely anaconda for these kinds of things. I am not aware of any other methods. Anaconda basically creates a virtual environment in which you can specify the version of all packages including python itself

Multiple versions of Anaconda & Python installation

I have three questions.
One, can I install multiple versions of Python on my machine. I have a 4 Gb RAM system.
Two, can I install multiple versions of Anaconda?
Three, what is the difference between jupyter notebook & jupyter lab?
Please help. I am a new user.
can I install multiple versions of Python on my machine?
Yes, and the conda package manager (which comes with the Anaconda distribution) will help with that. You can create a separate conda environment for each Python version you want to use by running:
conda create --name mypy36env python=3.6
conda activate mypy36env
For details on how to create and manage conda environments, take a look at the docs
can I install multiple versions of Anaconda?
You can but because of the answer above you don't need to and shouldn't. Instead of multiple Anaconda versions, just create multiple environments with the versions of packages you need.
You can create an environment with specific versions of Python and the Anaconda distribution with:
conda create -n anaconda201903 python=3.6 anaconda==2019.03
what is the difference between jupyter notebook & jupyter lab?
From the JupyterLab docs:
JupyterLab is the next-generation web-based user interface for Project Jupyter.
Basically JupyterLab is a new interface that allows you to create and run the same Jupyter notebooks you did in the past, but it also has much more functionality than the old interface.
For a bit more details, check this blog post.

downgrade python version from 3.8 to lower one in a given conda environment

In one existing conda environment, the python is 3.8. Is that possible to downgrade the python version for this specific environment from 3.8 to 3.6 or 3.7?
Check this,
Open your terminal and search for available versions using the following command.
conda search python
If the python version you are searching is available then use the command
conda install python=3.8 (0r 3.6 or 3.7 depending to your requirement)
This will change the python version in a specific environment.
Note: This command will overwrite the default python version.
I suggest you open a new conda environment using the following command.
conda create --name py38 python=3.8
//This lines will create a new environment named py38
Now you can work into this environment without interfering with the libraries of the other environment.
Hope this will help you.

choose python version in an anaconda environment

We have Anaconda installed on my cloudera cluster via parcels. We have python 2.7.13 available with the version of Anaconda. We wanted to have another version of python (3.6) across all nodes.
My challenge here is, when I followed the Conda documentation to create a new environment and install python 3.6 on that using "conda create -n py36 python=3.6 anaconda". For few nodes, I am getting python3.6.6 installed for few nodes, and python3.6.7 for few nodes and 3.6.1 for few.
I would like to know if there is a way to choose the version of python while installing 3.6 on a separate environment. Or am I doing something wrong? Please help me.
Thanks
Kancharlapalli
You can specify the patch as well
conda create -n py36 python=3.6.7 anaconda

Resources