Pyspark command not recognized (Ubuntu) - apache-spark

I have successfully installed pyspark using anaconda and configured paths in the .bashrc file.
Post typing pyspark command, it opens Jupyter-notebook in which python code is working properly. Like, print "Hello" etc.
But when I execute the Pyspark commands like collect(), take(5) etc, it gives an error that "Cannot run program '/usr/bin/Python-3.7.4". Permission denied.
It is referring wrong directory, as Python-3.7.4 is installed in the Anaconda directory.
Is there any configuration/step, I need to perform to resolve this issue?

Try to update PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON environment variables to the correct Python 3 distribution path

Related

I cannot run pyinstaller on my computer even though I have installed it

This is the problem right here, do you have any advice for that?
installed pip and pyinstaller, but still got this error message when I tried to convert my project into an .exe.
From Pyinstaller installation guide:
If you cannot use the pyinstaller command due to the scripts directory not being in PATH, you can instead invoke the PyInstaller module, by running python -m PyInstaller (pay attention to the module name, which is case sensitive). This form of invocation is also useful when you have PyInstaller installed in multiple python environments, and you cannot be sure from which installation the pyinstaller command will be ran.
So you may run it as e.g.:
python -m PyInstaller some_system.py
Or, as the issue seems that PATH Windows environment variable doesn't include Python's Script folder, it'd better to fix it. From the same guide:
If the command is not found, make sure the execution path includes the proper directory:
Windows: C:\PythonXY\Scripts where XY stands for the major and minor Python version number, for example C:\Python38\Scripts for Python 3.8)
To fix you may run where python to get exact location of Python on your machine (let's say it shows C:\Python38\). Then add to PATH env variable Scripts folder inside it (in this example it'd be C:\Python38\Scripts\)

What is the cause of "Bad Interpreter: No such file or directory"?

I have a Python virtual environment on my linux machine. It has been working fine for two weeks, but all of a sudden I woke up today, and while in the environment I can't execute any commands. For example, if I try to use pip list, or jupyter notebook, this is what I get (env is the name of my environment):
~/env/bin$ pip list
-bash: /home/ubuntu/env/bin/pip: /home/ubuntu/env/bin/python: bad interpreter: No such file or directory
The same thing happens with basically any other command, except Python. Typing python brings up the Python shell just fine. Interestingly it says Anaconda though, when I only used pip with this environment.
I've tried to find info on this but they all seem to be pertaining to running scripts.
Edit: Also want to mention that when I manually look in the environment bin, the packages I installed are all there in green, except Python is in red.
Thank you in advance.
You have a script /home/ubuntu/env/bin/pip and the script has shebang #!/home/ubuntu/env/bin/python but the file is either absent or is not executable.
Check if the file /home/ubuntu/env/bin/python exist. Check if it can be executed by the current user (just run it from the command line). If not — you need to find out a working executable (for example, it could be /home/ubuntu/env/bin/python3), edit the first line of /home/ubuntu/env/bin/pip to fix the shebang.

JupyterLab installation with pip3

I installed JupyterLab with
pip3 install jupyterlab --user
Yet, when trying I try to launch it (jupyter lab), I get the following error:
Error executing Jupyter command 'lab': [Errno 2] No such file or directory
The JupyterLab installation guide on github says that: "If installing using pip install --user, you must add the user-level bin directory to your PATH environment variable in order to launch jupyter lab"
But I don't what that means, I greatly appreciate any help. I am using Ubuntu 18.04
As said by the guide itself you need to add the user-level bin directory to your PATH environment variable, in order to do so you need at first spot which is the bin folder where Jupyter lab has been installed, and after that you can add that path with a simple command:
export PATH=$PATH:/path/to/your/jupyterlab/bin/directory
and it's done. You can check if you added it by running this other command:
echo $PATH
And you should see the content of PATH variable.
This method though will just add that variable for the current shell, meaning that when you close the terminal you lose the change in the variable. In order to make it permanent you need to edit another file which is ~/.bashrc.
One thing though, it's really important that you just add this line to the file:
PATH=$PATH:/path/to/your/jupyterlab/bin/directory
without changing all the rest of the file if you don't know what you are doing.
To give you a recap on what to do to make it permanent open a new shell and type:
gedit ~/.bashrc
This will open the file where you need to add the "export PATH...etc" command right at the end of the file in a new line. Then save the changes and reboot, from now on you should be able to open Jupyter lab directly from a shell with the command:
Jupyter lab

Can run pyspark.cmd but not pyspark from command prompt

I am trying to get pyspark setup for windows. I have java, python, Hadoop, and spark all setup and environmental variables I believe are setup as I've been instructed elsewhere. In fact, I am able to run this from the command prompt:
pyspark.cmd
And it will load up the pyspark interpreter. However, I should be able to run pyspark unqualified (without the .cmd), and python importing won't work otherwise. It does not matter whether I navigate directly to spark\bin or not, because I do have spark\bin added to the PATH already.
.cmd is listed in my PATHEXT variable, so I don't get why the pyspark command by itself doesn't work.
Thanks for any help.
While I still don't know exactly why, I think the issue somehow stemmed out of how I unzipped the spark tar file. Within the spark\bin folder, I was unable to run any .cmd programs without the .cmd extension included. But I could do that in basically any other folder. I redid the unzip and the problem no longer existed.

The SPARK_HOME env variable is set but Jupyter Notebook doesn't see it. (Windows)

I'm on Windows 10. I was trying to get Spark up and running in a Jupyter Notebook alongside Python 3.5. I installed a pre-built version of Spark and set the SPARK_HOME environmental variable. I installed findspark and run the code:
import findspark
findspark.init()
I receive a Value error:
ValueError: Couldn't find Spark, make sure SPARK_HOME env is set or Spark is in an expected location (e.g. from homebrew installation).
However the SPARK_HOME variable is set. Here is a screenshot that shows that the list of environmental variables on my system.
Has anyone encountered this issue or would know how to fix this? I only found an old discussion in which someone had set SPARK_HOME to the wrong folder but I don't think it's my case.
I had the same problem and wasted a lot of time. I found two solutions:
There are two solutions
copy downloaded spark folder in somewhere in C directory and give the link as below
import findspark
findspark.init('C:/spark')
use the function of findspark to find automatically the spark folder
import findspark
findspark.find()
The environmental variables get updated only after system reboot. It works after restarting your system.
I had same problem and had it solved by installing "vagrant" and "virtual box". (Note, though I use Mac OS and Python 2.7.11)
Take a look at this tutorial, which is for the Harvard CS109 course :
https://github.com/cs109/2015lab8/blob/master/installing_vagrant.pdf
After "vagrant reload" on the terminal , I am able to run my codes without errors.
NOTE the difference between the result of command "os.getcwd" shown in the attached images.
I had the same problem when installing spark using pip install pyspark findspark in a conda environment.
The solution was to do this:
export /Users/pete/miniconda3/envs/cenv3/lib/python3.6/site-packages/pyspark/
jupyter notebook
You'll have to substitute the name of your conda environment for cenv3 in the command above.
Restarting the system after setting up the environmental variables worked for me.
i have same problem, i solved it by closing cmd then open again. i forget that after editing env variable on windows that should restart cmd..
I got the same error. Initially, I had stored my Spark folder in the Documents directory. Later, when I moved it to the Desktop, it suddenly started recognizing all the system variables and it ran findspark.init() without any error.
Try it out once.
This error may occur, if you don't set the environment variables in .bashrc file. Set your python environment variable as follows:
export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.10.8.1-src.zip:$PYTHONPATH
export PATH=$SPARK_HOME/bin:$SPARK_HOME/python:$PATH
the simplest way i found to use spark with jupyter notebook is
1- download spark
2- unzip to desired location
3- open jupyter notebook in usual way nothing special
4- now run the below code
import findspark
findspark.init("location of spark folder ")
# in my case it is like
import findspark
findspark.init("C:\\Users\\raj24\\OneDrive\\Desktop\\spark-3.0.1-bin-hadoop2.7")

Resources