SPARK/pyspark - not running hive.HiveSessionStateBuilder - apache-spark

I have a problem with running pyspark from Windows command line. I don't know what is causing the issue.
Spark-shell is running normally.
JAVA_HOME is set to C:Java, where I have installed JDK java version "1.8.0_161"
SPARK_HOME is set to C:\Users\heyde\Anaconda3\Lib\site-packages\pyspark, where I have installed it through pip in Anaconda
Also I have added C:\Users\heyde\Anaconda3\Lib\site-packages\pyspark\bin and C:Java\bin to system PATH.
Console
Spark-shell

Related

Just updated Ubuntu to 22.04, now I can't open Jupyter Notebook

When I try to launch Jupyter Notebook, the browser launches and I get the following error:
Access to the file was denied
The file at /home/benjamin/.local/share/jupyter/runtime/nbserver-11758-open.html is not readable.
It may have been removed, moved, or file permissions may be preventing access.
I tried running
jupyter lab clean --all
pip3 install jupyterlab --force-reinstall
as per the suggestion from here: Jupyter Notebook: Access to the file was denied. The commands ran, but I still get the Access to the file was denied error. Also, on the reinstall command it spits this out:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spyder 4.2.5 requires pyqt5<5.13, which is not installed.
spyder 4.2.5 requires pyqtwebengine<5.13, which is not installed.
conda-repo-cli 1.0.4 requires pathlib, which is not installed.
anaconda-project 0.9.1 requires ruamel-yaml, which is not installed.
spyder 4.2.5 requires jedi==0.17.2, but you have jedi 0.18.1 which is incompatible.
spyder 4.2.5 requires parso==0.7.0, but you have parso 0.8.3 which is incompatible.
sphinx 4.0.1 requires Jinja2<3.0,>=2.3, but you have jinja2 3.1.1 which is incompatible.
sphinx 4.0.1 requires MarkupSafe<2.0, but you have markupsafe 2.1.1 which is incompatible.
python-language-server 0.36.2 requires jedi<0.18.0,>=0.17.2, but you have jedi 0.18.1 which is incompatible.
fermipy 1.0.1+5.g5a57 requires astropy<4, but you have astropy 4.2.1 which is incompatible.
which may or may not be part of the problem.
Cross posted here: https://discourse.jupyter.org/t/after-updating-to-ubuntu-22-04-i-am-no-longer-able-to-access-jupyter-notebook/13991
here: https://askubuntu.com/questions/1404330/after-updating-to-ubuntu-22-04-i-am-no-longer-able-to-access-jupyter-notebook
and on reddit: https://www.reddit.com/r/learnpython/comments/uaipzo/i_just_updated_my_machine_to_ubuntu_2204_now_i/
UPDATE: I am able to access the notebook now by using the URL printed to the console. (just copy and paste it into the Firefox browser)
I would still like to figure out how to get it to open with just the 'jupyter notebook" command the way it used to work before the update, but for now this is a useful workaround.
did you try setting
c.NotebookApp.use_redirect_file = False
In the jupyter_notebook_configuration.py file?
If you tried, did you remove the '#' at the start of the line?
I had the same problem with Ubuntu 22.04 and this fixed it.
I first created the file with :
jupyter notebook --generate-config
then I uncommented and change the line c.NotebookApp.use_redirect_file as you said
and now it works !
I faced the same issue using Firefox from snap on Ubuntu 22.04.
I noticed that setting c.NotebookApp.use_redirect_file = False in generated jupyter_notebook_config.py works when launching jupyter notebook, but fails for jupyter-lab (error you pasted). I found that doing something similar to fix the problem for jupyter-lab:
jupyter server --generate-config
Then edit the generated ./jupyter/jupyter_server_config.py and set c.ServerApp.use_redirect_file = False. Now running jupter-lab works in snap firefox too. Maybe you can try this way and just run via jupyter-lab if it works for you too.
Btw. none of this is needed if you just use Chrome, somehow the problem only occurs for firefox from snap on Ubuntu 22.04.
Otherwise you could check out jupyter notebook --debug and/or reinstalling JupyterLab.
I use epiphany web browser instead of Firefox as a default web browser and everythings work fine
The problem I found is that the launcher is not using a path to Jupyter Notebook that works. From terminal I get:
$ whereis jupyter-notebook
jupyter-notebook: /home/brombo/miniconda3/bin/jupyter-notebook /home/brombo/.local/bin/jupyter-notebook
The command /home/brombo/miniconda3/bin/jupyter-notebook will start notebook but /home/brombo/.local/bin/jupyter-notebook will not. If you use the first in the exec line of .local/share/applications/jupyter-notebook.desktop everything works -
[Desktop Entry]
Name=Jupyter Notebook
Comment=Run Jupyter Notebook
Exec=/home/brombo/miniconda3/bin/jupyter-notebook %f
Terminal=true
Type=Application
Icon=notebook
StartupNotify=true
MimeType=application/x-ipynb+json;
Categories=Development;Education;
Keywords=python;

pyspark to open directly in jupyter-lab

Installed below apps on windows 10
Install apache spark 3.1.3
Installed Hadoop 3.3.2
Installed Jupyter-lab
When i execute pyspark or spark-shell from command line. I get the below output which mean apache spark got installed/configured correctly
When in execute pyspark from command line, i want jupyter-lab interface to be opened automatically.
When i set the below environment variable jupyter notebook opens automatically
PYSPARK_DRIVER = C:\Users\xxxx\AppData\Local\Programs\Python\Python39\Scripts\jupyter.exe
PYSPARK_DRIVER_PYTHON_OPTS = notebook
I tried below setting, but no luck
PYSPARK_DRIVER = C:\Users\xxxx\AppData\Local\Programs\Python\Python39\Scripts\jupyter-lab.exe
PYSPARK_DRIVER_PYTHON_OPTS = lab
What environment variables, i need to set in order to open jupyter-lab directly. How to specify the kernel in jupyter kernels ?

Can't start spark-shell on windows 10 Spark 3.2.0 install

Issue
When I try to run spark-shell I get a huge message error that you can see here :
https://pastebin.com/8D6RGxUJ
Install
I used this tutorial, but I already have python and java installed. I used spark 3.2.0 instead.
Config :
Windows 10
HADOOP_HOME : C:\hadoop
downloaded from https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.0/bin
JAVA_HOME : C:\PROGRA~2\Java\jre1.8.0_311
SPARK_HOME : C:\Spark\spark-3.2.0-bin-hadoop3.2
in path :
%SPARK_HOME%\bin
%HADOOP_HOME%\bin
My guess is that you have to put winutils.exe in the same folder as the $SPARK_HOME%\bin folder. I discovered that after starting from scratch and following this tutorial!
By following this answer for a similar question, I downgraded from spark 3.2.1 to 3.0.3 and this seems to have solved this problem.
I managed to solve the problem with the following configuration:
Spark: spark-3.2.1-bin-hadoop2.7
Hadoop: winutils.exe and hadoop.dll (version 2.7.7 for both)
JDK: jdk-18.0.1
And I recommend that you put the environment variables in User, not System

E0401:Unable to import 'pyspark in VSCode in Windows 10

I have installed below on my windows 10 machine to use the Apache Spark.
Java,
Python 3.6 and
Spark (spark-2.3.1-bin-hadoop2.7)
I am trying to write pyspark related code in VSCode. It is showing red underline under the 'from ' and showing error message
E0401:Unable to import 'pyspark'
I have also used ctrl+Shift+P and select "Python:Update workspace Pyspark libraries". It is showing notification message
Make sure you have SPARK_HOME environment variable set to the root path of the local spark installation!
What is wrong?
You will need to install the pyspark Python package using pip install pyspark. Actually, this is the only package you'll need for VSCode, unless you also want to run your Spark application on the same machine.

how to setup pyspark with zeppelin on windows 10

I have had difficulties installing Zeppelin 0.7.2
Using the Zeppelin version 0.7.2 of spark that comes with it, I can run spark code, but I am unable to run %pyspark code even after modifying python environment variables to point to where python is installed (python was installed using anaconda).
%python code works fine.
If anyone can help resolve this issue I would be grateful. (The odd thing is I have done the same installation on another windows 10 laptop and pyspark does execute.)
The error I get is that: pyspark is not responding

Resources