pyspark to open directly in jupyter-lab - apache-spark

Installed below apps on windows 10
Install apache spark 3.1.3
Installed Hadoop 3.3.2
Installed Jupyter-lab
When i execute pyspark or spark-shell from command line. I get the below output which mean apache spark got installed/configured correctly
When in execute pyspark from command line, i want jupyter-lab interface to be opened automatically.
When i set the below environment variable jupyter notebook opens automatically
PYSPARK_DRIVER = C:\Users\xxxx\AppData\Local\Programs\Python\Python39\Scripts\jupyter.exe
PYSPARK_DRIVER_PYTHON_OPTS = notebook
I tried below setting, but no luck
PYSPARK_DRIVER = C:\Users\xxxx\AppData\Local\Programs\Python\Python39\Scripts\jupyter-lab.exe
PYSPARK_DRIVER_PYTHON_OPTS = lab
What environment variables, i need to set in order to open jupyter-lab directly. How to specify the kernel in jupyter kernels ?

Related

Spark connects from VSCode but doesn't connect from Jupyter

I have installed databricks-connect and I was able to connect to the clusters and launch some jobs from VSCode.
I saw that it's possible to launch databricks-connect in Jupyter notebook, so from the same terminal of my code in VScode, I have launched Jupyter Notebook in the same environment, but Spark didn't love the idea,
Here are some snapshots for the problem:
This one from VSCode Notebooks (it works also in .py ):
This one from Jupyter Notebook:
I have tried findspark, and in our case it's not the solution since I am using Databricks Connect,
I see it more that Spark is not pointing to the same context,
and I repeat, I have launch the notebook from the terminal of the same environment, so logically I have all the env variables the same

E0401:Unable to import 'pyspark in VSCode in Windows 10

I have installed below on my windows 10 machine to use the Apache Spark.
Java,
Python 3.6 and
Spark (spark-2.3.1-bin-hadoop2.7)
I am trying to write pyspark related code in VSCode. It is showing red underline under the 'from ' and showing error message
E0401:Unable to import 'pyspark'
I have also used ctrl+Shift+P and select "Python:Update workspace Pyspark libraries". It is showing notification message
Make sure you have SPARK_HOME environment variable set to the root path of the local spark installation!
What is wrong?
You will need to install the pyspark Python package using pip install pyspark. Actually, this is the only package you'll need for VSCode, unless you also want to run your Spark application on the same machine.

SPARK/pyspark - not running hive.HiveSessionStateBuilder

I have a problem with running pyspark from Windows command line. I don't know what is causing the issue.
Spark-shell is running normally.
JAVA_HOME is set to C:Java, where I have installed JDK java version "1.8.0_161"
SPARK_HOME is set to C:\Users\heyde\Anaconda3\Lib\site-packages\pyspark, where I have installed it through pip in Anaconda
Also I have added C:\Users\heyde\Anaconda3\Lib\site-packages\pyspark\bin and C:Java\bin to system PATH.
Console
Spark-shell

how to setup pyspark with zeppelin on windows 10

I have had difficulties installing Zeppelin 0.7.2
Using the Zeppelin version 0.7.2 of spark that comes with it, I can run spark code, but I am unable to run %pyspark code even after modifying python environment variables to point to where python is installed (python was installed using anaconda).
%python code works fine.
If anyone can help resolve this issue I would be grateful. (The odd thing is I have done the same installation on another windows 10 laptop and pyspark does execute.)
The error I get is that: pyspark is not responding

Jupyter notebook kernels do not launch in notebook's directory

I run:
Jupyter 4.2.0
notebook 4.2.3
Linux Mint 18
The notebook application starts correctly and in the correct directory. But when I open a notebook, the python kernel is launched in ~/user and not in the notebooks directory (as it used to be). This problems seems to be happening since I encrypted my home folder.
Could this be a permission issue ?

Resources