Using Spark Kernel on Jupyter - apache-spark

So I am just starting out with Jupyter and the idea of notebooks.
I usually program in VIM and terminal so I am still trying to figure out somethings.
I am trying to use a Toree kernel.
I am trying to install a kernel that is capable of executing spark and have come across Toree. I installed toree and it appears when I run kernel list. Here is the result:
$ jupyter kernelspec list
Available kernels:
python3 C:\Users\UserName\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\resources
bash C:\Users\UserName\AppData\Roaming\jupyter\kernels\bash
toree C:\ProgramData\jupyter\kernels\toree
So when I open a toree notebook, the kernel dies and will not restart. Closing the notebook and reopening it results in the kernel changing to Python3.
There is a large error message that gets printed to the host terminal and the notebook error message. There is another post that has been put on hold; they are the same error messages.
I followed this page for the install:
https://github.com/apache/incubator-toree
These instructions are mostly for Linux/Mac is appears.
Any thoughts on how to get a spark notebook on Jupyter?
I understand there is not a lot of information here, If more is needed. Let me know.

I posted a similar question to Gitter and they replied saying (paraphrased) that:
Toree is the future of spark programming on Jupyter and will appear to have installed correctly on a windows machine but the .jar and .sh files will not operate correctly on the windows machine.
Knowing this, I tried it on my Linux (Fedora) and a borrowed Mac. Once jupyter was installed (and Anaconda) I entered these commands:
$ SparkHome="~/spark/spark1.5.5-bin.hadoop2.6"
$ sudo pip install toree
Password: **********
$ sudo jupyter toree install --spark_home=$SparkHome
Jupyter ran the toree notebook on both machines. I presume that a VM might work as well. I want to see if the Window's 10 bash shell will also work with this as I am running windows 7.
Thanks for the other Docs!

The answer from #user3025281 solved the issue for me as well. I had to make the following adjustment for my environment (an Ubuntu 16.04 Linux distro running Spark 2.2.0 and Hadoop 2.7). The downloads are direct file downloads from the hosting sites or a mirror site.
You'll be mostly configurating your environment variables then calling jupyter, assuming it's been installed through anaconda. that's pretty much it
SPARK_HOME="~/spark/spark-2.2.0-bin-hadoop2.7"
Write this into your ~/.bashrc file and then call source on `.bashrc
# reload environment variables
source ~/.bashrc`
Install
sudo pip install toree
sudo jupyter toree install --spark_home=$SPARK_HOME
Optional: On Windows 10, you could use "Bash on Ubuntu on Windows" for configurating jupyter on a linux distro

Related

jupyter-lab installed but jupyter doesn't see it

I have jupyter-lab installed via pip3 with Python 3.8.10 on Ubuntu 20.04. I've been using it for months, but noticed some problems getting KeplerGL to render maps in it. While trouble-shooting this, I ran jupyter --version to see which version of jupyterlab I have installed (KeplerGL setup varies based on jupyter-lab version), and mysteriously, it says:
jupyter lab : not installed
Quite odd, because I have it open right now; I launched it from the command line like this:
$ jupyter-lab
I ran pip3 install jupyterlab for good measure, and got a bunch of "requirement already satisfied" messages. I suspect this has something to do with my inability to render KeplerGL, but my main purpose for this inquiry is figuring out why Jupyter isn't seeing jupyter-lab.
Did you make sure everything is installed in the same environment?

Trying to make sense of Python/Jupyter environment on MacOS

Background: while running Jupyter Notebook a new import was failing even though the library was installing successfully using pip3. Some of the set up for the code I was running was done in PyCharm which was using a virtual Python 3.8.2 environment. The failing import library is in the virtual environment so why isn't Jupyter seeing it?
I went looking and found that there are multiple versions of Python installed:
/Library/Python/2.7
/Library/Frameworks/Python.framework/Versions/3.8
/usr/local/bin/python3
/usr/local/bin/python3.8
/usr/local/bin/jupyter (included this in case it clarifies things)
/usr/bin/python
/usr/bin/python3
/usr/local/Cellar/python/3.7.6_1
/Users/xxx/anaconda3/bin/python3.7 (anaconda was uninstalled months ago so why is this still here?)
/Users/xxx/git/moat-ds/venv/lib/python3.8
I have installed pyenv and virtualenv and tried (unsuccessfully) to sort things out through this and similar articles. But all of this has only left me with questions:
what are these different directories doing?
when launched what is Jupyter notebook using for 'python 3' kernel?
where are the python packages stored when I run pip3 at the CLI (in pycharm packages are put in the \venv folder but otherwise?)
installing jupyter with pip from pyenv fixed my problem
brew uninstall jupyter
pip install jupyter
and after restarting your console it should be pyenv's jupyter
After trying #Akbar30bill's answer without success I did brew doctor and restarted my terminal and tried again and it worked. Wasn't linked correctly or something.

Will installing IRkernel via CRAN work in my conda environment?

I'm trying to set up an R kernel to work in Jupyter Notebook and Jupyter Lab.
I have miniconda3 installed and when I activate base environment,
then type
jupyter-kernelspec list I see
python3 C:/path/to/miniconda3/share/jupyter/kernels/python3
I want the R kernel so I can use it in Jupyter Lab and Jupyter Notebook.
I already have Rstudio installed. Is there a difference if I install the IRkernel to the kernels directory above via CRAN in Rstudio or fork it from github (assuming I can find it) and then clone it to the kernels directory?
Is this what I need to do or is possible that all I need to do is alter some variable in my environment PATH?
If download/install it via CRAN in Rstudio is that kernel going to be available in my (base) environment?
If R is installed outside Conda (more common), then install through CRAN.
If R is installed in a Conda env (less common), then follow the nb_conda_kernels instructions.

How to run Jupyter Notebook without Anaconda on Ubuntu?

There is a lot of information available to run Jupyter Notebook with Anaconda but could not find any info to run Jupyter without Anaconda.
Any pointer would be much appreciated!
Basically the process is as follows:
pip3 install --upgrade pip
pip3 install jupyter
jupyter notebook # run notebook
Run a specific notebook:
jupyter notebook notebook.ipynb
Using custom IP or port:
jupyter notebook --port 9999
No browser:
jupyter notebook --no-browser
Help:
jupyter notebook --help
Answer from the following sources:
SOURCE 1
SOURCE 2
See Gordon Ball's Jupyter PPA, the most actively maintained Jupyter PPA as of this writing with support for both Ubuntu 16.04 and 18.04 (LTS).
Installation: It's a Ball
Thanks to Gordon's intrepid efforts, installation of Jupyter under Ubuntu trivially reduces to:
sudo add-apt-repository ppa:chronitis/jupyter
sudo apt-get update
sudo apt-get install jupyter
Doing so installs the jupyter metapackage, providing:
A standard collection of Jupyter packages published by this PPA, including:
Jupyter's core and client libraries.
Jupyter's console interface.
Jupyter's web-based notebook.
Tools for working with and converting notebook (ipynb) files.
The Python3 computational kernel.
The /usr/bin/jupyter executable.
As W. Dodge's pip-based solution details, the browser-based Jupyter Notebook UI may then be launched from a terminal as follows – where '/home/my_username/my_notebooks' should be replaced with the absolute path of the top-level directory containing all of your Jupyter notebook files:
jupyter notebook --notebook-dir='/home/my_username/my_notebooks'
Why Not Acanaconda or pip?
For Debian-based Linux distributions, the optimal solution is a Debian-based personal package archive (PPA). All other answers propose Debian-agnostic solutions (e.g., Anaconda, pip), which circumvent the system-wide APT package manager and hence are non-ideal.
Installing Jupyter via this or another PPA guarantees automatic updates to both Jupyter and its constellation of dependencies. Installing Jupyter via either Anaconda or pip requires manual updates to both on an ongoing basis – a constant thorn in the side that you can probably do without.
In short, PPA >>>> Anaconda >> pip.
There are two ways to install Jupyter-Notebook in Ubuntu. One is using Anaconda, the other using pip. Please go through the below added link for details.
http://jupyter.readthedocs.io/en/latest/install.html

How is Apache Toree installed on Mac OS X with Spark installed via Homebrew?

Apache Toree is looking for the spark home directory (defaults to "/usr/local/spark", but when it can't find the directory due to spark having been installed via Homebrew, it throws an exception.
jupyter toree install
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/spark/python/lib'
Where is the spark home when spark is installed via homebrew?
The directory Apache Toree is looking for when spark is installed via homebrew is in /usr/local/Cellar:
jupyter toree install --spark_home /usr/local/Cellar/apache-spark/2.1.0/libexec
/usr/local/Cellar/apache-spark/2.1.0/libexec/
It specifically wants the "libexec" directory where it can go into the "python/lib" sub-directory.
If that doesn't work, you might additionally need to pass in a --user flag.
jupyter toree install --user --spark_home /usr/local/Cellar/apache-spark/2.2.0/libexec
Kinda like from this github issue, this jupyter documentation, and this other stack question.

Resources