Jupyter + Apache toree - scala kernel is busy - apache-spark

I've installed jupyter notebook over python 3.5.2 on ubuntu server 16.04
I also have installed apache toree to run spark jobs from jupyter.
I run:
pip3 install toree
jupyter toree install --spark_home=/home/arik/spark-2.0.1-bin-hadoop2.7/ # My Spar directory
The output was a success:
[ToreeInstall] Installing Apache Toree version 0.1.0.dev8
[ToreeInstall] Apache Toree is an effort undergoing incubation at the
Apache Software Foundation (ASF), sponsored by the Apache Incubator
PMC.
Incubation is required of all newly accepted projects until a further
review indicates that the infrastructure, communications, and decision
making process have stabilized in a manner consistent with other
successful ASF projects.
While incubation status is not necessarily a reflection of the
completeness or stability of the code, it does indicate that the
project has yet to be fully endorsed by the ASF.
Additionally, this release is not fully compliant with Apache release
policy and includes a runtime dependency that is licensed as LGPL v3
(plus a static linking exception). This package is currently under an
effort to re-license (https://github.com/zeromq/jeromq/issues/327).
[ToreeInstall] Creating kernel Scala [ToreeInstall] Removing existing
kernelspec in /usr/local/share/jupyter/kernels/apache_toree_scala
[ToreeInstall] Installed kernelspec apache_toree_scala in
/usr/local/share/jupyter/kernels/apache_toree_scala
and i though that everthing was successful but everytime i create an apache toree notebook i see the following:
It says Kernel busy and all of my commands are ignored..
I couldn't find anything about this issue online.
Alternatives to toree would also be accepted.
Thank you

Toree unfortunately does not work with Scala 2.11. Either you can downgrade to scala 2.10 with spark or use more recent version of toree(still in beta). The way I made it work with spark 2.1 and Scala 2.11:
#!/bin/bash
pip install -i https://pypi.anaconda.org/hyoon/simple toree
jupyter toree install --spark_home=$SPARK_HOME --user #will install scala + spark kernel
jupyter toree install --spark_home=$SPARK_HOME --interpreters=PySpark --user
jupyter kernelspec list
jupyter notebook #launch jupyter notebook
Look at this post and this post for more info.
It will eventually look like this:

Related

Problem to Install pyspark of version 2.3

I was trying to install pyspark 2.3 from the last couple days. But I have found out Version 3.0.1 and 2.4.7 only so far. Actually I was trying to run a code implemented in pyspark 2.3 as a part of my project. Is that version still available now ? Please send me the essential resources to install pyspark 2.3 if it is available to install as well as shareable. As it seems tough to me to implement that code in version 3.0.1.
Pyspark 2.3 should still be available via Conda-Forge.
Please checkout https://anaconda.org/conda-forge/pyspark/files?version=2.3.2
There you will find the following and more packages for a direct download:
linux-64/pyspark-2.3.2-py36_1000.tar.bz2
win-64/pyspark-2.3.2-py36_1000.tar.bz2
If you don't want the raw packages, you can also install it via conda:
conda install -c conda-forge pyspark=2.3.2

How to run Jupyter Notebook without Anaconda on Ubuntu?

There is a lot of information available to run Jupyter Notebook with Anaconda but could not find any info to run Jupyter without Anaconda.
Any pointer would be much appreciated!
Basically the process is as follows:
pip3 install --upgrade pip
pip3 install jupyter
jupyter notebook # run notebook
Run a specific notebook:
jupyter notebook notebook.ipynb
Using custom IP or port:
jupyter notebook --port 9999
No browser:
jupyter notebook --no-browser
Help:
jupyter notebook --help
Answer from the following sources:
SOURCE 1
SOURCE 2
See Gordon Ball's Jupyter PPA, the most actively maintained Jupyter PPA as of this writing with support for both Ubuntu 16.04 and 18.04 (LTS).
Installation: It's a Ball
Thanks to Gordon's intrepid efforts, installation of Jupyter under Ubuntu trivially reduces to:
sudo add-apt-repository ppa:chronitis/jupyter
sudo apt-get update
sudo apt-get install jupyter
Doing so installs the jupyter metapackage, providing:
A standard collection of Jupyter packages published by this PPA, including:
Jupyter's core and client libraries.
Jupyter's console interface.
Jupyter's web-based notebook.
Tools for working with and converting notebook (ipynb) files.
The Python3 computational kernel.
The /usr/bin/jupyter executable.
As W. Dodge's pip-based solution details, the browser-based Jupyter Notebook UI may then be launched from a terminal as follows – where '/home/my_username/my_notebooks' should be replaced with the absolute path of the top-level directory containing all of your Jupyter notebook files:
jupyter notebook --notebook-dir='/home/my_username/my_notebooks'
Why Not Acanaconda or pip?
For Debian-based Linux distributions, the optimal solution is a Debian-based personal package archive (PPA). All other answers propose Debian-agnostic solutions (e.g., Anaconda, pip), which circumvent the system-wide APT package manager and hence are non-ideal.
Installing Jupyter via this or another PPA guarantees automatic updates to both Jupyter and its constellation of dependencies. Installing Jupyter via either Anaconda or pip requires manual updates to both on an ongoing basis – a constant thorn in the side that you can probably do without.
In short, PPA >>>> Anaconda >> pip.
There are two ways to install Jupyter-Notebook in Ubuntu. One is using Anaconda, the other using pip. Please go through the below added link for details.
http://jupyter.readthedocs.io/en/latest/install.html

How to change python version in apache toree pyspark notebook?

I am running Apache Toree for Pyspark Notebook. I had anaconda 3.5 and jupyter hub installed on unix machines. When I am invoking pyspark from Jupyter notebook it's starting with Python 2.7 instead of Anaconda 3.5.
Requesting your help in changing python version.
Please see I had already tried changing python version via os.environ but it didn't worked.
Followed Below steps for configuring Toree with Python-3:
Installed a new kernel with spark home and python path.
jupyter toree install --spark_home="spark_path" --kernel_name=tanveer_kernel1 --interpreters=PySpark,SQL --python="python_path"
After doing above there were issues with Driver Python version and Executor Python version. Corrected Python Version in spark-env.sh by adding
export PYSPARK_PYTHON="/usr/lib/anaconda3/bin/python"
export PYSPARK_DRIVER_PYTHON="/usr/lib/anaconda3/bin/python"
Restarted spark services.

How is Apache Toree installed on Mac OS X with Spark installed via Homebrew?

Apache Toree is looking for the spark home directory (defaults to "/usr/local/spark", but when it can't find the directory due to spark having been installed via Homebrew, it throws an exception.
jupyter toree install
FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/spark/python/lib'
Where is the spark home when spark is installed via homebrew?
The directory Apache Toree is looking for when spark is installed via homebrew is in /usr/local/Cellar:
jupyter toree install --spark_home /usr/local/Cellar/apache-spark/2.1.0/libexec
/usr/local/Cellar/apache-spark/2.1.0/libexec/
It specifically wants the "libexec" directory where it can go into the "python/lib" sub-directory.
If that doesn't work, you might additionally need to pass in a --user flag.
jupyter toree install --user --spark_home /usr/local/Cellar/apache-spark/2.2.0/libexec
Kinda like from this github issue, this jupyter documentation, and this other stack question.

Using Spark Kernel on Jupyter

So I am just starting out with Jupyter and the idea of notebooks.
I usually program in VIM and terminal so I am still trying to figure out somethings.
I am trying to use a Toree kernel.
I am trying to install a kernel that is capable of executing spark and have come across Toree. I installed toree and it appears when I run kernel list. Here is the result:
$ jupyter kernelspec list
Available kernels:
python3 C:\Users\UserName\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\resources
bash C:\Users\UserName\AppData\Roaming\jupyter\kernels\bash
toree C:\ProgramData\jupyter\kernels\toree
So when I open a toree notebook, the kernel dies and will not restart. Closing the notebook and reopening it results in the kernel changing to Python3.
There is a large error message that gets printed to the host terminal and the notebook error message. There is another post that has been put on hold; they are the same error messages.
I followed this page for the install:
https://github.com/apache/incubator-toree
These instructions are mostly for Linux/Mac is appears.
Any thoughts on how to get a spark notebook on Jupyter?
I understand there is not a lot of information here, If more is needed. Let me know.
I posted a similar question to Gitter and they replied saying (paraphrased) that:
Toree is the future of spark programming on Jupyter and will appear to have installed correctly on a windows machine but the .jar and .sh files will not operate correctly on the windows machine.
Knowing this, I tried it on my Linux (Fedora) and a borrowed Mac. Once jupyter was installed (and Anaconda) I entered these commands:
$ SparkHome="~/spark/spark1.5.5-bin.hadoop2.6"
$ sudo pip install toree
Password: **********
$ sudo jupyter toree install --spark_home=$SparkHome
Jupyter ran the toree notebook on both machines. I presume that a VM might work as well. I want to see if the Window's 10 bash shell will also work with this as I am running windows 7.
Thanks for the other Docs!
The answer from #user3025281 solved the issue for me as well. I had to make the following adjustment for my environment (an Ubuntu 16.04 Linux distro running Spark 2.2.0 and Hadoop 2.7). The downloads are direct file downloads from the hosting sites or a mirror site.
You'll be mostly configurating your environment variables then calling jupyter, assuming it's been installed through anaconda. that's pretty much it
SPARK_HOME="~/spark/spark-2.2.0-bin-hadoop2.7"
Write this into your ~/.bashrc file and then call source on `.bashrc
# reload environment variables
source ~/.bashrc`
Install
sudo pip install toree
sudo jupyter toree install --spark_home=$SPARK_HOME
Optional: On Windows 10, you could use "Bash on Ubuntu on Windows" for configurating jupyter on a linux distro

Resources