install cudf on databricks - databricks

I am trying to use cudf on databricks.
I started following https://medium.com/rapids-ai/rapids-can-now-be-accessed-on-databricks-unified-analytics-platform-666e42284bd1. But the init script link is broken.
Then, I followed this link (https://github.com/rapidsai/spark-examples/blob/master/getting-started-guides/csp/databricks/databricks.md#start-a-databricks-cluster) which install the cudf jars on the cluster. Still I could not import cudf.
I also tried:
%sh conda install -c rapidsai -c nvidia -c numba -c conda-forge cudf=0.13 python=3.7 cudatoolkit=10.1 which also failed with a long error ending with:
active environment : /databricks/python
active env location : /databricks/python
shell level : 2
user config file : /root/.condarc
populated config files : /databricks/conda/.condarc
conda version : 4.8.2
conda-build version : not installed
python version : 3.7.6.final.0
virtual packages : __cuda=10.2
__glibc=2.27
base environment : /databricks/conda (writable)
channel URLs : https://conda.anaconda.org/nvidia/linux-64
https://conda.anaconda.org/nvidia/noarch
https://conda.anaconda.org/rapidsai/linux-64
https://conda.anaconda.org/rapidsai/noarch
https://conda.anaconda.org/numba/linux-64
https://conda.anaconda.org/numba/noarch
https://conda.anaconda.org/conda-forge/linux-64
https://conda.anaconda.org/conda-forge/noarch
https://conda.anaconda.org/pytorch/linux-64
https://conda.anaconda.org/pytorch/noarch
https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
package cache : /databricks/python/pkgs
/local_disk0/conda/pkgs
envs directories : /databricks/conda/envs
/root/.conda/envs
platform : linux-64
user-agent : conda/4.8.2 requests/2.22.0 CPython/3.7.6 Linux/4.4.0-1114-aws ubuntu/18.04.5 glibc/2.27
UID:GID : 0:0
netrc file : None
offline mode : False
An unexpected error has occurred. Conda has prepared the above report.
Upload successful.
Any idea how to use cudf on a databricks cluster ?

I remember helping write that blog a while ago :). It's out of date now.
Karthik and team made some great updates since with spark-rapids. Here is the newest implementation of RAPIDs with databricks in spark: https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-databricks.html. That should get you running with the newest version of cudf.
I'll ask that someone add a disclaimer with that like on that specific blog, so that others don't get confused either. Thanks for alerting us through this question!

Perhaps you need cudatoolkit=10.2? You have virtual packages : __cuda=10.2 in that report.
I am investigating install issues on a databricks GPU cluster (different issue though) and noted that the version of CUDA was 10.2 and not the 10.1 that I expected.

I think the OP want to use python with cudf.
If so, that is not covered in the documentation.
But I tried to add below into the generate-init-script.ipynb to make it work:
#Use mamba to install packages to speed up conda resolve time
base=$(conda info --base)
conda create -y -n mamba -c conda-forge mamba
pip uninstall -y pyarrow
${base}/envs/mamba/bin/mamba remove -y c-ares zstd libprotobuf pandas
${base}/envs/mamba/bin/mamba install -y "pyarrow=1.0.1" -c "conda-forge"
${base}/envs/mamba/bin/mamba install -y -c "rapidsai" -c "nvidia" -c "conda-forge" -c "defaults" "cudf=0.18" "cudatoolkit=10.1"
conda env remove -n mamba
Note: Change the cudf version and cudatoolkit according to your env.

Related

On el8/el9/newer, how do you get newer versions of software like python3, gcc, java, etc?

For example on el7:
to develop an nvidia CUDA application you need a newer gcc than the default gcc version 4.8.x and to get the newer version you would use a software repo called "Software Collections" (SCL)
the base python3 is 3.6 and you need newer python modules and so you install python3.8 from SCL
Starting on el8, and el9: the SCL is deprecated and so there is a different method for installing and configuring newer versions of gcc and python3.
On el8/el9/newer, how do you get newer versions of software like python3, gcc, java, etc?
in a nutshell, here are some examples for how to install and configure
for python3 to get python3.9: dnf install -y python39 && alternatives --set python3 $(command -v python3.9)
for gcc to get gcc-12: dnf install gcc-toolset-12 && source scl_source enable gcc-toolset-12
for java to get java-17: dnf install java-17 && bin_java_filename=$(rpm -qa|grep java-17|xargs rpm -ql|grep "bin\/java$"|head -1) && alternatives --set java ${bin_java_filename}
tested on rocky8, rocky9
which repo has the newer software versions?
the old method using "SCL" was deprecated
the new method is to use a repo called "appstream"
here is a post written by the distro maintainers explaining the change https://developers.redhat.com/blog/2018/11/15/rhel8-introducing-appstreams
the repo is enabled by default
how to: install newer software versions?
for python3: dnf install python39
for gcc: dnf install gcc-toolset-12
how to: change the system default?
for python3: alternatives --set python3 $(command -v python3.9)
for gcc:
edit your user .bashrc or .bash_profile or create a new file under /etc/profile.d/ with the following: source scl_source enable gcc-toolset-12
i thought scl_source would go away in el8, el9 but apparently not
for more info on scl_source go to this link https://unix.stackexchange.com/a/195219/5510 or Permanently enable RHEL scl
p.s. what is the difference between alternatives and update-alternatives?
the original tool is called update-alternatives and is from Debian linux distro
in EnterpriseLinux, Redhat rewrote the tool and called it alternatives and when you install alternatives the package also installs a symlink with name update-alternatives on your env var PATH to help you find the tool
the two are similar but not the same because their source code is different

conda info showing the same old version after conda update in Linux

I followed the conda doc to update the conda on a google could server with this:
conda update -n base -c defaults conda
after this, it shows:
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.5.11
latest version: 4.12.0
Please update conda by running
$ conda update -n base -c defaults conda
# All requested packages already installed.
then i try this:
conda update --all
the same output plus some packages and progress:
Solving environment: done
==> WARNING: A newer version of conda exists. <==
current version: 4.5.11
latest version: 4.12.0
Please update conda by running
$ conda update -n base -c defaults conda
## Package Plan ##
environment location: <path>
The following packages will be downloaded:
<packages and progress>
then i run conda info
active environment : None
shell level : 0
user config file : <path>/.condarc
populated config files :
conda version : 4.5.11
conda-build version : 2.0.2
python version : 3.5.6.final.0
base environment : <path>/anaconda3 (writable)
channel URLs : https://repo.anaconda.com/pkgs/main/linux-64
https://repo.anaconda.com/pkgs/main/noarch
https://repo.anaconda.com/pkgs/free/linux-64
https://repo.anaconda.com/pkgs/free/noarch
https://repo.anaconda.com/pkgs/r/linux-64
https://repo.anaconda.com/pkgs/r/noarch
https://repo.anaconda.com/pkgs/pro/linux-64
https://repo.anaconda.com/pkgs/pro/noarch
package cache : <path>/anaconda3/pkgs
<path>/.conda/pkgs
envs directories : <path>//anaconda3/envs
<path>//.conda/envs
platform : linux-64
user-agent : conda/4.5.11 requests/2.25.1 CPython/3.5.6 Linux/5.13.0-1019-gcp ubuntu/20.04 glibc/2.31
UID:GID : <the ID>
netrc file : None
offline mode : False
There was no error during the last installation. but it seems it is still not updated, i did it not in any environment, i downloaded and installed the conda version 4.2.0 on this google cloud server, I want to know is my conda updated to 4.12 or not, if not how could i update it properly?
thanks
While normally not recommend to update Python in-place, that is quite outdated and is likely what is preventing the conda package from being updated.
Try:
conda install -n base --dry-run python=3.9 conda=4.12
to see if updating is possible. If so, try again without the --dry-run flag.
⚠️ Note this is a risky update - if the conda package does not upgrade correctly with python, the installation could fail. I'd recommend a backup first.

ffmpeg: error while loading shared libraries: libopenh264.so.5

I am using ffmpeg and getting this error
ffmpeg: error while loading shared libraries: libopenh264.so.5: cannot open shared object file: No such file or directory
I have already checked if the library exists and it does. I added it to /etc/ld.so.conf as mentioned in this previous question but it doesn't work.
Another approach that seemed easier and worked for me on Ubuntu 16.04 and python 3.8 was just calling:
conda update ffmpeg
from this post.
I faced this error when I run ffmpeg 4.2.2 under python 3.8 environment.
The root cause is that libopen264.so from python 3.8 is too new for ffmpeg 4.2.2.
I can find
libopen264.so.6 under the ~/anaconda3/envs/py38/lib/ (py38 is my conda virtual environment), but we only need an older version libopen264.so.5.
To solve the problem, I just make a softlink from my existing anaconda environment (python 3.7) as follows - and it works.
ln -s ~/anaconda3/lib/libopenh264.so ~/anaconda3/envs/py38/lib/libopenh264.so.5
I resolved this by:
Downloading the openh264 binary from GitHub
Copying/renaming the binary to my conda env, e.g. ~/anaconda3/envs/py38/lib/libopenh264.so.5 where py38 is the env name
I had the same issue, to fix it I removed all installs of ffmpeg:
sudo apt-get remove ffmpeg
sudo apt-get purge ffmpeg
After doing this, the output was still the same, which ffmpeg showed me I was using the one from anaconda, so I removed that one (renamed it)
Then I could do a clean install and now it works again:
sudo apt-get install ffmpeg
I copy ~/anaconda3/lib/libopenh264.so, paste to the same folder and rename it to libopenh264.so.5. And it works.
I recently encountered this issue with system-installed ffmpeg, and pip-installed ffmpeg-python within a conda environment.
The work-around for me was uninstalling the system ffmpeg and installing as ffmpeg as a conda package within my conda environment:
# Uninstall ffmpeg system install (assumes Ubuntu)
sudo apt-get remove ffmpeg -y
sudo apt-get purge ffmpeg -y
# Install ffmpeg in conda env
conda install -c conda-forge ffmpeg
I did what Synthesis did, that is
sudo apt-get remove ffmpeg
sudo apt-get purge ffmpeg
However, I removed the Anaconda ffmpeg module
conda remove ffmpeg
The clean install then did the trick:
sudo apt-get install ffmpeg

unable to install tensorflow model server

I am trying to deploy my model on tensorflow serving. But I am facing issue with the installation of tensorflow model server itself. Do I need to install anything else before model server can be installed? I am using python v3.6 and tensorflow version 1.12.0 currently on VM.
conda install tensorflow-model-server
pip install tensorflow-model-server
Below are the two ways using which I am trying to install:
using conda install which gives me below error.
Solving environment: failed
PackagesNotFoundError: The following packages are not available from current channels:
tensorflow-model-server
using pip which is says:
Collecting tensorflow-model-server
Could not find a version that satisfies the requirement tensorflow-model-server (from versions: )
No matching distribution found for tensorflow-model-server
Did you try to follow instruction that are provide into documentation?
At very first, you should try to Add Tensorflow Service as package source using instructions as below
echo "deb [arch=amd64] http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | sudo tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | sudo apt-key add -
# then install
apt-get update && apt-get install tensorflow-model-server
For more information, please look at link below:
Tensorflow Serving doc

Python 3.6 in tensorflow gpu docker images

How can I have python3.6 in tensorflow docker images.
All the images I tried (latest, nighty) are using python3.5 and I don't want to modify all my scripts.
The Tensorflow images are based on Ubuntu 16.04, as you can see from the Dockerfile. This release ships with Python 3.5 as standard.
So you'll have to re-build the image, and the Dockerfile will need editing, even though you need to do the actual build with the parameterized_docker_build.sh script.
This answer on ask Ubuntu covers how to get Python 3.6 on Ubuntu 16.04
The simplest way would probably be just to change the From line in the Dockerfile to FROM ubuntu:16.10, and python to python3.6 in the initial apt-get install line
Of course, this may break some other Ubuntu version-specific thing, so an alternative would be to keep Ubuntu 16.04 and install one of the alternative ppa's also listed in the linked answer:
RUN add-apt-repository ppa:deadsnakes/ppa &&
apt-get update &&
apt-get install -y python3.6
Note that you'll need this after the initial apt-get install, because that installs software-properties-common, which you need to add the ppa.
Note also, as in the comments to the linked answer, that you will need to symlink to Python 3.6.
Finally, note that I haven't tried any of this. The may be gotchas, and you may need to make another change to ensure that the correct version of Python is used by the running container.
You can use stable images which are supplied by third parties, like ufoym/deepo.
One that fits TensorFlow, python3.6 and cuda10 can be found here or you can pull it directly using the command docker pull ufoym/deepo:py36-cu100
I use their images all the time, never had problems
With this anwer, I just wanted to specify how I solved this problem (the previous answer of SiHa helped me a lot but I had to add a few steps so that it worked completly).
Context:
I'm using a package (segmentation model for unet++) that requires tensorflow==1.4.0 and keras==2.2.2.
I tried to use the docker image for tensorflow 1.4.0, however, the default version of python of this image is 3.5 which is not compatible with my package.
I managed to install python3.6 on the docker images thanks to the following files:
My Dockerfile contains the following lines:
Dockerfile:
FROM tensorflow/tensorflow:1.4.0-gpu-py3
RUN mkdir /AI_PLATFORM
WORKDIR /AI_PLATFORM
COPY ./install.sh ./install.sh
COPY ./requirements.txt ./requirements.txt
COPY ./computer_vision ./computer_vision
COPY ./config.ini ./config.ini
RUN bash install.sh
Install.sh:
#!/urs/bin/env bash
pip install --upgrade pip
apt-get update
apt-get install -y python3-pip
add-apt-repository ppa:deadsnakes/ppa &&
apt-get update &&
apt-get install python3.6 --assume-yes
apt-get install libpython3.6
python3.6 -m pip install --upgrade pip
python3.6 -m pip install -r requirements.txt
Three things are important:
use python3.6 -m pip instead of pip, else the packages are installed on python 3.5 default version of Ubuntu 16.04
use docker run python3.6 <command> to run your containers with python==3.6
in the requirements.txt file, I had to specify the following things:
h5py==2.10.0
tensorflow-gpu==1.4.1
keras==2.2.2
keras-applications==1.0.4
keras-preprocessing==1.0.2
I hope that this answer will be useful
Maybe the image I created will help you. It is based on the cuda-10.0-devel image and has tensorflow 2.0a-gpu installed.
You can use it as base image for your own implementation. The image itself doesn't do anything. I put the image on dockerhub https://cloud.docker.com/repository/docker/patientzero/tensorflow2.0a-gpu-py3.6
The github repo is located here: https://github.com/patientzero/tensorflow2.0-python3.6-Docker
Pulling it won't do much, but for completeness:
$ docker pull patientzero/tensorflow2.0-gpu-py3.6
edit: changed to general tensorflow 2.0x image.
Also as mentioned here, the official image for the beta 2.0 release now comes with python 3.6 support

Resources