How can I Dockeries a python script which contains spark dependencies? - python-3.x

I have a Python file, in which I tried to import Spark libraries.
When I built it with the Docker File it is giving me error as 'JAVA_HOME' is not set.
I tried to install Java through Docker file, but it is giving error as well.
Below is the Dockerfile I tried to execute.
FROM python:3.6.4
RUN apt-get update && \
apt-get upgrade -y && \
apt-get install -y software-properties-common && \
add-apt-repository ppa:webupd8team/java -y && \
apt-get update && \
echo oracle-java7-installer shared/accepted-oracle-license-v1-1 select true | /usr/bin/debconf-set-selections && \
apt-get install -y oracle-java8-installer && \
apt-get clean
ENV JAVA_HOME /usr/lib/jvm/java-8-oracle
ADD Samplespark.py /
COPY Samplespark.py /opt/ml/Samplespark.py
RUN pip install pandas
RUN pip install numpy
RUN pip install pyspark
RUN pip install sklearn
RUN pip install sagemaker_pyspark
RUN pip install sagemaker
CMD [ "python", "./Samplespark.py" ]
ENTRYPOINT ["python","/opt/ml/Samplespark.py"]
Please help me to install the Java dependencies for PySpark in Docker.

You have Debian os, not ubuntu os. These ppas are for ubuntu os. According to this, article oracle java8 is not available in Debian due to licensing issues.
You have following options-
1. Use an Ubuntu docker image which comes with preinstalled oracle java8 like this one
2. Follow this tutorial on how to install Oracle java8 on Debian Jessie
3. Install open_jdk sudo apt-get install openjdk-8-jre

Related

Dockerfile for Python3 and OpenCV

I need to have a Dockerfile with Python3 and the latest version of OpenCV. The Dockerfile I have written is described below:
FROM ubuntu
#Install OpenCV
RUN apt-get update &&\
apt-get install -y cmake
RUN apt-get install -y gcc g++
RUN apt-get install -y python3-dev python3-numpy
# To support GUI features
RUN apt-get install -y libavcodec-dev libavformat-dev libswscale-dev
# To support GTK 2
RUN apt-get install -y libgtk2.0-dev
# To support GTK 3
RUN apt-get install -y libgtk-3-dev
#Optional dependencies
RUN apt-get install -y libpng-dev
RUN apt-get install -y libjpeg-dev
RUN apt-get install -y libopenexr-dev
RUN apt-get install -y libtiff-dev
RUN apt-get install -y libwebp-dev
# Clone OpenCV repo
RUN apt-get install -y git
RUN git clone https://github.com/opencv/opencv.git
#Compile
RUN mkdir /opencv/build && \
cd /opencv/build
RUN cmake ..
RUN make
However, when i run it, it gives me the following error with cmake.
Step 17/27 : RUN cmake ..
---> Running in 3dca32df2036
CMake Error: The source directory "/" does not appear to contain CMakeLists.txt.
Specify --help for usage, or press the help button on the CMake GUI.
The command '/bin/sh -c cmake ..' returned a non-zero code: 1
Do you know what am i doing wrong?
You just need to chain the last RUN instructions:
#Compile
RUN mkdir /opencv/build && \
cd /opencv/build && \
cmake .. && \
make
Explanation:
A RUN statement creates a new image layer and executes the specified shell command. This means that each RUN statement will run the command in a separate shell, so the current directory from the previous RUN will not be preserved (you can see that the CMake current directory is / by looking closely at the error message).
You can find more info about RUN statement in the Docker documentation and I also recommend reading Best practices for writing Dockerfiles.

Install python 3.5 inside docker with a base image centos7

I am trying to install python 3.5 inside docker with a base image centos7. This is our Dockerfile
FROM base-centos7:0.0.8
# Install basic tools
RUN yum install -y which vim wget git gcc
# Install python 3.5
RUN yum install -y https://repo.ius.io/ius-release-el7.rpm \
&& yum update -y \
&& yum install -y python35u python35u-libs python35u-devel python35u-pip
RUN python3.5 -m pip install --upgrade pip
But during the build, docker build image is failing with the following errors
executor failed running [/bin/sh -c yum install -y https://repo.ius.io/ius-release-el7.rpm
&& yum update -y
&& sudo yum install -y python35u python35u-libs python35u-devel python35u-pip]: exit code: 127.
Can anyone guide me in resolving this issue. and why am I seeing this issue in very first place.
You can use python image from docker hub
https://hub.docker.com/_/python
Example of dockerfile :
FROM python:3.6
RUN mkdir /code
WORKDIR /code
ADD . /code/
RUN pip install -r requirements.txt
EXPOSE 5000
CMD ["python", "/code/app.py"]
i think it's easy , isn't ?
the centos repo uses:
FROM centos/s2i-base-centos7
EXPOSE 8080
ENV PYTHON_VERSION=3.5 \
PATH=$HOME/.local/bin/:$PATH \
PYTHONUNBUFFERED=1 \
PYTHONIOENCODING=UTF-8 \
LC_ALL=en_US.UTF-8 \
LANG=en_US.UTF-8 \
PIP_NO_CACHE_DIR=off
RUN INSTALL_PKGS="rh-python35 rh-python35-python-devel rh-python35-python-setuptools rh-python35-python-pip nss_wrapper \
httpd24 httpd24-httpd-devel httpd24-mod_ssl httpd24-mod_auth_kerb httpd24-mod_ldap \
httpd24-mod_session atlas-devel gcc-gfortran libffi-devel libtool-ltdl enchant" && \
yum install -y centos-release-scl && \
yum -y --setopt=tsflags=nodocs install --enablerepo=centosplus $INSTALL_PKGS && \
rpm -V $INSTALL_PKGS && \
# Remove centos-logos (httpd dependency) to keep image size smaller.
rpm -e --nodeps centos-logos && \
yum -y clean all --enablerepo='*'
source here
The problem is not difficult, I build the image changing
FROM base-centos7:0.0.8 ====> FROM centos:7
You can consult the images version of centos in https://hub.docker.com/_/centos
PD: The container showed: errro exited(1), you should focus on the main process.

docker ERROR: Could not find a version that satisfies the requirement apturl==0.5.2

I am using windows 10 OS. I want to build an container based on linux so I can replicate code and dependencies developed from ubuntu. When I try to build it outputs Error message as above.
From my understanding docker for desktop runs linux OS kernel under-the-hood therefore allowing window users to run linux based containers, not sure why it is outputting this error.
My dockerfile looks like this:
FROM ubuntu:18.04
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
RUN apt update \
&& apt install -y htop python3-dev wget
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& mkdir root/.conda \
&& sh Miniconda3-latest-Linux-x86_64.sh -b \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
RUN conda create -y -n ml python=3.7
COPY . src/
RUN /bin/bash -c "cd src \
&& source activate ml \
&& pip install -r requirements.txt"
requirements.txt contains:
apturl==0.5.2
asn1crypto==0.24.0
bleach==2.1.2
Brlapi==0.6.6
certifi==2020.11.8
chardet==3.0.4
click==7.1.2
command-not-found==0.3
configparser==5.0.1
cryptography==2.1.4
cupshelpers==1.0
dataclasses==0.7
When I run docker build command it outputs:
1.649 ERROR: Could not find a version that satisfies the requirement apturl==0.5.2 1.649 ERROR: No matching distribution found for apturl==0.5.2 Deleting it and running it lead to another error. All error seem to be associated with ubuntu packages.
Am I not running a ubuntu container? why aren't I allowed to install ubuntu packages?
Thanks!
You try to install ubuntu packages with pip (which is for python packages")
try apt install -y apturl
If you want to install python packages write pip install package_name

pip3 install doesn't install anything in my docker image

So, here's the problem:
I am trying to build and run a jupyter notebook inside the docker container. At some point I need to install a few python packages with pip3 install. The build looks fine, all the modules are being installed, but when I import any of those modules inside the jupyter notebook, it says "No module named <module_name>", as if it wasn't installed.
Here's the Dockerfile:
FROM jupyter/base-notebook
LABEL maintainer="Jupyter Project <jupyter#googlegroups.com>"
USER root
RUN apt-get update && apt-get install -yq --no-install-recommends \
build-essential \
python3-pip
RUN sudo -H pip3 install matplotlib \
pillow \
numpy
USER $NB_UID
I can't figure out why this doesn't work.

How to install PYODBC in Databricks

I have to install pyodbc module in Databricks.
I have tried using this command (pip install pyodbc) but it is failed due to below error.
Error message
I was having the same issue for installation. This is what I tried and it worked.
Databricks does not have default ODBC Driver. Run following commands in a single cell to install MS SQL ODBC driver
%sh
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/ubuntu/16.04/prod.list > /etc/apt/sources.list.d/mssql-release.list
sudo apt-get update
sudo ACCEPT_EULA=Y apt-get -q -y install msodbcsql17
Run this in notebook
dbutils.fs.put("/databricks/init/<YourClusterName>/pyodbc-install.sh","""
#!/bin/bash
sudo apt-get update
sudo apt-get -q -y install unixodbc unixodbc-dev
sudo apt-get -q -y install python3-dev
/databricks/python/bin/pip install pyodbc
""", True)
Restart the cluster
Import pyodbc in Code
I had some problems a while back with connecting using pyobdc, details of my fix are here: https://datathirst.net/blog/2018/10/12/executing-sql-server-stored-procedures-on-databricks-pyspark
I think the problem stems from PYTHONPATH on the databricks clusters being set to the Python 2 install.
I suspect the lines:
%sh
apt-get -y install unixodbc-dev
/databricks/python/bin/pip install pyodbc
Will work for you.
Update: Even simpler (though you will still need unixodbc-dev from above):
%sh
sudo apt-get install python3-pip -y
pip3 install --upgrade pyodbc
Right-click the Workspace folder where you want to store the library.
Select Create > Library.
Look this https://docs.databricks.com/user-guide/libraries.html for detailed information

Resources