Docker make Nvidia GPUs visible during docker build process - python-3.x

I want to build a docker image where I want to compile custom kernels with pytorch. Therefore I need access to the available gpus in order to compile the custom kernels during docker build process. On the host machine everything is setted up including nvidia-container-runtime, nvidia-docker, Nvidia-Drivers, Cuda etc. The following command shows docker runtime information on the host system:
$ docker info|grep -i runtime
Runtimes: nvidia runc
Default Runtime: runc
As you can see the default runtime of docker in my case is runc. I think changing the default runtime from runc to nvidia would solve this problem, as noted here.
The proposed solution doesn't work in my case because:
I have no permissions to change the default runtime on system I use
I have no permissions to make changes to the daemon.json file
Is there a way to get access to the gpus during the build process in the Dockerfile in order to compile custom pytorch kernels for CPU and GPU (in my case DCNv2)?
Here is the minimal example of my Dockerfile to reproduce this problem. In this image, DCNv2 is only compiled for CPU and not for GPU.
FROM nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
RUN apt-get update && \
DEBIAN_FRONTEND=noninteractive apt-get install -y tzdata && \
apt-get install -y --no-install-recommends software-properties-common && \
add-apt-repository ppa:deadsnakes/ppa && \
apt update && \
apt install -y --no-install-recommends python3.6 && \
apt-get install -y --no-install-recommends \
build-essential \
python3.6-dev \
python3-pip \
python3.6-tk \
pkg-config \
software-properties-common \
git
RUN ln -s /usr/bin/python3 /usr/bin/python & \
ln -s /usr/bin/pip3 /usr/bin/pip
RUN python -m pip install --no-cache-dir --upgrade pip setuptools && \
python -m pip install --no-cache-dir torch==1.4.0 torchvision==0.5.0
RUN git clone https://github.com/CharlesShang/DCNv2/
#Compile DCNv2
WORKDIR /DCNv2
RUN bash ./make.sh
# clean up
RUN apt-get clean && \
rm -rf /var/lib/apt/lists/*
#Build: docker build -t my_image .
#Run: docker run -it my_image
An not opitmal solution which worked would be be the following:
Comment out line RUN bash ./make.sh in Dockerfile
Build image: docker build -t my_image .
Run image in interactive mode: docker run --gpus all -it my_image
Compile DCNv2 manually: root#1cd02fd62461:/DCNv2# ./make.sh
Here DCNv2 is compiled for CPU and GPU, but that seems to me not an ideal solution, because I must compile DCNv2 every time when i start the container.

Related

Why would Python not be available to a Docker Entrypoint Script?

This Python 3.9 project has a Dockerfile, that builds successfully. The file makes use of an ENTRYPOINT script to create some directories and handle some clean-up at run time. It is a bash script. The ENTRYPOINT script has no problem running until the very end, where it is expected to execute the CMD that is passed next. Well, I should say this behavior only happens when Kaniko builds the image. When the image is built locally, no such problem occurs. However, I am willing to chalk that up to the fact that locally is on a Windows machine. However, that shouldn't matter here because the error thrown is:
/opt/project/conf/entrypoint.sh: /usr/bin/supervisord: /usr/bin/python3: bad interpreter: No such file or directory
/opt/project/conf/entrypoint.sh: line 8: /usr/bin/supervisord: Success
Now I have looked at many "bad interpreter" questions. They all seem to revolve around the interpreter being in a custom place. I am reliant upon the default spot for the Python 3.9 interpreter. On Debian Bullseye (The OS behind the base image) that should be /usr/local/bin/python or /usr/local/bin/python3. So I am completely stumped as to why it is unable to find or use it.
Here are the implementation details:
Dockerfile:
FROM python:3.9-slim-bullseye
# Minimum Required Environment Variables
ENV SHELL=/bin/bash
ENV CC /usr/bin/gcc
ENV CXX /usr/bin/g++
ENV LANG=C.UTF-8
ENV DEBIAN_FRONTEND=noninteractive
ENV PYMSSQL_BUILD_WITH_BUNDLED_FREETDS=1
ENV PIP_CONFIG_FILE=/etc/pip.conf
ENV TZ=America/Los_Angeles
# Project Specific Environment Variables
ENV PROJECT_LOGFILE=/var/log/project/project.log
ENV PROJECT_CONFIG_DIRECTORY=/opt/project/conf
ENV PROJECT_SETTINGS_MODULE="project.settings"
# Files Needed for Dependency Installation
COPY dev/.pip.conf /etc/pip.conf
COPY dev/dev-requirements.txt /usr/local/requirements.txt
# Dependency Installation
WORKDIR /tmp
RUN apt-get update \
&& apt-get upgrade -y \
&& apt-get install musl-dev g++ bash curl gnupg -y \
&& curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - \
&& curl https://packages.microsoft.com/config/debian/11/prod.list > /etc/apt/sources.list.d/mssql-release.list \
&& apt-get update \
&& apt-get install --no-install-recommends libfreetype-dev freetds-dev python-dev git libpng-dev libxml2-dev \
libxslt-dev libssl-dev libopenblas-dev rsyslog supervisor tini tzdata libghc-zlib-dev libjpeg-dev cron \
libgssapi-krb5-2 unixodbc-dev -y \
&& ACCEPT_EULA=Y apt-get install -y msodbcsql18 \
&& ln -s /usr/include/locale.h /usr/include/xlocale.h \
&& pip install --no-cache-dir --upgrade pip setuptools wheel \
&& pip install matplotlib --no-cache-dir \
&& pip install --no-cache-dir -r /usr/local/requirements.txt
# Setting Up For Install
COPY conf/ /opt/project/conf/
RUN mkdir -p /var/log/project /conf \
&& cp /opt/project/conf/supervisord.conf /conf/supervisord.conf \
&& cp /opt/project/conf/rsyslog.conf /conf/rsyslog.conf
WORKDIR /opt
# Copy Over Packages
COPY project-db-migrations /opt/project/project-db-migrations
COPY infrastructure /opt/infrastructure
COPY project /opt/project/src
COPY README.md /opt/project/README.md
# Install Infrastructure
RUN cd /opt/infrastructure && python3 setup.py install
# Install Project Service
RUN cd /opt/project/src && python3 setup.py install
RUN ["chmod", "+x", "/opt/project/conf/entrypoint.sh"]
WORKDIR /
EXPOSE 80
ENTRYPOINT ["tini", "--", "/opt/project/conf/entrypoint.sh"]
CMD ["supervisord", "-c", "/conf/supervisord.conf"]
entrypoint.sh
#!/bin/bash
set -eu
echo "Setting Up Project Service"
# Adding Temp Directory
mkdir -p /opt/project/tmp
echo "Service has been setup"
exec $#
supervisord.conf
[supervisord]
nodaemon=true
logfile=/var/log/project/supervisord.log
childlogdir=/var/log/project
[program:rsyslogd]
command=/usr/sbin/rsyslogd -n -f /conf/rsyslog.conf
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:crond]
command=/usr/sbin/cron -f -l 15
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
[program:project]
command=python -m project.run --server
stdout_logfile=/dev/stdout
stdout_logfile_maxbytes=0
stderr_logfile=/dev/stderr
stderr_logfile_maxbytes=0
The image is ran and deployed without changes to the user, so it should be running as root.
In this case, after some digging I found there is an issue with the Kaniko version DevOps had running. That was causing the issue. Because the image wasn't being flattened correctly, Python could not start properly.

Install python 3.5 inside docker with a base image centos7

I am trying to install python 3.5 inside docker with a base image centos7. This is our Dockerfile
FROM base-centos7:0.0.8
# Install basic tools
RUN yum install -y which vim wget git gcc
# Install python 3.5
RUN yum install -y https://repo.ius.io/ius-release-el7.rpm \
&& yum update -y \
&& yum install -y python35u python35u-libs python35u-devel python35u-pip
RUN python3.5 -m pip install --upgrade pip
But during the build, docker build image is failing with the following errors
executor failed running [/bin/sh -c yum install -y https://repo.ius.io/ius-release-el7.rpm
&& yum update -y
&& sudo yum install -y python35u python35u-libs python35u-devel python35u-pip]: exit code: 127.
Can anyone guide me in resolving this issue. and why am I seeing this issue in very first place.
You can use python image from docker hub
https://hub.docker.com/_/python
Example of dockerfile :
FROM python:3.6
RUN mkdir /code
WORKDIR /code
ADD . /code/
RUN pip install -r requirements.txt
EXPOSE 5000
CMD ["python", "/code/app.py"]
i think it's easy , isn't ?
the centos repo uses:
FROM centos/s2i-base-centos7
EXPOSE 8080
ENV PYTHON_VERSION=3.5 \
PATH=$HOME/.local/bin/:$PATH \
PYTHONUNBUFFERED=1 \
PYTHONIOENCODING=UTF-8 \
LC_ALL=en_US.UTF-8 \
LANG=en_US.UTF-8 \
PIP_NO_CACHE_DIR=off
RUN INSTALL_PKGS="rh-python35 rh-python35-python-devel rh-python35-python-setuptools rh-python35-python-pip nss_wrapper \
httpd24 httpd24-httpd-devel httpd24-mod_ssl httpd24-mod_auth_kerb httpd24-mod_ldap \
httpd24-mod_session atlas-devel gcc-gfortran libffi-devel libtool-ltdl enchant" && \
yum install -y centos-release-scl && \
yum -y --setopt=tsflags=nodocs install --enablerepo=centosplus $INSTALL_PKGS && \
rpm -V $INSTALL_PKGS && \
# Remove centos-logos (httpd dependency) to keep image size smaller.
rpm -e --nodeps centos-logos && \
yum -y clean all --enablerepo='*'
source here
The problem is not difficult, I build the image changing
FROM base-centos7:0.0.8 ====> FROM centos:7
You can consult the images version of centos in https://hub.docker.com/_/centos
PD: The container showed: errro exited(1), you should focus on the main process.

docker ERROR: Could not find a version that satisfies the requirement apturl==0.5.2

I am using windows 10 OS. I want to build an container based on linux so I can replicate code and dependencies developed from ubuntu. When I try to build it outputs Error message as above.
From my understanding docker for desktop runs linux OS kernel under-the-hood therefore allowing window users to run linux based containers, not sure why it is outputting this error.
My dockerfile looks like this:
FROM ubuntu:18.04
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
RUN apt update \
&& apt install -y htop python3-dev wget
RUN wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
&& mkdir root/.conda \
&& sh Miniconda3-latest-Linux-x86_64.sh -b \
&& rm -f Miniconda3-latest-Linux-x86_64.sh
RUN conda create -y -n ml python=3.7
COPY . src/
RUN /bin/bash -c "cd src \
&& source activate ml \
&& pip install -r requirements.txt"
requirements.txt contains:
apturl==0.5.2
asn1crypto==0.24.0
bleach==2.1.2
Brlapi==0.6.6
certifi==2020.11.8
chardet==3.0.4
click==7.1.2
command-not-found==0.3
configparser==5.0.1
cryptography==2.1.4
cupshelpers==1.0
dataclasses==0.7
When I run docker build command it outputs:
1.649 ERROR: Could not find a version that satisfies the requirement apturl==0.5.2 1.649 ERROR: No matching distribution found for apturl==0.5.2 Deleting it and running it lead to another error. All error seem to be associated with ubuntu packages.
Am I not running a ubuntu container? why aren't I allowed to install ubuntu packages?
Thanks!
You try to install ubuntu packages with pip (which is for python packages")
try apt install -y apturl
If you want to install python packages write pip install package_name

Docker build for Mattermost, /bin/sh dnf not found

I've got a CentOS 8 install, and I'm trying to use a docker container to run Mattermost to set up a local node for my family to use. I've been searching a lot online, but my google-fu appears to be weak as I can't get answers that address my issue.
I've downloaded docker, and docker compose using the following guide, again tailoring it to Centos - https://docs.mattermost.com/install/prod-docker.htm I've successfully run the "Hello World" container.
I'm using this guide and trying to tailor the Mattermost container install - https://wiki.archlinux.org/index.php/Ma ... ith_Docker
I've edited the ~/mattermost-docker/db/Dockerfile to remove references to apk, and put in yum and then dnf, and tried to execute with SUDO in the script and using SU account to run the script. Latest Dockerfile:
FROM postgres:9.4-alpine
ENV DEFAULT_TIMEZONE UTC
# Install some packages to use WAL
RUN echo "azure<5.0.0" > pip-constraints.txt
RUN dnf install -y \
build-base \
curl \
libc6-compat \
libffi-dev \
linux-headers \
python-dev \
py-pip \
py-cryptography \
pv \
libressl-dev \
&& pip install --upgrade pip \
&& pip --no-cache-dir install -c pip-constraints.txt 'wal-e<1.0.0' envdir \
&& rm -rf /tmp/* /var/tmp/* \
&& dnf clean all
# Add wale script
COPY setup-wale.sh /docker-entrypoint-initdb.d/
#Healthcheck to make sure container is ready
HEALTHCHECK CMD pg_isready -U $POSTGRES_USER -d $POSTGRES_DB || exit 1
# Add and configure entrypoint and command
COPY entrypoint.sh /
ENTRYPOINT ["/entrypoint.sh"]
CMD ["postgres"]
VOLUME ["/var/run/postgresql", "/usr/share/postgresql/", "/var/lib/postgresql/data", "/tmp", "/etc/wal-e.d/env"]
However it still fails on: docker-compose build
Error -
Building db
Step 1/10 : FROM postgres:9.4-alpine
---> 4e66908aa630
Step 2/10 : ENV DEFAULT_TIMEZONE UTC
---> Using cache
---> 03d176f9f783
Step 3/10 : RUN echo "azure<5.0.0" > pip-constraints.txt
---> Using cache
---> 35dbc995f705
Step 4/10 : RUN sudo dnf install -y build-base curl libc6-compat libffi-dev linux-headers python-dev py-pip py-cryptography pv libressl-dev && pip install --upgrade pip && pip --no-cache-dir install -c pip-constraints.txt 'wal-e<1.0.0' envdir && rm -rf /tmp/* /var/tmp/* && dnf clean all
---> Running in 4b89205fdca3
/bin/sh: dnf: not found
ERROR:Service 'db' failed to build : The command '/bin/sh -c sudo dnf install -y build-base curl libc6-compat libffi-dev linux-headers python-dev py-pip py-cryptography pv libressl-dev && pip install --upgrade pip && pip --no-cache-dir install -c pip-constraints.txt 'wal-e<1.0.0' envdir && rm -rf /tmp/* /var/tmp/* && dnf clean all' returned a non-zero code: 127````
Confirmed dnf, and yum are present in /bin and /usr/bin, confirmed /bin/sh -> /bin/bash. I'm not even sure what question I should be asking, so I'd appreciate some assistance in figuring out how I can get this container stood up.
Thanks.

Dockerized Tensorflow script can't see GPUs

I use docker containers to train deep learning models. These Docker containers are located on a Linux server and are trained there with several GPUs.
The problem is that Tensorflow does not recognize the GPUs inside the container. The Docker container looks like this:
FROM nvidia/cuda:10.2-runtime-ubuntu18.04
RUN apt-get update && apt-get install -y apt-utils
RUN apt-get install -y \
git \
pkg-config \
python3-pip \
python3.6 \
nano \
wget \
yasm
FROM python:3.6
COPY requirements.txt ./
# Here tensorflow-gpu == 2.1 is installed
RUN pip install --upgrade pip && \
pip install --no-cache-dir -r requirements.txt
COPY . /
ENTRYPOINT ["python", "./main.py"]
If you now comment out the python specific lines from the docker file and replace them with CMD["nvidia-smi"] you can see that the GPUs inside the container are visible. Now the only question I have to ask myself is how it is possible for Tensorflow to detect the GPUs.
In Python code the GPUs are included as follows:
physical_devices = tf.config.experimental.list_physical_devices('GPU')
for physical_device in physical_devices:
tf.config.experimental.set_memory_growth(physical_device, True)

Resources