OpenCV in AWS Lambda container image - python-3.x

I am trying to build a docker image which will be deployed as a function on AWS Lambda. Able to build and test the image successfully but facing an issue when I try to import OpenCV in the function.
I do not face this issue when I remove the import statement from app.py
The error I am facing -
{"errorMessage": "Unable to import module 'app': libGL.so.1: cannot open shared object file: No such file or directory", "errorType": "Runtime.ImportModuleError"}
My Dockerfile -
# Define custom function directory
ARG FUNCTION_DIR="/function"
FROM python:3.9 as build-image
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Install aws-lambda-cpp build dependencies
RUN apt-get update && \
apt-get install -y \
g++ \
make \
cmake \
unzip \
libcurl4-openssl-dev
RUN apt-get install -y --fix-missing \
build-essential \
cmake \
gfortran \
git \
wget \
curl \
graphicsmagick \
libgraphicsmagick1-dev \
libatlas-base-dev \
libavcodec-dev \
libavformat-dev \
libgtk2.0-dev \
libjpeg-dev \
liblapack-dev \
libswscale-dev \
pkg-config \
python3-dev \
python3-numpy \
software-properties-common \
zip \
&& apt-get clean && rm -rf /tmp/* /var/tmp/*
# Copy function code
RUN mkdir -p ${FUNCTION_DIR}
COPY app/* ${FUNCTION_DIR}
WORKDIR ${FUNCTION_DIR}
RUN pip install -r requirements.txt --target ${FUNCTION_DIR}
# Install the function's dependencies
RUN pip install \
--target ${FUNCTION_DIR} \
awslambdaric
FROM python:3.9
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
CMD [ "app.handler" ]
My requirements.txt -
mediapipe<=0.8.3.1
numpy<=1.19.4
opencv-python<=4.4.0.46
boto3<=1.17.64
My app.py
import cv2
def handler(event, context):
return cv2.__version__

I came across the same issue getting the same error when trying to use a mediapipe container. pip installing python-opencv-headless solved the issue for me, I did not need to install any additional dependencies.
FROM public.ecr.aws/lambda/python:3.8
# Copy function code
COPY app.py ${LAMBDA_TASK_ROOT}
# Install the function's dependencies using file requirements.txt
# from your project folder.
COPY requirements.txt .
RUN pip3 install mediapipe opencv-python-headless --target "${LAMBDA_TASK_ROOT}"
# Set the CMD to your handler (could also be done as a parameter override outside of the Dockerfile)
CMD [ "app.handler" ]

Managed to solve it by adding these to the Dockerfile -
RUN apt-get install ffmpeg libsm6 libxext6 -y
New Dockerfile looks like this -
# Define custom function directory
ARG FUNCTION_DIR="/function"
FROM python:3.9 as build-image
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Install aws-lambda-cpp build dependencies
RUN apt-get update && \
apt-get install -y \
g++ \
make \
cmake \
unzip \
libcurl4-openssl-dev
RUN apt-get install -y --fix-missing \
build-essential \
cmake \
gfortran \
git \
wget \
curl \
ffmpeg \
libsm6 \
libxext6 \
graphicsmagick \
libgraphicsmagick1-dev \
libatlas-base-dev \
libavcodec-dev \
libavformat-dev \
libgtk2.0-dev \
libjpeg-dev \
liblapack-dev \
libswscale-dev \
pkg-config \
python3-dev \
python3-numpy \
software-properties-common \
zip \
&& apt-get clean && rm -rf /tmp/* /var/tmp/*
# Copy function code
RUN mkdir -p ${FUNCTION_DIR}
COPY app/* ${FUNCTION_DIR}
WORKDIR ${FUNCTION_DIR}
RUN pip install -r requirements.txt --target ${FUNCTION_DIR}
# Install the function's dependencies
RUN pip install \
--target ${FUNCTION_DIR} \
awslambdaric
FROM python:3.9
# Include global arg in this stage of the build
ARG FUNCTION_DIR
# Set working directory to function root directory
WORKDIR ${FUNCTION_DIR}
# Copy in the built dependencies
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
ENTRYPOINT [ "/usr/local/bin/python", "-m", "awslambdaric" ]
CMD [ "app.handler" ]

Related

Unable to import module 'app': No module named 'distutils.util' - Dockerfile GDAL and geopandas python 3

We going to have a AWS Lambda function (python3) created with Docker container (Elastic Container Registry)
When test the Lambda function, we have an error
Unable to import module 'app': No module named 'distutils.util'
What is wrong with Dockerfile?
ARG FUNCTION_DIR="/function"
FROM osgeo/gdal:ubuntu-small-latest as build-image
RUN apt-get update && \
apt-get install -y \
g++ \
make \
cmake \
unzip \
libcurl4-openssl-dev
ARG FUNCTION_DIR
RUN mkdir -p ${FUNCTION_DIR}
COPY app/* ${FUNCTION_DIR}
RUN apt-get update && apt-get install -y software-properties-common gcc && \
add-apt-repository -y ppa:deadsnakes/ppa
RUN apt-get update && apt-get install -y python3.6 python3-distutils python3-pip python3-apt
RUN python3 -m pip install --target ${FUNCTION_DIR} awslambdaric
RUN python3 -m pip install --target ${FUNCTION_DIR} geopandas
FROM osgeo/gdal:ubuntu-small-latest
ARG FUNCTION_DIR
WORKDIR ${FUNCTION_DIR}
COPY --from=build-image ${FUNCTION_DIR} ${FUNCTION_DIR}
ENTRYPOINT [ "python3", "-m", "awslambdaric" ]
CMD [ "app.handler" ]
There seems to be (or at least there was) a bug in setuptools.
I'm not 100% sure what caused it and if it's fixed or not, but I've resolved this problem with using the workaround proposed in the GitHib issue, which is to set the following environment variable:
SETUPTOOLS_USE_DISTUTILS=stdlib

install PyTorch CPU-only in Dockerfile

I am fairly new to Docker and containerisation. I am wanting to decrease the size of my_proj docker container in production.
I prefer installing packages and managing dependencies via Poetry.
How can I specify using CPU-only PyTorch in a Dockerfile?
To do this via. bash terminal, it would be:
poetry add pytorch-cpu torchvision-cpu -c pytorch
(or conda install...)
My existing Dockerfile:
FROM python:3.7-slim as base
RUN apt-get update -y \
&& apt-get -y --no-install-recommends install curl wget\
&& rm -rf /var/lib/apt/lists/*
ENV ROOT /home/worker/python/my_proj
WORKDIR $ROOT
ARG ATLASSIAN_TOKEN
ARG POETRY_HTTP_BASIC_AZURE_PASSWORD
ARG ACCESS_KEY
ENV AWS_ACCESS_KEY_ID=$ACCESS_KEY
ARG SECRET_KEY
ENV AWS_SECRET_ACCESS_KEY=$SECRET_KEY
ARG REPO
ENV REPO_URL=$REPO
ENV PYPIRC_PATH=$ROOT/.pypirc
ENV \
PYTHONFAULTHANDLER=1 \
POETRY_VERSION=1.1.4 \
POETRY_HOME=/etc/poetry \
XDG_CACHE_HOME=/home/worker/.cache \
POETRY_VIRTUALENVS_IN_PROJECT=true \
MPLCONFIGDIR=/home/worker/matplotlib \
PATH=/home/worker/python/my_proj/.venv/bin:/usr/local/bin:/etc/poetry/bin:$PATH
ADD https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py ./
RUN python get-poetry.py && chmod +x /etc/poetry/bin/poetry
RUN --mount=type=cache,target=/root/.cache pip install twine keyring artifacts-keyring
RUN --mount=type=cache,target=/root/.cache apt update && apt install gcc -y
FROM base as ws
ARG WS_APIKEY
ARG WS_PROJECTVERSION=
ARG WS_PROJECTNAME=workers-python-my_proj
ARG WS_PRODUCTNAME=HALO
COPY --chown=worker:worker . .
RUN --mount=type=cache,uid=1000,target=/home/worker/.cache poetry install --no-dev
COPY --from=openjdk:15-slim-buster /usr/local/openjdk-15 /usr/local/openjdk-15
ENV JAVA_HOME /usr/local/openjdk-15
ENV PATH $JAVA_HOME/bin:$PATH
RUN --mount=type=cache,uid=1000,target=/home/worker/.cache ./wss_agent.sh
FROM base as test
COPY . .
RUN poetry config experimental.new-installer false
RUN poetry install
RUN cd my_proj && poetry run invoke deployconfluence_server_pass=$ATLASSIAN_TOKEN
FROM base as package
COPY . .
RUN poetry build
RUN python -m pip install --upgrade pip && \
pip install twine keyring artifacts-keyring && \
twine upload -r $REPO_URL --config-file $PYPIRC_PATH dist/* --skip-existing
FROM base as build
COPY . .
RUN poetry config experimental.new-installer false
RUN poetry install --no-dev
RUN pip3 --no-cache-dir install --upgrade awscli
RUN aws s3 cp s3://....tar.gz $ROOT/my_proj # censored url
RUN mkdir $ROOT/my_proj/bert-base-cased && cd $ROOT/my_proj/bert-base-cased && \
wget https://huggingface.co/bert-base-cased/resolve/main/config.json && \
wget https://huggingface.co/bert-base-cased/resolve/main/tokenizer.json && \
wget https://huggingface.co/bert-base-cased/resolve/main/tokenizer_config.json
FROM python:3.7-slim as production
ENV ROOT=/home/worker/python/my_proj \
VIRTUAL_ENV=/home/worker/python/my_proj/.venv\
PATH=/home/worker/python/my_proj/.venv/bin:/home/worker/python/my_proj:$PATH
COPY --from=build /home/worker/python/my_proj/pyproject.toml /home/worker/python/
COPY --from=build /home/worker/python/my_proj/.venv /home/worker/python/my_proj/.venv
COPY --from=build /home/worker/python/my_proj/my_proj /home/worker/python/my_proj
WORKDIR $ROOT
ENV PYTHONPATH=$ROOT:/home/worker/python/
ENTRYPOINT [ "primary_worker", "--mongo" ]
Installing it via pip should work:
RUN pip3 install torch==1.9.0+cpu torchvision==0.10.0+cpu torchaudio==0.9.0 -f https://download.pytorch.org/whl/torch_stable.html

Optimize Dockerfile

Our frontends have such Dockerfile:
FROM node:13.12.0
RUN apt-get update && apt-get install -y --no-install-recommends \
apt-utils \
git \
xvfb \
libgtk-3-0 \
libxtst6 \
libgconf-2-4 \
libgtk2.0-0 \
libnotify-dev \
libnss3 \
libxss1 \
libasound2 \
tzdata && \
rm -rf /var/lib/apt/lists/* && \
ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && \
echo $TZ > /etc/timezone
COPY ./docker-entrypoint.sh /docker-entrypoint.sh
ENTRYPOINT ["/docker-entrypoint.sh"]
WORKDIR /code
COPY ./ /code
RUN npm set registry <registry-url> && \
npm cache clean --force && npm install && npm run bootstrap
As far as I can see it is not optimized because code copying happens before dependencies installation, right? And a better way would be to copy package.json and install dependencies first and then code copying? Something like this:
FROM node:13.12.0
RUN apt-get update && apt-get install -y --no-install-recommends \
apt-utils \
git \
xvfb \
libgtk-3-0 \
libxtst6 \
libgconf-2-4 \
libgtk2.0-0 \
libnotify-dev \
libnss3 \
libxss1 \
libasound2 \
tzdata && \
rm -rf /var/lib/apt/lists/* && \
ln -snf /usr/share/zoneinfo/$TZ /etc/localtime && \
echo $TZ > /etc/timezone
COPY ./docker-entrypoint.sh /docker-entrypoint.sh
WORKDIR /code
COPY package*.json ./
RUN npm set registry <registry-url> && \
npm cache clean --force && npm install && npm run bootstrap
COPY ./ /code
ENTRYPOINT ["/docker-entrypoint.sh"]
I think one of the most important thing related to Dockerfile optimization is to put the elements that might be changing in future versions of your container, the latest cause in case of any change on the code part, being the latest will not force recreation on other layers.
I think that's the reason for the Dockerfile to look like it does in your first example
There are other considerations regarding Dockerfile optimization that you can read for example here:
https://linuxhint.com/optimizing-docker-images/
The hadolint/hadolint Dockerfile linter is a good starting point. Linting your Dockerfile using the Haskell Dockerfile Linter i.e. docker run --rm -i hadolint/hadolint < Dockerfile:
/dev/stdin:5 SC2086 info: Double quote to prevent globbing and word splitting.
/dev/stdin:5 DL3008 warning: Pin versions in apt get install. Instead of `apt-get install <package>` use `apt-get install <package>=<version>`
... after fixing the issues and sorting the packages alphanumerically following the best practices with a couple of minor modifications your Dockerfile might look like:
FROM node:13.12.0
ARG NPM_REGISTRY
RUN apt-get update && \
apt-get install -y --no-install-recommends \
apt-utils=1.4.11 \
git=1:2.11.0-3+deb9u7 \
libasound2=1.1.3-5 \
libgconf-2-4=3.2.6-4+b1 \
libgtk2.0-0=2.24.31-2 \
libgtk-3-0=3.22.11-1 \
libnotify-dev=0.7.7-2 \
libnss3=2:3.26.2-1.1+deb9u2 \
libxss1=1:1.2.2-1 \
libxtst6=2:1.2.3-1 \
tzdata=2021a-0+deb9u1 \
xvfb=2:1.19.2-1+deb9u7 && \
rm -rf /var/lib/apt/lists/* && \
ln -snf "/usr/share/zoneinfo/$TZ" /etc/localtime && \
echo "$TZ" > /etc/timezone
COPY docker-entrypoint.sh /docker-entrypoint.sh
WORKDIR /code
COPY package*.json ./
RUN npm set registry "${NPM_REGISTRY}" && \
npm cache clean --force && \
npm install && \
npm run bootstrap
COPY . .
ENTRYPOINT ["/docker-entrypoint.sh"]
Note: the minor changes are by preference, i.e. COPY . . to COPY from the context into the /code directory which is set by the WORKDIR instruction.
Build the image passing the NPM_REGISTRY as a build arg i.e.: docker build --rm --build-arg NPM_REGISTRY=https://yarn.npmjs.org -t so:66493910 .

Dockerfiles build performance issues

The following Dockerfiles takes more than 30 minutes to build.
FROM python:3.7-alpine
COPY . /app
WORKDIR /app
RUN apk add --no-cache python3-dev libstdc++ && \
apk add --no-cache g++ && \
ln -s /usr/include/locale.h /usr/include/xlocale.h && \
pip install --upgrade pip && \
pip3 install --upgrade pip && \
pip3 install allure-behave && \
pip3 install -r requirements.txt
entrypoint ["sh", "testsuite.sh"]
Requirement file:
behave==1.2.6
boto3==1.8.2
botocore==1.11.9
pandas==0.25.0
Is that normal?

Is sklearn compatible with Linux-alpine?

I get an error when I try to build an alpine based docker image that includes the sklearn package.
I've tried a few variations of pip installation, different package combinations, and outdated versions of sklearn to see if they are compatible. I've also run the container in -it mode and tried to install the package manually from there. When I remove the sklearn line, the Dockerfile builds and the container runs just fine. Sklearn works in an Ubuntu:latest Dockerfile I've built, but I'm trying to reduce my footprint, so I was hoping to get it to work on alpine...
Here's my Dockerfile code:
FROM alpine:latest
RUN apk upgrade --no-cache \
&& apk update \
&& apk add --no-cache \
musl \
build-base \
python3 \
python3-dev \
postgresql-dev \
bash \
git \
&& pip3 install --no-cache-dir --upgrade pip \
&& pip3 install sklearn \
&& rm -rf /var/cache/* \
&& rm -rf /root/.cache/*
And here's the error I'm getting:
ERROR: Command "/usr/bin/python3.6 /usr/lib/python3.6/site-packages/pip/_vendor/pep517/_in_process.py prepare_metadata_for_build_wheel /tmp/tmpqjsz0004" failed with error code 1 in /tmp/pip-install-xlvbli9u/scipy
Alpine Linux doesn't support PEP 513. I found that something like this works:
RUN apk add --no-cache gcc g++ gfortran lapack-dev libffi-dev libressl-dev musl-dev && \
mkdir scipy && cd scipy && \
wget https://github.com/scipy/scipy/releases/download/v1.3.2/scipy-1.3.2.tar.gz && \
tar -xvf scipy-1.3.2.tar.gz && \
cd scipy-1.3.2 && \
python3 -m pip --no-cache-dir install .

Resources