I am training a Tensorflow 2 model on Google Cloud Platform using the Vertex AI service. I build a docker image, upload it to Container Registry, then run it on Vertex.
Vertex automatically submits the output to Cloud Logging. Unfortunately it frequently loses some of the output. It seems to most often lose the output if the total output is short, or at the end of the output stream. It also seems that different log levels are treated differently, so for example level INFO might be collected but level WARNING lost. When I run the image locally I get everything.
It seems I can make the logs come through by dumping out a lot of extra output. I don't really want to do this for every log level. I tried various things to flush the logs which are shown in my script.
Is anyone aware of this or know why it is happening?
Some recreation test scripts are below for completeness.
Update: I tested the image on Google Cloud Run and the logs are collected correctly there (I still have to use Vertex though).
script.py
import logging
import time
logging.basicConfig()
logging.getLogger().setLevel(logging.INFO)
print('Test message')
logging.info("Info message")
logging.warning("Warning message")
print('Flushing print buffer.', flush=True)
logging.shutdown()
time.sleep(3.)
Dockerfile
FROM tensorflow/tensorflow:latest-gpu
WORKDIR /root
RUN apt-get install -y wget
RUN wget -nv \
https://dl.google.com/dl/cloudsdk/release/google-cloud-sdk.tar.gz && \
mkdir /root/tools && \
tar xvzf google-cloud-sdk.tar.gz -C /root/tools && \
rm google-cloud-sdk.tar.gz && \
/root/tools/google-cloud-sdk/install.sh --usage-reporting=false \
--path-update=false --bash-completion=false \
--disable-installation-options && \
rm -rf /root/.config/* && \
ln -s /root/.config /config && \
# Remove the backup directory that gcloud creates
rm -rf /root/tools/google-cloud-sdk/.install/.backup \
ENV PATH $PATH:/root/tools/google-cloud-sdk/bin
RUN echo '[GoogleCompute]\nservice_account = default' > /etc/boto.cfg
COPY script.py /root/script.py
ENTRYPOINT ["python3", "script.py"]
Related
I have written a python script to run the commands to execute nerdctl shell commands using subprocess,
res = subprocess.run(
f"nerdctl --host '/host/run/containerd/containerd.sock' --namespace k8s.io commit {container} {image}",
shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, executable='/bin/bash')
I'm running this python script inside a ubuntu container, When I sh inside the conatiner and run this script by passing arguments
python3 run_script.py b3425e7a0d1e image1
it executes properly, but when I run it using debug mode
kubectl debug node/pool-93oi9uqaq-mfs8b -it --image=registry.digitalocean.com/test-registry-1/nerdctl#sha256:56b2e5690e21a67046787e13bb690b3898a4007978187800dfedd5c56d45c7b2 -- python3 run_script.py b3425e7a0d1e image1
I'm getting the error
b'/bin/bash: line 1: nerdctl: command not found\n'
can some one help/suggest where it is going wrong?
run_script.py
import subprocess
import sys
container = sys.argv[1]
image = sys.argv[2]
res = subprocess.run(
f"nerdctl --host '/host/run/containerd/containerd.sock' --namespace k8s.io commit {container} {image}",
shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, executable='/bin/bash')
print(res)
Dockerfile
FROM ubuntu:latest
RUN rm /bin/sh && ln -s /bin/bash /bin/sh
LABEL version="0.1.0"
RUN apt-get -y update
RUN apt-get install wget curl -y
RUN wget -q "https://github.com/containerd/nerdctl/releases/download/v1.0.0/nerdctl-full-1.0.0-linux-amd64.tar.gz" -O /tmp/nerdctl.tar.gz
RUN mkdir -p ~/.local/bin
RUN tar -C ~/.local/bin/ -xzf /tmp/nerdctl.tar.gz --strip-components 1 bin/nerdctl
RUN echo -e '\nexport PATH="${PATH}:~/.local/bin"' >> ~/.bashrc
RUN source ~/.bashrc
Mechanically: the binary you're installing isn't in $PATH anywhere. You unpack it into probably /root/.local/bin in the container filesystem but never add that to $PATH. The final RUN source line has no effect since each RUN command runs in a new shell (and technically a new container) and so the changes it makes are lost immediately. The preceding line tries to change a shell dotfile, but most paths to running things in Docker don't read shell dotfiles at all.
The easiest solution here is to unpack the binary into a directory that's already in $PATH, like /usr/local/bin.
FROM ubuntu:latest
LABEL version="0.1.0"
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive \
apt-get install --no-install-recommends --assume-yes \
wget
RUN wget -q "https://github.com/containerd/nerdctl/releases/download/v1.0.0/nerdctl-full-1.0.0-linux-amd64.tar.gz" -O /tmp/nerdctl.tar.gz \
&& tar -C /usr/local -xzf /tmp/nerdctl.tar.gz bin/nerdctl \
&& rm /tmp/nerdctl.tar.gz
WORKDIR /app
...
CMD ["./run_script.py"]
You'll have a second bigger problem running this, though. A container doesn't normally have access to the host's container runtime to be able to manipulate containers. In standard Docker you can trivially root the host system if you can launch a container; it's possible to mount the Docker socket into a container but does require thinking hard about the security implications.
Your question has several hints at Kubernetes and I'd expect a responsible cluster administrator to make it hard-to-impossible to bypass the cluster container runtime and potentially compromise nodes this way. If you're using Kubernetes you probably can't access the host container runtime at all, whether it's Docker proper or something else.
Philosophically, it looks like you're trying to script a commit command. Using commit at all is almost never a best practice. Again, there are several practical problems with it around Kubernetes (which replica would you be committing? how would you save the resulting image? how would you reuse it?) but having an image you can't recreate from source can lead to later problems around for example taking security updates.
I am using smbnetfs within a Docker container (running on Ubuntu 22.04) to write files from my application to a mounted Windows Server share. Reading files from the share is working properly, but writing files via smbnetfs gives me a headache. My Haskell application crashes with an Input/output error while writing files to the mounted share. Just 0KB files without any content are written. Apart from the application I've the same problem if I try to write files from the containers bash terminal or from Ubuntu 22.04 directly. So I assume that the problem is not related to Haskell and/or Docker. Therefore let's focus on creating files via bash within a Docker container in this SO question here.
Within the container I've tried the following different possibilities to write files, some with success and some non-success:
This works:
Either touch <mount-dir>/file.txt => 0KB file is generated. Editing the file with nano works
properly.
Or echo "demo content" > <mount-dir>/file.txt works also.
(Hint: Consider the redirection operator)
Creating directories with mkdir -p <mount-dir>/path/to/file/ is also working without any problems.
These steps do not work:
touch <mount-dir>/file.txt => 0KB file is generated properly.
echo "demo-content" >> <mount-dir>/file.txt => Input/output error
(Hint: Consider the redirection operator)
Configuration
Following my configuration:
smbnetfs
smbnetfs.conf
...
show_$_shares "true"
...
include "smbnetfs.auth"
...
include "smbnetfs.host"
smbnetfs.auth
auth "<windows-server-fqdn>/<share>" "<domain>/<user>" "<password>"
smbnetfs.host
host <windows-server-fqdn> visible=true
Docker
Here the Docker configuration.
Docker run arguments:
...
--device=/dev/fuse \
--cap-add SYS_ADMIN \
--security-opt apparmor:unconfined \
...
Dockerfile:
FROM debian:bullseye-20220711-slim#sha256:f52f9aebdd310d504e0995601346735bb14da077c5d014e9f14017dadc915fe5
ARG DEBIAN_FRONTEND=noninteractive
# Prerequisites
RUN apt-get update && \
apt-get install -y --no-install-recommends \
fuse=2.9.9-5 \
locales=2.31-13+deb11u3 \
locales-all=2.31-13+deb11u3 \
libcurl4=7.74.0-1.3+deb11u1 \
libnuma1=2.0.12-1+b1 \
smbnetfs=0.6.3-1 \
tzdata=2021a-1+deb11u4 \
jq=1.6-2.1 && \
rm -rf /var/lib/apt/lists/*
# Set the locale
ENV LANG en_US.UTF-8
ENV LANGUAGE en_US:en
ENV LC_ALL en_US.UTF-8
# Copy runtime artifacts
WORKDIR /app
COPY --from=build /home/vscode/.local/bin/Genesis-exe .
COPY entrypoint.sh .
## Prepare smbnetfs configuration files and create runtime user
ARG MOUNT_DIR=/home/moduleuser/mnt
ARG SMB_CONFIG_DIR=/home/moduleuser/.smb
RUN useradd -ms /bin/bash moduleuser && mkdir ${SMB_CONFIG_DIR}
# Set file permission so, that smbnetfs.auth and smbnetfs.host can be created later
RUN chmod -R 700 ${SMB_CONFIG_DIR} && chown -R moduleuser ${SMB_CONFIG_DIR}
# Copy smbnetfs.conf and restrict file permissions
COPY smbnetfs.conf ${SMB_CONFIG_DIR}/smbnetfs.conf
RUN chmod 600 ${SMB_CONFIG_DIR}/smbnetfs.conf && chown moduleuser ${SMB_CONFIG_DIR}/smbnetfs.conf
# Create module user and create mount directory
USER moduleuser
RUN mkdir ${MOUNT_DIR}
ENTRYPOINT ["./entrypoint.sh"]
Hint: The problem is not related to Docker, because I've the same problem within Ubuntu22.04.
Updates:
Update 1:
If I start smbnetfs in debug mode and run the command echo "demo-content" >> <mount-dir>/file.txt the following log is written:
open flags: 0x8401 /<windows-server-fqdn>/share/sub-dir/file.txt
2022-07-25 07:36:32.393 srv(26)->smb_conn_srv_open: errno=6, No such device or address
2022-07-25 07:36:34.806 srv(27)->smb_conn_srv_open: errno=6, No such device or address
2022-07-25 07:36:37.229 srv(28)->smb_conn_srv_open: errno=6, No such device or address
unique: 12, error: -5 (Input/output error), outsize: 16
Update 2:
If I use a Linux based smb-server, then I can write the files properly with the command echo "demo-content" >> <mount-dir>/file.txt
SMB-Server's Dockerfile
FROM alpine:3.7#sha256:92251458088c638061cda8fd8b403b76d661a4dc6b7ee71b6affcf1872557b2b
RUN apk add --no-cache --update \
samba-common-tools=4.7.6-r3 \
samba-client=4.7.6-r3 \
samba-server=4.7.6-r3
RUN mkdir -p /Shared && \
chmod 777 /Shared
COPY ./conf/smb.conf /etc/samba/smb.conf
EXPOSE 445/tcp
CMD ["smbd", "--foreground", "--log-stdout", "--no-process-group"]
SMB-Server's smb.conf
[global]
map to guest = Bad User
log file = /var/log/samba/%m
log level = 2
[guest]
public = yes
path = /Shared/
read only = no
guest ok = yes
Update 3:
It also works:
if I create the file locally in the container and then move it to the <mount-dir>.
if I remove a file, that I created earlier (rm <mount-dir>/file.txt)
if I rename a file, that I created earlier.(mv <mount-dir>/file.txt <mount-dir>/fileMv.txt)
Update 4:
Found identical problem description here.
The issue with my current files is that in my entrypoint.sh file, I have to change the ownership of my entire project directory to the non-administrative user (chown -R node /node-servers). However, when a lot of npm packages are installed, this takes a lot of time. Is there a way to avoid having to chown the node_modules directory?
Background: The reason I create everything as root in the Dockerfile is because this way I can match the UID and GID of a developer's local user. This enables mounting volumes more easily. The downside is that I have to step-down from root in an entrypoint.sh file and ensure that the permissions of the entire project files have all been changed to the non-administrative user.
my docker file:
FROM node:10.24-alpine
#image already has user node and group node which are 1000, thats what we will use
# grab gosu for easy step-down from root
# https://github.com/tianon/gosu/releases
ENV GOSU_VERSION 1.14
RUN set -eux; \
\
apk add --no-cache --virtual .gosu-deps \
ca-certificates \
dpkg \
gnupg \
; \
\
dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')"; \
wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch"; \
wget -O /usr/local/bin/gosu.asc "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch.asc"; \
\
# verify the signature
export GNUPGHOME="$(mktemp -d)"; \
gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4; \
gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu; \
command -v gpgconf && gpgconf --kill all || :; \
rm -rf "$GNUPGHOME" /usr/local/bin/gosu.asc; \
\
# clean up fetch dependencies
apk del --no-network .gosu-deps; \
\
chmod +x /usr/local/bin/gosu; \
# verify that the binary works
gosu --version; \
gosu nobody true
COPY ./ /node-servers
# Setting the working directory
WORKDIR /node-servers
# Install app dependencies
# Install openssl
RUN apk add --update openssl ca-certificates && \
apk --no-cache add shadow && \
apk add libcap && \
npm install -g && \
chmod +x /node-servers/entrypoint.sh && \
setcap cap_net_bind_service=+ep /usr/local/bin/node
# Entrypoint used to load the environment and start the node server
#ENTRYPOINT ["/bin/sh"]
my entrypoint.sh
# In Prod, this may be configured with a GID already matching the container
# allowing the container to be run directly as Jenkins. In Dev, or on unknown
# environments, run the container as root to automatically correct docker
# group in container to match the docker.sock GID mounted from the host
set -x
if [ -z ${HOST_UID+x} ]; then
echo "HOST_UID not set, so we are not changing it"
else
echo "HOST_UID is set, so we are changing the container UID to match"
# get group of notadmin inside container
usermod -u ${HOST_UID} node
CUR_GID=`getent group node | cut -f3 -d: || true`
echo ${CUR_GID}
# if they don't match, adjust
if [ ! -z "$HOST_GID" -a "$HOST_GID" != "$CUR_GID" ]; then
groupmod -g ${HOST_GID} -o node
fi
if ! groups node | grep -q node; then
usermod -aG node node
fi
fi
# gosu drops from root to node user
set -- gosu node "$#"
[ -d "/node-servers" ] && chown -v -R node /node-servers
exec "$#"
You shouldn't need to run chown at all here. Leave the files owned by root (or by the host user). So long as they're world-readable the application will still be able to run; but if there's some sort of security issue or other bug, the application won't be able to accidentally overwrite its own source code.
You can then go on to simplify this even further. For most purposes, users in Unix are identified by their numeric user ID; there isn't actually a requirement that the user be listed in /etc/passwd. If you don't need to change the node user ID and you don't need to chown files, then the entrypoint script reduces to "switch user IDs and run the main script"; but then Docker can provide an alternate user ID for you via the docker run -u option. That means you don't need to install gosu either, which is a lot of the Dockerfile content.
All of this means you can reduce the Dockerfile to:
FROM node:10.24-alpine
# Install OS-level dependencies (before you COPY anything in)
apk add openssl ca-certificates
# (Do not install gosu or its various dependencies)
# Set (and create) the working directory
WORKDIR /node-servers
# Copy language-level dependencies in
COPY package.json package-lock.json .
RUN npm ci
# Copy the rest of the application in
# (make sure `node_modules` is in .dockerignore)
COPY . .
# (Do not call setcap here)
# Set the main command to run
USER node
CMD npm run start
Then when you run the container, you can use Docker options to specify the current user and additional capability.
docker run \
-d \ # in the background
-u $(id -u) \ # as an alternate user
-v "$PWD/data:/node-servers/data" \ # mounting a data directory
-p 8080:80 \ # publishing a port
my-image
Docker grants the NET_BIND_SERVICE capability by default so you don't need to specially set it.
This same permission setup will work if you're using bind mounts to overwrite the application code; again, without a chown call.
docker run ... \
-u $(id -u) \
-v "$PWD:/node-servers" \ # run the application from the host, not the image
-v /node-servers/node_modules \ # with libraries that will not be updated ever
...
I thought I understood the docs, but maybe I didn't. I was under the impression that the -v /HOST/PATH:/CONTAINER/PATH flag is bi-directional. If we have file or directories in the container, they would be mirrored on the host giving us a way to retain the directories and files even after removing a docker container.
In the official MySQL docker images, this works. The /var/lib/mysql can be bound to the host and survive restarts and replacement of container while maintaining the data on the host.
I wrote a docker file for sphinxsearch-2.2.9 just as a practice and for the sake of learning and understanding, here it is:
FROM debian
ENV SPHINX_VERSION=2.2.9-release
RUN apt-get update -qq && DEBIAN_FRONTEND=noninteractive apt-get install -yqq\
build-essential\
wget\
curl\
mysql-client\
libmysql++-dev\
libmysqlclient15-dev\
checkinstall
RUN wget http://sphinxsearch.com/files/sphinx-${SPHINX_VERSION}.tar.gz && tar xzvf sphinx-${SPHINX_VERSION}.tar.gz && rm sphinx-${SPHINX_VERSION}.tar.gz
RUN cd sphinx-${SPHINX_VERSION} && ./configure --prefix=/usr/local/sphinx
EXPOSE 9306 9312
RUN cd sphinx-${SPHINX_VERSION} && make
RUN cd sphinx-${SPHINX_VERSION} && make install
RUN rm -rf sphinx-${SPHINX_VERSION}
VOLUME /usr/local/sphinx/etc
VOLUME /usr/local/sphinx/var
Very simple and easy to get your head wrapped around while learning. I am assigning the /etc & /var directories from the sphinx build to the VOLUME command thinking that it will allow me to do something like -v ~/dev/sphinx/etc:/usr/local/sphinx/etc -v ~/dev/sphinx/var:/usr/local/sphinx/var, but it's not, instead it's overwriting the directories inside the container and leaving them blank. When i remove the -v flags and create the container, the directories have the expected files and they are not overwritten.
This is what I run to create the docker file after navigating to the directory that it's in: docker build -t sphinxsearch .
And once I have that created, I do the following to create a container based on that image: docker run -it --hostname some-sphinx --name some-sphinx --volume ~/dev/docker/some-sphinx/etc:/usr/local/sphinx/etc -d sphinxsearch
I really would appreciate any help and insight on how to get this to work. I looked at the MySQL images and don't see anything magical that they did to make the directory bindable, they used VOLUME.
Thank you in advance.
After countless hours of research, I decided to extend my image with the following Dockerfile:
FROM sphinxsearch
VOLUME /usr/local/sphinx/etc
VOLUME /usr/local/sphinx/var
RUN mkdir -p /sphinx && cd /sphinx && cp -avr /usr/local/sphinx/etc . && cp -avr /usr/local/sphinx/var .
ADD docker-entrypoint.sh /
RUN chmod +x /docker-entrypoint.sh
ENTRYPOINT ["/docker-entrypoint.sh"]
Extending it benefited it me in that I didn't have to build the entire image from scratch as I was testing, and only building the parts that were relevant.
I created an ENTRYPOINT to execute a bash script that would copy the files back to the required destination for sphinx to run properly, here is that code:
#!/bin/sh
set -e
target=/usr/local/sphinx/etc
# check if directory exists
if [ -d "$target" ]; then
# check if we have files
if find "$target" -mindepth 1 -print -quit | grep -q .; then
# no files don't do anything
# we may use this if condition for something else later
echo not empty, don\'t do anything...
else
# we don't have any files, let's copy the
# files from etc and var to the right locations
cp -avr /sphinx/etc/* /usr/local/sphinx/etc && cp -avr /sphinx/var/* /usr/local/sphinx/var
fi
else
# directory doesn't exist, we will have to do something here
echo need to creates the directory...
fi
exec "$#"
Having access to the /etc & /var directories on the host allows me to adjust the files while keeping them preserved on the host in between restarts and so forth... I also have the data saved on the host which should survive the restarts.
I know it's a debated topic on data containers vs. storing on the host, at this moment I am leaning towards storing on the host, but will try the other method later. If anyone has any tips, advice, etc... to improve what I have or a better way, please share.
Thank you #h3nrik for suggestions and for offering help!
Mounting container directories to the host is against the docker concepts. That would break the process/resources encapsulation principle.
The other way around - mounting a host folder into a container - is possible. But I would rather suggest to use volume containers, instead.
because mysql do init After the mapping,so before mapping there have no data at /var/lib/mysql.
so if you have data before start container, the -v action will override your data.
see entrypoint.sh
I am using the docker-solr image with docker, and I need to mount a directory inside it which I achieve using the -v flag.
The problem is that the container needs to write to the directory that I have mounted into it, but doesn't appear to have the permissions to do so unless I do chmod 777 on the entire directory. I don't think setting the permission to allows all users to read and write to it is the solution, but just a temporary workaround.
Can anyone guide me in finding a more canonical solution?
Edit: I've been running docker without sudo because I added myself to the docker group. I just found that the problem is solved if I run docker with sudo, but I am curious if there are any other solutions.
More recently, after looking through some official docker repositories I've realized the more idiomatic way to solve these permission problems is using something called gosu in tandem with an entry point script. For example if we take an existing docker project, for example solr, the same one I was having trouble with earlier.
The dockerfile on Github very effectively builds the entire project, but does nothing to account for the permission problems.
So to overcome this, first I added the gosu setup to the dockerfile (if you implement this notice the version 1.4 is hardcoded. You can check for the latest releases here).
# grab gosu for easy step-down from root
RUN mkdir -p /home/solr \
&& gpg --keyserver pool.sks-keyservers.net --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4 \
&& curl -o /usr/local/bin/gosu -SL "https://github.com/tianon/gosu/releases/download/1.4/gosu-$(dpkg --print-architecture)" \
&& curl -o /usr/local/bin/gosu.asc -SL "https://github.com/tianon/gosu/releases/download/1.4/gosu-$(dpkg --print-architecture).asc" \
&& gpg --verify /usr/local/bin/gosu.asc \
&& rm /usr/local/bin/gosu.asc \
&& chmod +x /usr/local/bin/gosu
Now we can use gosu, which is basically the exact same as su or sudo, but works much more nicely with docker. From the description for gosu:
This is a simple tool grown out of the simple fact that su and sudo have very strange and often annoying TTY and signal-forwarding behavior.
Now the other changes I made to the dockerfile were these adding these lines:
COPY solr_entrypoint.sh /sbin/entrypoint.sh
RUN chmod 755 /sbin/entrypoint.sh
ENTRYPOINT ["/sbin/entrypoint.sh"]
just to add my entrypoint file to the docker container.
and removing the line:
USER $SOLR_USER
So that by default you are the root user. (which is why we have gosu to step-down from root).
Now as for my own entrypoint file, I don't think it's written perfectly, but it did the job.
#!/bin/bash
set -e
export PS1="\w:\u docker-solr-> "
# step down from root when just running the default start command
case "$1" in
start)
chown -R solr /opt/solr/server/solr
exec gosu solr /opt/solr/bin/solr -f
;;
*)
exec $#
;;
esac
A docker run command takes the form:
docker run <flags> <image-name> <passed in arguments>
Basically the entrypoint says if I want to run solr as per usual we pass the argument start to the end of the command like this:
docker run <flags> <image-name> start
and otherwise run the commands you pass as root.
The start option first gives the solr user ownership of the directories and then runs the default command. This solves the ownership problem because unlike the dockerfile setup, which is a one time thing, the entry point runs every single time.
So now if I mount directories using the -d flag, before the entrypoint actually runs solr, it will chown the files inside of the docker container for you.
As for what this does to your files outside the container I've had mixed results because docker acts a little weird on OSX. For me, it didn't change the files outside of the container, but on another OS where docker plays more nicely with the filesystem, it might change your files outside, but I guess that's what you'll have to deal with if you want to mount files inside the container instead of just copying them in.