Why Docker COPY doesn't change file permissions? (--chmod) - linux

Given this Dockerfile:
FROM docker.io/alpine
RUN mkdir test
# RUN umask 0022
COPY --chmod=777 README /test/README-777
COPY --chmod=755 README /test/README-755
COPY --chmod=777 FORALL /test/FORALL-777
COPY --chmod=755 FORALL /test/FORALL-755
RUN ls -la /test
I'd expect to have the read, write, execute permissions be set accordingly by Docker during the build process (docker build ./).
But the last command returns
total 8
drwxr-xr-x 1 root root 4096 Jun 9 19:20 .
drwxr-xr-x 1 root root 4096 Jun 9 19:20 ..
-rwxrwxrwx 1 root root 0 Jun 9 19:19 FORALL
-rwxrwxrwx 1 root root 0 Jun 9 19:19 FORALL-755
-rwxrwxrwx 1 root root 0 Jun 9 19:19 FORALL-777
-rw-rw-r-- 1 root root 0 Jun 9 19:19 README
-rw-rw-r-- 1 root root 0 Jun 9 19:19 README-755
-rw-rw-r-- 1 root root 0 Jun 9 19:19 README-777
No file permission was changed, and no error was raised.
Why doesn't it work?
How to fix this?

I figured out:
the flag --chmod is a new feature from Docker Buildkit, so it is necessary to run the build enabling it via:
DOCKER_BUILDKIT=1 docker build ./
However, it is really not clear why Docker swallows the --chmod option without any error or warn about the non-existing option 😕.

This is fixed in 20.10.6 (pull request, tracking issue):
$ cat df.chmod
FROM busybox as base
RUN touch /test
FROM busybox as release
COPY --from=base --chmod=777 /test /test-777
COPY --from=base --chmod=555 /test /test-555
CMD ls -l /test*
$ DOCKER_BUILDKIT=0 docker build -t test-chmod-classic -f df.chmod .
Sending build context to Docker daemon 22.02kB
Step 1/6 : FROM busybox as base
---> a9d583973f65
Step 2/6 : RUN touch /test
---> Running in ed48f45a5dca
Removing intermediate container ed48f45a5dca
---> 5606d2d23861
Step 3/6 : FROM busybox as release
---> a9d583973f65
Step 4/6 : COPY --from=base --chmod=777 /test /test-777
the --chmod option requires BuildKit. Refer to https://docs.docker.com/go/buildkit/ to learn how to build images with BuildKit enabled
And if the build is run with buildkit, the expected result occurs:
$ DOCKER_BUILDKIT=1 docker build -t test-chmod-buildkit -f df.chmod .
[+] Building 1.0s (8/8) FINISHED
=> [internal] load build definition from df.chmod 0.0s
=> => transferring dockerfile: 214B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 49B 0.0s
=> [internal] load metadata for docker.io/library/busybox:latest 0.0s
=> CACHED [base 1/2] FROM docker.io/library/busybox 0.0s
=> [base 2/2] RUN touch /test 0.6s
=> [release 2/3] COPY --from=base --chmod=777 /test /test-777 0.1s
=> [release 3/3] COPY --from=base --chmod=555 /test /test-555 0.1s
=> exporting to image 0.0s
=> => exporting layers 0.0s
=> => writing image sha256:a4df92175046e36a72a769f9c7b297bc04a825708c5f6ca5873428b55c340036 0.0s
=> => naming to docker.io/library/test-chmod-buildkit 0.0s
$ docker run --rm test-chmod-buildkit
-r-xr-xr-x 1 root root 0 Jun 10 13:00 /test-555
-rwxrwxrwx 1 root root 0 Jun 10 13:00 /test-777


chown not working when coping a file in a dockerfile

I'm running docker engine on windows and am trying to add my own file to the image. Problem is that when I copy the file its ownership is always root:root but it needs to be heartbeat:heartbeat (exisitng user on image). Mounting a single file with the -v parameter und docker run doesn't seam to be possible on windows atm. Thats why I tried to create my own image with a docker file:
FROM docker.elastic.co/beats/heartbeat:7.16.3
USER root
COPY --chown=heartbeat:heartbeat yml/heartbeat.yml /usr/share/heartbeat/heartbeat.yml
RUN chown -R heartbeat:heartbeat /usr/share/heartbeat
The --chown parameter behind the coping does nothing. It is still root when I check and the RUN chown command results in a error. Here the output:
docker image build ./ -t custom/heartbeat:7.16.3
Sending build context to Docker daemon 10.75kB
Step 1/4 : FROM docker.elastic.co/beats/heartbeat:7.16.3
---> b64ad4b42006
Step 2/4 : USER root
---> Using cache
---> 922a9121e51b
Step 3/4 : COPY --chown=heartbeat:heartbeat yml/heartbeat.yml /usr/share/heartbeat/heartbeat.yml
---> Using cache
---> f30eb4934dca
Step 4/4 : RUN chown -R heartbeat:heartbeat /usr/share/heartbeat
---> [Warning] The requested image's platform (linux/amd64) does not match the detected host platform (windows/amd64) and no specific platform was requested
---> Running in 2ae3bfdd5422
The command '/bin/sh -c chown -R heartbeat:heartbeat /usr/share/heartbeat' returned a non-zero code: 4294967295: failed to shutdown container: container 2ae3bfdd5422e81461a14896db0908e4cd67af1a6f99c629abff1e588f62fc32 encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110): subsequent terminate failed container 2ae3bfdd5422e81461a14896db0908e4cd67af1a6f99c629abff1e588f62fc32 encountered an error during hcsshim::System::waitBackground: failure in a Windows system call: The virtual machine or container with the specified identifier is not running. (0xc0370110)
All help is welcome...
Running with --platform:
PS C:\SynteticMonitoring> docker image build ./ -t custom/heartbeat:7.16.3
Sending build context to Docker daemon 9.728kB
Step 1/4 : FROM --platform=linux/amd64 docker.elastic.co/beats/heartbeat:7.16.3
---> b64ad4b42006
Step 2/4 : USER root
---> Using cache
---> 922a9121e51b
Step 3/4 : COPY --chown=heartbeat:heartbeat yml/heartbeat.yml /usr/share/heartbeat/heartbeat.yml
---> Using cache
---> f30eb4934dca
Step 4/4 : RUN chmod +r /usr/share/heartbeat/heartbeat.yml
---> Using cache
---> e9a075d2ab53
Successfully built e9a075d2ab53
Successfully tagged custom/heartbeat:7.16.3
PS C:\SynteticMonitoring> docker run --interactive --tty --entrypoint /bin/sh custom/heartbeat:7.16.3
sh-4.2# ls -l
total 106916
-rw-r--r-- 1 root root 13675 Jan 7 00:47 LICENSE.txt
-rw-r--r-- 1 root root 1964303 Jan 7 00:47 NOTICE.txt
-rw-r--r-- 1 root root 851 Jan 7 00:47 README.md
drwxrwxr-x 2 root root 4096 Jan 7 00:48 data
-rw-r--r-- 1 root root 374197 Jan 7 00:47 fields.yml
-rwxr-xr-x 1 root root 107027952 Jan 7 00:47 heartbeat
-rw-r--r-- 1 root root 69196 Jan 7 00:47 heartbeat.reference.yml
-rw-rw-rw- 1 root root 1631 Jan 26 06:49 heartbeat.yml
drwxr-xr-x 2 root root 4096 Jan 7 00:47 kibana
drwxrwxr-x 2 root root 4096 Jan 7 00:48 logs
drwxr-xr-x 2 root root 4096 Jan 7 00:47 monitors.d
sh-4.2# pwd
You can't chown of a file to a user that does not exist. It seems that the heartbeat user and group do not exist in your base image.
That's why the COPY --chown does nothing and you get files owned by root.
You can fix this by creating the user before COPYing. To do this, add a line before your COPY statement, such as:
RUN addgroup heartbeat && adduser -S -H heartbeat -G heartbeat
If you don't have addgroup and adduser in your base image, try alternative:
RUN useradd -rUM -s /usr/sbin/nologin heartbeat
This will create the group and user heartbeat and then chown will be able to successfully change the ownership.
According to Dockerfile documentation:
The optional --platform flag can be used to specify the platform of the image in case FROM references a multi-platform image. For example, linux/amd64, linux/arm64, or windows/amd64. By default, the target platform of the build request is used.
I suggest try something like:
FROM [--platform=<platform>] <image> [AS <name>]
FROM --platform=linux/amd64 docker.elastic.co/beats/heartbeat:7.16.3

Execing docker image entrypoint, which is a compiled go app, fails with "not found"

I have built a small Go app and done local testing of it on my Linux VM.
I'm now trying to build a prototype Docker image for it and test running the image. The Dockerfile structure is pretty simple. I base it on Alpine, copy the executable to the root directory and my entrypoint is running the executable.
It fails with "not found".
Now for more details.
Here is the Dockerfile, with some information elided:
FROM <registry>/<namespace>/alpine-base:3.12.3
COPY target/dist/linux-amd64/<appname> /
RUN echo hello
RUN ls -ltd .
RUN ls -lt
RUN whoami
#ENTRYPOINT ["./<appname>"]
ENTRYPOINT ./<appname>
This is approximately what I do when I build the image:
chmod 777 target/dist/linux-amd64/<appname>
docker build --no-cache -f Dockerfile -t <registry>/<namespace>/<appname>:dev-latest .
This is the output of that:
Sending build context to Docker daemon 14.48MB
Step 1/8 : FROM <registry>/<namespace>/alpine-base:3.12.3
---> d7eec24f3d29
Step 2/8 : COPY target/dist/linux-amd64/<appname> /
---> e056bbe44bd6
Step 3/8 : EXPOSE 8080
---> Running in 921cc1fe8804
Removing intermediate container 921cc1fe8804
---> 00b30c5a2770
Step 4/8 : RUN echo hello
---> Running in 9fb08d924d3c
Removing intermediate container 9fb08d924d3c
---> 6788feafae4b
Step 5/8 : RUN ls -ltd .
---> Running in 78e6d4aea09f
drwxr-xr-x 1 root root 4096 Jan 10 23:02 .
Removing intermediate container 78e6d4aea09f
---> 711f3d247efe
Step 6/8 : RUN ls -lt
---> Running in 32e703a9d480
total 14200
drwxr-xr-x 5 root root 340 Jan 10 23:02 dev
drwxr-xr-x 1 root root 4096 Jan 10 23:02 etc
dr-xr-xr-x 324 root root 0 Jan 10 23:02 proc
dr-xr-xr-x 13 root root 0 Jan 10 23:02 sys
-rwxrwxrwx 1 root root 14480384 Jan 10 22:39 <appname>
drwxr-xr-x 1 root root 4096 Jan 12 2021 home
drwxr-xr-x 1 root root 4096 Jan 12 2021 opt
drwxr-xr-x 2 root root 4096 Dec 16 2020 bin
drwxr-xr-x 2 root root 4096 Dec 16 2020 sbin
drwxr-xr-x 1 root root 4096 Dec 16 2020 lib
drwxr-xr-x 5 root root 4096 Dec 16 2020 media
drwxr-xr-x 2 root root 4096 Dec 16 2020 mnt
drwx------ 2 root root 4096 Dec 16 2020 root
drwxr-xr-x 2 root root 4096 Dec 16 2020 run
drwxr-xr-x 2 root root 4096 Dec 16 2020 srv
drwxrwxrwt 2 root root 4096 Dec 16 2020 tmp
drwxr-xr-x 1 root root 4096 Dec 16 2020 usr
drwxr-xr-x 1 root root 4096 Dec 16 2020 var
Removing intermediate container 32e703a9d480
---> 68871e80b517
Step 7/8 : RUN whoami
---> Running in 40b2460bc349
Removing intermediate container 40b2460bc349
---> 4cf57c0b5f10
Step 8/8 : ENTRYPOINT ./<appname>
---> Running in 3c57717800ab
Removing intermediate container 3c57717800ab
---> eaafc953da46
Successfully built eaafc953da46
Successfully tagged <registry>/<namespace>/<appname>:dev-latest
And this is what I run to test it:
docker rm <appname>-1
docker run -P --name=<appname>-1 -d -t <registry>/<namespace>/<appname>:dev-latest
docker logs <appname>-1
And this is the output:
docker rm <appname>-1
docker run -P --name=<appname>-1 -d -t <registry>/<namespace>/<appname>:dev-latest
docker logs <appname>-1
/bin/sh: ./<appname>: not found
It says "not found". I don't understand that. I showed the contents of the root directory. The file is clearly there. Is this error saying that some OTHER file is not found, like if it thought it was a shell script and the shebang pointed to a shell that doesn't exist?
So the one tiny little detail that I realized I didn't mention in the original post is that disabling CGO is not going to be possible. The entire reason for this app is to link with a C library and call functions in it, so I have to use Cgo.
What I conclude from these helpful comments and other threads like Go-compiled binary won't run in an alpine docker container on Ubuntu host , is that my "workaround" of changing to an ubuntu base image is actually the only reasonable solution.
If disabling cgo is not an option you can pass "-static" parameter to the linker.
package main
#include <stdio.h>
void test_puts() {
puts("puts() called");
import "C"
func main() {
go build --ldflags '-extldflags "-static"'

Permission for singularity

I am got an issue when running the whole pipeline of ChIP-seq using profile singularity on my local PC (window but subsystem Linux)
Error executing process > 'output_documentation'
Caused by:
Failed to pull singularity image
command: singularity pull --name nfcore-chipseq-1.2.2.img.pulling.1630098407814 docker://nfcore/chipseq:1.2.2 > /dev/null
status : 255
INFO: Using cached SIF image
FATAL: While making image from oci registry: error copying image out of cache: could not open temporary file for copy: failed to change permission of ./tmp-copy-2575820807: chmod ./tmp-copy-2575820807: operation not permitted
I'm using singularity 3.8.2
I also have specified NXF_SINGULARITY_CACHEDIR to a hard drive instead of /home/.singularity
I also checked the folder to make sure all the file can be accessed
total 0
drwxrwxrwx 1 root root 4096 Aug 28 05:06 .
drwxrwxrwx 1 root root 4096 Aug 28 04:47 ..
-rwxrwxrwx 1 root root 0 Aug 28 04:53 tmp-copy-2299332276
-rwxrwxrwx 1 root root 0 Aug 28 05:06 tmp-copy-2575820807

yarn doesn't create node_modules folder in docker container

I am trying to build an application using dockerfile and docker_compose. Everything seems ok until it comes to yarn command, which is not creating node_module and not installing dependencies.
docker-compose file:
version: '2'
env_file: .env
context: .
- APP_USER=app
FROM node:10.15.2-alpine
RUN adduser -D -s /bin/false -h /home/$APP_USER $APP_USER $APP_USER
COPY . .
RUN whoami
RUN yarn install
RUN ls -la
RUN pwd
RUN yarn build
CMD ["node", "./dist/index.js"]
Building mdisassistant
Step 1/15 : FROM node:10.15.2-alpine
---> 072459fe4d8a
Step 2/15 : ARG APP_NAME
---> Using cache
---> 4c30b08b312d
Step 3/15 : ARG APP_USER
---> Using cache
---> 9631ef748cd7
Step 4/15 : RUN adduser -D -s /bin/false -h /home/$APP_USER $APP_USER $APP_USER
---> Using cache
---> f5d045ca5282
Step 5/15 : ENV HOME=/home/$APP_USER
---> Using cache
---> 32c4f9457b9e
---> Using cache
---> a8d71a9f563f
Step 7/15 : COPY . .
---> d4ef17f02a9f
Step 8/15 : RUN chown $APP_USER:$APP_USER -R $HOME
---> Running in f6c194316e12
Removing intermediate container f6c194316e12
---> f6742a0c10df
Step 9/15 : USER $APP_USER
---> Running in ec22ed655aa5
Removing intermediate container ec22ed655aa5
---> af800732027d
Step 10/15 : RUN whoami
---> Running in ba9fa81a95a3
Removing intermediate container ba9fa81a95a3
---> ebf0f6a4f8a7
Step 11/15 : RUN yarn install
---> Running in 4d5e76dd1508
yarn install v1.13.0
[1/4] Resolving packages...
[2/4] Fetching packages...
Removing intermediate container 4d5e76dd1508
---> 1785eec9829e
Step 12/15 : RUN ls -la
---> Running in b0f3f1b1e5fc
total 92
drwxr-sr-x 1 app app 4096 Mar 21 02:52 .
drwxr-sr-x 1 app app 4096 Mar 21 02:52 ..
-rw-r--r-- 1 app app 1610 Mar 21 02:05 .dockerignore
-rw-r--r-- 1 app app 62 Mar 21 00:56 .env.example
drwxr-xr-x 1 app app 4096 Mar 21 02:52 .git
-rw-r--r-- 1 app app 1610 Mar 20 23:56 .gitignore
-rw-r--r-- 1 app app 333 Mar 21 02:52 Dockerfile
-rw-r--r-- 1 app app 1060 Mar 20 23:56 LICENSE
-rw-r--r-- 1 app app 681 Mar 20 23:56 README.md
-rw-r--r-- 1 app app 280 Mar 21 02:35 docker-compose.yml
-rw-r--r-- 1 app app 614 Mar 21 01:57 package.json
drwxr-xr-x 1 app app 4096 Mar 21 01:05 src
-rw-r--r-- 1 app app 44193 Mar 21 01:20 yarn.lock
Removing intermediate container b0f3f1b1e5fc
---> eb6fbee4548f
Step 13/15 : RUN pwd
---> Running in 92b3d6d20201
Removing intermediate container 92b3d6d20201
---> 853d9879da99
Step 14/15 : RUN yarn build
---> Running in 205ef8079386
yarn run v1.13.0
$ ncc build src/index.js -o dist -m
/bin/sh: ncc: not found
error Command failed with exit code 127.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
ERROR: Service 'mdisassistant' failed to build: The command '/bin/sh -c yarn build' returned a non-zero code: 1
How to fix this problem?
Can you try without user creation
FROM node:10.15.2-alpine
COPY package.json .
RUN whoami
RUN yarn install
RUN ls -la
RUN pwd
COPY . .
RUN yarn build
CMD ["node", "./dist/index.js"]
The problem was in the version of the docker and OS that I am used to building the docker container.
I was running docker-compose with docker version 18.x and OS was Deepin 15.11
I switched to EleebtaryOS version 5.1.2 Hera with installed docker version 19.03.6 and now everything works.
I don't know why this happens in Deepin OS.

convert spring boot tomcat azure k8s deployment to standalone application

I have created an azure devops project for java , spring boot and kubernetes as a way to learn about the azure technology set. It does work , the simple spring boot web application is deployed and runs and is rebuilt if I make code changes.
However the spring boot application uses a very old version of spring 1.5.7.RELEASE and it is deployed in a tomcat server in k8s.
I am looking for some guidance on how to run it as a standalone spring boot version 2 application in kubernetes. My attempts so far have resulted in the deployment timing out after 15 minutes in the Helm Upgrade step.
The existing docker file
FROM maven:3.5.2-jdk-8 AS build-env
COPY . /app
RUN mvn package
FROM tomcat:8
RUN rm -rf /usr/local/tomcat/webapps/ROOT
COPY --from=build-env /app/target/*.war /usr/local/tomcat/webapps/ROOT.war
How to change the dockerfile to build the image of a standalone spring boot app?
I changed the pom to generate a jar file, then modified the docker file to this:
FROM maven:3.5.2-jdk-8 AS build-env
COPY . /app
RUN mvn package
FROM openjdk:8-jdk-alpine
COPY --from=build-env /app/target/ROOT.jar .
RUN ls -la
ENTRYPOINT ["java","-jar","ROOT.jar"]
This builds, see output from the log for 'Build an image' step
2019-06-25T23:33:38.0841365Z Step 9/20 : COPY --from=build-env /app/target/ROOT.jar .
2019-06-25T23:33:41.4839851Z ---> b478fb8867e6
2019-06-25T23:33:41.4841124Z Step 10/20 : RUN ls -la
2019-06-25T23:33:41.6653383Z ---> Running in 4618c503ac5c
2019-06-25T23:33:42.2022890Z total 50156
2019-06-25T23:33:42.2026590Z drwxr-xr-x 1 root root 4096 Jun 25 23:33 .
2019-06-25T23:33:42.2026975Z drwxr-xr-x 1 root root 4096 Jun 25 23:33 ..
2019-06-25T23:33:42.2027267Z -rwxr-xr-x 1 root root 0 Jun 25 23:33 .dockerenv
2019-06-25T23:33:42.2027608Z -rw-r--r-- 1 root root 51290350 Jun 25 23:33 ROOT.jar
2019-06-25T23:33:42.2027889Z drwxr-xr-x 2 root root 4096 May 9 20:49 bin
2019-06-25T23:33:42.2028188Z drwxr-xr-x 5 root root 340 Jun 25 23:33 dev
2019-06-25T23:33:42.2028467Z drwxr-xr-x 1 root root 4096 Jun 25 23:33 etc
2019-06-25T23:33:42.2028765Z drwxr-xr-x 2 root root 4096 May 9 20:49 home
2019-06-25T23:33:42.2029376Z drwxr-xr-x 1 root root 4096 May 11 01:32 lib
2019-06-25T23:33:42.2029682Z drwxr-xr-x 5 root root 4096 May 9 20:49 media
2019-06-25T23:33:42.2029961Z drwxr-xr-x 2 root root 4096 May 9 20:49 mnt
2019-06-25T23:33:42.2030257Z drwxr-xr-x 2 root root 4096 May 9 20:49 opt
2019-06-25T23:33:42.2030537Z dr-xr-xr-x 135 root root 0 Jun 25 23:33 proc
2019-06-25T23:33:42.2030937Z drwx------ 2 root root 4096 May 9 20:49 root
2019-06-25T23:33:42.2031214Z drwxr-xr-x 2 root root 4096 May 9 20:49 run
2019-06-25T23:33:42.2031523Z drwxr-xr-x 2 root root 4096 May 9 20:49 sbin
2019-06-25T23:33:42.2031797Z drwxr-xr-x 2 root root 4096 May 9 20:49 srv
2019-06-25T23:33:42.2032254Z dr-xr-xr-x 12 root root 0 Jun 25 23:33 sys
2019-06-25T23:33:42.2032355Z drwxrwxrwt 2 root root 4096 May 9 20:49 tmp
2019-06-25T23:33:42.2032656Z drwxr-xr-x 1 root root 4096 May 11 01:32 usr
2019-06-25T23:33:42.2032945Z drwxr-xr-x 1 root root 4096 May 9 20:49 var
2019-06-25T23:33:43.0909881Z Removing intermediate container 4618c503ac5c
2019-06-25T23:33:43.0911258Z ---> 0d824ce4ae62
2019-06-25T23:33:43.0911852Z Step 11/20 : ENTRYPOINT ["java","-jar","ROOT.jar"]
2019-06-25T23:33:43.2880002Z ---> Running in bba9345678be
The build completes but deployment fails in the Helm Upgrade step, timing out after 15 minutes. This is the log
2019-06-25T23:38:06.6438602Z ##[section]Starting: Helm upgrade
2019-06-25T23:38:06.6444317Z ==============================================================================
2019-06-25T23:38:06.6444448Z Task : Package and deploy Helm charts
2019-06-25T23:38:06.6444571Z Description : Deploy, configure, update a Kubernetes cluster in Azure Container Service by running helm commands
2019-06-25T23:38:06.6444648Z Version : 0.153.0
2019-06-25T23:38:06.6444927Z Author : Microsoft Corporation
2019-06-25T23:38:06.6445006Z Help : https://learn.microsoft.com/azure/devops/pipelines/tasks/deploy/helm-deploy
2019-06-25T23:38:06.6445300Z ==============================================================================
2019-06-25T23:38:09.1285973Z [command]/opt/hostedtoolcache/helm/2.14.1/x64/linux-amd64/helm upgrade --tiller-namespace dev2134 --namespace dev2134 --install --force --wait --set image.repository=stephenacr.azurecr.io/stephene991 --set image.tag=20 --set applicationInsights.InstrumentationKey=643a47f5-58bd-4012-afea-b3c943bc33ce --set imagePullSecrets={stephendockerauth} --timeout 900 azuredevops /home/vsts/work/r1/a/Drop/drop/sampleapp-v0.2.0.tgz
2019-06-25T23:53:13.7882713Z UPGRADE FAILED
2019-06-25T23:53:13.7883396Z Error: timed out waiting for the condition
2019-06-25T23:53:13.7885043Z Error: UPGRADE FAILED: timed out waiting for the condition
2019-06-25T23:53:13.7967270Z ##[error]Error: UPGRADE FAILED: timed out waiting for the condition
2019-06-25T23:53:13.7976964Z ##[section]Finishing: Helm upgrade
I have had another look at this as I now am more familiar with all the technologies, and I have located the problem.
The helm upgrade statement is timing out waiting for the newly deployed pod to become live but this doesn’t happen because the k8s liveness probe defined for the pod is not working. This can be seen with this command :
kubectl get po -n dev5998 -w
sampleapp-86869d4d54-nzd9f 0/1 CrashLoopBackOff 17 48m
sampleapp-c8f84c857-phrrt 1/1 Running 0 1h
sampleapp-c8f84c857-rmq8w 1/1 Running 0 1h
tiller-deploy-79f84d5f-4r86q 1/1 Running 0 2h
The new pod is repeatedly restarted then killed. It seems to repeat forever or until another deployment is run.
In the log for the pod
kubectl describe po sampleapp-86869d4d54-nzd9f -n dev5998
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 39m default-scheduler Successfully assigned sampleapp-86869d4d54-nzd9f to aks-agentpool-24470557-1
Normal SuccessfulMountVolume 39m kubelet, aks-agentpool-24470557-1 MountVolume.SetUp succeeded for volume "default-token-v72n5"
Normal Pulling 39m kubelet, aks-agentpool-24470557-1 pulling image "devopssampleacreg.azurecr.io/devopssamplec538:52"
Normal Pulled 39m kubelet, aks-agentpool-24470557-1 Successfully pulled image "devopssampleacreg.azurecr.io/devopssamplec538:52"
Normal Created 37m (x3 over 39m) kubelet, aks-agentpool-24470557-1 Created container
Normal Started 37m (x3 over 39m) kubelet, aks-agentpool-24470557-1 Started container
Normal Killing 37m (x2 over 38m) kubelet, aks-agentpool-24470557-1 Killing container with id docker://sampleapp:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 36m (x6 over 38m) kubelet, aks-agentpool-24470557-1 Liveness probe failed: HTTP probe failed with statuscode: 404
Warning Unhealthy 34m (x12 over 38m) kubelet, aks-agentpool-24470557-1 Readiness probe failed: HTTP probe failed with statuscode: 404
Normal Pulled 9m25s (x12 over 38m) kubelet, aks-agentpool-24470557-1 Container image "devopssampleacreg.azurecr.io/devopssamplec538:52" already present on machine
Warning BackOff 4m10s (x112 over 34m) kubelet, aks-agentpool-24470557-1 Back-off restarting failed container
So there must be a difference in what urls are delivered by the application depending on how it is deployed, tomcat or standalone. Which now seems obvious.
