Gitlab Runner cannot run `docker` commands in `docker` executor - gitlab

This has been driving me nuts as I think I'm exactly following the documentation by GitLab for setting up DIND using socket in GitLab Runner so I can run docker commands in Gitlab CI job. But it keeps giving the following error -
Running with gitlab-runner 14.0.0 (3b6f852e)
on Gitlab-HiddenLayer-Group-Runner GosSpAyH
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab
Using Kubernetes executor with image docker:19.03.12 ...
Using attach strategy to execute scripts...
Preparing environment
00:07
Waiting for pod gitlab/runner-gosspayh-project-27874308-concurrent-0qkp2h to be running, status is Pending
Waiting for pod gitlab/runner-gosspayh-project-27874308-concurrent-0qkp2h to be running, status is Pending
ContainersNotReady: "containers with unready status: [build helper]"
ContainersNotReady: "containers with unready status: [build helper]"
Running on runner-gosspayh-project-27874308-concurrent-0qkp2h via gitlab-runner-gitlab-runner-6984874897-l9z5z...
Getting source from Git repository
00:02
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/GosSpAyH/0/hiddenlayer/hl-tech-blog/.git/
Created fresh repository.
Checking out c48b6257 as master...
Skipping Git submodules setup
Executing "step_script" stage of the job script
00:00
$ docker info
Client:
Debug Mode: false
Server:
ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
errors pretty printing info
Cleaning up file based variables
00:01
ERROR: Job failed: command terminated with exit code 1
Here is my toml configuration in values.yaml for GitLab Runner installation in my private Kubernetes cluster.
config: |
[[runners]]
url = "https://gitlab.com/"
executor = "docker"
privileged = true
[runners.docker]
tls_verify = false
image = "docker:19.03.12"
privileged = true
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
[runners.cache]
Insecure = false
and my .gitlab-ci.yml is the following -
image: docker:19.03.12
variables:
DOCKER_DRIVER: overlay2
before_script:
- docker info
- echo "$CI_REGISTRY_USER | $CI_REGISTRY_PASSWORD | $CI_REGISTRY"
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
build:
stage: build
# Default branch leaves tag empty (= latest tag)
# All other branches are tagged with the escaped branch name (commit ref slug)
script:
- |
if [[ "$CI_COMMIT_BRANCH" == "$CI_DEFAULT_BRANCH" ]]; then
tag=""
echo "Running on default branch '$CI_DEFAULT_BRANCH': tag = 'latest'"
else
tag=":$CI_COMMIT_REF_SLUG"
echo "Running on branch '$CI_COMMIT_BRANCH': tag = $tag"
fi
- docker build --pull -t "$CI_REGISTRY_IMAGE${tag}" -f deploy/Dockerfile .
- docker push "$CI_REGISTRY_IMAGE${tag}"
Note: I'm intentionally leaving the docker-dind service from the .gitlab-ci.yaml file because the documentation says it is not needed.
Additional Information:
Kubernetes Version: 1.20
Gitlab Runner Version: 14.0.0
Running docker commands in CIs is a pretty common workflow and I'm starting to think if it's this difficult to setup, I may as well go back to old ways with using Jenkins.

See if upgrading to GitLab 14.3 (September 2021) would help:
Support for Kubernetes 1.20
In GitLab 14.3, we added support to Kubernetes version 1.20.
GitLab users can benefit from having recent cluster versions in many features, such as the GitLab Kubernetes Agent, Auto DevOps and Cluster Management Project.
You can find the list of supported versions and related timelines in our documentation.
See Documentation and Epic.

Related

Gitlab-ci stage image not pulled

I'm facing an strange behaviour.
Below my .gitlab-ci.yml
image: node:latest
stages:
- release
release:
image: registry.gitlab.com/gitlab-org/release-cli
stage: release
script:
- release-cli create --name release-branch-$CI_JOB_ID --description "desc" --tag-name job-$CI_JOB_ID --ref $CI_COMMIT_SHA
The pipeline is finishing with the following error:
Running with gitlab-runner 14.6.0 (5316d4ac)
on shell-runner gmyChsa1
Preparing the "shell" executor
Using Shell executor...
Preparing environment
Running on myserver.com...
Getting source from Git repository
Fetching changes with git depth set to 50...
Reinitialized existing Git repository in /home/gitlab-runner/builds/gmyChsa1/0/kleyson-sr/changelog-test/.git/
Checking out 21ab4cac as main...
Removing desc.md
Skipping Git submodules setup
Executing "step_script" stage of the job script
$ release-cli create --name release-branch-$CI_JOB_ID --description "desc" --tag-name job-$CI_JOB_ID --ref $CI_COMMIT_SHA
bash: line 137: release-cli: command not found
Cleaning up project directory and file based variables
ERROR: Job failed: exit status 1
The stage release is not pulling and using the release-cli image instead of the global node:latest image.
How can I fix that ?
PS.: Running the pipeline in a local gitlab server.
Running with gitlab-runner 14.6.0 (5316d4ac)
on shell-runner gmyChsa1
Your job is using a GitLab runner configured with the shell executor. This means that the image: directive is ignored entirely.
To use an image: you will need to use a runner configured with a docker/kubernetes/custom executor that supports running jobs in an image.

GitLab CI Timeout with Kaniko and EKS

I am trying to follow the GitLab example code for using kaniko as outlined here. The only thing I have changed is that I am using the v1.7.0-debug tag instead of simply debug.
build:
stage: build
image:
name: gcr.io/kaniko-project/executor:v1.7.0-debug
entrypoint: [""]
script:
- mkdir -p /kaniko/.docker
- echo "{\"auths\":{\"${CI_REGISTRY}\":{\"auth\":\"$(printf "%s:%s" "${CI_REGISTRY_USER}" "${CI_REGISTRY_PASSWORD}" | base64 | tr -d '\n')\"}}}" > /kaniko/.docker/config.json
- >-
/kaniko/executor
--context "${CI_PROJECT_DIR}"
--dockerfile "${CI_PROJECT_DIR}/Dockerfile"
--destination "${CI_REGISTRY_IMAGE}:${CI_COMMIT_TAG}"
My build job is stalling out at the following line:
Running with gitlab-runner 14.4.0 (4b9e985a)
on gitlab-runner-gitlab-runner-84d476ff5c-mkt4s HMty8QBu
Preparing the "kubernetes" executor
00:00
Using Kubernetes namespace: gitlab-runner
Using Kubernetes executor with image gcr.io/kaniko-project/executor:v1.7.0-debug ...
Using attach strategy to execute scripts...
Preparing environment
00:03
Waiting for pod gitlab-runner/runner-hmty8qbu-project-31186441-concurrent-0bbt8x to be running, status is Pending
Running on runner-hmty8qbu-project-31186441-concurrent-0bbt8x via gitlab-runner-gitlab-runner-84d476ff5c-mkt4s...
Getting source from Git repository
00:01
Fetching changes with git depth set to 50...
Initialized empty Git repository in /builds/...
Created fresh repository.
Checking out 4d05d22b as ci...
Skipping Git submodules setup
Executing "step_script" stage of the job script
It just stops at Executing "step_script" and never moves on. I've researched all over and read through as much documentation as I can find but am unable to troubleshoot this issue.
Setup
Amazon EKS version 1.21
GitLab Runner Helm Chart version 0.34.0
kaniko executor image v1.7.0-debug
This ended up being an issue with how the Kubernetes runner itself was configured inside of the runner configuration toml. The default container image we were using for our runners required a modification to the PATH environment variable so we were using the environment configuration setting to do this as outlined here. It seems that this PATH variable did not include the busybox shell defined in the kaniko debug image. We have since moved that PATH change inside our Docker image where it should've been in the first place and things are working as expected.

gitlab job failed - image pull failed

I am trying to do docker scan by using Trivy and integrating it in GitLab the pipeline is passed.
However the job is failed, not sure why the job is failed.
the docker image is valid.
updated new error after enabled shared runner
gitlab.yml
Trivy_container_scanning:
stage: test
image: docker:stable-git
variables:
# Override the GIT_STRATEGY variable in your `.gitlab-ci.yml` file and set it to `fetch` if you want to provide a `clair-whitelist.yml`
# file. See https://docs.gitlab.com/ee/user/application_security/container_scanning/index.html#overriding-the-container-scanning-template
# for details
GIT_STRATEGY: none
IMAGE: "$CI_REGISTRY_IMAGE:$CI_COMMIT_SHA"
allow_failure: true
before_script:
- export TRIVY_VERSION=${TRIVY_VERSION:-v0.20.0}
- apk add --no-cache curl docker-cli
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
- curl -sfL https://raw.githubusercontent.com/aquasecurity/trivy/main/contrib/install.sh | sh -s -- -b /usr/local/bin ${TRIVY_VERSION}
- curl -sSL -o /tmp/trivy-gitlab.tpl https://github.com/aquasecurity/trivy/raw/${TRIVY_VERSION}/contrib/gitlab.tpl
script:
- trivy --exit-code 0 --cache-dir .trivycache/ --no-progress --format template --template "#/tmp/trivy-gitlab.tpl" -o gl-container-scanning-report.json $IMAGE
#- ./trivy — exit-code 0 — severity HIGH — no-progress — auto-refresh trivy-ci-test
#- ./trivy — exit-code 1 — severity CRITICAL — no-progress — auto-refresh trivy-ci-test
cache:
paths:
- .trivycache/
artifacts:
reports:
container_scanning: gl-container-scanning-report.json
dependencies: []
only:
refs:
- branches
Dockerfile
FROM composer:1.7.2
RUN git clone https://github.com/aquasecurity/trivy-ci-test.git && cd trivy-ci-test && rm Cargo.lock && rm Pipfile.lock
CMD apk add — no-cache mysql-client
ENTRYPOINT [“mysql”]
job error:
Running with gitlab-runner 13.2.4 (264446b2)
on gitlab-runner-gitlab-runner-76f48bbd84-8sc2l GCJviaG2
Preparing the "kubernetes" executor
30:00
Using Kubernetes namespace: gitlab-managed-apps
Using Kubernetes executor with image docker:stable-git ...
Preparing environment
30:18
Waiting for pod gitlab-managed-apps/runner-gcjviag2-project-1020-concurrent-0pgp84 to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-gcjviag2-project-1020-concurrent-0pgp84 to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-gcjviag2-project-1020-concurrent-0pgp84 to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-gcjviag2-project-1020-concurrent-0pgp84 to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-gcjviag2-project-1020-concurrent-0pgp84 to be running, status is Pending
Waiting for pod gitlab-managed-apps/runner-gcjviag2-project-1020-concurrent-0pgp84 to be running, status is Pending
ERROR: Job failed (system failure): prepare environment: image pull failed: Back-off pulling image "docker:stable-git". Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
another error:
Running with gitlab-runner 13.2.4 (264446b2)
on gitlab-runner-gitlab-runner-76f48bbd84-8sc2l GCJviaG2
Preparing the "kubernetes" executor
30:00
Using Kubernetes namespace: gitlab-managed-apps
Using Kubernetes executor with image $CI_REGISTRY/devops/docker-alpine-sdk:19.03.15 ...
Preparing environment
30:03
Waiting for pod gitlab-managed-apps/runner-gcjviag2-project-1020-concurrent-0t7plc to be running, status is Pending
ERROR: Job failed (system failure): prepare environment: image pull failed: Failed to apply default image tag "/devops/docker-alpine-sdk:19.03.15": couldn't parse image reference "/devops/docker-alpine-sdk:19.03.15": invalid reference format. Check https://docs.gitlab.com/runner/shells/index.html#shell-profile-loading for more information
The root cause is actually no variable being setup in gitlab cicd variables.
After defined the registry credentials, all works.
This is followed by gitlab-org/gitlab-runner issue 27664
either a GitLab infrastructure issue
or (comment from Bruce Lau)
After some trial and errors, me and our team figured out the issue is due to the runner failed to use service account secret to pull images.
In order to solve this issue, we use a custom config which specify image_pull_secrets in .dockercfg format in order to pull images successfully.
Content of runner-custom-config-map:
kind: ConfigMap
apiVersion: v1
metadata:
name: runner-custom-config-map
namespace: runner-namespace
data:
config.toml: |-
[[runners]]
[runners.kubernetes]
image_pull_secrets = ["secret_to_docker_cfg_file_with_sa_token"]
Used in the runner operator spec:
spec:
concurrent: 1
config: runner-custom-config-map
gitlabUrl: 'https://example.gitlab.com'
imagePullPolicy: Always
serviceaccount: kubernetes-service-account
token: gitlab-runner-registration-secret
With secret_to_docker_cfg_file_with_sa_token:
kind: Secret
apiVersion: v1
name: secret_to_docker_cfg_file_with_sa_token
namespace: plt-gitlab-runners
data:
.dockercfg: >-
__docker_cfg_file_with_pull_token__
type: kubernetes.io/dockercfg
June 2022: the issue is closed by MR 3399 for GitLab 15.0:
"Check serviceaccount and imagepullsecret availability before creating pod"
To prevent the pod creation when needed resources are not available.

GitLab Container to GKE (Kubernetes) deployment

Hello I have a problem with GitLab CI/CD. I'm trying to deploy container to Kubernetes on GKE however I'm getting an error:
This job failed because the necessary resources were not successfully created.
I created a service account with kube-admin rights and created cluster via GUI of GitLab so its fully itegrated. But when I run the job it still doesn't work..
by the way I use kubectl get pods in gitlab-ci file just to test if kubernetes is repsonding.
stages:
- build
- deploy
docker-build:
# Use the official docker image.
image: docker:latest
stage: build
services:
- docker:dind
before_script:
- docker login -u "$CI_REGISTRY_USER" -p "$CI_REGISTRY_PASSWORD" $CI_REGISTRY
# Default branch leaves tag empty (= latest tag)
# All other branches are tagged with the escaped branch name (commit ref slug)
script:
- docker build --pull -t "$CI_REGISTRY_IMAGE${tag}" .
- docker push "$CI_REGISTRY_IMAGE${tag}"
deploy-prod:
stage: deploy
image: bitnami/kubectl:latest
script:
- kubectl get pods
environment:
name: production
kubernetes:
namespace: test1
Any Ideas?
Thank you
namespace should be removed.
GitLab creates own namespace for every project

build and push docker images with GitLab CI

I would like to build and push docker images to my local nexus repo with GitLab CI
This is my current CI file:
image: docker:latest
services:
- docker:dind
before_script:
- docker info
- docker login -u some_user -p nexus-rfit some_host
stages:
- build
build-deploy-ubuntu-image:
stage: build
script:
- docker build -t some_host/dev-image:ubuntu ./ubuntu/
- docker push some_host/dev-image:ubuntu
only:
- master
when: manual
I also have a job for an alpine docker image, but when I want to run any of it it's failing with the following error:
Checking out 13102ac4 as master...
Skipping Git submodules setup
$ docker info
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
ERROR: Job failed: exit code 1
So technically the docker daemon in the image isn't running, but I have no idea why.
GitLab folks have a reference on their docs about using docker-build inside docker-based jobs: https://docs.gitlab.com/ce/ci/docker/using_docker_build.html#use-docker-in-docker-executor. Since you seem to have everything in place (i.e. the right image for the job and the additional docker:dind service), it's most likely a runner-config issue.
If you look at the second step in the docs:
Register GitLab Runner from the command line to use docker and privileged mode:
[...]
Notice that it's using the privileged mode to start the build and service containers. If you want to use docker-in-docker mode, you always have to use privileged = true in your Docker containers.
Probably you're using a runner that was not configured in privileged mode and hence can't properly run the docker daemon inside. You can directly edit the /etc/gitlab-runner/config.toml on your registered runner to add that option.
(Also, read on the section on the docs for some more info about the performance related to the storage driver you choose/your runner supports when using dind)

Resources