Service runner-XXX probably didn't start properly - gitlab

There are a lot of answers about this topic, but I cannot find a solution to my problem here my log:
Waiting for services to be up and running...
*** WARNING: Service runner-hgz7smm8-project-3-concurrent-0-c2b622f72cceadc3-docker-0 probably didn't start properly.
Health check error:
service "runner-hgz7smm8-project-3-concurrent-0-c2b622f72cceadc3-docker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2021-12-07T16:13:47.326235886Z mount: permission denied (are you root?)
2021-12-07T16:13:47.326275450Z Could not mount /sys/kernel/security.
2021-12-07T16:13:47.326284427Z AppArmor detection and --privileged mode might break.
My docker version inside the runner:
root#gitlab-runner-2:~# docker -v
Docker version 20.10.7, build 20.10.7-0ubuntu5.1
Gitlab-runner:
root#gitlab-runner-2:~# gitlab-runner -v
Version: 14.5.1
Git revision: de104fcd
Git branch: 14-5-stable
GO version: go1.13.8
Built: 2021-12-01T15:41:35+0000
OS/Arch: linux/amd64
Runner is an LXD container running inside PROXMOX and is configured like this with "docker" executor:
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-2"
url = "http://gitlab.XXXXXX.com"
token = "XXXXXXXXXX"
executor = "docker"
pre_build_script = "export DOCKER_HOST=tcp://docker:2375"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
Any advices?

The solution that I've achieved, for GitLab 14.10, to solve those Warnings/Errors was to perform the following changes.
On gitlab-runner config.toml:
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "runnertothehills"
url = "https://someexample.com/"
token = "aRunnerToken"
executor = "docker"
[runners.docker]
image = "docker:20.10.14"
privileged = true
disable_cache = false
volumes = ["/cache:/cache", "/var/run/docker.sock:/var/run/docker.sock", "/builds:/builds"]
group = 1000
environment = ["DOCKER_AUTH_CONFIG={\"auths\":{\"some.docker.registry.com:12345\":{\"auth\":\"AdockerLoginToken=\"}}}"]
extra_hosts = ["one.extra.host.com:100.111.120.231"]
The main configuration here is the docker executor and the volumes mount point "/var/run/docker.sock:/var/run/docker.sock".
In .gitlab-ci.yml instead of using
service: docker:dind
Use docker commands directly.
Example:
deploy:
stage: deploy
script:
- docker login -u myuser -p my_password
This solved the following problems:
** WARNING: Service runner-3tm987o3-project-131-concurrent-0-ce49f8u8c582bf56-docker-0
probably didn't start properly.
The problem of docker group not found
2022-05-23T14:24:57.167991289Z time="2022-05-23T14:24:57.167893989Z"
level=warning msg="could not change group /var/run/docker.sock to
docker: group docker not found"
and
2022-05-23T14:24:57.168164288Z failed to load listeners: can't create
unix socket /var/run/docker.sock: device or resource busy

Related

gitlab-ci docker-in-docker with unsecure registry

I currently try to deploy an image build on a gitlab ci/cd to registry.
runner and gitlab containers are setup into the gitlab network.
Therefore, I couldn't make it work.
Here are my configs :
# gitlab-runner.toml
concurrent = 1
check_interval = 0
[[runners]]
name = "runner1"
url = "http://gitlab"
token = "t4ihZ8Tc4Kxy5i5EgHYt"
executor = "docker"
[runners.docker]
host = ""
tls_verify = false
image = "ruby:2.1"
privileged = false
disable_cache = false
shm_size = 0
network_mode = "gitlab"
# gitlab.rb
external_url 'https://gitlab.domain.com/'
gitlab_rails['initial_root_password'] = File.read('/run/secrets/gitlab_root_password')
nginx['listen_https'] = false
nginx['listen_port'] = 80
nginx['redirect_http_to_https'] = false
letsencrypt['enable'] = false
gitlab_rails['smtp_enable'] = true
gitlab_rails['smtp_address'] ="smtp.domain.com"
gitlab_rails['smtp_port'] = 587
gitlab_rails['smtp_user_name'] = "gitlab"
gitlab_rails['smtp_password'] = "password"
gitlab_rails['smtp_domain'] = "domain.com"
gitlab_rails['smtp_authentication'] = "login"
gitlab_rails['smtp_enable_starttls_auto'] = true
gitlab_rails['smtp_openssl_verify_mode'] = 'peer'
gitlab_rails['gitlab_email_from'] = 'gitlab#domain.com'
gitlab_rails['gitlab_email_reply_to'] = 'noreply#domain.com'
# gitlab-compose.yml
version: "3.6"
services:
gitlab:
image: gitlab/gitlab-ce:latest
volumes:
- gitlab_data:/var/opt/gitlab
- gitlab_logs:/var/log/gitlab
- gitlab_config:/etc/gitlab
shm_size: '256m'
environment:
GITLAB_OMNIBUS_CONFIG: "from_file('/omnibus_config.rb')"
GITLAB_ROOT_EMAIL: "contact#domain.com"
GITLAB_ROOT_PASSWORD: "password"
configs:
- source: gitlab
target: /omnibus_config.rb
secrets:
- gitlab_root_password
deploy:
placement:
constraints:
- node.labels.role == compute
labels:
- "traefik.enable=true"
- "traefik.docker.network=traefik-public"
- traefik.constraint-label=traefik-public
- "traefik.http.services.gitlab.loadbalancer.server.port=80"
- "traefik.http.routers.gitlab.rule=Host(`gitlab.domain.com`)"
- "traefik.http.routers.gitlab.entrypoints=websecure"
- "traefik.http.routers.gitlab.tls.certresolver=lets-encrypt"
networks:
- gitlab
- traefik-public
configs:
gitlab:
file: ./gitlab.rb
secrets:
gitlab_root_password:
file: ./root_password.txt
volumes:
gitlab_data:
driver: local
gitlab_logs:
driver: local
gitlab_config:
driver: local
networks:
gitlab:
external: true
traefik-public:
external: true
#gitlab-ci
stages:
- gulp_build
- docker_build_deploy
cache:
paths:
- node_modules/
variables:
DEPLOY_USER: $DEPLOY_USER
DEPLOY_TOKEN: $DEPLOY_TOKEN
build app:
stage: gulp_build
image: node:14.17
before_script:
- npm install
script:
- ./node_modules/.bin/gulp build -production
artifacts:
paths:
- public
docker deploy:
stage: docker_build_deploy
image: docker:latest
services:
- name: docker:dind
command: ["--insecure-registry=gitlab"]
before_script:
- docker login -u $DEPLOY_USER -p $DEPLOY_TOKEN gitlab
script:
- echo $CI_REGISTRY_IMAGE:$CI_COMMIT_REF_SLUG
- docker build -t gitlab/laurene/domain.com:$CI_COMMIT_REF_SLUG -t gitlab/laurene/domain.com:latest .
- docker push gitlab/laurene/domain.com:$CI_COMMIT_REF_SLUG
- docker push gitlab/laurene/domain.com:latest
Deployment logs :
[0KRunning with gitlab-runner 14.9.1 (bd40e3da)[0;m
[0K on runner1 t4ihZ8Tc[0;m
section_start:1650242087:prepare_executor
[0K[0K[36;1mPreparing the "docker" executor[0;m[0;m
[0KUsing Docker executor with image docker:latest ...[0;m
[0KStarting service docker:dind ...[0;m
[0KPulling docker image docker:dind ...[0;m
[0KUsing docker image sha256:a072474332af3e4cf06e349685c4cea8f9e631f0c5cab5b582f3a3ab4cff9b6a for docker:dind with digest docker#sha256:210076c7772f47831afaf7ff200cf431c6cd191f0d0cb0805b1d9a996e99fb5e ...[0;m
[0KWaiting for services to be up and running...[0;m
[0;33m*** WARNING:[0;m Service runner-t4ihz8tc-project-2-concurrent-0-2cd68d823b0d9914-docker-0 probably didn't start properly.
Health check error:
service "runner-t4ihz8tc-project-2-concurrent-0-2cd68d823b0d9914-docker-0-wait-for-service" timeout
Health check container logs:
Service container logs:
2022-04-18T00:34:50.436194142Z Generating RSA private key, 4096 bit long modulus (2 primes)
2022-04-18T00:34:50.490663718Z ............++++
2022-04-18T00:34:50.549517108Z ...............++++
2022-04-18T00:34:50.549802329Z e is 65537 (0x010001)
2022-04-18T00:34:50.562099799Z Generating RSA private key, 4096 bit long modulus (2 primes)
2022-04-18T00:34:50.965975282Z .....................................................................................................................++++
2022-04-18T00:34:51.033998142Z ..................++++
2022-04-18T00:34:51.034281623Z e is 65537 (0x010001)
2022-04-18T00:34:51.056355164Z Signature ok
2022-04-18T00:34:51.056369034Z subject=CN = docker:dind server
2022-04-18T00:34:51.056460584Z Getting CA Private Key
2022-04-18T00:34:51.065394153Z /certs/server/cert.pem: OK
2022-04-18T00:34:51.067347859Z Generating RSA private key, 4096 bit long modulus (2 primes)
2022-04-18T00:34:51.210090561Z ........................................++++
2022-04-18T00:34:51.491331619Z .................................................................................++++
2022-04-18T00:34:51.491620790Z e is 65537 (0x010001)
2022-04-18T00:34:51.509644008Z Signature ok
2022-04-18T00:34:51.509666918Z subject=CN = docker:dind client
2022-04-18T00:34:51.509757628Z Getting CA Private Key
2022-04-18T00:34:51.519103998Z /certs/client/cert.pem: OK
2022-04-18T00:34:51.594873133Z ip: can't find device 'ip_tables'
2022-04-18T00:34:51.595519686Z ip_tables 32768 3 iptable_mangle,iptable_filter,iptable_nat
2022-04-18T00:34:51.595526296Z x_tables 40960 14 xt_REDIRECT,xt_ipvs,xt_state,xt_policy,iptable_mangle,xt_mark,xt_u32,xt_nat,xt_tcpudp,xt_conntrack,xt_MASQUERADE,xt_addrtype,iptable_filter,ip_tables
2022-04-18T00:34:51.595866717Z modprobe: can't change directory to '/lib/modules': No such file or directory
2022-04-18T00:34:51.597027030Z mount: permission denied (are you root?)
2022-04-18T00:34:51.597064490Z Could not mount /sys/kernel/security.
2022-04-18T00:34:51.597067880Z AppArmor detection and --privileged mode might break.
2022-04-18T00:34:51.597608422Z mount: permission denied (are you root?)
[0;33m*********[0;m
[0KPulling docker image docker:latest ...[0;m
[0KUsing docker image sha256:7417809fdb730b60c1b903077030aacc708677cdf02f2416ce413f38e81ec7e0 for docker:latest with digest docker#sha256:41978d1974f05f80e1aef23ac03040491a7e28bd4551d4b469b43e558341864e ...[0;m
section_end:1650242124:prepare_executor
[0Ksection_start:1650242124:prepare_script
[0K[0K[36;1mPreparing environment[0;m[0;m
Running on runner-t4ihz8tc-project-2-concurrent-0 via fed5cebcc8e6...
section_end:1650242125:prepare_script
[0Ksection_start:1650242125:get_sources
[0K[0K[36;1mGetting source from Git repository[0;m[0;m
[32;1mFetching changes with git depth set to 20...[0;m
Reinitialized existing Git repository in /builds/laurene/oelabs.co/.git/
[32;1mChecking out a63a1f2a as master...[0;m
Removing node_modules/
[32;1mSkipping Git submodules setup[0;m
section_end:1650242128:get_sources
[0Ksection_start:1650242128:restore_cache
[0K[0K[36;1mRestoring cache[0;m[0;m
[32;1mChecking cache for default...[0;m
No URL provided, cache will not be downloaded from shared cache server. Instead a local version of cache will be extracted.[0;m
[32;1mSuccessfully extracted cache[0;m
section_end:1650242129:restore_cache
[0Ksection_start:1650242129:download_artifacts
[0K[0K[36;1mDownloading artifacts[0;m[0;m
[32;1mDownloading artifacts for build app (97)...[0;m
Downloading artifacts from coordinator... ok [0;m id[0;m=97 responseStatus[0;m=200 OK token[0;m=Uvp--J3i
section_end:1650242131:download_artifacts
[0Ksection_start:1650242131:step_script
[0K[0K[36;1mExecuting "step_script" stage of the job script[0;m[0;m
[0KUsing docker image sha256:7417809fdb730b60c1b903077030aacc708677cdf02f2416ce413f38e81ec7e0 for docker:latest with digest docker#sha256:41978d1974f05f80e1aef23ac03040491a7e28bd4551d4b469b43e558341864e ...[0;m
[32;1m$ docker login -u $DEPLOY_USER -p $DEPLOY_TOKEN gitlab[0;m
WARNING! Using --password via the CLI is insecure. Use --password-stdin.
time="2022-04-18T00:35:33Z" level=info msg="Error logging in to endpoint, trying next endpoint" error="Get \"https://gitlab/v2/\": dial tcp 10.0.18.4:443: connect: connection refused"
Get "https://gitlab/v2/": dial tcp 10.0.18.4:443: connect: connection refused
section_end:1650242133:step_script
[0K[31;1mERROR: Job failed: exit code 1
[0;m

Error: Missing required provider in next stage even after init

I have following CI configurations:
...
cache:
key: ${CI_PROJECT_NAME}
paths:
- ${TF_ROOT}/.terraform
before_script:
- echo -e "credentials \"$CI_SERVER_HOST\" {\n token = \"$CI_JOB_TOKEN\"\n}" > $TF_CLI_CONFIG_FILE
- cd ${TF_ROOT}
- export TF_LOG_CORE=TRACE
- export TF_LOG_PATH=terraform_logs.txt
stages:
- initialize
- validate
init:
stage: initialize
script:
- terraform -v
- terraform init
#- terraform validate
validate:
stage: validate
script:
- terraform validate
My init runs totally fine however i get following in the next stage i.e. validate:
$ terraform validate
╷
│ Error: Missing required provider
│
│ This configuration requires provider registry.terraform.io/datadog/datadog,
│ but that provider isn't available. You may be able to install it
│ automatically by running:
│ terraform init
in provider.tf:
terraform {
required_version = ">= 0.14"
required_providers {
datadog = {
source = "DataDog/datadog"
version = "2.24.0"
}
}
}
in config.toml:
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "some rummer"
url = "****
token = "***"
executor = "shell"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
If run the validate as subsequent command in the init stage itself if works fine, but just not in the different stage.
If i do ls -al in the next stage before validate, i can even see .terraform folder present which should be having providers inside?
Second guess was a caching issue, however I believe I have specified caches correctly - ${TF_ROOT}/.terraform?
I am running the gitlab-runner as shell executor.
Any idea what is wrong here?

Private GitLab Runner: 403 Forbidden

When running my CI-Pipeline, my GitLab runner shows that the access to the repository is denied (although it is internal and all users of the server are maintainers - including the admin)!
remote: You are not allowed to download code.
fatal: unable to access 'https://gitlab.<omitted>.me/S0urC10ud/eaglesheetmusicbackend.git/': The requested URL returned error: 403
I noticed that there is no token in the URL above, although there is one in the requests before:
21:29:18.702836 git.c:439 trace: built-in: git fetch origin +38682fb8a487f8dca7baa5107a5a021b6f8391c7:refs/pipelines/12 +refs/heads/master:refs/remotes/origin/master --depth 50 --prune --quiet
21:29:18.702963 run-command.c:663 trace: run_command: GIT_DIR=.git git-remote-https origin https://gitlab-ci-token:<omitted>#gitlab.<omitted>.me/S0urC10ud/eaglesheetmusicbackend.git
Is any special configuration needed for the Auth to be set? My runner config looks like the following:
concurrent = 1
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "shared-runner"
url = "https://gitlab.<omitted>.me"
token = "<omitted>"
executor = "docker"
clone_url = "https://gitlab.<omitted>.me"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
network_mode = "br0"
tls_verify = false
image = "ruby:2.6"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
dns = ["192.168.1.251"]
Before you ask: Yes, I am accessing the GitLab-Backend via a NGINX reverse-proxy - but my config should not yield 403
i ended up needing to create a loopback in our firewall/DNS and that resolved the issue

KVM with Terraform: SSH permission denied (Cloud-Init)

I have a KVM host. I'm using Terraform to create some virtual servers using KVM provider. Here's the relevant section of the Terraform file:
provider "libvirt" {
uri = "qemu+ssh://root#192.168.60.7"
}
resource "libvirt_volume" "ubuntu-qcow2" {
count = 1
name = "ubuntu-qcow2-${count.index+1}"
pool = "default"
source = "https://cloud-images.ubuntu.com/bionic/current/bionic-server-cloudimg-amd64.img"
format = "qcow2"
}
resource "libvirt_network" "vm_network" {
name = "vm_network"
mode = "bridge"
bridge = "br0"
addresses = ["192.168.60.224/27"]
dhcp {
enabled = true
}
}
# Use CloudInit to add our ssh-key to the instance
resource "libvirt_cloudinit_disk" "commoninit" {
name = "commoninit.iso"
pool = "default"
user_data = "data.template_file.user_data.rendered"
network_config = "data.template_file.network_config.rendered"
}
data "template_file" "user_data" {
template = file("${path.module}/cloud_config.yaml")
}
data "template_file" "network_config" {
template = file("${path.module}/network_config.yaml")
}
The cloud_config.yaml file contains the following info:
manage_etc_hosts: true
users:
- name: ubuntu
sudo: ALL=(ALL) NOPASSWD:ALL
groups: users, admin
home: /home/ubuntu
shell: /bin/bash
lock_passwd: false
ssh-authorized-keys:
- ${file("/path/to/keyfolder/homelab.pub")}
ssh_pwauth: false
disable_root: false
chpasswd:
list: |
ubuntu:linux
expire: False
package_update: true
packages:
- qemu-guest-agent
growpart:
mode: auto
devices: ['/']
The server gets created successfully, I can ping the device from the host on which I ran the Terraform script. I cannot seem to login through SSH though despite the fact that I pass my SSH key through the cloud-init file.
From the folder where all my keys are stored I run:
homecomputer:keyfolder wim$ ssh -i homelab ubuntu#192.168.80.86
ubuntu#192.168.60.86: Permission denied (publickey).
In this command, homelab is my private key.
Any reasons why I cannot login? Any way to debug? I cannot login to the server now to debug. I tried setting the passwd in the cloud-config file but that also does not work
*** Additional information
1) the rendered template is as follows:
> data.template_file.user_data.rendered
manage_etc_hosts: true
users:
- name: ubuntu
sudo: ALL=(ALL) NOPASSWD:ALL
groups: users, admin
home: /home/ubuntu
shell: /bin/bash
lock_passwd: false
ssh-authorized-keys:
- ssh-rsa AAAAB3NzaC1y***Homelab_Wim
ssh_pwauth: false
disable_root: false
chpasswd:
list: |
ubuntu:linux
expire: False
package_update: true
packages:
- qemu-guest-agent
growpart:
mode: auto
devices: ['/']
I also faced the same problem, because i'm missing the fisrt line
#cloud-config
in the cloudinit.cfg file
You need to add libvirt_cloudinit_disk resource to add ssh-key to VM,
code from my TF-script:
# Use CloudInit ISO to add ssh-key to the instance
resource "libvirt_cloudinit_disk" "commoninit" {
count = length(var.hostname)
name = "${var.hostname[count.index]}-commoninit.iso"
#name = "${var.hostname}-commoninit.iso"
# pool = "default"
user_data = data.template_file.user_data[count.index].rendered
network_config = data.template_file.network_config.rendered
i , i had the same problem . i had resolved in this way:
user_data = data.template_file.user_data.rendered
without double quote!

gitlab-runner cache file size everytime got double in size

I am using distributed cache (S3) for gitlab runner. It working fine but everytime the runner executes job and store cache file in S3, the size of file got double. It includes the older cache.zip file in the new cache file.
gitlab-ci.yml file:
cache:
key: "$CI_COMMIT_REF_NAME"
untracked: true
paths:
- .m2/repository/
runner cache configuration file config.toml:
[runners.cache]
Type = "s3"
Path = "runners-cache"
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "***"
SecretKey = "***"
BucketName = "***"
BucketLocation = "***"

Resources