Gitlab autoscaling: "Failed to request job: runner requestConcurrency meet"

Gitlab autoscaling: "Failed to request job: runner requestConcurrency meet" - gitlab

We've set up a Gitlab autoscaling master machine as described in the official docs.
The gitlab-runner instance works, is recognized by gitlab.org, and also successfully spawns runners to execute the jobs.
However, the jobs don't get really started on the spawned runners. They stick at this point.
We have debug-level logging turned on, and the only unhappy-looking messages are these repeated ones:
msg="Failed to request job: runner requestConcurrency meet"
config.toml looks like this:
concurrent = 20
check_interval = 10
log_level = "debug"
log_format = "text"
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-master"
url = "https://gitlab.com/"
token = "blabla"
executor = "docker+machine"
limit = 25
[runners.custom_build_dir]
[runners.cache]
Type = "s3"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "blabla"
SecretKey = "blabla"
BucketName = "gitlab.cache.dyynamo.net"
BucketLocation = "us-east-1"
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "amd64/ubuntu:16.04"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = true
volumes = ["/cache"]
shm_size = 0
[runners.machine]
IdleCount = 0
IdleTime = 600
MaxBuilds = 100
MachineDriver = "amazonec2"
MachineName = "gitlab-docker-machine-%s"
Any ideas what the problem can be?
Have also noted this still-open similar issue.

Related

Parallel Matrix Blocking Gitlab Pipeline Execution

My company uses self-managed AWS auto-scaling Docker runners, via Docker Machine. This configuration is documented here
We have a single runner/runner-manager EC2 instance whose config.toml contains several different runner configs, all with different tags so that different groups in our Gitlab org get a dedicated runner, by use of runner tags, all from a single runner which spins up the appropriate executor for the corresponding tag in the job definition.
The runner for my group has been working flawlessly for months. Today I created a job using the parallel:matrix: keyword
Build Images:
image: myimage
stage: build
script:
- docker build -f $DOCKERFILE -t $IMAGE_TAG
- docker push $IMAGE_TAG
parallel:
matrix:
- DOCKERFILE: $CI_PROJECT_DIR/Dockerfile
IMAGE_TAG: myrepo/myimage:standard
- DOCKERFILE: $CI_PROJECT_DIR/super.Dockerfile
IMAGE_TAG: myrepo/myimage:super
rules:
- when: always
When I push a commit neither this job or any others which should run are getting triggered. No error message or anything. The CI/CD->Jobs page does not show any jobs either.
This is the config.toml used on the runner manager. The runner I am attempting to run this job with is the first runner "my-runner"
concurrent = 100
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "my-runner"
limit = 6
url = "https://gitlab.com"
token = "XYZABC"
executor = "docker+machine"
[runners.custom_build_dir]
[runners.cache]
Type = "s3"
Path = "cache"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
BucketName = "mybucket"
BucketLocation = "us-east-1"
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
shm_size = 0
[runners.machine]
IdleCount = 0
IdleTime = 600
MaxBuilds = 10
MachineDriver = "amazonec2"
MachineName = "gitlab-docker-machine-%s"
MachineOptions = ["amazonec2-instance-type=t3.medium", "amazonec2-vpc-id=vpc-xxxxxxxx", "amazonec2-security-group=my-security-group", "amazonec2-iam-instance-profile=xxxxxx", "amazonec2-root-size=32", "amazonec2-ami=ami-218k65t87w8b6posq", "amazonec2-subnet-id=subnet-xxxxxxxxx", "amazonec2-zone=a"]
[[runners.machine.autoscaling]]
Periods = ["* * 13-23 * * mon-fri *"]
Timezone = "UTC"
IdleCount = 1
IdleTime = 600
[[runners.machine.autoscaling]]
Periods = ["* * 2-11 * * * *"]
Timezone = "UTC"
IdleCount = 0
IdleTime = 300
[[runners]]
name = "other-runner"
limit = 6
url = "https://gitlab.com"
token = "LMNOP"
executor = "docker+machine"
...
...
...
There are several more runners defined in this config, but they are all very similar. Each is registered with different tags.
My Question: In the Gitlab CI docs it says
Multiple runners must exist, or a single runner must be configured to run multiple jobs concurrently
and to me it seems like multiple runners do exist, since the runner I am using has a limit of 6. Do the executors need to actually be spun up and sitting idle for this to work? Is there any way that I can get these parallel jobs to run without increasing my runner idle count?
Edit: Some additional information
This is just one job of about a dozen in this file(load-tests.yml)
My gitlab-ci.yml file imports jobs from about 10 other files via
include:
- local: .gitlab/load-test.yml
The pipeline never get created. If I comment out this job then the pipeline runs, including the other jobs in this file.
I can provide the entire file verbatim, but everything works fine if this job is not included. I'm fairly experienced with Gitlab-CI so I'm sure that the issue lies with this job and/or the runner config when using these keywords.
Tag and other keys are set in defaults in the .gitlab-ci.yml file. None of them are of significance, things like default variables, default before_script, cache, etc.
Edit: We are using Gitlab SAAS(premium I believe, but not sure) with the runner manager using Gitlab-Runner v14.1.0

having trouble configuring gitlab runner with docker machine from azure availability set

I wish to configure a GitLab runner AutoScale using Docker Machine and Azure Availability sets. I followed this guide https://docs.gitlab.com/runner/executors/docker_machine.html . the docker machine seems to work and new vms are created using the availability set but I get this errorwhen I use docker-machine ls Unable to query docker version: Cannot connect to the docker engine endpoint .
Also I configured the config.toml file and registered with gitlab-runner register and I do not get the runner in the gitlab runners list.
this is the config.toml
concurrent = 30
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runners-bastion"
url = "#############"
token = "#############"
executor = "docker+machine"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
AccountName = "gitlabrunners#############"
AccountKey = "#############"
ContainerName = "runners-cache"
StorageDomain = "blob.core.windows.net"
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
shm_size = 0
[runners.machine]
IdleCount = 1
IdleTime = 300
MachineDriver = "azure"
MachineName = "gitlab-runner-%s"
MachineOptions = [
"azure-subscription-id=#############",
"azure-availability-set=",
"azure-client-id=c3f1c633-eb2e-4fb0-9b45-8089ed4d8809",
"azure-client-secret=Cmi8Q~AFbCpPYZV4Uz7FjwrM66Z8Fj3Pd~QQPc2i",
"azure-location=northeurope",
"azure-no-public-ip",
"azure-use-private-ip",
"azure-vnet=BASMACH-NE-PROD-GITLAB-RUNNERS-VNET",
"azure-subnet=default",
"azure-size=Standard_B2s" ,
"azure-resource-group=BASMACH-NE-PROD-GITLAB-RUNNERS"
]

gitlab parallel:matrix does not run concurrently

I am trying to get a job to run in parallel within gitlab. I have registered 8 runners on one server and set concurrent=16 in /etc/gitlab-runner/config.toml. I see all 8 runners configured within that file. Below is the beginning of the file:
concurrent = 16
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-02-#1"
url = "http://gitlab.example.com"
token = "redacted"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "almalinux:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[[runners]]
name = "gitlab-runner-02-#2"
url = "http://gitlab.example.com"
token = "redacted"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "almalinux:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
.gitlab-ci.yml:
image: almalinux:latest
stages:
- build
foo:
stage: build
script: sleep 60
parallel:
matrix:
- foo: [a, b, c, d, e, f, g, h]
From everything I've read, this should be enough for it to run in parallel. And yet, all the jobs run sequentially. What am I missing here?
Edit:
I've crossposted on gitlab's forum. I will make sure both gitlab's forum and stackoverflow have the answer clearly listed when I find it. https://forum.gitlab.com/t/parallel-matrix-isnt-executing-in-parallel/72095
Update:
I seem to have resolved the issue. I need to do more experimenting, but it appears to be related to another registered runner on a different server not allowing more than one job. I also increased the limit for each individual runner. I need to do more experimenting to understand what actually fixed it still.

InfluxDB write failure node-influx library

I am writing data to influxdb using the node-influx library.
https://github.com/node-influx/node-influx
It writes about 500,000 records and then I start seeing this error following which there are no more writes. It looks like a dns issue but I am running it inside a docker container on ubuntu 18.04 host.
Error: getaddrinfo EAGAIN influxdb influxdb:8086 |at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:56:26)
I have the logging level set to debug but I am not seeing any other errors. Any idea what might be causing this?
Update
tried with a different influx version
increased ulimit of host
used ip address of docker-container instead of service name, no error is thrown, writes stop after sometime silently
tried to call the write API with curl
curl -i -XPOST 'http://localhost:8086/write?db=mydb' --data-binary 'snmp,hostipv4=172.16.102.82,oidname=cpu_idle,site=gotham value=1000 1574751020815819489'
This works and a record is inserted in the DB.
Update2
It seems to be a dns issue on the docker network. I am not able to ping the influxdb container from the worker container. The writes are not reaching influx.
Update3
As a workaround for now, I am forcing a process.exit(1) on catching the error on my worker and using docker-compose restart: on-failure to restart the service. This resumes the writes.
retention policy is set for 2 days on the DB
influxdb.conf
reporting-disabled = true
[meta]
dir = "/var/lib/influxdb/meta"
retention-autocreate = false
logging-enabled = true
[logging]
format = "auto"
level = "debug"
[data]
engine = "tsm1"
dir = "/var/lib/influxdb/data"
wal-dir = "/var/lib/influxdb/wal"
wal-fsync-delay = "200ms"
index-version = "inmem"
wal-logging-enabled = true
query-log-enabled = true
cache-max-memory-size = "2g"
cache-snapshot-memory-size = "256m"
cache-snapshot-write-cold-duration = "20m"
compact-full-write-cold-duration = "24h"
max-concurrent-compactions = 0
compact-throughput = "48m"
max-points-per-block = 0
max-series-per-database = 1000000
trace-logging-enabled = false
[coordinator]
write-timeout = "10s"
max-concurrent-queries = 0
query-timeout = "0s"
log-queries-after = "0s"
max-select-point = 0
max-select-series = 0
max-select-buckets = 0
[retention]
enabled = true
check-interval = "30m0s"
[shard-precreation]
enabled = true
check-interval = "10m0s"
advance-period = "30m0s"
[monitor]
store-enabled = true
store-database = "_internal"
store-interval = "10s"
[http]
enabled = true
bind-address = ":8086"
auth-enabled = false
log-enabled = true
max-concurrent-write-limit = 0
max-enqueued-write-limit = 0
enqueued-write-timeout = 0
[continuous_queries]
enabled = false
log-enabled = true
run-interval = "10s"

How to use GitLab and Gitlab-Mattermost on one machine?

My conf-file:
external_url "http://192.168.3.23" # note the use of a dotted ip
gitlab_rails['gitlab_email_enabled'] = true
gitlab_rails['gitlab_email_from'] = 'gitlab#myhome.com'
gitlab_rails['gitlab_email_display_name'] = 'gitlab'
#gitlab_rails['gitlab_email_reply_to'] = 'gitlab#myhome.com'
gitlab_rails['smtp_enable'] = true
gitlab_rails['smtp_address'] = "mail.home"
gitlab_rails['smtp_port'] = 25
gitlab_rails['smtp_domain'] = "myhome.com"
mattermost_external_url 'http://192.168.3.23'
mattermost['gitlab_enable'] = true
mattermost['gitlab_secret'] = "4d1e<***>bdbfe"
mattermost['gitlab_id'] = "1c441<***>092df"
mattermost['gitlab_scope'] = ""
mattermost['gitlab_auth_endpoint'] = "http://192.168.3.23/oauth/authorize"
mattermost['gitlab_token_endpoint'] = "http://192.168.3.23/oauth/token"
mattermost['gitlab_user_api_endpoint'] = "http://192.168.3.23/api/v3/user"
# Shut down GitLab services on the Mattermost server
#gitlab_rails['enable'] = false
But now by the address 192.168.3.23 loading only gitlab.
GitLab Community Edition 8.4.4 9c31cc6!
How to start use gitlab and mattermost together?

Need use different url-address for GitLab and Mattermost.
extermanl_url "http://192.168.3.23"
...
mattermost_external_url "http://192.168.3.23:8065"
Solve here.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Gitlab autoscaling: "Failed to request job: runner requestConcurrency meet" - gitlab

Related

Parallel Matrix Blocking Gitlab Pipeline Execution

having trouble configuring gitlab runner with docker machine from azure availability set

gitlab parallel:matrix does not run concurrently

InfluxDB write failure node-influx library

How to use GitLab and Gitlab-Mattermost on one machine?

Categories

Resources