I wish to configure a GitLab runner AutoScale using Docker Machine and Azure Availability sets. I followed this guide https://docs.gitlab.com/runner/executors/docker_machine.html . the docker machine seems to work and new vms are created using the availability set but I get this errorwhen I use docker-machine ls Unable to query docker version: Cannot connect to the docker engine endpoint .
Also I configured the config.toml file and registered with gitlab-runner register and I do not get the runner in the gitlab runners list.
this is the config.toml
concurrent = 30
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runners-bastion"
url = "#############"
token = "#############"
executor = "docker+machine"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
AccountName = "gitlabrunners#############"
AccountKey = "#############"
ContainerName = "runners-cache"
StorageDomain = "blob.core.windows.net"
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache", "/var/run/docker.sock:/var/run/docker.sock"]
shm_size = 0
[runners.machine]
IdleCount = 1
IdleTime = 300
MachineDriver = "azure"
MachineName = "gitlab-runner-%s"
MachineOptions = [
"azure-subscription-id=#############",
"azure-availability-set=",
"azure-client-id=c3f1c633-eb2e-4fb0-9b45-8089ed4d8809",
"azure-client-secret=Cmi8Q~AFbCpPYZV4Uz7FjwrM66Z8Fj3Pd~QQPc2i",
"azure-location=northeurope",
"azure-no-public-ip",
"azure-use-private-ip",
"azure-vnet=BASMACH-NE-PROD-GITLAB-RUNNERS-VNET",
"azure-subnet=default",
"azure-size=Standard_B2s" ,
"azure-resource-group=BASMACH-NE-PROD-GITLAB-RUNNERS"
]
Related
My company uses self-managed AWS auto-scaling Docker runners, via Docker Machine. This configuration is documented here
We have a single runner/runner-manager EC2 instance whose config.toml contains several different runner configs, all with different tags so that different groups in our Gitlab org get a dedicated runner, by use of runner tags, all from a single runner which spins up the appropriate executor for the corresponding tag in the job definition.
The runner for my group has been working flawlessly for months. Today I created a job using the parallel:matrix: keyword
Build Images:
image: myimage
stage: build
script:
- docker build -f $DOCKERFILE -t $IMAGE_TAG
- docker push $IMAGE_TAG
parallel:
matrix:
- DOCKERFILE: $CI_PROJECT_DIR/Dockerfile
IMAGE_TAG: myrepo/myimage:standard
- DOCKERFILE: $CI_PROJECT_DIR/super.Dockerfile
IMAGE_TAG: myrepo/myimage:super
rules:
- when: always
When I push a commit neither this job or any others which should run are getting triggered. No error message or anything. The CI/CD->Jobs page does not show any jobs either.
This is the config.toml used on the runner manager. The runner I am attempting to run this job with is the first runner "my-runner"
concurrent = 100
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "my-runner"
limit = 6
url = "https://gitlab.com"
token = "XYZABC"
executor = "docker+machine"
[runners.custom_build_dir]
[runners.cache]
Type = "s3"
Path = "cache"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
BucketName = "mybucket"
BucketLocation = "us-east-1"
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "alpine:latest"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/var/run/docker.sock:/var/run/docker.sock", "/cache"]
shm_size = 0
[runners.machine]
IdleCount = 0
IdleTime = 600
MaxBuilds = 10
MachineDriver = "amazonec2"
MachineName = "gitlab-docker-machine-%s"
MachineOptions = ["amazonec2-instance-type=t3.medium", "amazonec2-vpc-id=vpc-xxxxxxxx", "amazonec2-security-group=my-security-group", "amazonec2-iam-instance-profile=xxxxxx", "amazonec2-root-size=32", "amazonec2-ami=ami-218k65t87w8b6posq", "amazonec2-subnet-id=subnet-xxxxxxxxx", "amazonec2-zone=a"]
[[runners.machine.autoscaling]]
Periods = ["* * 13-23 * * mon-fri *"]
Timezone = "UTC"
IdleCount = 1
IdleTime = 600
[[runners.machine.autoscaling]]
Periods = ["* * 2-11 * * * *"]
Timezone = "UTC"
IdleCount = 0
IdleTime = 300
[[runners]]
name = "other-runner"
limit = 6
url = "https://gitlab.com"
token = "LMNOP"
executor = "docker+machine"
...
...
...
There are several more runners defined in this config, but they are all very similar. Each is registered with different tags.
My Question: In the Gitlab CI docs it says
Multiple runners must exist, or a single runner must be configured to run multiple jobs concurrently
and to me it seems like multiple runners do exist, since the runner I am using has a limit of 6. Do the executors need to actually be spun up and sitting idle for this to work? Is there any way that I can get these parallel jobs to run without increasing my runner idle count?
Edit: Some additional information
This is just one job of about a dozen in this file(load-tests.yml)
My gitlab-ci.yml file imports jobs from about 10 other files via
include:
- local: .gitlab/load-test.yml
The pipeline never get created. If I comment out this job then the pipeline runs, including the other jobs in this file.
I can provide the entire file verbatim, but everything works fine if this job is not included. I'm fairly experienced with Gitlab-CI so I'm sure that the issue lies with this job and/or the runner config when using these keywords.
Tag and other keys are set in defaults in the .gitlab-ci.yml file. None of them are of significance, things like default variables, default before_script, cache, etc.
Edit: We are using Gitlab SAAS(premium I believe, but not sure) with the runner manager using Gitlab-Runner v14.1.0
I am trying to get a job to run in parallel within gitlab. I have registered 8 runners on one server and set concurrent=16 in /etc/gitlab-runner/config.toml. I see all 8 runners configured within that file. Below is the beginning of the file:
concurrent = 16
check_interval = 0
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-02-#1"
url = "http://gitlab.example.com"
token = "redacted"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "almalinux:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
[[runners]]
name = "gitlab-runner-02-#2"
url = "http://gitlab.example.com"
token = "redacted"
executor = "docker"
[runners.custom_build_dir]
[runners.cache]
[runners.cache.s3]
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "almalinux:latest"
privileged = false
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = false
volumes = ["/cache"]
shm_size = 0
.gitlab-ci.yml:
image: almalinux:latest
stages:
- build
foo:
stage: build
script: sleep 60
parallel:
matrix:
- foo: [a, b, c, d, e, f, g, h]
From everything I've read, this should be enough for it to run in parallel. And yet, all the jobs run sequentially. What am I missing here?
Edit:
I've crossposted on gitlab's forum. I will make sure both gitlab's forum and stackoverflow have the answer clearly listed when I find it. https://forum.gitlab.com/t/parallel-matrix-isnt-executing-in-parallel/72095
Update:
I seem to have resolved the issue. I need to do more experimenting, but it appears to be related to another registered runner on a different server not allowing more than one job. I also increased the limit for each individual runner. I need to do more experimenting to understand what actually fixed it still.
We've set up a Gitlab autoscaling master machine as described in the official docs.
The gitlab-runner instance works, is recognized by gitlab.org, and also successfully spawns runners to execute the jobs.
However, the jobs don't get really started on the spawned runners. They stick at this point.
We have debug-level logging turned on, and the only unhappy-looking messages are these repeated ones:
msg="Failed to request job: runner requestConcurrency meet"
config.toml looks like this:
concurrent = 20
check_interval = 10
log_level = "debug"
log_format = "text"
[session_server]
session_timeout = 1800
[[runners]]
name = "gitlab-runner-master"
url = "https://gitlab.com/"
token = "blabla"
executor = "docker+machine"
limit = 25
[runners.custom_build_dir]
[runners.cache]
Type = "s3"
Shared = true
[runners.cache.s3]
ServerAddress = "s3.amazonaws.com"
AccessKey = "blabla"
SecretKey = "blabla"
BucketName = "gitlab.cache.dyynamo.net"
BucketLocation = "us-east-1"
[runners.cache.gcs]
[runners.cache.azure]
[runners.docker]
tls_verify = false
image = "amd64/ubuntu:16.04"
privileged = true
disable_entrypoint_overwrite = false
oom_kill_disable = false
disable_cache = true
volumes = ["/cache"]
shm_size = 0
[runners.machine]
IdleCount = 0
IdleTime = 600
MaxBuilds = 100
MachineDriver = "amazonec2"
MachineName = "gitlab-docker-machine-%s"
Any ideas what the problem can be?
Have also noted this still-open similar issue.
we have big-data Hadoop cluster based on horton-works HDP version 2.6.4 and ambari 2.6.1 version
all machines are with RHEL 7.2 version
in our cluster we have more then 540 machines and on all machines we have ambari-agent that communicate with ambari server , ( Ambari server is installed only on one machine ) while ambari-agent installed on all machines
until using ansible everything was good , when we do ambari-agent upgrade and ambari-agent restart
but recently we start to use ansible ( ansible-playbook ) in order to automate the installation
and ansible is running on all machines
so when task do the ambari-agent restart , then imminently we notice that ansible execution stooped and killed
after some investigation we saw that ambari agent is using the following ports
url_port = 8440
secured_url_port = 8441
ping_port = 8670
but I not see that any ansible process used above ports , so we not think its related
but the basic issue is clear
when ansible task doing on remote machine - ambari-agent restart , then its caused ansible interrupt and ansible killed
ambari-agent configuration looks like this
[server]
hostname = datanode02.gtfactory.com
url_port = 8440
secured_url_port = 8441
connect_retry_delay = 10
max_reconnect_retry_delay = 30
[agent]
logdir = /var/log/ambari-agent
piddir = /var/run/ambari-agent
prefix = /var/lib/ambari-agent/data
loglevel = INFO
data_cleanup_interval = 86400
data_cleanup_max_age = 2592000
data_cleanup_max_size_mb = 100
ping_port = 8670
cache_dir = /var/lib/ambari-agent/cache
tolerate_download_failures = true
run_as_user = root
parallel_execution = 0
alert_grace_period = 5
status_command_timeout = 5
alert_kinit_timeout = 14400000
system_resource_overrides = /etc/resource_overrides
[security]
keysdir = /var/lib/ambari-agent/keys
server_crt = ca.crt
passphrase_env_var_name = AMBARI_PASSPHRASE
ssl_verify_cert = 0
credential_lib_dir = /var/lib/ambari-agent/cred/lib
credential_conf_dir = /var/lib/ambari-agent/cred/conf
credential_shell_cmd = org.apache.hadoop.security.alias.CredentialShell
[network]
use_system_proxy_settings = true
[services]
pidlookuppath = /var/run/
[heartbeat]
state_interval_seconds = 60
dirs = /etc/hadoop,/etc/hadoop/conf,/etc/hbase,/etc/hcatalog,/etc/hive,/etc/oozie,
/etc/sqoop,
/var/run/hadoop,/var/run/zookeeper,/var/run/hbase,/var/run/templeton,/var/run/oozie,
/var/log/hadoop,/var/log/zookeeper,/var/log/hbase,/var/run/templeton,/var/log/hive
log_lines_count = 300
idle_interval_min = 1
idle_interval_max = 10
[logging]
syslog_enabled = 0
for now we are thinking about the following:
maybe ansible crash because TLSv1 is restricted ( Transport Layer Security ) , the default is that ambari-agent connects to TLSv1
so we think to set force_https_protocol=PROTOCOL_TLSv1_2 in ambari agent configuration , but this is only assumption
our suggestion and the new conf that maybe can help?
[security]
force_https_protocol=PROTOCOL_TLSv1_2 <------ the new update
keysdir = /var/lib/ambari-agent/keys
server_crt = ca.crt
passphrase_env_var_name = AMBARI_PASSPHRASE
ssl_verify_cert = 0
credential_lib_dir = /var/lib/ambari-agent/cred/lib
credential_conf_dir = /var/lib/ambari-agent/cred/conf
credential_shell_cmd = org.apache.hadoop.security.alias.CredentialShell
My conf-file:
external_url "http://192.168.3.23" # note the use of a dotted ip
gitlab_rails['gitlab_email_enabled'] = true
gitlab_rails['gitlab_email_from'] = 'gitlab#myhome.com'
gitlab_rails['gitlab_email_display_name'] = 'gitlab'
#gitlab_rails['gitlab_email_reply_to'] = 'gitlab#myhome.com'
gitlab_rails['smtp_enable'] = true
gitlab_rails['smtp_address'] = "mail.home"
gitlab_rails['smtp_port'] = 25
gitlab_rails['smtp_domain'] = "myhome.com"
mattermost_external_url 'http://192.168.3.23'
mattermost['gitlab_enable'] = true
mattermost['gitlab_secret'] = "4d1e<***>bdbfe"
mattermost['gitlab_id'] = "1c441<***>092df"
mattermost['gitlab_scope'] = ""
mattermost['gitlab_auth_endpoint'] = "http://192.168.3.23/oauth/authorize"
mattermost['gitlab_token_endpoint'] = "http://192.168.3.23/oauth/token"
mattermost['gitlab_user_api_endpoint'] = "http://192.168.3.23/api/v3/user"
# Shut down GitLab services on the Mattermost server
#gitlab_rails['enable'] = false
But now by the address 192.168.3.23 loading only gitlab.
GitLab Community Edition 8.4.4 9c31cc6!
How to start use gitlab and mattermost together?
Need use different url-address for GitLab and Mattermost.
extermanl_url "http://192.168.3.23"
...
mattermost_external_url "http://192.168.3.23:8065"
Solve here.