I am trying to reduce the the Gitlab pipeline execution time, by running multiple thread for a job using “parallel: 4” keyword in one of my bottleneck build job. With this I can see that my job as 4 threads with name like 1/4 to 4/4, but the execution is not parallel, they execute sequentially, can someone help me understand why this is happening and any other way to reduce the build time by running things in parallel?
This is a tricky question to answer - because the best answer is it depends.
Also the documentation about the Parallel Directive states
Jobs can only run in parallel if there are multiple runners, or a single runner is configured to run multiple jobs concurrently.
Additionally it is also mentioned here again https://docs.gitlab.com/ee/ci/yaml/README.html#use-your-own-runners
Do you run your own runners or do you use shared runners?
The runners must be configured to allow concurrent execution, if they are not configured like that, you will not be able to run jobs in parallel. With own runners you might be able to change this.
concurrent: limits how many jobs globally can be run concurrently. The most upper limit of jobs using all defined runners. 0 does not mean unlimited
What is the limit of parallel execution on the runner?
Besides the concurrent setting there is also the limit settings on the runner.
limit: Limit how many jobs can be handled concurrently by this token. 0 (default) simply means don’t limit.
All this information about runners can be checked at the GitLab Documentation
Related
i need to limit the amount of parallel jobs because they depend on an external service that only has N available slots for building, but i have more than N jobs in the gitlab pipeline, so i wanted a way to limit the amount of parallels in the job to be equal to N.
Basically:
-I have 1 stage with 3 parallel jobs.
(it has to be only one stage)
-Each job depends on an external building service (outside of gitlab) and it has 2 parallel "slots".
(i cannot raise the number of slots in the external service)
-I need to run the 3 parallel jobs, but no more than 2 always, for them not to fail due to not available slots in the external service.
(independent of the other ones success or failed status)
I already read the documentation about "needs" but i don't find a way to run the next job even if the "needed" job fails.
Also i read the documentation for "rules" but i don't understand if by rules i can check the number of parallel jobs that are running for the current stage.
Thanks
Q1: Whats the difference between
concurrent = 3
[[runners]]
..
executor = "shell"
and
concurrent = 3
[[runners]]
...
executor = "shell"
[[runners]]
...
executor = "shell"
[[runners]]
...
executor = "shell"
Q2: Does it makes sense, to...
have 3 executors (workers) of same type on a single runner with global concurrent = 3? Or can single executor with global concurrent = 3 do multiple jobs in parallel safely?
Q3: How they're related...
runners.limit with runners.request_concurrency and concurrent
Thanks
Gitlab's documentation on runners describes them as:
(...) isolated (virtual) machines that pick up jobs through the coordinator API of GitLab CI
Therefore, each runner is an isolated process responsible for picking up requests for job executions and for dealing with them according to pre-defined configurations. As an isolated process, each runner have the capability of creating 'sub-processes' (also called machines) in order to run jobs.
When you define in your config.toml a [[runner]] section, you're configuring a runner and setting how it should deal with job execution requests.
In your questions, you mentioned two of those "how to deal with job execution request"' settings:
limit: "Limit how many jobs can be handled concurrently". In other words, how many 'sub-processes' can be created by a runner in order to execute jobs simultaneously;
request_concurrency: "Limit number of concurrent requests for new jobs from GitLab". In other words, how many job execution requests can a runner take from GitLab CI job queue simultaneously.
Also, there are some settings that apply to a machine globally. In your question you mentioned one of them:
concurrent: "Limit how many jobs globally can be run concurrently. This is the most upper limit of number of jobs using all defined runners". In other words, it limits the maximum amount of 'sub-processes' that can run jobs simultaneously.
Thus, keeping in mind the difference between a runner its sub-processes and also the difference between specific runner settings and global machine settings:
Q1:
The difference is that in your 1st example you have one runner and in your 2nd example you have three runners. It's worth mentioning that in both examples your machine would only allow running 3 jobs simultaneously.
Q2:
Not only a single runner can run multiple jobs concurrently safely but also is possible to control how many jobs you want it to handle (using the aforementioned limit setting).
Also, there is no problem to have similar runners running in the same machine. How you're going to define your runner's configurations is up to you and your infrastructure capabilities.
Also, please notice that an executor only defines how to run your job. It isn't the only thing that defines a runner and it isn't a synonymous for "worker". The ones working are your runners and their sub-processes.
Q3:
To summarize: You can define one or many workers at the same machine. Each one is an isolated process. A runner's limit is how many sub-processes of a runner process can be created to run jobs concurrently. A runner's request_concurrency is how many requests can a runner handle from the Gitlab CI job queue. Finally, setting a value to concurrent will limit how many jobs can be executed at your machine at the same time in the one or more runners running in the machine.
References
For better understanding, I really recommend you read about Autoscaling algorithm and parameters.
Finally, I think you might find this question on how to run runners in parallel on the same server useful.
When having one gitlab runner serving multiple projects, it can only run one CI pipeline while the other project pipelines have to queue.
Is it possible to make a gitlab runner run pipelines from all projects in parallel?
I don't seem to find anywhere a configuration explanation for this.
I believe the configuration options you are looking for is concurrent and limit, which you'd change in the GitLab Runners config.toml file.
From the documentation:
concurrent: limits how many jobs globally can be run concurrently. The most upper limit of jobs using all defined runners. 0 does not mean unlimited
limit: limit how many jobs can be handled concurrently by this token.
The location for the config.toml file:
/etc/gitlab-runner/config.toml on *nix systems when GitLab Runner is
executed as root (this is also path for service configuration)
~/.gitlab-runner/config.toml on *nix systems when GitLab Runner is
executed as non-root
./config.toml on other systems
Useful issue as well.
We have 4 deploy jobs in the same stage that can be run concurrently. From the Gitlab docs:
The ordering of elements in stages defines the ordering of jobs' execution:
Jobs of the same stage are run in parallel.
Jobs of the next stage are run after the jobs from the previous stage complete successfully.
What happens, however, is that only one of the jobs run at a time and the others stay on pending. Is there perhaps other things that I need to do in order to get it to execute in parallel. I'm using a runner with a shell executor hosted on an Ubuntu 16.04 instance.
Your runner should be configured to enable concurrent jobs( see https://docs.gitlab.com/runner/configuration/advanced-configuration.html)
concurrent = 4
or you may want to setup several runners.
I also ran into this problem. I needed to run several tasks at the same time. I used everything I could find (from needs to parallel). however, my tasks were still performed sequentially. every task I had was on standby. the solution turned out to be very simple. open file /etc/gitlab-runner/config.toml concurent for the required number of parallel tasks for you.
I have multiple spark jobs. Normally I submit my spark jobs to yarn and I have an option that is --yarn_queue which tells it which yarn queue to enter.
But, the jobs seem to run in parallel in the same queue. Sometimes, the results of one spark job, are the inputs for the next spark job. How do I run my spark jobs sequentially rather than in parallel in the same queue?
I have looked at this page for a capacity scheduler. But the closest thing I can see is the property yarn.scheduler.capacity.<queue>.maximum-applications. But this only sets the number of applications that can be in both PENDING and RUNNING. I'm interested in setting the number of applications that can be in the RUNNING state, but I don't care the total number of applications in PENDING (or ACCEPTED which is the same thing).
How do I limit the number of applications in state=RUNNING to 1 for a single queue?
You can manage appropriate queue run one task a time in capacity scheduler configuration. My suggestion to use ambari for that purpose. If you haven't such opportunity apply instruction from guide
From https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:
The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue through the config file. This can be useful when a user must submit hundreds of apps at once, or in general to improve performance if running too many apps at once would cause too much intermediate data to be created or too much context-switching. Limiting the apps does not cause any subsequently submitted apps to fail, only to wait in the scheduler’s queue until some of the user’s earlier apps finish.
Specifically, you need to configure:
maxRunningApps: limit the number of apps from the queue to run at once
E.g.
<?xml version="1.0"?>
<allocations>
<queue name="sample_queue">
<maxRunningApps>1</maxRunningApps>
<other options>
</queue>
</allocations>