Gitlab-CI. Limit parallel jobs in one stage to specific amount - gitlab

i need to limit the amount of parallel jobs because they depend on an external service that only has N available slots for building, but i have more than N jobs in the gitlab pipeline, so i wanted a way to limit the amount of parallels in the job to be equal to N.
Basically:
-I have 1 stage with 3 parallel jobs.
(it has to be only one stage)
-Each job depends on an external building service (outside of gitlab) and it has 2 parallel "slots".
(i cannot raise the number of slots in the external service)
-I need to run the 3 parallel jobs, but no more than 2 always, for them not to fail due to not available slots in the external service.
(independent of the other ones success or failed status)
I already read the documentation about "needs" but i don't find a way to run the next job even if the "needed" job fails.
Also i read the documentation for "rules" but i don't understand if by rules i can check the number of parallel jobs that are running for the current stage.
Thanks

Related

Reducing build pipeline time using parallel keyword in Gitlab?

I am trying to reduce the the Gitlab pipeline execution time, by running multiple thread for a job using “parallel: 4” keyword in one of my bottleneck build job. With this I can see that my job as 4 threads with name like 1/4 to 4/4, but the execution is not parallel, they execute sequentially, can someone help me understand why this is happening and any other way to reduce the build time by running things in parallel?
This is a tricky question to answer - because the best answer is it depends.
Also the documentation about the Parallel Directive states
Jobs can only run in parallel if there are multiple runners, or a single runner is configured to run multiple jobs concurrently.
Additionally it is also mentioned here again https://docs.gitlab.com/ee/ci/yaml/README.html#use-your-own-runners
Do you run your own runners or do you use shared runners?
The runners must be configured to allow concurrent execution, if they are not configured like that, you will not be able to run jobs in parallel. With own runners you might be able to change this.
concurrent: limits how many jobs globally can be run concurrently. The most upper limit of jobs using all defined runners. 0 does not mean unlimited
What is the limit of parallel execution on the runner?
Besides the concurrent setting there is also the limit settings on the runner.
limit: Limit how many jobs can be handled concurrently by this token. 0 (default) simply means don’t limit.
All this information about runners can be checked at the GitLab Documentation

What is the concept of concurrent pipelines (One parallel job in Azure Pipeline lets you run a single build or release job at any given time)?

I am reading about Concurrent pipelines in azure.
Concurrent pipelines
You can run concurrent pipelines (also called parallel jobs) in Azure
Pipelines. One parallel job in Azure Pipeline lets you run a single
build or release job at any given time. This rule is true whether you
run the job on Microsoft-hosted or self-hosted agents. Parallel jobs
are purchased at the organization level, and they are shared by all
projects in an organization.
My understanding is that - the azure build pipeline is organized into jobs (either agent/agentless jobs). Each job contains tasks. On auto/manual trigger the build pipeline runs and I thought that the number of pipelines that can run in parallel (assuming each pipeline has got only 1 job in them) depends on the availability of build agents (machines - either azure or hosted).
So what exactly is the concept of concurrent pipelines? What is the meaning of "One parallel job in Azure Pipeline lets you run a single build or release job at any given time."? In simple English, buying One parallel job should allow us to either a) run 2 build pipelines (assuming each pipeline contains only 1 job) or b) 1 pipeline with 2 jobs in parallel simultaneously. But this depends on availability of build agent as each pipeline (with 1 job) or 1 pipeline with 2 jobs will need 2 machines to run parallelly. Does it also mean that by default (free of charge) only one build pipeline can run at a time? There seems to be confusion between parallel job and parallel pipeline because one pipeline can have parallel job.
I need clarity on this topic with respect to pipeline/job/parallel pipeline/parallel job/count of build agents/count of parallel jobs.
I need clarity on this topic with respect to pipeline/job/parallel
pipeline/parallel job/count of build agents/count of parallel jobs.
Check Relationship between jobs and parallel jobs:
1.When you define a pipeline, you can define it as a collection of jobs. When a pipeline runs, you can run multiple jobs as part of that pipeline.
2.Each job consumes a parallel job that runs on an agent. When there aren't enough parallel jobs available for your organization, the jobs are queued up and run one after the other.
So if we have a pipeline with two jobs: When I queue the pipeline,these two jobs can't run at the same time if we only have one parallel job.
There're different count of parallel jobs available for microsoft-hosted and self-hosted agents, you can follow View available parallel jobs to check the parallel jobs you have.
And for count of build agents, there's no count limit for microsoft-hosted agents. If you're meaning self-hosted agents, you can own many agents in your agent pool.(The limit of count is something we won't meet in normal situation.) We can also install more than one agents in same local machine, see Can I install multiple self-hosted agents on the same machine?.
Hope all above helps :)
Well, the agents don't run pipelines.
They run jobs.
So if you are allowed to run "2 concurrent pipelines", it must mean 2 parallel jobs.
These jobs can be in a single pipeline if they are allowed to run in parallel (i.e. the other is not dependent on the first one).
Yes, on the free version it seems only one job can run in parallel.
I'm not sure when this was released, but there is a setting in the Pre-deployment conditions of an environment that fixed this for me. (Same place you'd find Triggers, Pre-depoyment approvals, Gates)
Pre-deployment conditions >> Deployment queue settings >> Number of parallel deployments = 1 Specific = Maximum number of parallel deployments = 1.

How gitlab runner concurrency works?

Q1: Whats the difference between
concurrent = 3
[[runners]]
..
executor = "shell"
and
concurrent = 3
[[runners]]
...
executor = "shell"
[[runners]]
...
executor = "shell"
[[runners]]
...
executor = "shell"
Q2: Does it makes sense, to...
have 3 executors (workers) of same type on a single runner with global concurrent = 3? Or can single executor with global concurrent = 3 do multiple jobs in parallel safely?
Q3: How they're related...
runners.limit with runners.request_concurrency and concurrent
Thanks
Gitlab's documentation on runners describes them as:
(...) isolated (virtual) machines that pick up jobs through the coordinator API of GitLab CI
Therefore, each runner is an isolated process responsible for picking up requests for job executions and for dealing with them according to pre-defined configurations. As an isolated process, each runner have the capability of creating 'sub-processes' (also called machines) in order to run jobs.
When you define in your config.toml a [[runner]] section, you're configuring a runner and setting how it should deal with job execution requests.
In your questions, you mentioned two of those "how to deal with job execution request"' settings:
limit: "Limit how many jobs can be handled concurrently". In other words, how many 'sub-processes' can be created by a runner in order to execute jobs simultaneously;
request_concurrency: "Limit number of concurrent requests for new jobs from GitLab". In other words, how many job execution requests can a runner take from GitLab CI job queue simultaneously.
Also, there are some settings that apply to a machine globally. In your question you mentioned one of them:
concurrent: "Limit how many jobs globally can be run concurrently. This is the most upper limit of number of jobs using all defined runners". In other words, it limits the maximum amount of 'sub-processes' that can run jobs simultaneously.
Thus, keeping in mind the difference between a runner its sub-processes and also the difference between specific runner settings and global machine settings:
Q1:
The difference is that in your 1st example you have one runner and in your 2nd example you have three runners. It's worth mentioning that in both examples your machine would only allow running 3 jobs simultaneously.
Q2:
Not only a single runner can run multiple jobs concurrently safely but also is possible to control how many jobs you want it to handle (using the aforementioned limit setting).
Also, there is no problem to have similar runners running in the same machine. How you're going to define your runner's configurations is up to you and your infrastructure capabilities.
Also, please notice that an executor only defines how to run your job. It isn't the only thing that defines a runner and it isn't a synonymous for "worker". The ones working are your runners and their sub-processes.
Q3:
To summarize: You can define one or many workers at the same machine. Each one is an isolated process. A runner's limit is how many sub-processes of a runner process can be created to run jobs concurrently. A runner's request_concurrency is how many requests can a runner handle from the Gitlab CI job queue. Finally, setting a value to concurrent will limit how many jobs can be executed at your machine at the same time in the one or more runners running in the machine.
References
For better understanding, I really recommend you read about Autoscaling algorithm and parameters.
Finally, I think you might find this question on how to run runners in parallel on the same server useful.

Gitlab pipeline jobs in the same stage are not running in parallel

We have 4 deploy jobs in the same stage that can be run concurrently. From the Gitlab docs:
The ordering of elements in stages defines the ordering of jobs' execution:
Jobs of the same stage are run in parallel.
Jobs of the next stage are run after the jobs from the previous stage complete successfully.
What happens, however, is that only one of the jobs run at a time and the others stay on pending. Is there perhaps other things that I need to do in order to get it to execute in parallel. I'm using a runner with a shell executor hosted on an Ubuntu 16.04 instance.
Your runner should be configured to enable concurrent jobs( see https://docs.gitlab.com/runner/configuration/advanced-configuration.html)
concurrent = 4
or you may want to setup several runners.
I also ran into this problem. I needed to run several tasks at the same time. I used everything I could find (from needs to parallel). however, my tasks were still performed sequentially. every task I had was on standby. the solution turned out to be very simple. open file /etc/gitlab-runner/config.toml concurent for the required number of parallel tasks for you.

How do I limit the number of spark applications in state=RUNNING to 1 for a single queue in YARN?

I have multiple spark jobs. Normally I submit my spark jobs to yarn and I have an option that is --yarn_queue which tells it which yarn queue to enter.
But, the jobs seem to run in parallel in the same queue. Sometimes, the results of one spark job, are the inputs for the next spark job. How do I run my spark jobs sequentially rather than in parallel in the same queue?
I have looked at this page for a capacity scheduler. But the closest thing I can see is the property yarn.scheduler.capacity.<queue>.maximum-applications. But this only sets the number of applications that can be in both PENDING and RUNNING. I'm interested in setting the number of applications that can be in the RUNNING state, but I don't care the total number of applications in PENDING (or ACCEPTED which is the same thing).
How do I limit the number of applications in state=RUNNING to 1 for a single queue?
You can manage appropriate queue run one task a time in capacity scheduler configuration. My suggestion to use ambari for that purpose. If you haven't such opportunity apply instruction from guide
From https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html:
The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue through the config file. This can be useful when a user must submit hundreds of apps at once, or in general to improve performance if running too many apps at once would cause too much intermediate data to be created or too much context-switching. Limiting the apps does not cause any subsequently submitted apps to fail, only to wait in the scheduler’s queue until some of the user’s earlier apps finish.
Specifically, you need to configure:
maxRunningApps: limit the number of apps from the queue to run at once
E.g.
<?xml version="1.0"?>
<allocations>
<queue name="sample_queue">
<maxRunningApps>1</maxRunningApps>
<other options>
</queue>
</allocations>

Resources