Keep Gitlab CI pipeline on single runner to ensure cache is shared - gitlab

I have a local gitlab server running with a few Gitlab CI runners. In the past, we've had each runner have concurrent = 1 setup, and then when a pipeline is run, any available runner takes any job in each stage.
However, I'd like to start caching dependencies between stages. This means that I must ensure an entire pipeline is run within a single runner instance (I'm trying to avoid uploading caches).
Is it possible for an entire pipeline to be assigned a runner? But have 2+ pipelines run concurrently on multiple runners?

The cache is always stored on the same location where the runner is installed and running[1]. So to share a cache across all your runners, you need to setup an S3 replacement like minio[2] and configure your runners to use that cache.
Without uploading (and downloading) the cache to a central storage, it is not possible that every runner can access the cache of another runner.
[1]https://docs.gitlab.com/ce/ci/caching/#cache-vs-artifacts
[2]https://docs.gitlab.com/runner/install/registry_and_cache_servers.html#install-your-own-cache-server
Is it possible for an entire pipeline to be assigned a runner?
Yes. Just give every runner a unique tag. Than tag every job in your pipeline with the tag of one runner. This will ensure that your pipeline will be executed only by one runner. For more see https://docs.gitlab.com/ce/ci/runners/#using-tags

What you want is currently (GitLab 11.7) not possible (At least on Windows it seems) without the significant administrative overhead of assigning each runner specifically for each of your jobs. Pinning a specific runner to your project and disable all shared ones would work too.
There are a handful of issues that prevent this use case as it is not possible to share the runners cache even with a S3 blob storage configuration (we tried minio).
One of them is a race condition that prevents the cache to be extracted correctly if subsequent jobs are executed on different nodes. This is especially the case for parallel jobs.
What we tried:
Sharing cache using a SMB folder available on all runner machines
Using minio to share the cache
You can find our bug ticket here:
https://gitlab.com/gitlab-org/gitlab-runner/issues/3920

Related

How to define parallelism for MS hosted agents with 2 pipelines

We've got two YAML Pipelines, pull-request.yml and main.yml. As the names suggest, pull-request.yml runs on every PR, and main.yml runs once deployed to main.
I've configured two MS hosted parallel jobs.
In main.yml using deployment jobs, I'm deploying to various Environments. It all works well, except when main.yml is executed twice, in parallel. Then, it will deploy to the same environment in each pipeline, causing issues with our IAC scripts.
Looking at the documentation, it doesn't seem possible to restrict this behavior with YAML pipelines.
My workaround now is to switch back to 1 parallel job, but I want to have to have parallel jobs for my pull-request.yml pipelines. Then, I thought, let's create another Agent Pool, but that only allows me to add self hosted agents. I want to avoid that as MS hosted agents are very convenient.
How can I have parallel jobs for my pull-request.yml but only a single instance for main.yml with MS hosted agents only?
It's not supported to have parallel jobs for one pull-request.yml but single parallel for another main.yaml with MS hosted agent, since Microsoft will auto detect the agent for pipeline if the requirements meet and use it to run the job.
But for your main.yml which deploying to environment, maybe you can use "Exclusive deployment lock policy" on the environment.
As doc mentioned:
With this update, you can ensure that only a single run deploys to an environment at a time. By choosing the "Exclusive lock" check on an environment, only one run will proceed. Subsequent runs which want to deploy to that environment will be paused. Once the run with the exclusive lock completes, the latest run will proceed. Any intermediate runs will be canceled.

Redundance in gitlab runners

I want to implement redundancy in my GitLab runners.
Before creating a new server I am trying with my local machine.
The current setup on my repository:
Working runner (from server)
Non working runner (from local machine)
I want GitLab to chose the other runner when the selected is not working.
The thing is that GitLab is selecting the non-working runner and fail the pipeline without trying to run with the other runner.
How can I make this works?
Both runner are added:
But as the local runner (not working) is chosen, the pipeline fails:
This is an interesting edge case since the runner process itself is still healthy, but something while running a job is failing. The runner process won't know this happens until it retrieves a job and tries to run it, so it will keep try to run jobs, and keep failing.
Since neither the Runner process nor Gitlab can catch this edge case, the only option I can see is that when you see a failed job for this reason, pause the Runner in the project (like in your screenshot) or if you're an admin (or can ask an admin), pause it for the entire instance, assuming you're self-hosting Gitlab. This will prevent any new jobs from running on that Runner so you can troubleshoot the issue.
This will let you run multiple Runner processes on different hosts (or even on the same host by specifying separate config.toml files) so you can still get redundancy and speed up your pipelines.
Some quick searching shows that common issues causing this issue are the runner's host running out of disk space, or a Docker issue that might be solved by updating to the latest version. Making sure Gitlab and the runners are the latest available version wouldn't hurt either.
Another option you have is to submit a new Issue with Gitlab to see if they can address it. The desired situation would be that in the event of a runner system failure, the runner should become unhealthy and not process further jobs.

Azure web app deployment using vscode is faster than devops pipeline

Currently, I am working on Django based project which is deployed in the azure app service. While deploying into the azure app service there were two options, one via using DevOps and another via vscode plugin. Both the scenario is working fine, but strangle while deploying into app service via DevOps is slower than vscode deployment. Usually, via DevOps, it takes around 17-18 minutes whereas via vscode it takes less than 14 min.
Any reason behind this.
Assuming you're using Microsoft hosted build agents, the following statements are true:
With Microsoft-hosted agents, maintenance and upgrades are taken care of for you. Each time you run a pipeline, you get a fresh virtual machine. The virtual machine is discarded after one use.
and
Parallel jobs represents the number of jobs you can run at the same time in your organization. If your organization has a single parallel job, you can run a single job at a time in your organization, with any additional concurrent jobs being queued until the first job completes. To run two jobs at the same time, you need two parallel jobs.
Microsoft provides a free tier of service by default in every organization that includes at least one parallel job. Depending on the number of concurrent pipelines you need to run, you might need more parallel jobs to use multiple Microsoft-hosted or self-hosted agents at the same time.
This first statement might cause an Azure Pipeline to be slower because it does not have any cached information about your project. If you're only talking about deploying, the pipeline first needs to download (and extract?) an artifact to be able to deploy it. If you're also building, it might need to bring in the entire source code and/or external packages before being able to build.
The second statement might make it slower because there might be less parallelization possible than on the local machine.
Next to these two possible reasons, the agents will most probably not have the specs of your development machine, causing them to run tasks slower than they can on your local machine.
You could look into hosting your own agents to eliminate these possible reasons.
Do self-hosted agents have any performance advantages over Microsoft-hosted agents?
In many cases, yes. Specifically:
If you use a self-hosted agent, you can run incremental builds. For example, if you define a pipeline that does not clean the repo and does not perform a clean build, your builds will typically run faster. When you use a Microsoft-hosted agent, you don't get these benefits because the agent is destroyed after the build or release pipeline is completed.
A Microsoft-hosted agent can take longer to start your build. While it often takes just a few seconds for your job to be assigned to a Microsoft-hosted agent, it can sometimes take several minutes for an agent to be allocated depending on the load on our system.
More information: Azure Pipelines Agents
When you deploy via DevOps pipeline. you will go through a lot more steps. See below:
Process the pipeline-->Request Agents(wait for an available agent to be allocated to run the jobs)-->Downloads all the tasks needed to run the job-->Run each step in the job(Download source code, restore, build, publish, deploy,etc.).
If you deploy the project in the release pipeline. Above process will need to be repeated again in the release pipeline.
You can check the document Pipeline run sequence for more information.
However, when you deploy via vscode plugin. Your project will get restored, built on your local machine, and then it will be deployed to azure web app directly from your local machine. So we can see deploying via vscode plugin is faster, since much less steps are needed.

About gitlab CI runners

I am new to gitlab CI and I am fascinated with it. I managed already to get the pipelines working even using docker containers, so I am familiar with the flow for setting jobs and artifacts. I just wish now to understand how this works. My questions are about the following:
Runners
Where is actually everything happening? I mean, which computer is running my builds and executables? I understand that Gitlab has its own shared runners that are available to the users, does this mean that if a shared runner grabs my jobs, is it going to run wherever those runners are hosted? If I register my own runner in my laptop, and use that specific runner, my builds and binaries will be run in my computer?
Artifacts
In order to run/test code, we need the binaries, which from the build stage they are grabbed as artifacts. For the build part if I use cmake, for example, in the script part of the CI.yml file I create a build directory and call cmake .. and so on. Once my job is succesful, if I want the binary i have to go in gitlab and retrieve it myself. So my question is, where is everything saved? I notice that the runner, withing my project, creates something like refs/pipeline/, but where is this actually? how could I get those files and new directories in my laptop
Working space
Pretty much, where is everything happening? the runners, the execution, the artifacts?
Thanks for your time
Everything that happens in each job/step in a pipeline happens on the runner host itself, and depends on the executor you're using (shell, docker, etc.), or on the Gitlab server directly.
If you're using gitlab.com, they have a number of shared runners that the Gitlab team maintains and you can use for your project(s), but as they are shared with everyone on gitlab.com, it can be some time before your jobs are run. However, no matter if you self host or use gitlab.com, you can create your own runners specific for your project(s).
If you're using the shell executor, while the job is running you could see the files on the filesystem somewhere, but they are cleaned up after that job finishes. It's not really intended for you to access the filesystem while the job is running. That's what the job script is for.
If you're using the docker executor, the gitlab-runner service will start a docker instance from the image you specify in .gitlab-ci.yml (or use the default that is configurable). Then the job is run inside that docker instance, and it's deleted immediately after the job finishes.
You can add your own runners anywhere -- AWS, spare machine lying around, even your laptop, and jobs would be picked up by any of them. You can also turn off shared runners and force it to be run on one of your runners if needed.
In cases where you need an artifact after a build/preparatory step, it's created on the runner as part of the job as above, but then the runner automatically uploads the artifact to the gitlab server (or another service that implements the S3 protocol like AWS S3 or Minio). Unless you're using S3/minio, it will only be accessible through the gitlab UI interface, or through the API. In the UI however, it will show up on any related MR's, and also the Pipeline page, so it's fairly accessible.

Limit azure pipeline to only run one after the other rather than in parallel

I have set up a PR Pipeline in Azure. As part of this pipeline I run a number of regression tests. These run against a regression test database - we have to clear out the database at the start of the tests so we are certain what data is in there and what should come out of it.
This is all working fine until the pipeline runs multiple times in parallel - then the regression database is being written to multiple times and the data returned from it is not what is expected.
How can I stop a pipeline running in parallel - I've tried Google but can't find exactly what I'm looking for.
If the pipeline is running, the the next build should wait (not for all pipelines - I want to set it on a single pipeline), is this possible?
Depending on your exact use case, you may be able to control this with the right trigger configuration.
In my case, I had a pipeline scheduled to kick off every time a Pull Request is merged to the main branch in Azure. The pipeline deployed the code to a server and kicked off a suite of tests. Sometimes, when two merges occurred just minutes apart, the builds would fail due to a shared resource that required synchronisation being used.
I fixed it by Batching CI Runs
I changed my basic config
trigger:
- main
to use the more verbose syntax allowing me to turn batching on
trigger:
batch: true
branches:
include:
- main
With this in place, a new build will only be triggered for main once the previous one has finished, no matter how many commits are added to the branch in the meantime.
That way, I avoid having too many builds being kicked off and I can still use multiple agents where needed.
One way to solve this is to model your test regression database as an "environment" in your pipeline, then use the "Exclusive Lock" check to prevent concurrent "deployment" to that "environment".
Unfortunately this approach comes with several disadvantages inherent to "environments" in YAML pipelines:
you must set up the check manually in the UI, it's not controlled in source code.
it will only prevent that particular deployment job from running concurrently, not an entire pipeline.
the fake "environment" you create will appear in alongside all other environments, cluttering the environment view if you happen to use environments for "real" deployments. This is made worse by this view being a big sack of all environments, there's no grouping or hierarchy.
Overall the initial YAML reimplementation of Azure Pipelines mostly ignored the concepts of releases, deployments, environments. A few piecemeal and low-effort aspects have subsequently been patched in, but without any real overarching design or apparent plan to get to parity with the old release pipelines.
You can use "Trigger Azure DevOps Pipeline" extension by Maik van der Gaag.
It needs to add to you DevOps and configure end of the main pipeline and point to your test pipeline.
Can find more details on Maik's blog.
According to your description, you could use your own self-host agent.
Simply deploy your own self-hosted agents.
Just need to make sure your self host agent environment is the same as your local development environment.
Under this situation, since your agent pool only have one available build agent. When multiple builds triggered, only one build will be running simultaneously. Others will stay in queue with a specific order for agents. Unless the prior build finished, it will not run with next build.
For other pipeline, just need to keep use the host agent pool.

Resources