Gitlab CI: running the same set of tests on several machines

Gitlab CI: running the same set of tests on several machines - gitlab

In high performance computing is crucial to have a code tested against many different architectures/compilers: from a laptop to a supercomputer.
Assuming that we have
N testing machines/workers (each one running gitlab-ci-runner);
M tests,
what shall be the correct layout of .gitlab-ci.yml to ensure that each of the N machines runs all the tests?
Looks to me that adding just more workers ends up in a round-robin like assignment of the jobs.
Thanks for your help.

You could use tags in your .gitlab-ci.yml and on the runners to distribute your tests to all mashines you want. Gitlab CI is very open to those use cases. I assume that you don't use docker for testing.
To accomplish your goal, do following steps:
Install your gitlab-ci-multi-runners on your N mashines BUT if the runner ask you for tags, tag the mashine with a specifc name e.g. "MashineWithManyCPUsWhatever". Please use the gitlab-ci-multi-runner register command to do so. You can alternatively change the tags in Gitlab on the administration view after registration.
The type of the runners should be "shared" not "specific".
When you installed and tagged every runner on your mashines, tag you jobs.
Example:
Every job will now be excecuted on the specific mashine with the specific tag.
For more Information see Gitlab CI Documentation

Related

Dynamic create environment and run Cypress parallel

We are using Cypress to run our end-2-end-tests in GitLab. Before we run the test we create a dynamic environment. A dynamic environment is an environment which is created with docker-compoe inside the gitlab runner which executes the cypress tests. After the dynamic environment is up fire the tests against this dynamic environment. Everything happens in one gitlab-runner so no external deployment to a test environment takes place.
Now we want to move forward and parallelize the cypress run. Its documented here https://docs.cypress.io/guides/guides/parallelization and it is working under the assumption that the environment is already there. It creates several GitLab runners and cypress takes care for distribution the scenarios between the runners.
The question is, how to set up a dynamic environment with GitLab which can be shared between GitLab runners? Is it only possible with a dummy deployment to a Kubernetes environment which is prepared for this user case? Do I need create a dynamic environment in each runner? Or any other hints?

Our solution now looks like this.
We start the environment from docker-compose in each running gitlab-job. This of course reduces the time-saving of parallel execution of tests since gitlab runners are blocked longer. But since it happens in parallel and we are not short on runners its OK for us. An other advantage was that we could keep our current setup with only minimal modifications. In the end we reduced our "run all end 2 end tests" - execution time by about 30%

gitlab cloud CI: how to increase memory for shared runner

My Gitlab CI jobs fail because of RAM limitations.
Page https://docs.gitlab.com/ee/user/gitlab_com/index.html says:
All your CI/CD jobs run on n1-standard-1 instances with 3.75GB of RAM, CoreOS and the latest Docker Engine installed.
Below it says:
The gitlab-shared-runners-manager-X.gitlab.com fleet of runners are dedicated for GitLab projects as well as community forks of them. They use a slightly larger machine type (n1-standard-2) and have a bigger SSD disk size. They don’t run untagged jobs and unlike the general fleet of shared runners, the instances are re-used up to 40 times.
So, how do I enable these n1-standard-2 runners (which have 7.5 GB RAM)? I've read the docs over and over but can't seem to find any instructions.

Disclaimer: I did not check if you can use them with a project and if they picked up for your gitlab CI/CD - but that is the way, of how you can check for available Runners and their tags, and how to use them. The terminology GitLab projects as well as community forks of them reads like that this is only for Official GitLab projects and their forks - and not for random projects on GitLab.
You can check all the available runners in your projects CI/CD Settings under Runners, and you will see a list of runners there like:
As you can see there are Runners tagged with gitlab-org. Base on the description you can not run them, without using a tag. Hence that you need to adapt your .gitlab-ci.yml file with those appropriate tags.
EG:
job:
tags:
- gitlab-org
see GitLab documentation for tags

About gitlab CI runners

I am new to gitlab CI and I am fascinated with it. I managed already to get the pipelines working even using docker containers, so I am familiar with the flow for setting jobs and artifacts. I just wish now to understand how this works. My questions are about the following:
Runners
Where is actually everything happening? I mean, which computer is running my builds and executables? I understand that Gitlab has its own shared runners that are available to the users, does this mean that if a shared runner grabs my jobs, is it going to run wherever those runners are hosted? If I register my own runner in my laptop, and use that specific runner, my builds and binaries will be run in my computer?
Artifacts
In order to run/test code, we need the binaries, which from the build stage they are grabbed as artifacts. For the build part if I use cmake, for example, in the script part of the CI.yml file I create a build directory and call cmake .. and so on. Once my job is succesful, if I want the binary i have to go in gitlab and retrieve it myself. So my question is, where is everything saved? I notice that the runner, withing my project, creates something like refs/pipeline/, but where is this actually? how could I get those files and new directories in my laptop
Working space
Pretty much, where is everything happening? the runners, the execution, the artifacts?
Thanks for your time

Everything that happens in each job/step in a pipeline happens on the runner host itself, and depends on the executor you're using (shell, docker, etc.), or on the Gitlab server directly.
If you're using gitlab.com, they have a number of shared runners that the Gitlab team maintains and you can use for your project(s), but as they are shared with everyone on gitlab.com, it can be some time before your jobs are run. However, no matter if you self host or use gitlab.com, you can create your own runners specific for your project(s).
If you're using the shell executor, while the job is running you could see the files on the filesystem somewhere, but they are cleaned up after that job finishes. It's not really intended for you to access the filesystem while the job is running. That's what the job script is for.
If you're using the docker executor, the gitlab-runner service will start a docker instance from the image you specify in .gitlab-ci.yml (or use the default that is configurable). Then the job is run inside that docker instance, and it's deleted immediately after the job finishes.
You can add your own runners anywhere -- AWS, spare machine lying around, even your laptop, and jobs would be picked up by any of them. You can also turn off shared runners and force it to be run on one of your runners if needed.
In cases where you need an artifact after a build/preparatory step, it's created on the runner as part of the job as above, but then the runner automatically uploads the artifact to the gitlab server (or another service that implements the S3 protocol like AWS S3 or Minio). Unless you're using S3/minio, it will only be accessible through the gitlab UI interface, or through the API. In the UI however, it will show up on any related MR's, and also the Pipeline page, so it's fairly accessible.

Keep Gitlab CI pipeline on single runner to ensure cache is shared

I have a local gitlab server running with a few Gitlab CI runners. In the past, we've had each runner have concurrent = 1 setup, and then when a pipeline is run, any available runner takes any job in each stage.
However, I'd like to start caching dependencies between stages. This means that I must ensure an entire pipeline is run within a single runner instance (I'm trying to avoid uploading caches).
Is it possible for an entire pipeline to be assigned a runner? But have 2+ pipelines run concurrently on multiple runners?

The cache is always stored on the same location where the runner is installed and running[1]. So to share a cache across all your runners, you need to setup an S3 replacement like minio[2] and configure your runners to use that cache.
Without uploading (and downloading) the cache to a central storage, it is not possible that every runner can access the cache of another runner.
[1]https://docs.gitlab.com/ce/ci/caching/#cache-vs-artifacts
[2]https://docs.gitlab.com/runner/install/registry_and_cache_servers.html#install-your-own-cache-server
Is it possible for an entire pipeline to be assigned a runner?
Yes. Just give every runner a unique tag. Than tag every job in your pipeline with the tag of one runner. This will ensure that your pipeline will be executed only by one runner. For more see https://docs.gitlab.com/ce/ci/runners/#using-tags

What you want is currently (GitLab 11.7) not possible (At least on Windows it seems) without the significant administrative overhead of assigning each runner specifically for each of your jobs. Pinning a specific runner to your project and disable all shared ones would work too.
There are a handful of issues that prevent this use case as it is not possible to share the runners cache even with a S3 blob storage configuration (we tried minio).
One of them is a race condition that prevents the cache to be extracted correctly if subsequent jobs are executed on different nodes. This is especially the case for parallel jobs.
What we tried:
Sharing cache using a SMB folder available on all runner machines
Using minio to share the cache
You can find our bug ticket here:
https://gitlab.com/gitlab-org/gitlab-runner/issues/3920

How does gitlab decide which runner to use for a job

If there are more than one available runner for a project, how does gitlab ci decide which runner to use?
I have an omnibus gitlab 8.6.6-ee installation, with 2 runners configured. The runners are identical (docker images, config, etc) except that they are running on different computers.
If they are both idle and a job comes in that either of them could run, which one will run?

To add to Rubinum's answer the 'first' runner would be whichever runner that checks in first that meets all criteria. For example, labels could limit which runners certain jobs run on.
Runners query the gitlab server every X seconds to check if there are builds. if there's a build queued and multiple meet criteria, the first to ask will win
Update to answer comments:
Runners communicate through the CI API http://docs.gitlab.com/ce/ci/api/builds.html to get build status. This will eventually imply that it will become a more or less random choosing of the runner based on when it finished the last job and the xamount of msit is waiting to check.
To completely answer the question:
Credit goes to BM5k after digging through the code and finding that x = 3 seconds based on this and this. Also found that:
which machine a docker+machine runner will use once that runner has been selected) reveals that the machine selection is more or less (effectively) random as well

Gitlab CI assign jobs to those runners which are available. If a runner is not availble because of beeing busy then Gitlab CI assign the job to other runners which are available. In your case it will always assign the jobs to the first runner (whoever it is).
If you want to specify the execution on a specific runner/mashine then have a look at my post here.
In my opinion its good to not know which runner will run your build if you are using docker because thats the benefit of the runner/docker archeticture.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string