GitLab multiple runners, exchanging artifacts

GitLab multiple runners, exchanging artifacts - gitlab

I'm already using gitlab CI on smaller projects, but now I'm looking into using gitlab as CI for a larger project.
How can I pass build artifacts (bunch of binary files etc) between two gitlab-runners running on two different physical machines?
Context:
I have a large repository, which produces a lot of artifacts during the build. Obviously this takes time, so I'd like to build on a beefy multi-core machine. If the build passes, I want to test in parallel across many other (smaller) machines. These test-machines are hooked up to many different kinds of equipment. Equipment that I don't want to bother the beefy machine with.
I understand artifacts: and dependencies: should address this, but that uses a local cache as far as I can tell.
The build artefacts weigh in at ~4GB so somehow that data must be transferred.
Can gitlab help with this natively, or do I need a pattern of build+push followed by a fetch+test? (To say, artifactory CEPH NFS etc)
I imagine my needs aren't unique so something must already exist for this.

You are on the right path: artifacts is what you are looking for. Runners do not store the artifacts they build, but they upload them to the GitLab instance.
Now, where GitLab stores them is a different topic, and if you manage your GitLab installation, you can take a look at the administration documentation: https://docs.gitlab.com/ee/administration/job_artifacts.html
You can also retrieve artifacts through APIs, if you have any special need, but artifacts and dependencies should be more than enough for your use case: https://docs.gitlab.com/ee/api/job_artifacts.html#get-job-artifacts

Related

gitlab cloud CI: how to increase memory for shared runner

My Gitlab CI jobs fail because of RAM limitations.
Page https://docs.gitlab.com/ee/user/gitlab_com/index.html says:
All your CI/CD jobs run on n1-standard-1 instances with 3.75GB of RAM, CoreOS and the latest Docker Engine installed.
Below it says:
The gitlab-shared-runners-manager-X.gitlab.com fleet of runners are dedicated for GitLab projects as well as community forks of them. They use a slightly larger machine type (n1-standard-2) and have a bigger SSD disk size. They don’t run untagged jobs and unlike the general fleet of shared runners, the instances are re-used up to 40 times.
So, how do I enable these n1-standard-2 runners (which have 7.5 GB RAM)? I've read the docs over and over but can't seem to find any instructions.

Disclaimer: I did not check if you can use them with a project and if they picked up for your gitlab CI/CD - but that is the way, of how you can check for available Runners and their tags, and how to use them. The terminology GitLab projects as well as community forks of them reads like that this is only for Official GitLab projects and their forks - and not for random projects on GitLab.
You can check all the available runners in your projects CI/CD Settings under Runners, and you will see a list of runners there like:
As you can see there are Runners tagged with gitlab-org. Base on the description you can not run them, without using a tag. Hence that you need to adapt your .gitlab-ci.yml file with those appropriate tags.
EG:
job:
tags:
- gitlab-org
see GitLab documentation for tags

About gitlab CI runners

I am new to gitlab CI and I am fascinated with it. I managed already to get the pipelines working even using docker containers, so I am familiar with the flow for setting jobs and artifacts. I just wish now to understand how this works. My questions are about the following:
Runners
Where is actually everything happening? I mean, which computer is running my builds and executables? I understand that Gitlab has its own shared runners that are available to the users, does this mean that if a shared runner grabs my jobs, is it going to run wherever those runners are hosted? If I register my own runner in my laptop, and use that specific runner, my builds and binaries will be run in my computer?
Artifacts
In order to run/test code, we need the binaries, which from the build stage they are grabbed as artifacts. For the build part if I use cmake, for example, in the script part of the CI.yml file I create a build directory and call cmake .. and so on. Once my job is succesful, if I want the binary i have to go in gitlab and retrieve it myself. So my question is, where is everything saved? I notice that the runner, withing my project, creates something like refs/pipeline/, but where is this actually? how could I get those files and new directories in my laptop
Working space
Pretty much, where is everything happening? the runners, the execution, the artifacts?
Thanks for your time

Everything that happens in each job/step in a pipeline happens on the runner host itself, and depends on the executor you're using (shell, docker, etc.), or on the Gitlab server directly.
If you're using gitlab.com, they have a number of shared runners that the Gitlab team maintains and you can use for your project(s), but as they are shared with everyone on gitlab.com, it can be some time before your jobs are run. However, no matter if you self host or use gitlab.com, you can create your own runners specific for your project(s).
If you're using the shell executor, while the job is running you could see the files on the filesystem somewhere, but they are cleaned up after that job finishes. It's not really intended for you to access the filesystem while the job is running. That's what the job script is for.
If you're using the docker executor, the gitlab-runner service will start a docker instance from the image you specify in .gitlab-ci.yml (or use the default that is configurable). Then the job is run inside that docker instance, and it's deleted immediately after the job finishes.
You can add your own runners anywhere -- AWS, spare machine lying around, even your laptop, and jobs would be picked up by any of them. You can also turn off shared runners and force it to be run on one of your runners if needed.
In cases where you need an artifact after a build/preparatory step, it's created on the runner as part of the job as above, but then the runner automatically uploads the artifact to the gitlab server (or another service that implements the S3 protocol like AWS S3 or Minio). Unless you're using S3/minio, it will only be accessible through the gitlab UI interface, or through the API. In the UI however, it will show up on any related MR's, and also the Pipeline page, so it's fairly accessible.

Keep Gitlab CI pipeline on single runner to ensure cache is shared

I have a local gitlab server running with a few Gitlab CI runners. In the past, we've had each runner have concurrent = 1 setup, and then when a pipeline is run, any available runner takes any job in each stage.
However, I'd like to start caching dependencies between stages. This means that I must ensure an entire pipeline is run within a single runner instance (I'm trying to avoid uploading caches).
Is it possible for an entire pipeline to be assigned a runner? But have 2+ pipelines run concurrently on multiple runners?

The cache is always stored on the same location where the runner is installed and running[1]. So to share a cache across all your runners, you need to setup an S3 replacement like minio[2] and configure your runners to use that cache.
Without uploading (and downloading) the cache to a central storage, it is not possible that every runner can access the cache of another runner.
[1]https://docs.gitlab.com/ce/ci/caching/#cache-vs-artifacts
[2]https://docs.gitlab.com/runner/install/registry_and_cache_servers.html#install-your-own-cache-server
Is it possible for an entire pipeline to be assigned a runner?
Yes. Just give every runner a unique tag. Than tag every job in your pipeline with the tag of one runner. This will ensure that your pipeline will be executed only by one runner. For more see https://docs.gitlab.com/ce/ci/runners/#using-tags

What you want is currently (GitLab 11.7) not possible (At least on Windows it seems) without the significant administrative overhead of assigning each runner specifically for each of your jobs. Pinning a specific runner to your project and disable all shared ones would work too.
There are a handful of issues that prevent this use case as it is not possible to share the runners cache even with a S3 blob storage configuration (we tried minio).
One of them is a race condition that prevents the cache to be extracted correctly if subsequent jobs are executed on different nodes. This is especially the case for parallel jobs.
What we tried:
Sharing cache using a SMB folder available on all runner machines
Using minio to share the cache
You can find our bug ticket here:
https://gitlab.com/gitlab-org/gitlab-runner/issues/3920

Jenkins - Artifact handling

I have a Jenkins set-up consisting of one Master and two Slaves. I have Jenkins jobs (which run only on the slaves) which will create binaries on every commit. Currently, Jenkins archives these artifacts into some place within the Jenkins Master. When i wish to download the binaries using a bash shell script, i use wget url_link_to_particular_artifact. I wish to change this. I want to copy all the generated artifacts into one common location on the master node. So, the url would remain the same and only the last part would change with respect to the generated binary name. I label my binaries with tags so it is easy to retrieve them later on. Now, is there a plugin which will copy artifacts into the master node but to the location that I can provide. The master and slave nodes are all redhat linux machines.
I have already gone through the Artifactory Plugin and I do not wish to use it. I want something really simple to implement. Is there really a need for a web server to be running at the location on the master where I wish to copy the artifacts into? Can i transfer the artifacts from slave to master over SSH? If yes, how?
EDIT:
I have made some progress and I am sort of stuck now: Assuming we have a web-server on the Jenkins master node that is running. Is it possible for the slave nodes to send the artifacts to this location and the web-server sort of writes it into the file system at that location on the Master??

This, of course, is possible, but let me explain to you, why this is a bad idea.
Jenkins is not your artifact repository. Indeed you can store your artifacts in Jenkins, but it was not designed to do so. If you will do that for most of your jobs, you will run into problems with disk space, etc. or even race condition with names.
Not to mention that you don't want to have hundreds or thousands of files in one directory.
Better approach would be to use an artifact repository, such as Nexus to store your artifacts. You can manage and retrieve them easily thru different channels.
Keep in mind that it would be nice to keep your Jenkins in stateless mode and version control your configuration for easy restoration.
If you still want to store your artifacts in one web location, I'd suggest to setup an nginx server, proxy /jenkins calls to jenkins and /artifacts to your artifacts directory.

GitLab CI and Distributed Build Confusion

I'm relatively new to continuous integration servers. I've been using GitLab (v6.5) for a while to manage projects, but I'd like to begin using the GitLab CI to ensure tests pass and builds succeed.
My testing setup consists of two virtual machines: one machine for GitLab and another machine for the GitLab CI (and runners). However, in production I only have a single machine, which is running GitLab. The GitLab team posted an interesting blog post a while back that emphasized:
If you are running tests on the CI server you are doing it wrong!
It was a very informative post, but I didn't come away feeling like I understood this specific point. Does this mean one shouldn't run GitLab and GitLab CI on the same server? Does it mean one shouldn't run GitLab CI and GitLab CI runners on the same server? Or both-- Do I need three servers, one for each task?
From the same post:
Anybody who can push to a branch that is tested on a CI server can easily own that server.
This implies to me that the runners are the security risk since they can run stuff contained in a commit. If that's the case, what's the typical implementation? Put GitLab and GitLab CI on the same machine, but the runners on a separate machine? Wouldn't it still suck if the runner machine was compromised? So people are okay losing their runner machine as long as their code machine is safe?
I would really like to understand this a bit more-- definitely before I implement it in production. Is there any possible yet safe way to implement GitLab, GitLab CI, and GitLab CI runners all on the same machine?

Ideally you're fine running gitlab-ci and gitlab on the same host. Others may disagree with me but the orechestrator (the gitlab-ci node) doesn't do any of the heavy lifting. Its strictly job meta IO and warehousing the results.
With that being said, I would not put the runners on the same machine. Gitlab-CI Runners are resource intensive and will be executing at full tilt on whichever machine you place them on. Its a good idea if you're running in production to put these on spot instances to help curb some of the costs of running the often cpu/memory hungry builds - but can be impractical as your instances are not always on at that point.
I've had some success with putting my gitlab-ci runner's in digital ocean on small instances. I'm not doing HUGE builds, but the idea is to distribute the work load against several servers so your CI server:
Is responsive
Can build multiple project builds at once
Can exercise isolation (this is kind of arbitrary in this list)
and a few other things that don't come to mind right away.
Hope this helps!

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string