GitLab CI and Distributed Build Confusion - gitlab

I'm relatively new to continuous integration servers. I've been using GitLab (v6.5) for a while to manage projects, but I'd like to begin using the GitLab CI to ensure tests pass and builds succeed.
My testing setup consists of two virtual machines: one machine for GitLab and another machine for the GitLab CI (and runners). However, in production I only have a single machine, which is running GitLab. The GitLab team posted an interesting blog post a while back that emphasized:
If you are running tests on the CI server you are doing it wrong!
It was a very informative post, but I didn't come away feeling like I understood this specific point. Does this mean one shouldn't run GitLab and GitLab CI on the same server? Does it mean one shouldn't run GitLab CI and GitLab CI runners on the same server? Or both-- Do I need three servers, one for each task?
From the same post:
Anybody who can push to a branch that is tested on a CI server can easily own that server.
This implies to me that the runners are the security risk since they can run stuff contained in a commit. If that's the case, what's the typical implementation? Put GitLab and GitLab CI on the same machine, but the runners on a separate machine? Wouldn't it still suck if the runner machine was compromised? So people are okay losing their runner machine as long as their code machine is safe?
I would really like to understand this a bit more-- definitely before I implement it in production. Is there any possible yet safe way to implement GitLab, GitLab CI, and GitLab CI runners all on the same machine?

Ideally you're fine running gitlab-ci and gitlab on the same host. Others may disagree with me but the orechestrator (the gitlab-ci node) doesn't do any of the heavy lifting. Its strictly job meta IO and warehousing the results.
With that being said, I would not put the runners on the same machine. Gitlab-CI Runners are resource intensive and will be executing at full tilt on whichever machine you place them on. Its a good idea if you're running in production to put these on spot instances to help curb some of the costs of running the often cpu/memory hungry builds - but can be impractical as your instances are not always on at that point.
I've had some success with putting my gitlab-ci runner's in digital ocean on small instances. I'm not doing HUGE builds, but the idea is to distribute the work load against several servers so your CI server:
Is responsive
Can build multiple project builds at once
Can exercise isolation (this is kind of arbitrary in this list)
and a few other things that don't come to mind right away.
Hope this helps!

Related

How often does Gitlab Runner connect to gitlab.com?

I'm new to CI/CD.
I installed Gitlab Runner on my VPS for my project.
The first pipeline was successfully passed after the push to the master branch.
Questions:
Gitlab Runner does not listen to any port. Instead, it connects to the Gitlab server itself and checks if there are tasks for it. Is that true?
If so, how often does Gitlab Runner connect to gitlab.com?
Is it dangerous to keep Gitlab Runner and the project on the same VPS?
Gitlab Runner does not listen to any port. Instead, it connects to the Gitlab server itself and checks if there are tasks for it. Is that true?
Yes. It is a "pull" mechanism exclusively, as far as the jobs go. Though the runner may open ports for additional functionality, such as the session server.
If so, how often does Gitlab Runner connect to gitlab.com?
By default, every 3 seconds to check for new jobs (assuming the concurrency limit has not been reached). This can be configured through the check_interval configuration
Is it dangerous to keep Gitlab Runner and the project on the same VPS?
It can be, so this should be avoided where possible. The degree of danger depends somewhat on which executor you use and how you configure your server.
The file system of your GitLab server is extremely sensitive; it contains many critical secrets and operational components that should not be exposed to anyone (including CI jobs). Besides this, jobs would also have the potential to cause performance problems -- if a job uses too much memory/cpu/IO/etc., it could cause resource contention and harm/crash your other GitLab components on the server.
You might be able to get away with a single VPS if you deploy the server and runner as separate containers with a Docker or Helm deployment of GitLab and the runner on the VPS host; you can use separate logical Docker volumes and set resource constraints via Docker... but there is still some additional risk in such a scenario and configuration mistakes (which may be easier to make than one might think) can have serious consequences.

Gitlab Runners when Gitlab server is down

I have a question regarding Gitlab runners here. I've looked and searched quite much but haven't found any relevant information yet.
I'm going to update the Gitlab server for the company I work for, and we have many Gitlab runners.
There will be a down time for the Gitlab server update and I was wondering what will happen to the active Gitlab runners during this downtime?
Will the Gitlab runners function as normal after the update or will I need to register all runners again?
Anyone familiar with this?
Thanks :)
They should work again as normal, but it depends how exactly the runners are set up. Where are the runners deployed?
E.g. for local development purposes i have deployed some runners locally via docker and mix their usage with the shared "company" runners.

About gitlab CI runners

I am new to gitlab CI and I am fascinated with it. I managed already to get the pipelines working even using docker containers, so I am familiar with the flow for setting jobs and artifacts. I just wish now to understand how this works. My questions are about the following:
Runners
Where is actually everything happening? I mean, which computer is running my builds and executables? I understand that Gitlab has its own shared runners that are available to the users, does this mean that if a shared runner grabs my jobs, is it going to run wherever those runners are hosted? If I register my own runner in my laptop, and use that specific runner, my builds and binaries will be run in my computer?
Artifacts
In order to run/test code, we need the binaries, which from the build stage they are grabbed as artifacts. For the build part if I use cmake, for example, in the script part of the CI.yml file I create a build directory and call cmake .. and so on. Once my job is succesful, if I want the binary i have to go in gitlab and retrieve it myself. So my question is, where is everything saved? I notice that the runner, withing my project, creates something like refs/pipeline/, but where is this actually? how could I get those files and new directories in my laptop
Working space
Pretty much, where is everything happening? the runners, the execution, the artifacts?
Thanks for your time
Everything that happens in each job/step in a pipeline happens on the runner host itself, and depends on the executor you're using (shell, docker, etc.), or on the Gitlab server directly.
If you're using gitlab.com, they have a number of shared runners that the Gitlab team maintains and you can use for your project(s), but as they are shared with everyone on gitlab.com, it can be some time before your jobs are run. However, no matter if you self host or use gitlab.com, you can create your own runners specific for your project(s).
If you're using the shell executor, while the job is running you could see the files on the filesystem somewhere, but they are cleaned up after that job finishes. It's not really intended for you to access the filesystem while the job is running. That's what the job script is for.
If you're using the docker executor, the gitlab-runner service will start a docker instance from the image you specify in .gitlab-ci.yml (or use the default that is configurable). Then the job is run inside that docker instance, and it's deleted immediately after the job finishes.
You can add your own runners anywhere -- AWS, spare machine lying around, even your laptop, and jobs would be picked up by any of them. You can also turn off shared runners and force it to be run on one of your runners if needed.
In cases where you need an artifact after a build/preparatory step, it's created on the runner as part of the job as above, but then the runner automatically uploads the artifact to the gitlab server (or another service that implements the S3 protocol like AWS S3 or Minio). Unless you're using S3/minio, it will only be accessible through the gitlab UI interface, or through the API. In the UI however, it will show up on any related MR's, and also the Pipeline page, so it's fairly accessible.

Gitlab CI Runner without routable IP

I'd like to use Gitlab CI in my workflow, but because my project relies on licensed software, so I need it to run on my machine, which does not have a public, routable IP. My thought is that I can create a simple server on heroku to accept the webhooks and put the requests into a message queue (a redis DB, for instance), which my local machine can poll and actually run the CI job. It appears, however, that the entire Gitlab CI system is written assuming that the gitlab.com server can directly speak to the runner. Does anybody know of a proof of concept for proxying the CI build trigger through a webhook or making the gitlab-runner pull build jobs rather than accepting push events? I could roll my own runner if necessary that polls for build events and runs the commands I need, but it'd be really nice to use the existing, documented infrastructure/file format rather than reinventing the wheel. Thanks for any suggestions.
It turns out that I had misinterpreted the documentation and this already works out of the box with the standard runner.

What are the security risks of using Gitlab CI shared test runners?

I am trying to host a new project with Gitlab. It is a private Python project. I was able to test some initial tests with Gitlab CI.
I don't use cache while running tests,
While exploring the runner section in settings, there is a warning shown,
GitLab Runners do not offer secure isolation between projects that
they do builds for. You are TRUSTING all GitLab users who can push
code to project A, B or C to run shell scripts on the machine hosting
runner X.
what are the security risks in using a shared test runner? Is it safe to run private projects on a shared runner? What precautions can be taken while running tests on a shared runner?
Thank you for any insight.
GitLab CI runner offers the following executor types:
shell
docker
ssh
docker-ssh
parallels
virtualbox
The security concerns you should have are mainly from using ssh and shell runners.
shell is unsafe unless you're in a controlled environment.
This is because it's, literally, a simple shell. The user running your build will have access to everything else going on for that user, and that includes other projects.
ssh is susceptible to man-in-the-middle attacks.
If you're dealing with private crypto keys in your builds, beware that they may be stolen.
Fortunately, http://gitlab.com seems to be sharing only docker runners.
docker runners are generally safe* because every build runs in a new container, so there's nothing to worry.
You can read further about GitLab CI Runner security here.
* unless you're doing the nasty privileged mode!
Using Gitlab shared runners does not comes at no risk. As Gitlab caches the content of the repos where the runners are running, your code is still stored somewhere in the Gitlab's instances. Whoever can have access to these instances as a user capable to read the cache folder, can access to yours.
If you're already running Gitlab as SaaS, I guess it doesn't make a big difference.

Resources