I am new to gitlab CI and I am fascinated with it. I managed already to get the pipelines working even using docker containers, so I am familiar with the flow for setting jobs and artifacts. I just wish now to understand how this works. My questions are about the following:
Runners
Where is actually everything happening? I mean, which computer is running my builds and executables? I understand that Gitlab has its own shared runners that are available to the users, does this mean that if a shared runner grabs my jobs, is it going to run wherever those runners are hosted? If I register my own runner in my laptop, and use that specific runner, my builds and binaries will be run in my computer?
Artifacts
In order to run/test code, we need the binaries, which from the build stage they are grabbed as artifacts. For the build part if I use cmake, for example, in the script part of the CI.yml file I create a build directory and call cmake .. and so on. Once my job is succesful, if I want the binary i have to go in gitlab and retrieve it myself. So my question is, where is everything saved? I notice that the runner, withing my project, creates something like refs/pipeline/, but where is this actually? how could I get those files and new directories in my laptop
Working space
Pretty much, where is everything happening? the runners, the execution, the artifacts?
Thanks for your time
Everything that happens in each job/step in a pipeline happens on the runner host itself, and depends on the executor you're using (shell, docker, etc.), or on the Gitlab server directly.
If you're using gitlab.com, they have a number of shared runners that the Gitlab team maintains and you can use for your project(s), but as they are shared with everyone on gitlab.com, it can be some time before your jobs are run. However, no matter if you self host or use gitlab.com, you can create your own runners specific for your project(s).
If you're using the shell executor, while the job is running you could see the files on the filesystem somewhere, but they are cleaned up after that job finishes. It's not really intended for you to access the filesystem while the job is running. That's what the job script is for.
If you're using the docker executor, the gitlab-runner service will start a docker instance from the image you specify in .gitlab-ci.yml (or use the default that is configurable). Then the job is run inside that docker instance, and it's deleted immediately after the job finishes.
You can add your own runners anywhere -- AWS, spare machine lying around, even your laptop, and jobs would be picked up by any of them. You can also turn off shared runners and force it to be run on one of your runners if needed.
In cases where you need an artifact after a build/preparatory step, it's created on the runner as part of the job as above, but then the runner automatically uploads the artifact to the gitlab server (or another service that implements the S3 protocol like AWS S3 or Minio). Unless you're using S3/minio, it will only be accessible through the gitlab UI interface, or through the API. In the UI however, it will show up on any related MR's, and also the Pipeline page, so it's fairly accessible.
Related
I'm new to CI/CD.
I installed Gitlab Runner on my VPS for my project.
The first pipeline was successfully passed after the push to the master branch.
Questions:
Gitlab Runner does not listen to any port. Instead, it connects to the Gitlab server itself and checks if there are tasks for it. Is that true?
If so, how often does Gitlab Runner connect to gitlab.com?
Is it dangerous to keep Gitlab Runner and the project on the same VPS?
Gitlab Runner does not listen to any port. Instead, it connects to the Gitlab server itself and checks if there are tasks for it. Is that true?
Yes. It is a "pull" mechanism exclusively, as far as the jobs go. Though the runner may open ports for additional functionality, such as the session server.
If so, how often does Gitlab Runner connect to gitlab.com?
By default, every 3 seconds to check for new jobs (assuming the concurrency limit has not been reached). This can be configured through the check_interval configuration
Is it dangerous to keep Gitlab Runner and the project on the same VPS?
It can be, so this should be avoided where possible. The degree of danger depends somewhat on which executor you use and how you configure your server.
The file system of your GitLab server is extremely sensitive; it contains many critical secrets and operational components that should not be exposed to anyone (including CI jobs). Besides this, jobs would also have the potential to cause performance problems -- if a job uses too much memory/cpu/IO/etc., it could cause resource contention and harm/crash your other GitLab components on the server.
You might be able to get away with a single VPS if you deploy the server and runner as separate containers with a Docker or Helm deployment of GitLab and the runner on the VPS host; you can use separate logical Docker volumes and set resource constraints via Docker... but there is still some additional risk in such a scenario and configuration mistakes (which may be easier to make than one might think) can have serious consequences.
I want to implement redundancy in my GitLab runners.
Before creating a new server I am trying with my local machine.
The current setup on my repository:
Working runner (from server)
Non working runner (from local machine)
I want GitLab to chose the other runner when the selected is not working.
The thing is that GitLab is selecting the non-working runner and fail the pipeline without trying to run with the other runner.
How can I make this works?
Both runner are added:
But as the local runner (not working) is chosen, the pipeline fails:
This is an interesting edge case since the runner process itself is still healthy, but something while running a job is failing. The runner process won't know this happens until it retrieves a job and tries to run it, so it will keep try to run jobs, and keep failing.
Since neither the Runner process nor Gitlab can catch this edge case, the only option I can see is that when you see a failed job for this reason, pause the Runner in the project (like in your screenshot) or if you're an admin (or can ask an admin), pause it for the entire instance, assuming you're self-hosting Gitlab. This will prevent any new jobs from running on that Runner so you can troubleshoot the issue.
This will let you run multiple Runner processes on different hosts (or even on the same host by specifying separate config.toml files) so you can still get redundancy and speed up your pipelines.
Some quick searching shows that common issues causing this issue are the runner's host running out of disk space, or a Docker issue that might be solved by updating to the latest version. Making sure Gitlab and the runners are the latest available version wouldn't hurt either.
Another option you have is to submit a new Issue with Gitlab to see if they can address it. The desired situation would be that in the event of a runner system failure, the runner should become unhealthy and not process further jobs.
I have a local gitlab server running with a few Gitlab CI runners. In the past, we've had each runner have concurrent = 1 setup, and then when a pipeline is run, any available runner takes any job in each stage.
However, I'd like to start caching dependencies between stages. This means that I must ensure an entire pipeline is run within a single runner instance (I'm trying to avoid uploading caches).
Is it possible for an entire pipeline to be assigned a runner? But have 2+ pipelines run concurrently on multiple runners?
The cache is always stored on the same location where the runner is installed and running[1]. So to share a cache across all your runners, you need to setup an S3 replacement like minio[2] and configure your runners to use that cache.
Without uploading (and downloading) the cache to a central storage, it is not possible that every runner can access the cache of another runner.
[1]https://docs.gitlab.com/ce/ci/caching/#cache-vs-artifacts
[2]https://docs.gitlab.com/runner/install/registry_and_cache_servers.html#install-your-own-cache-server
Is it possible for an entire pipeline to be assigned a runner?
Yes. Just give every runner a unique tag. Than tag every job in your pipeline with the tag of one runner. This will ensure that your pipeline will be executed only by one runner. For more see https://docs.gitlab.com/ce/ci/runners/#using-tags
What you want is currently (GitLab 11.7) not possible (At least on Windows it seems) without the significant administrative overhead of assigning each runner specifically for each of your jobs. Pinning a specific runner to your project and disable all shared ones would work too.
There are a handful of issues that prevent this use case as it is not possible to share the runners cache even with a S3 blob storage configuration (we tried minio).
One of them is a race condition that prevents the cache to be extracted correctly if subsequent jobs are executed on different nodes. This is especially the case for parallel jobs.
What we tried:
Sharing cache using a SMB folder available on all runner machines
Using minio to share the cache
You can find our bug ticket here:
https://gitlab.com/gitlab-org/gitlab-runner/issues/3920
I have a Jenkins set-up consisting of one Master and two Slaves. I have Jenkins jobs (which run only on the slaves) which will create binaries on every commit. Currently, Jenkins archives these artifacts into some place within the Jenkins Master. When i wish to download the binaries using a bash shell script, i use wget url_link_to_particular_artifact. I wish to change this. I want to copy all the generated artifacts into one common location on the master node. So, the url would remain the same and only the last part would change with respect to the generated binary name. I label my binaries with tags so it is easy to retrieve them later on. Now, is there a plugin which will copy artifacts into the master node but to the location that I can provide. The master and slave nodes are all redhat linux machines.
I have already gone through the Artifactory Plugin and I do not wish to use it. I want something really simple to implement. Is there really a need for a web server to be running at the location on the master where I wish to copy the artifacts into? Can i transfer the artifacts from slave to master over SSH? If yes, how?
EDIT:
I have made some progress and I am sort of stuck now: Assuming we have a web-server on the Jenkins master node that is running. Is it possible for the slave nodes to send the artifacts to this location and the web-server sort of writes it into the file system at that location on the Master??
This, of course, is possible, but let me explain to you, why this is a bad idea.
Jenkins is not your artifact repository. Indeed you can store your artifacts in Jenkins, but it was not designed to do so. If you will do that for most of your jobs, you will run into problems with disk space, etc. or even race condition with names.
Not to mention that you don't want to have hundreds or thousands of files in one directory.
Better approach would be to use an artifact repository, such as Nexus to store your artifacts. You can manage and retrieve them easily thru different channels.
Keep in mind that it would be nice to keep your Jenkins in stateless mode and version control your configuration for easy restoration.
If you still want to store your artifacts in one web location, I'd suggest to setup an nginx server, proxy /jenkins calls to jenkins and /artifacts to your artifacts directory.
Please note: I am not interested in any enterprise/for-pay (Tower?) solutions here, only solutions available via Ansible's OSS offering.
OK so I've got my Ansible project configured and working perfectly, woo hoo! Looks something like this:
myansible01.example.com:/opt/ansible/
site.yml
fizz.yml
buzz.yml
group_vars/
roles/
common/
tasks/
main.yml
handlers
main.yml
foos/
tasks/
main.yml
handlers/
main.yml
There's several things I need to accomplish to get this working in a production environment:
I need to be able to automate the deployment of changes to this project
I need to schedule playbooks to be ran, say, every 30 seconds (to ensure all managed nodes are always in compliance)
So my concerns:
How are changes usually deployed to live Ansible projects? Say the project is located at myansible01.example.com:/opt/ansible (my Ansible server). Is it sufficient to simply delete the Ansible project root (rm -rf /opt/ansible) and then copy the latest (containing changes) Ansible project back to the same location? What happens if Ansible is currently running any plays while I perform this "drop-n-swap"?
It looks like the commercial offering (Ansible Tower) has a scheduling feature built into it, but not the OSS offering. How can I schedule Ansible OSS to run plays at certain times? For instance, I might want certain plays to be ran every 30 seconds, so as to ensure nodes are always within compliance. Is cron sufficient to do this, or is there a more standard approach?
For this kind of task you typically want an orchestration engine such as Jenkins to do all your, well, orchestration.
You can set Jenkins to run playbooks on timers or other events such as a push to an SCM such as git.
Typically a job starts by checking out a tag/branch of our Ansible code base and then applying it to all of our specified servers so you always know what is being run. If you want, this can simply be the head on master (in git terms) so it's always applying the most recent changes. If you were also to have this to hook into your SCM repo then a simple push will force those changes to be applied to all of your servers.
Because of that immediacy you might want to consider only doing this on some test servers that then have some form of testing done against them (such as Serverspec) to verify that your changes are good before rolling them out to a production environment.
Jenkins, by default, will not run a job while the same job is running (or if you are maxed out on executor slots) so you can always be sure that it will only pull the repo (including any changes) after your Ansible run is complete. If you have multiple jobs running you can use blocking to prevent jobs running at the same time (both trying to apply potentially different configurations to the servers) but you don't have to worry about a new job starting and pulling the repo into the already running job as Jenkins separates these into separate work spaces.
We use Jenkins for manual runs of Ansible against our environment but we also have a "self healing" Jenkins job that simply runs a tagged commit of our Ansible code base against our environment, forcing it to an idempotent state to prevent natural drift of configurations. When we need to do something different to the environment or are running a slightly further ahead commit of our code base in to it we can easily disable the self healing job until we're happy with things and then either just re-enable the job to put things back or advance the tag that Jenkins is using to now use the more recent commit.