I have a continuous integration setup running on Jenkins where every time I push changes to a specific branch, Jenkins starts 50 EC2 slaves. Each slave checks out the branch and run some tests against the branch.
The problem is that the 50 EC2 slaves start all at once and they check out the branch from my GitLab server, which starts a git process for each slave and quickly runs out of memory (despite the fact that the server has 7.5GB of RAM.)
If I'm looking at top, I can see that as soon as the slaves get going, many git processes appear and start to quickly consume all the memory. The kern.log tells me that the kernel has to periodically kill a git process because the system runs out of memory.
My question is how do I limit the number of git processes that GitLab starts in such a way that the slaves don't think GitLab has gone away. (I'd like GitLab to put the requests in a queue and serve them 10 at a time, for example.)
I've also considered some other ideas. For example, I could make just a few of the slaves (seed slaves) pull the branch from GitLab and then have the rest of the slaves pull the branch from those seed slaves. But that seems like it might involve some work.
Another idea is that I could stagger the EC2 launches by a few seconds, to spread the requests over a couple of minutes. Or, I could get more memory.
I welcome ideas for any other approaches for dealing with this.
I think your last idea would be the best :
try to spread the load for your gitlab instance.
For example, you could start your CI by building the artifact :
Assuming you work with say java, you could build the war file on the CI server and distribute it through a shared S3 bucket.
This way, you query gitlab only once per build and run as many copies as you want. Each instance downloads the build and runs its tests.
Related
I want to implement redundancy in my GitLab runners.
Before creating a new server I am trying with my local machine.
The current setup on my repository:
Working runner (from server)
Non working runner (from local machine)
I want GitLab to chose the other runner when the selected is not working.
The thing is that GitLab is selecting the non-working runner and fail the pipeline without trying to run with the other runner.
How can I make this works?
Both runner are added:
But as the local runner (not working) is chosen, the pipeline fails:
This is an interesting edge case since the runner process itself is still healthy, but something while running a job is failing. The runner process won't know this happens until it retrieves a job and tries to run it, so it will keep try to run jobs, and keep failing.
Since neither the Runner process nor Gitlab can catch this edge case, the only option I can see is that when you see a failed job for this reason, pause the Runner in the project (like in your screenshot) or if you're an admin (or can ask an admin), pause it for the entire instance, assuming you're self-hosting Gitlab. This will prevent any new jobs from running on that Runner so you can troubleshoot the issue.
This will let you run multiple Runner processes on different hosts (or even on the same host by specifying separate config.toml files) so you can still get redundancy and speed up your pipelines.
Some quick searching shows that common issues causing this issue are the runner's host running out of disk space, or a Docker issue that might be solved by updating to the latest version. Making sure Gitlab and the runners are the latest available version wouldn't hurt either.
Another option you have is to submit a new Issue with Gitlab to see if they can address it. The desired situation would be that in the event of a runner system failure, the runner should become unhealthy and not process further jobs.
I am new to gitlab CI and I am fascinated with it. I managed already to get the pipelines working even using docker containers, so I am familiar with the flow for setting jobs and artifacts. I just wish now to understand how this works. My questions are about the following:
Runners
Where is actually everything happening? I mean, which computer is running my builds and executables? I understand that Gitlab has its own shared runners that are available to the users, does this mean that if a shared runner grabs my jobs, is it going to run wherever those runners are hosted? If I register my own runner in my laptop, and use that specific runner, my builds and binaries will be run in my computer?
Artifacts
In order to run/test code, we need the binaries, which from the build stage they are grabbed as artifacts. For the build part if I use cmake, for example, in the script part of the CI.yml file I create a build directory and call cmake .. and so on. Once my job is succesful, if I want the binary i have to go in gitlab and retrieve it myself. So my question is, where is everything saved? I notice that the runner, withing my project, creates something like refs/pipeline/, but where is this actually? how could I get those files and new directories in my laptop
Working space
Pretty much, where is everything happening? the runners, the execution, the artifacts?
Thanks for your time
Everything that happens in each job/step in a pipeline happens on the runner host itself, and depends on the executor you're using (shell, docker, etc.), or on the Gitlab server directly.
If you're using gitlab.com, they have a number of shared runners that the Gitlab team maintains and you can use for your project(s), but as they are shared with everyone on gitlab.com, it can be some time before your jobs are run. However, no matter if you self host or use gitlab.com, you can create your own runners specific for your project(s).
If you're using the shell executor, while the job is running you could see the files on the filesystem somewhere, but they are cleaned up after that job finishes. It's not really intended for you to access the filesystem while the job is running. That's what the job script is for.
If you're using the docker executor, the gitlab-runner service will start a docker instance from the image you specify in .gitlab-ci.yml (or use the default that is configurable). Then the job is run inside that docker instance, and it's deleted immediately after the job finishes.
You can add your own runners anywhere -- AWS, spare machine lying around, even your laptop, and jobs would be picked up by any of them. You can also turn off shared runners and force it to be run on one of your runners if needed.
In cases where you need an artifact after a build/preparatory step, it's created on the runner as part of the job as above, but then the runner automatically uploads the artifact to the gitlab server (or another service that implements the S3 protocol like AWS S3 or Minio). Unless you're using S3/minio, it will only be accessible through the gitlab UI interface, or through the API. In the UI however, it will show up on any related MR's, and also the Pipeline page, so it's fairly accessible.
I have a Jenkins set-up consisting of one Master and two Slaves. I have Jenkins jobs (which run only on the slaves) which will create binaries on every commit. Currently, Jenkins archives these artifacts into some place within the Jenkins Master. When i wish to download the binaries using a bash shell script, i use wget url_link_to_particular_artifact. I wish to change this. I want to copy all the generated artifacts into one common location on the master node. So, the url would remain the same and only the last part would change with respect to the generated binary name. I label my binaries with tags so it is easy to retrieve them later on. Now, is there a plugin which will copy artifacts into the master node but to the location that I can provide. The master and slave nodes are all redhat linux machines.
I have already gone through the Artifactory Plugin and I do not wish to use it. I want something really simple to implement. Is there really a need for a web server to be running at the location on the master where I wish to copy the artifacts into? Can i transfer the artifacts from slave to master over SSH? If yes, how?
EDIT:
I have made some progress and I am sort of stuck now: Assuming we have a web-server on the Jenkins master node that is running. Is it possible for the slave nodes to send the artifacts to this location and the web-server sort of writes it into the file system at that location on the Master??
This, of course, is possible, but let me explain to you, why this is a bad idea.
Jenkins is not your artifact repository. Indeed you can store your artifacts in Jenkins, but it was not designed to do so. If you will do that for most of your jobs, you will run into problems with disk space, etc. or even race condition with names.
Not to mention that you don't want to have hundreds or thousands of files in one directory.
Better approach would be to use an artifact repository, such as Nexus to store your artifacts. You can manage and retrieve them easily thru different channels.
Keep in mind that it would be nice to keep your Jenkins in stateless mode and version control your configuration for easy restoration.
If you still want to store your artifacts in one web location, I'd suggest to setup an nginx server, proxy /jenkins calls to jenkins and /artifacts to your artifacts directory.
If there are more than one available runner for a project, how does gitlab ci decide which runner to use?
I have an omnibus gitlab 8.6.6-ee installation, with 2 runners configured. The runners are identical (docker images, config, etc) except that they are running on different computers.
If they are both idle and a job comes in that either of them could run, which one will run?
To add to Rubinum's answer the 'first' runner would be whichever runner that checks in first that meets all criteria. For example, labels could limit which runners certain jobs run on.
Runners query the gitlab server every X seconds to check if there are builds. if there's a build queued and multiple meet criteria, the first to ask will win
Update to answer comments:
Runners communicate through the CI API http://docs.gitlab.com/ce/ci/api/builds.html to get build status. This will eventually imply that it will become a more or less random choosing of the runner based on when it finished the last job and the xamount of msit is waiting to check.
To completely answer the question:
Credit goes to BM5k after digging through the code and finding that x = 3 seconds based on this and this. Also found that:
which machine a docker+machine runner will use once that runner has been selected) reveals that the machine selection is more or less (effectively) random as well
Gitlab CI assign jobs to those runners which are available. If a runner is not availble because of beeing busy then Gitlab CI assign the job to other runners which are available. In your case it will always assign the jobs to the first runner (whoever it is).
If you want to specify the execution on a specific runner/mashine then have a look at my post here.
In my opinion its good to not know which runner will run your build if you are using docker because thats the benefit of the runner/docker archeticture.
Please note: I am not interested in any enterprise/for-pay (Tower?) solutions here, only solutions available via Ansible's OSS offering.
OK so I've got my Ansible project configured and working perfectly, woo hoo! Looks something like this:
myansible01.example.com:/opt/ansible/
site.yml
fizz.yml
buzz.yml
group_vars/
roles/
common/
tasks/
main.yml
handlers
main.yml
foos/
tasks/
main.yml
handlers/
main.yml
There's several things I need to accomplish to get this working in a production environment:
I need to be able to automate the deployment of changes to this project
I need to schedule playbooks to be ran, say, every 30 seconds (to ensure all managed nodes are always in compliance)
So my concerns:
How are changes usually deployed to live Ansible projects? Say the project is located at myansible01.example.com:/opt/ansible (my Ansible server). Is it sufficient to simply delete the Ansible project root (rm -rf /opt/ansible) and then copy the latest (containing changes) Ansible project back to the same location? What happens if Ansible is currently running any plays while I perform this "drop-n-swap"?
It looks like the commercial offering (Ansible Tower) has a scheduling feature built into it, but not the OSS offering. How can I schedule Ansible OSS to run plays at certain times? For instance, I might want certain plays to be ran every 30 seconds, so as to ensure nodes are always within compliance. Is cron sufficient to do this, or is there a more standard approach?
For this kind of task you typically want an orchestration engine such as Jenkins to do all your, well, orchestration.
You can set Jenkins to run playbooks on timers or other events such as a push to an SCM such as git.
Typically a job starts by checking out a tag/branch of our Ansible code base and then applying it to all of our specified servers so you always know what is being run. If you want, this can simply be the head on master (in git terms) so it's always applying the most recent changes. If you were also to have this to hook into your SCM repo then a simple push will force those changes to be applied to all of your servers.
Because of that immediacy you might want to consider only doing this on some test servers that then have some form of testing done against them (such as Serverspec) to verify that your changes are good before rolling them out to a production environment.
Jenkins, by default, will not run a job while the same job is running (or if you are maxed out on executor slots) so you can always be sure that it will only pull the repo (including any changes) after your Ansible run is complete. If you have multiple jobs running you can use blocking to prevent jobs running at the same time (both trying to apply potentially different configurations to the servers) but you don't have to worry about a new job starting and pulling the repo into the already running job as Jenkins separates these into separate work spaces.
We use Jenkins for manual runs of Ansible against our environment but we also have a "self healing" Jenkins job that simply runs a tagged commit of our Ansible code base against our environment, forcing it to an idempotent state to prevent natural drift of configurations. When we need to do something different to the environment or are running a slightly further ahead commit of our code base in to it we can easily disable the self healing job until we're happy with things and then either just re-enable the job to put things back or advance the tag that Jenkins is using to now use the more recent commit.