We are currently in the process of migrating our CI/CD pipelines to GitLab. For testing purposes we are using a runner deployed on OpenShift 4 using the GitLab Runner operator. Using the default configuration for now (we are still in a very early stage) we are able to spin up runners normally and without issues, however we are noticing long delays between jobs. For example a build job normally takes about 2 minutes for the actual build to take place, takes almost 8 minutes in total as a job to finish. This happens even after the job has successfully finished (as evident by the logs). This in turn means that there are long delays between jobs of a single pipeline.
I took a look at the configuration properties of the runner but I am unable to figure out whether we have something misconfigured. For reference we are using GitLab CE version 13.12.15 and the Runner in question is running version 15.0.0.
Does anyone know how to mitigate this problem?
Related
I have a long-running Java/Gradle process and an Azure Pipelines job to run it.
It's perfectly fine and expected for the process to run for several days, potentially over a week. The Azure Pipelines job is run on a self-hosted agent (to rule out any timeout issues) and the timeout is set to 0, which in theory means that the job can run forever.
Sometimes the Azure Pipelines job fails after a day or two with an error message that says "We stopped hearing from agent". Even when this happens, the job may still be running, as evident when SSH-ing into the machine that hosts the agent.
When I discuss investigating these failures with DevOps, I often hear that Azure Pipelines is a CI tool that is not designed for long-running jobs. Is there evidence to support this claim? Does Microsoft commit to only support running jobs within a certain duration limit?
Based on the troubleshooting guide and timeout documentation page referenced above, there's a duration limit applicable to Microsoft-hosted agents, but I fail to see anything similar for self-hosted agents.
Agree with #Dianel Mann.
It's not common to run long-time jobs, but as per doc, it should be supported.
stopped hearing from agent could be caused by network problem on the agent, or agent issue due to high cpu, storage, ram...etc. You can check the agent diagnostic log to troubleshoot.
Normally in my project pipeline job for tests takes about 20min (gitlab runners) to finish on gitlab, and (2min locally on macOS). Something about 2 months ago there where a problem with running this job with no reason (no change in the code at all), the job took over 59min and it was automatically canceled by gitlab.
Then I spend a time to run it on locally gitlabrunner using docker, and It worked fine. After few days of investigation when I ran the same pipeline it just finish with success and it was like that since last week. Now again I cannot run pipeline because job is just hanging for an hour and then it is canceled.
The deployment of the company product has several tasks to finish. For example,
task 1 to copy some build files to server A
task 2 to copy some build files to server B
task 1 or 2 could fail and we need to redeploy only the failed task because each task takes a long time to finish.
I can split the tasks into different stages but we have a long tasks list and if we include staging and production it will be difficult to manage.
so my question is
is there an easy way to redeploy partial tasks without editing and disabling the tasks in the stage?
or a better way to organize multiple stages into one group like 'Staging' or 'Production' so I can have a better visualization of the release stages
thanks
Update:
Thanks #jessehouwing
Found there is an option when I click redeploy. See screenshot below.
You can group each stage with one or more jobs. You can easily retry jobs without having to run the whole stage. You will get the overhead of each job fetching sources or downloading artifacts and to use the output of a previous job you need to publish the result. One advantage is that jobs can run in parallel, your overall duration may actually be shorter that way.
I'm building multiple projects using a single docker build, generating an image and pushing that into AWS ECR. I've recently noticed that builds that were taking 6-7 minutes are now taking on the order of 25 minutes. The Docker build portion of the process that checks out git repos and does the project builds takes ~5 minutes, but what is really slow are the individual Docker build commands such as COPY, ARG, RUN, ENV, LABEL etc. Each one is taking a very long time resulting in an additional 18 minutes or so. The timings vary quite a bit, even though the build remains generally the same.
When I first noticed this degradation Azure was reporting that their pipelines were impacted by "abuse", which I took as a DDOS against the platform (early April 2021). Now, that issue has apparently been resolved, but the slow performance continues.
Are Azure DevOps builds assigned random agents? Should we be running some kind of cleanup process such as docker system prune etc?
Are Azure DevOps builds assigned random agents? Should we be running some kind of cleanup process such as docker system prune etc?
Based on your description:
The timings vary quite a bit, even though the build remains generally the same.
This issue should still be a performance problem of the hosted agent.
And based on the settings of Azure DevOps, every time you run the pipeline with host-agent, the system will randomly match a new qualified agent. Azure DevOps builds assigned random new agent, so we do not need run some kind of cleanup process.
To verify this, you could set your private agent to check if the build time is much different each time (The first build time may be a bit longer because there is no local cache resource).
By the way, if you still want to determine whether the decline in hosted performance is causing your problem, you should contact the Product team directly, and they can check the region where your organization is located to determine whether there is degradation in the region.
I have two pipelines (also called "build definitions") in azure pipelines, one is executing system tests and one is executing performance tests. Both are using the same test environment. I have to make sure that the performance pipeline is not triggered when the system test pipeline is running and vice versa.
What I've tried so far: I can access the Azure DevOps REST-API to check whether a build is running for a certain definition. So it would be possible for me to implement a job executing a script before the actual pipeline runs. The script then just checks for the build status of the other pipeline by checking the REST-API each second and times out after e.g. 1 hour.
However, this seems quite hacky to me. Is there a better way to block a build pipeline while another one is running?
If your project is private, the Microsoft-hosted CI/CD parallel job limit is one free parallel job that can run for up to 60 minutes each time, until you've used 1,800 minutes (30 hours) per month.
The self-hosted CI/CD parallel job limit is one self-hosted parallel job. Additionally, for each active Visual Studio Enterprise subscriber who is a member of your organization, you get one additional self-hosted parallel job.
And now, there isn't such setting to control different agent pool parallel job limit.But there is a similar problem on the community, and the answer has been marked. I recommend you can check if the answer is helpful for you. Here is the link.