Azure DevOps pipeline stuck - azure

I am facing below error while i am running the pipeline.
This agent request is not running because you have reached the maximum number of requests that can run for parallelism type 'Microsoft-Hosted Private'. Current position in queue: 1
Note: This is my first job and i am not running any additional pipelines in the same project.
Please help me to sort out this issue.
Thanks & Regards,
Muthu Kumar M.,

First of all, you can view in-progress jobs in Parallel jobs of Organization Settings to check if there is only one running job.
If your organization is newly created, there could be no agent pool available.
Since March, we have temporarily disabled the free grant of parallel jobs for public projects and for certain private projects in new organizations. However, you can request this grant by sending an email to azpipelines-freetier#microsoft.com.
Related release note
For more information about parallel jobs and free grants, see our documentation.

You can create your own private build server to overcome this limit.

Related

Are Azure Pipelines designed to support long-running jobs (running for days, potentially weeks)?

I have a long-running Java/Gradle process and an Azure Pipelines job to run it.
It's perfectly fine and expected for the process to run for several days, potentially over a week. The Azure Pipelines job is run on a self-hosted agent (to rule out any timeout issues) and the timeout is set to 0, which in theory means that the job can run forever.
Sometimes the Azure Pipelines job fails after a day or two with an error message that says "We stopped hearing from agent". Even when this happens, the job may still be running, as evident when SSH-ing into the machine that hosts the agent.
When I discuss investigating these failures with DevOps, I often hear that Azure Pipelines is a CI tool that is not designed for long-running jobs. Is there evidence to support this claim? Does Microsoft commit to only support running jobs within a certain duration limit?
Based on the troubleshooting guide and timeout documentation page referenced above, there's a duration limit applicable to Microsoft-hosted agents, but I fail to see anything similar for self-hosted agents.
Agree with #Dianel Mann.
It's not common to run long-time jobs, but as per doc, it should be supported.
stopped hearing from agent could be caused by network problem on the agent, or agent issue due to high cpu, storage, ram...etc. You can check the agent diagnostic log to troubleshoot.

GitLab analyze job queue waiting time

Is there a built-in way to get an overview how long jobs, by tag, spend in queue, to check if the runners are over- or undercommited? I checked the Admin Area, but did not find anything, have I overlooked something?
If not, are there any existing solutions you are aware of? I tried searching, but my keywords are too broad and as such the results are as broad, and found nothing yet.
Edit: I see the job REST API can return all runner's jobs and has created_at, started_at and finished_at, maybe I'll have to analyze that.
One way to do this is to use the runners API to list the status of all your active runners. So, at any given moment in time, you can assess how many active jobs you have running by listing each runners 'running' jobs (use ?status=running filter).
So, from the above, you should be able to arrive at the number of currently running jobs. You can compare that against your maximum capacity of jobs (the sum of the configured jobs limits for all your runners).
However, if you are at full capacity, this won't tell you how much over capacity you are or how big the job backlog is. For that, you can look at the pending queue under /admin/jobs. I'm not sure if there's an official API endpoint to list all the pending jobs, however.
Another way to do this is through GitLab's prometheus metrics. Go to the /-/metrics endpoint for your GitLab server (the full URL can be found at /admin/health_check). These are the prometheus metrics exposed by GitLab for monitoring. In these metrics, there is a metrics called job_queue_duration_ that you can use to query metrics on the queue duration for jobs. That is -- how long jobs are queued before they begin running. You can even delimit project by project or whether the runners are shared or not.
To get an average of this time per minute, I've used the following prometheus query:
sum(rate(job_queue_duration_seconds_sum{jobs_running_for_project=~".*",shard="default",shared_runner="true"}[1m]))
Although the feature is currently deprecated, you could even create this metric in the builtin metrics monitoring feature, using localhost:9000 (the gitlab server itself) as the configured prometheus server and using the above query, you would see a chart like so:
Of course, any tool that can view prometheus metrics will work (e.g., grafana, or your own prometheus server).
GitLab 15.6 (November 2022) starts implementing your request:
Admin Area Runners - job queued and duration times
When GitLab administrators get reports from their development team that a CI job is either waiting for a runner to become available or is slower than expected, one of the first areas they investigate is runner availability and queue times for CI jobs.
While there are various methods to retrieve this data from GitLab, those options could be more efficient.
They should provide what users need - a view that makes it more evident if there is a bottleneck on a specific runner.
The first iteration of solving this problem is now available in the GitLab UI.
GitLab administrators can now use the runner details view in Admin Area > Runners to view the queue time for CI job and job execution duration metrics.
See Documentation and Issue.

Azure DevOps Self hosted agent error connectivity issues

We are using Azure DevOps Self hosted agents to build and release our application. Often we are seeing
below error and recovering automatically. Does anyone know what is this error ,how to tackle this and where to exactly check logs about the error ?
We stopped hearing from agent <agent name>. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error. For more information, see: https://go.microsoft.com/fwlink?Linkid=846610
This seems to be a known issue with both self-hosted and Microsoft-hosted agents that many people have been reporting.
Quoting the reply from #zachariahcox from the Azure Pipelines Product Group:
To provide some context, the azure pipelines agent is composed of two
processes: agent.listener and agent.worker (one of these per
step in the job). The listener is responsible for reporting that
workers are still making progress. If the agent.listener is unable
to communicate with the server for 10 minutes (we attempt to
communicate every minute), we assume something has Gone Wrong and
abandon the job.
So, if you're running a private machine, anything that can interfere
with the listener's ability to communicate with our server is going to
be a problem.
Among the issues i've seen are anti-virus programs identifying it as a
threat, local proxies acting up in various ways, the physical machine
running out of memory or disk space (quite common), the machine
rebooting unexpectedly, someone ctrl+c'ing the whole listener process,
the work payload being run at a way higher priority than the listener
(thus "starving" the listener out), unit tests shutting down network
adapters (quite common), having too many agents at normal priority on
the same machine so they starve each other out, etc.
If you think you're seeing an issue that cannot be explained by any of
the above (and nothing jumps out at you from the _diag logs folder),
please file an issue at
https://azure.microsoft.com/en-us/support/devops/
If everything seems to be perfectly alright with your agent and none of the steps mentioned in the Pipeline troubleshooting guide help, please report it on Developer Community where the Azure DevOps Team and DevOps community are actively answering questions.

Azure Pipelines: How to block pipeline A if pipeline B is running

I have two pipelines (also called "build definitions") in azure pipelines, one is executing system tests and one is executing performance tests. Both are using the same test environment. I have to make sure that the performance pipeline is not triggered when the system test pipeline is running and vice versa.
What I've tried so far: I can access the Azure DevOps REST-API to check whether a build is running for a certain definition. So it would be possible for me to implement a job executing a script before the actual pipeline runs. The script then just checks for the build status of the other pipeline by checking the REST-API each second and times out after e.g. 1 hour.
However, this seems quite hacky to me. Is there a better way to block a build pipeline while another one is running?
If your project is private, the Microsoft-hosted CI/CD parallel job limit is one free parallel job that can run for up to 60 minutes each time, until you've used 1,800 minutes (30 hours) per month.
The self-hosted CI/CD parallel job limit is one self-hosted parallel job. Additionally, for each active Visual Studio Enterprise subscriber who is a member of your organization, you get one additional self-hosted parallel job.
And now, there isn't such setting to control different agent pool parallel job limit.But there is a similar problem on the community, and the answer has been marked. I recommend you can check if the answer is helpful for you. Here is the link.

Azure Batch integration with Bitbucket

Is there any way to build and deploy Azure Batch Application Packages when changes are pushed to Bitbacket repository?
I'm looking for the same Deployment approach as for Azure Functions or something like this.
To start with this is what I think on top of my head.
Cool, I will share some information and thoughts around this, I am sure you can make use of the information to help your idea.
There are 2 levels of application package:
Pool level; and
Task level
Detail information here: https://learn.microsoft.com/en-us/azure/batch/batch-application-packages
The pool level is set at the pool level and available to any task joining the pool where as the task level gets unpacked at the creation of task.
Please be aware of the max limits of pkgs etc:
* https://learn.microsoft.com/en-us/azure/batch/batch-quota-limit#other-limits
Key
AFAIK, there is no flag which can tell the vm that the current pkg is updated, hence in your scenario 2 things can happen:
pool level Scenario: if you are joining the pool every time: If you can afford to create pool i.e. at the join pool level then you can keep the package name and every time the code is updated you can recreate the pool which will end up creating the whole thing again i.e. the new package will get picked up.
Task level: If you don't want to create pool all the time, then you can do it by creating new task every time your code gets changed, not the caveat to that will be the max limit which is described at the link above.
Both ways you can do it via user code but deciding between what scenario will suite you depends on the grand architecture of the case.
Information flow possibility at user end
some resource in bit bucket.
User at any change in that resource ==> packs it in *.zip format and then carry on with the batch side of things.
User create the pool or mention task level packages (depending on the detail above); can also add the versions for the same package (beaware of the max limits here)
pkg is available in vm.
Alternate approach:
There is another way this can be done which is non package way:
Mount drive to the node at start task:
And user code has to make user that drive always gets updated will latest version of “*.files”
I hope this help your scenario \ design :), thanks!

Resources