Job is pending, waiting for approval, then become skipped - azure

I have an Azure pipeline where the last stage needs approval from an authorised person. The pipeline seems to work well, and when this last stage is reached the status is "Job is pending..." as expected:
The problem is that after a certain time, the job eventually turn to "skipped" status automatically, so the person who should approve doesn't have time to do so:
Unfortunately I can't find what's causing this. How would I go about debugging this issue? Is there any log I can look at that would tell us why the job is being skipped (couldn't find any such log)? If not, any idea what can transition a job from "waiting for approval" to "skipped" without us doing anything?

The problem is that after a certain time, the job eventually turn to
"skipped" status automatically.
According to your screenshot, you are using approvals and checks. When the approvers didn’t approve or reject the request until the timeout specified, it is an expected behavior that the stage will be marked skipped.
You can check the timeout setting in your resources. By default, it is set as 30 Days. You can define the timeout where you define the approvals and checks.
Please note: Maximum timeout is 30 days.
For your reference, you can find more details in the official doc: Define approvals and checks.
Azure Pipelines pauses the execution of a pipeline prior to each
stage, and waits for all pending checks to be completed. Checks are
re-evaluation based on the retry interval specified in each check. If
all checks are not successful until the timeout specified, then that
stage is not executed. If any of the checks terminally fails (for
example, if you reject an approval on one of the resources), then that
stage is not executed
.

Related

Is there a way to keep the Azure DataFactory from reporting a pipeline as failed when only one activity has failed?

I have created a pipeline in Azure DataFactory that comprises of multiple activities, some of which are used as fallbacks if certain activities fail. Unfortunately, the pipeline is always reported as "failed" in the monitor tab, even if the fallback activities succeed. Can pipelines be set to appear as "succeeded" in the monitoring tab even if one or more activities fail?
Can pipelines be set to appear as "succeeded" in the monitoring tab
even if one or more activities fail
There are 3 ways to handle this mechanism
Try Catch block This approach renders pipeline succeeds, if Upon Failure path succeeds.
Do If Else block This approach renders pipeline fails, even if Upon Failure path succeeds.
Do if Skip Else block This approach renders pipeline succeeds, if Upon Failure path succeeds.
Using above approach you can get success status even if one or more activities fail.
Reference - https://learn.microsoft.com/en-us/azure/data-factory/tutorial-pipeline-failure-error-handling

Azure DevOps - Continue on Time out error in Pipeline Deployment

If azure deployment pipeline stages fails with time out error any options available to continue with next stage ? I have tried with few option like continue with error, partial error etc all work if error occurs not on timeout error
Thanks You
For the stage trigger option : Trigger even when the selected stages partially succeed, it requires the previous stages partially succeed. If all the tasks of first stage are failed it won't trigger the second stage. It requires the first stage has at least one successful task.

How to cause gitlab to retry on purpose?

From this link,
https://docs.gitlab.com/ee/ci/yaml/#retry
it shows that it is possible to cause gitlab to retry a job based on certain circumstances. Those circumstances are listed in the 'when' section. How do we cause a script to cause one of those retry conditions?
Do we return a number? How do we find what number?
For some reason, a service we're using sometimes is never recognized as ready to be used, so what I want to do is check for readiness for like 10 minutes and if it's still failing, fail the script with a reason of "stuck_or_timeout_failure" and then have:
retry:
max: 5
when:
- stuck_or_timeout_failure
How do I get there?
This should be possible with GitLab 14.6 (December 2021):
Job failure reason returned in API response
It can be hard to use the API to gather data about why a job failed.
For example, you might want exact failure reasons to make better use of the retry:when keyword.
Now, the failure_reason is exposed in responses from the Jobs API, and it is much easier to gather job failure data.
Thanks to #albert.vacacintora for this contribution!
See Documentation and Issue.

How to send Alert Notification for Failed Job in Google Dataproc?

I am wondering if there is a way to hook in some notifications for jobs submitted in Dataproc. We are planning to use Dataproc to run a streaming application 24/7. But Dataproc doesnt seem to have a way to notify for failed jobs.
Just wondering if Google StackDriver can be used by any means.
Thanks
Suren
Sure, StackDriver can be used to set an alert policy on a defined log-metric.
For example, you can set a Metric Absence policy which will monitor for successful job completion and alert if it's missing for a defined period of time.
Go to Logging in your console and set a filter:
resource.type="cloud_dataproc_cluster"
jsonPayload.message:"completed with exit code 0"
Click on Create Metric, after filling the details you'll be redirected to the log-metrics page where you'll be able to create an alert from the metric
As noted in the answer above, log-based metrics can be coerced to provide the OP required functionality. But, metric absence for long-running jobs would imply you would have to wait for longer than a guess at the longest job running time (and you still might get an alert if the job takes a bit longer but is not failing). What 'we' really want is a way of monitoring and alerting on job status failed, or, service completion message indicating failure (like your example), such that we are alerted immediately. Yes, you can define a Stackdriver log-based metric, looking for specific strings or values indicating failure, and this 'works', but metrics are measures that are counted, for example 'how many jobs failed', and require inconvenient workarounds to turn alert-from-metric into a simple 'this job failed' alert. To make this work, for example, the alert filters on a metric and also needs to specify a mean aggregator over an interval to fire an alert. Nasty :(

What happens to "in-progress" jobs when you deploy a webjob?

Subject says it all really :) Say I've got a pretty busy Azure continuous webjob that is processing from an azure Queue:
public static void ProcessQueue([QueueTrigger("trigger")] Info info)
{ .... }
If I re-deploy the webjob, I can see that any currently executing job seems to be aborted (I see a "Never Finished" status). Is that job replayed after I release or is it lost forever?
Also, is there a nice way to make sure that no jobs are running when we deploy webjobs, or is it up to the developer to code a solution to that (such as a config flag that is checked every run).
Thanks
When a WebJob that uses the WebJobs SDK picks up a message from a queue, it acquires it with a 10 minutes lease. If the job process dies while processing the message, the lease expires after 10 minutes and the message goes back in the queue. If the WebJob is restarted, it will pick that message again. The message is only deleted if the function completes successfully.
Therefore, if the job dies and restarts immediately, like in the case of a redeploy, it might take up to 10 minutes to pick again the message. Also, because of this, it is recommended to either save state yourself or make the function idempotent.
In the WebJobs Dashboard you will see two invocations for the same message. One of them will be marked as Never Finished because the function execution never completed.
Unfortunately, there is no out of the box solution to prevent jobs from running during deploy. You would have to create your own logic that notifies (through a queue message?) that a deploy is about the start and then aborts the host. The host abort will wait for any existing function to stop and will prevent new ones from starting. However, this is a very tricky situation if you have multiple instances of the webjob because only one of them will get the notification.

Resources