currently experiencing an issue with azure pipelines where a job seems to be stuck running stopping other jobs from being processed. The running job has been cancelled yet the agent says it is running, are there any solutions to this? We've tried deleting the 'azure pipelines', turning the agent off and back on again but no luck, is this likely to be an azure bug? We have not hit any caps or limits
Below you can see there is one running job.
When I click into azure pipelines no processes are running
But the agent thinks it is running Job 938 but as can be seen it is not running
Any help appreciated, thanks
Related
I need to know , how to stop a azure databricks cluster by doing configuration when it is running infinitely for executing a job.(without manual stopping)and as well as create an email alert for it, as the job running time exceeds its usual running time.
You can do this in the Jobs UI, Select your job, under Advanced, edit the Alerts and Timeout values.
This Databricks docs page may help you: https://docs.databricks.com/jobs.html
I'm trying to install Giraph on HDInsight cluster with hadoop, using script actions.
After 30+- minutes when deploying the cluster, an error shows up.
Deployment failed
Deployment to resource group 'graphs' failed. Additional details from
the underlying API that might be helpful: At least one resource
deployment operation failed. Please list deployment operations for
details. Please see https://aka.ms/arm-debug for usage details.
Thanks in advance.
Thanks a lot for reporting this issue. We found what the issue is and fixed it.
Issue: There’s a deadlock when the Giraph script is provided during cluster creation. The Giraph script waits for /example/jars to be created in DFS (Wasb/ADLS), but /example/jars can only be created after the Giraph script completes. This issue doesn’t repro for runtime scripts since at the point the script is run, /example/jars has already existed.
Note: We have created and deployed the fix for the scripts. And I have also tested creating a cluster with the updated version, which works fine. Please test on your side and let me know.
I have enable always on property in configuration, still long running jobs are aborting.
I have running 10 long running jobs concurrently in one Web APP. For Web App plan is standard. As per standard plan we can schedule 50 jobs in one web app. still I am facing issue of abort. That it wont abort all the jobs it will abort 3 to 4 jobs which are taking more CPU throughput. It will be great if any body come with answer. Thanks in advance.
I have been using Google Dataproc for a few weeks now and since I started I had a problem with canceling and stopping jobs.
It seems like there must be some server other than those created on cluster setup, that keeps track of and supervises jobs.
I have never had a process that does its job without error actually stop when I hit stop in the dev console. The spinner just keeps spinning and spinning.
Cluster restart or stop does nothing, even if stopped for hours.
Only when the cluster is entirely deleted will the jobs disappear... (But wait there's more!) If you create a new cluster with the same settings, before the previous cluster's jobs have been deleted, the old jobs will start on the new cluster!!!
I have seen jobs that terminate on their own due to OOM errors restart themselves after cluster restart! (with no coding for this sort of fault tolerance on my side)
How can I forcefully stop Dataproc jobs? (gcloud beta dataproc jobs kill does not work)
Does anyone know what is going on with these seemingly related issues?
Is there a special way to shutdown a Spark job to avoid these issues?
Jobs keep running
In some cases, errors have not been successfully reported to the Cloud Dataproc service. Thus, if a job fails, it appears to run forever even though it (has probably) failed on the back end. This should be fixed by a soon-to-be released version of Dataproc in the next 1-2 weeks.
Job starts after restart
This would be unintended and undesirable. We have tried to replicate this issue and cannot. If anyone can replicate this reliably, we'd like to know so we can fix it! This may (is provably) be related to the issue above where the job has failed but appears to be running, even after a cluster restarts.
Best way to shutdown
Ideally, the best way to shutdown a Cloud Dataproc cluster is to terminate the cluster and start a new one. If that will be problematic, you can try a bulk restart of the Compute Engine VMs; it will be much easier to create a new cluster, however.
I am trying out the Kubernetes setup from here The problem is that it hangs with the following output :
Waiting for cluster initialization.
This will continually check to see if the API for kubernetes is reachable.
This might loop forever if there was some uncaught error during start
up.
...................................................................................................................................................................
And it just hangs there?
According to https://github.com/chanezon/azure-linux/tree/master/coreos/kubernetes Kubernetes for GIT is currently broken, I also wasnt able to get it running unfortunately :(