Basic question
How can I skip an activity within a pipeline in Azure Data Factory if the pipeline runs in debug mode?
Background information
I have a complex pipeline setup (one master pipeline that triggers multiple sub pipelines) which also triggers fail messages if some activities failed. When testing things in debug mode, the fail messages are also triggered. This should not be happening to avoid spam.
Current approach
I could use the system variable #pipeline().TriggerType, which has the value Manual and pass that information as parameter from master pipeline through every single sub pipeline and check for the trigger type before sending the message (if triggerType != Manual). But this would mean a lot of changes and more things to consider when creating new pipelines, because that parameter always needs to be there then.
Does anyone have a better idea? Any idea how I can check in a sub-pipeline if the whole process was initially triggered via a scheduled trigger or as a debug run?
Currently we can't disable / skip an activity in ADF pipeline during its run
Please submit the feedback for this feature here:
https://feedback.azure.com/forums/270578-data-factory/suggestions/33111574-ability-to-disable-an-activity
You can either follow one of these for now:
Manually delete the activity and click debug for execution but don't publish it
Create a copy of that pipeline by cloning from original pipeline and delete the activities that you need to skip and save that with a suffix DEBUG which will become easy to identify and then you can run that pipeline whenever you need to debug
Perform the steps using parameter as you mentioned
Thanks
Related
I have a simple pipeline in ADF that is triggered by a Logic App every time someone submits a file as response in a Microsoft forms. The pipeline creates a cluster based in a Docker and then uses a Databricks notebook to run some calculations that can take several minutes.
The problem is that every time the pipeline is running and someone submits a new response to the forms, it triggers another pipeline run that, for some reason, will make the previous runs to fail.
The last pipeline will always work fine, but earlier runs will show this error:
> Operation on target "notebook" failed: Cluster 0202-171614-fxvtfurn does not exist
However, checking the parameters of the last pipeline it uses a different cluster id, 0202-171917-e616dsng for example.
It seems that for some reason, the computers resources for the first run are relocated in order to be used for the new pipeline run. However, the IDs of the cluster are different.
I have set up the concurrency up to 5 in the pipeline general settings tab, but still getting the same error.
Concurrency setup screenshot
Also, in the first connector that looks up for the docker image files I have the concurrency set up to 15, but this won’t fix the issue
look up concurrency screenshot
To me, it seems a very simple and common task when it comes to automation and data workflows, but I cannot figure it out.
I really appreciate any help and suggestions, thanks in advance
The best way would be use an existing pool rather than recreating the pool everytime
I have a an Azure Pipeline A, that executes a deployment to my Salesforce org in the event of a PR merge.
My problem statement is,
I am not able to restrict the execution of this pipeline such that it executes only after the previous execution of the same pipeline has completed.
In other words, if this pipeline is triggered by multiple PR's, then I would want only once instance of the pipeline to run. The next one should wait until the previous run has been completed.
Is there a way to achieve this?
You can enable "Batch changes while a build is in progress" option to execute one pipeline at a time.
If your question was on Release Pipeline, you can achieve this through specifying number of executions in the "Deployment queue settings" under Pre-Deployment conditions for the particular stage.
If you are using YAML you should be able to use the following trigger:
trigger:
batch: boolean # batch changes if true; start a new build for every push if false (default)
https://learn.microsoft.com/en-us/azure/devops/pipelines/yaml-schema?view=azure-devops&tabs=schema%2Cparameter-schema#triggers
Is there a parameter or a setting for running pipelines in sequence in azure devops?
I currently have a single dev pipeline in my azure DevOps project. I use this for infrastructure because I build, test, and deploy using scripts in multiple stages in my pipeline.
My issue is that my stages are sequential, but my pipelines are not. If I run my pipeline multiple times back-to-back, agents will be assigned to every run and my deploy scripts will therefore run in parallel.
This is an issue if our developers commit close together because each commit kicks off a pipeline run.
You can reduce the number of parallel jobs to 1 in your project settings.
I swear there was a setting on the pipeline as well but I can't find it. You could do an API call as part or your build/release to pause and start the pipeline as well. Pause as the first step and start as the last step. This will ensure the active pipeline is the only one running.
There is a new update to Azure DevOps that will allow sequential pipeline runs. All you need to do is add a lockBehavior parameter to your YAML.
https://learn.microsoft.com/en-us/azure/devops/release-notes/2021/sprint-190-update
Bevan's solution can achieve what you want, but there has an disadvantage that you need to change the parallel number manually back and forth if sometimes need parallel job and other times need running in sequence. This is little unconvenient.
Until now, there's no directly configuration to forbid the pipeline running. But there has a workaruond that use an parameter to limit the agent used. You can set the demand in pipeline.
After set it, you'll don't need to change the parallel number back and forth any more. Just define the demand to limit the agent used. When the pipeline running, it will pick up the relevant agent to execute the pipeline.
But, as well, this still has disadvantage. This will also limit the job parallel.
I think this feature should be expand into Azure Devops thus user can have better experience of Azure Devops. You can raise the suggestion in our official Suggestion forum. Then vote it. Our product group and PMs will review it and consider taking it into next quarter roadmap.
I have set up a PR Pipeline in Azure. As part of this pipeline I run a number of regression tests. These run against a regression test database - we have to clear out the database at the start of the tests so we are certain what data is in there and what should come out of it.
This is all working fine until the pipeline runs multiple times in parallel - then the regression database is being written to multiple times and the data returned from it is not what is expected.
How can I stop a pipeline running in parallel - I've tried Google but can't find exactly what I'm looking for.
If the pipeline is running, the the next build should wait (not for all pipelines - I want to set it on a single pipeline), is this possible?
Depending on your exact use case, you may be able to control this with the right trigger configuration.
In my case, I had a pipeline scheduled to kick off every time a Pull Request is merged to the main branch in Azure. The pipeline deployed the code to a server and kicked off a suite of tests. Sometimes, when two merges occurred just minutes apart, the builds would fail due to a shared resource that required synchronisation being used.
I fixed it by Batching CI Runs
I changed my basic config
trigger:
- main
to use the more verbose syntax allowing me to turn batching on
trigger:
batch: true
branches:
include:
- main
With this in place, a new build will only be triggered for main once the previous one has finished, no matter how many commits are added to the branch in the meantime.
That way, I avoid having too many builds being kicked off and I can still use multiple agents where needed.
One way to solve this is to model your test regression database as an "environment" in your pipeline, then use the "Exclusive Lock" check to prevent concurrent "deployment" to that "environment".
Unfortunately this approach comes with several disadvantages inherent to "environments" in YAML pipelines:
you must set up the check manually in the UI, it's not controlled in source code.
it will only prevent that particular deployment job from running concurrently, not an entire pipeline.
the fake "environment" you create will appear in alongside all other environments, cluttering the environment view if you happen to use environments for "real" deployments. This is made worse by this view being a big sack of all environments, there's no grouping or hierarchy.
Overall the initial YAML reimplementation of Azure Pipelines mostly ignored the concepts of releases, deployments, environments. A few piecemeal and low-effort aspects have subsequently been patched in, but without any real overarching design or apparent plan to get to parity with the old release pipelines.
You can use "Trigger Azure DevOps Pipeline" extension by Maik van der Gaag.
It needs to add to you DevOps and configure end of the main pipeline and point to your test pipeline.
Can find more details on Maik's blog.
According to your description, you could use your own self-host agent.
Simply deploy your own self-hosted agents.
Just need to make sure your self host agent environment is the same as your local development environment.
Under this situation, since your agent pool only have one available build agent. When multiple builds triggered, only one build will be running simultaneously. Others will stay in queue with a specific order for agents. Unless the prior build finished, it will not run with next build.
For other pipeline, just need to keep use the host agent pool.
I need to spawn a variable number of jobs from one upstream job.
AFAIK, there is no plugin that can do this. The closest one is MultiJob plugin (https://wiki.jenkins-ci.org/display/JENKINS/Multijob+Plugin).
So I thought to create a build step that would use one of the Jenkins APIs (REST, groovy or jenkins-cli) to trigger those builds.
However, if I do that, those builds become "detached" (meaning they do not have an upstream job) and the main job has no linkage with those builds.
So it boils down to this: is it possible to start a job build and tell it who is its upstream?
There is Build Result Trigger plugin. It is literally the inverse of Parameterized Trigger Plugin. Instead of triggering downstream jobs, like the latter does, the Build Result Trigger lets your "downstream" jobs watch/monitor the progress of an upstream job, and trigger based on that result.
This way, your "upstream" job is actually not aware of downstream jobs that are watching it.
Check out the Groovy Plugin.
It'll let you fire as many jobs as you want, and set the upstream cause.
Code Example: http://wiki.jenkins-ci.org/display/JENKINS/Groovy+plugin#Groovyplugin-Retrievingparametersandtriggeringanotherbuild
job.scheduleBuild2(0, new Cause.UpstreamCause(build), new ParametersAction(params))
A related post is here:
How do I dynamically trigger downstream builds in jenkins?
However, from all the answers that I have read it's clear that using the Groovy/Java class hudson.model.Cause.UpstreamCause(currentBuild) does achieve the goal of programmatically triggering another job, but it does not fully establish an upstream/downstream relationship.
When you examine the builds you do not see any upstream/downstream information. The only way to see those is to open the console output of each.