How to setup an ADF pipeline that isolates every pipeline run and create its own computer resources?

How to setup an ADF pipeline that isolates every pipeline run and create its own computer resources? - azure

I have a simple pipeline in ADF that is triggered by a Logic App every time someone submits a file as response in a Microsoft forms. The pipeline creates a cluster based in a Docker and then uses a Databricks notebook to run some calculations that can take several minutes. 
The problem is that every time the pipeline is running and someone submits a new response to the forms, it triggers another pipeline run that, for some reason, will make the previous runs to fail.
The last pipeline will always work fine, but earlier runs will show this error:
 > Operation on target "notebook" failed: Cluster 0202-171614-fxvtfurn does not exist 
However, checking the parameters of the last pipeline it uses a different cluster id, 0202-171917-e616dsng for example.
 It seems that for some reason, the computers resources for the first run are relocated in order to be used for the new pipeline run. However, the IDs of the cluster are different.
I have set up the concurrency up to 5 in the pipeline general settings tab, but still getting the same error. 
Concurrency setup screenshot
Also, in the first connector that looks up for the docker image files I have the concurrency set up to 15, but this won’t fix the issue 
look up concurrency screenshot
To me, it seems a very simple and common task when it comes to automation and data workflows, but I cannot figure it out.
I really appreciate any help and suggestions, thanks in advance

The best way would be use an existing pool rather than recreating the pool everytime

Related

Azure Data Factory: skip activity in debug mode

Basic question
How can I skip an activity within a pipeline in Azure Data Factory if the pipeline runs in debug mode?
Background information
I have a complex pipeline setup (one master pipeline that triggers multiple sub pipelines) which also triggers fail messages if some activities failed. When testing things in debug mode, the fail messages are also triggered. This should not be happening to avoid spam.
Current approach
I could use the system variable #pipeline().TriggerType, which has the value Manual and pass that information as parameter from master pipeline through every single sub pipeline and check for the trigger type before sending the message (if triggerType != Manual). But this would mean a lot of changes and more things to consider when creating new pipelines, because that parameter always needs to be there then.
Does anyone have a better idea? Any idea how I can check in a sub-pipeline if the whole process was initially triggered via a scheduled trigger or as a debug run?

Currently we can't disable / skip an activity in ADF pipeline during its run
Please submit the feedback for this feature here:
https://feedback.azure.com/forums/270578-data-factory/suggestions/33111574-ability-to-disable-an-activity
You can either follow one of these for now:
Manually delete the activity and click debug for execution but don't publish it
Create a copy of that pipeline by cloning from original pipeline and delete the activities that you need to skip and save that with a suffix DEBUG which will become easy to identify and then you can run that pipeline whenever you need to debug
Perform the steps using parameter as you mentioned
Thanks

azure selenium script randomly fails

I have a selenium script that get triggered in an azure pipeline to test some web pages if they are working. The script get triggered every hour in an azure pipeline, but the weird thing is that this script randomly, at least twice a day, it fails because it doesn't find an element. I do believe that this might happen because the pipeline worker is not fast enough to load the pages.
So I was wondering, if there is a way how can I solve this issue as for now the script when it fails its returning a false positive and I would like to avoid this.
thank you so much for any help or advice you can offer

To wait until the page is fully loaded, you can check similar ticket for the details.
In addition, for azure devops pipeline, to make it's more stable, you can setup self-hosted agent for the selenium test.

How to determine if a Databricks cluster is ready using the API?

I'm calling the /clusters/events API with PowerShell to check if my Databricks cluster is up and ready for the next step in my setup process. Is this the best approach?
Currently, I grab the array of ClusterEvent and check the most recent ClusterEvent for its ClusterEventType. If it's RUNNING, we're good to go and we move on to the next step.
Recently, I discovered my release pipeline was hanging while checking the cluster status. It turns out that the cluster was in fact running but its status was DRIVER_HEALTHY, not RUNNNING. So, I changed my script and everyone is happy again.
Is there an official API call I make that returns yes/no, true/false, etc. so I don't need to code for the ClusterEventType I find means the cluster is running?

There is no such API that says yes/no about the cluster status. You can use Get command of the Clusters REST API - it returns information about current state of the cluster, so you just need to wait until it's get to the RUNNING state.
P.S. if you're doing that as part of release pipeline, or something like, then you can look to the Terraform provider for Databricks - it will handle waiting for cluster running, and other things automatically, and you can combine it with other things, like, provisioning of Azure resources, etc.

Setting for running pipelines in sequence - Azure Devops

Is there a parameter or a setting for running pipelines in sequence in azure devops?
I currently have a single dev pipeline in my azure DevOps project. I use this for infrastructure because I build, test, and deploy using scripts in multiple stages in my pipeline.
My issue is that my stages are sequential, but my pipelines are not. If I run my pipeline multiple times back-to-back, agents will be assigned to every run and my deploy scripts will therefore run in parallel.
This is an issue if our developers commit close together because each commit kicks off a pipeline run.

You can reduce the number of parallel jobs to 1 in your project settings.
I swear there was a setting on the pipeline as well but I can't find it. You could do an API call as part or your build/release to pause and start the pipeline as well. Pause as the first step and start as the last step. This will ensure the active pipeline is the only one running.

There is a new update to Azure DevOps that will allow sequential pipeline runs. All you need to do is add a lockBehavior parameter to your YAML.
https://learn.microsoft.com/en-us/azure/devops/release-notes/2021/sprint-190-update

Bevan's solution can achieve what you want, but there has an disadvantage that you need to change the parallel number manually back and forth if sometimes need parallel job and other times need running in sequence. This is little unconvenient.
Until now, there's no directly configuration to forbid the pipeline running. But there has a workaruond that use an parameter to limit the agent used. You can set the demand in pipeline.
After set it, you'll don't need to change the parallel number back and forth any more. Just define the demand to limit the agent used. When the pipeline running, it will pick up the relevant agent to execute the pipeline.
But, as well, this still has disadvantage. This will also limit the job parallel.
I think this feature should be expand into Azure Devops thus user can have better experience of Azure Devops. You can raise the suggestion in our official Suggestion forum. Then vote it. Our product group and PMs will review it and consider taking it into next quarter roadmap.

Limit azure pipeline to only run one after the other rather than in parallel

I have set up a PR Pipeline in Azure. As part of this pipeline I run a number of regression tests. These run against a regression test database - we have to clear out the database at the start of the tests so we are certain what data is in there and what should come out of it.
This is all working fine until the pipeline runs multiple times in parallel - then the regression database is being written to multiple times and the data returned from it is not what is expected.
How can I stop a pipeline running in parallel - I've tried Google but can't find exactly what I'm looking for.
If the pipeline is running, the the next build should wait (not for all pipelines - I want to set it on a single pipeline), is this possible?

Depending on your exact use case, you may be able to control this with the right trigger configuration.
In my case, I had a pipeline scheduled to kick off every time a Pull Request is merged to the main branch in Azure. The pipeline deployed the code to a server and kicked off a suite of tests. Sometimes, when two merges occurred just minutes apart, the builds would fail due to a shared resource that required synchronisation being used.
I fixed it by Batching CI Runs
I changed my basic config
trigger:
- main
to use the more verbose syntax allowing me to turn batching on
trigger:
batch: true
branches:
include:
- main
With this in place, a new build will only be triggered for main once the previous one has finished, no matter how many commits are added to the branch in the meantime.
That way, I avoid having too many builds being kicked off and I can still use multiple agents where needed.

One way to solve this is to model your test regression database as an "environment" in your pipeline, then use the "Exclusive Lock" check to prevent concurrent "deployment" to that "environment".
Unfortunately this approach comes with several disadvantages inherent to "environments" in YAML pipelines:
you must set up the check manually in the UI, it's not controlled in source code.
it will only prevent that particular deployment job from running concurrently, not an entire pipeline.
the fake "environment" you create will appear in alongside all other environments, cluttering the environment view if you happen to use environments for "real" deployments. This is made worse by this view being a big sack of all environments, there's no grouping or hierarchy.
Overall the initial YAML reimplementation of Azure Pipelines mostly ignored the concepts of releases, deployments, environments. A few piecemeal and low-effort aspects have subsequently been patched in, but without any real overarching design or apparent plan to get to parity with the old release pipelines.

You can use "Trigger Azure DevOps Pipeline" extension by Maik van der Gaag.
It needs to add to you DevOps and configure end of the main pipeline and point to your test pipeline.
Can find more details on Maik's blog.

According to your description, you could use your own self-host agent.
Simply deploy your own self-hosted agents.
Just need to make sure your self host agent environment is the same as your local development environment.
Under this situation, since your agent pool only have one available build agent. When multiple builds triggered, only one build will be running simultaneously. Others will stay in queue with a specific order for agents. Unless the prior build finished, it will not run with next build.
For other pipeline, just need to keep use the host agent pool.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

How to setup an ADF pipeline that isolates every pipeline run and create its own computer resources? - azure

The best way would be use an existing pool rather than recreating the pool everytime

Related

Azure Data Factory: skip activity in debug mode

azure selenium script randomly fails

How to determine if a Databricks cluster is ready using the API?

Setting for running pipelines in sequence - Azure Devops

Limit azure pipeline to only run one after the other rather than in parallel

Categories

Resources