Azure Data Factory V1 - azure

Is it possible to trigger a pipeline in ADF v1 using Powershell script?
I found this command "Resume-AzureRmDataFactoryPipeline" to trigger the pipeline, but it does not really start the pipeline..
Please advise.

It really depends on what your pipeline does, but an alternative method is setting the status of a slice to waiting, with the following powershell cmdlet:
$StartDateTime = (Get-Date).AddDays(-7)
$ResourceGroupName = "YourRGName"
$DSName = "YourDatasetName"
$DataFactoryV1Name = "YourDFv1Name"
Set-AzureRmDataFactorySliceStatus -DataFactoryName $DataFactoryV1Name -DatasetName $DSName -ResourceGroupName $ResourceGroupName -StartDateTime $StartDateTime -Status Waiting
Replace with your values and run after being logged in and selecting a subscription. What this does is sets some slices to Waiting, and if their startdatetime is in the past, data factory will run them immediately.
Hope this helped!

Resume-AzureRmDataFactoryPipeline will work only on those pipelines which are suspended as this only
resumes a suspended pipeline in Data Factory. Link.
Now, if you want to start a pipeline then start with -
New-AzureRmDataFactoryPipeline which would create a pipeline for you and if the pipeline already exists then it would ask for confirmation to replace the existing one.
Once successfully done then you can use Set-AzureRmDataFactoryPipelineActivePeriod to configure active period for the data slices. So, this basically means after you create the pipeline, you specify the period in which data processing occurs by specifying the active period for the pipeline in which the data slices are processed. These cmdlets would run only when the data factory is already created.
You could also choose to run Set-AzureRmDataFactoryPipelineActivePeriod independently to define the active periods of the pipeline and run your data factory.

You can use this command Set-AzureRmDataFactorySliceStatus. Through this, you can reset the slice to "Pending Execution" state. You also get the option to set the same status for Upstream slices so that the entire pipeline can re-run.
See this for reference https://learn.microsoft.com/en-us/powershell/module/azurerm.datafactories/set-azurermdatafactoryslicestatus?view=azurermps-5.4.0

Related

How to pass parameters to pipeline during trigger run in Azure Data Factory?

As far as I know I can pass a parameters in manual run(trigger now). But how if I want to set the pipeline to autorun everyday, and be able to pass a parameter without entering the trigger now pipeline page?
Another question is that during the deign of my pipeline, I have set up few parameters and logic linked to it, like "if the parameter is null then run all table, it there is value, then only run that table", that is for user enter re-run for specific table.
However, I noticed that the message "Parameters that are not provided a value will not be included in the trigger.", does that mean my logic in the pipeline cannot be setup this way if I want to trigger it automatically everyday?
Thanks a lot!
Implementing heavy ADF logic can be difficult. You can set the default value for parameters but I assume those need to be set dynamically?
You could also use pipeline variables and an Activity at the beginning of your pipeline named "Set variable" and work with that using expressions to run your activity based on variables that are set with parameters?
In our project we did even something more complicated and we deploy and trigger a Pipeline once a week from Azure Devops. So not the ADF itself triggers the pipeline but AzureDevops scheduled run does.
Powershell:
$parameters = #{
"parameterName1" = $parameterValue
"parameterName2" = $ParameterValue
}
Invoke-AzDataFactoryV2Pipeline -DataFactoryName $DataFactoryName -ResourceGroupName
$ResourceGroupName -PipelineName $pipelineName -Parameter $parameters
With powershell you can implement any logic you really want at this point passing values to ADF.

Automation account rerun, a jobshedule already exist

I have created a CD pipeline in Azure DevOps that will deploy an Azure Automation account and a runbook , shedule , jobshedule through ARM templates.
All working fine except when rerunning the template. My template is a part of a large deployment process that is still under construction so until the total scope is finished the ARM template that creates the runbook, shedule, jobshedule will rerun with every release.
The problem right now is the following: Whenever I rerun the template with a new release pipeline, I receive following error
A job schedule for the specified runbook and schedule already
exists.
At first I tried to be smart so added a GUI before the name of my jobshedule but the shedule itself attaches the runbook with the shedule and the deployment was smart enough to figure it out that the shedule was already connected to the runbook. Is there a way of making this still within the DevOps mindset / process so that I can rerun my templates with no problem.
The workaround solution I have created atm is to delete the shedule at every deployment but that seems like a very bad workaround.
Related feature request from UserVoice / feedback forum is here that's currently in triaged state.
Job Schedule id need to be unique for each deployment as per azure documentation.

How to run a remote command (powershell/bash) against an existing Azure VM in Azure Data Factory V2?

I've been trying to find a way to run a simple command against one of my existing Azure VMs using Azure Data Factory V2.
Options so far:
Custom Activity/Azure Batch won't let me add existing VMs to the pool
Azure Functions - I have not played with this but I have not found any documentation on this using AZ Functions.
Azure Cloud Shell - I've tried this using the browser UI and it works, however I cannot find a way of doing this via ADF V2
The use case is the following:
There are a few tasks that are running locally (Azure VM) in task scheduler that I'd like to orchestrate using ADF as everything else is in ADF, these tasks are usually python applications that restore a SQL Backup and or purge some folders.
i.e. sqdb-restore -r myDatabase
where sqldb-restore is a command that is recognized locally after installing my local python library. Unfortunately the python app needs to live locally in the VM.
Any suggestions? Thanks.
Thanks to #martin-esteban-zurita, his answer helped me to get to what I needed and this was a beautiful and fun experiment.
It is important to understand that Azure Automation is used for many things regarding resource orchestration in Azure (VMs, Services, DevOps), this automation can be done with Powershell and/or Python.
In this particular case I did not need to modify/maintain/orchestrate any Azure resource, I needed to actually run a Bash/Powershell command remotely into one of my existing VMs where I have multiple Powershell/Bash commands running recurrently in "Task Scheduler".
"Task Scheduler" was adding unnecessary overhead to my data pipelines because it was unable to talk to ADF.
In addition, Azure Automation natively only runs Powershell/Python commands in Azure Cloud Shell which is very useful to orchestrate resources like turning on/off Azure VMs, adding/removing permissions from other Azure services, running maintenance or purge processes, etc, but I was still unable to run commands locally in an existing VM. This is where the Hybrid Runbook Worker came into to picture. A Hybrid worker group
These are the steps to accomplish this use case.
1. Create an Azure Automation Account
2. Install the Windows Hybrid Worker in my existing VM . In my case it was tricky because my proxy was giving me some errors. I ended up downloading the Nuget Package and manually installing it.
.\New-OnPremiseHybridWorker.ps1 -AutomationAccountName <NameofAutomationAccount> -AAResourceGroupName <NameofResourceGroup>
-OMSResourceGroupName <NameofOResourceGroup> -HybridGroupName <NameofHRWGroup>
-SubscriptionId <AzureSubscriptionId> -WorkspaceName <NameOfLogAnalyticsWorkspace>
Keep in mind that in the above code, you will need to find your own parameter values, the only parameter that does not have to be found and will be created is HybridGroupName this will define the name of the Hybrid Group
3. Create a PowerShell Runbook
[CmdletBinding()]
Param
([object]$WebhookData) #this parameter name needs to be called WebHookData otherwise the webhook does not work as expected.
$VerbosePreference = 'continue'
#region Verify if Runbook is started from Webhook.
# If runbook was called from Webhook, WebhookData will not be null.
if ($WebHookData){
# Collect properties of WebhookData
$WebhookName = $WebHookData.WebhookName
# $WebhookHeaders = $WebHookData.RequestHeader
$WebhookBody = $WebHookData.RequestBody
# Collect individual headers. Input converted from JSON.
$Input = (ConvertFrom-Json -InputObject $WebhookBody)
# Write-Verbose "WebhookBody: $Input"
#Write-Output -InputObject ('Runbook started from webhook {0} by {1}.' -f $WebhookName, $From)
}
else
{
Write-Error -Message 'Runbook was not started from Webhook' -ErrorAction stop
}
#endregion
# This is where I run the commands that were in task scheduler
$callBackUri = $Input.callBackUri
# This is extremely important for ADF
Invoke-WebRequest -Uri $callBackUri -Method POST
4. Create a Runbook Webhook pointing to the Hybrid Worker's VM
4. Create a webhook activity in ADF where the above PowerShell runbook script will be called via a POST Method
Important Note: When I created the webhook activity it was timing out after 10 minutes (default), so I noticed in the Azure Automation Account that I was actually getting INPUT data (WEBHOOKDATA) that contained a JSON structure with the following elements:
WebhookName
RequestBody (This one contains whatever you add in the Body plus a default element called callBackUri)
All I had to do was to invoke the callBackUri from Azure Automation. And this is why in the PowerShell runbook code I added Invoke-WebRequest -Uri $callBackUri -Method POST. With this, ADF was succeeding/failing instead of timing out.
There are many other details that I struggled with when installing the hybrid worker in my VM but those are more specific to your environment/company.
This looks like a use case that is supported with Azure Automation, using a hybrid worker. Try reading here: https://learn.microsoft.com/en-us/azure/automation/automation-hybrid-runbook-worker
You can call runbooks with webhooks in ADFv2, using the web activity.
Hope this helped!

Pass a value from inside the Azure ADF pipeline to a PowerShell where the pipeline invoked

I want to do some steps in my PowerShell based on a value from an Azure ADF(Azure Data Factory) pipeline. How can I pass a value from an ADF pipeline to the PowerShell, where I invoked this ADF Pipeline? So that, I can do the appropriate steps in the PowerShell based on a value I received from ADF pipeline.
NOTE: I am not looking for the run-status of the pipeline (success, failure etc), but I am looking for some variable-value that we get inside a pipeline - say, a flag-value we obtained from a table using a Lookup activity etc.
Any thoughts?
KPK, the requirements you're talking about definitely can be fulfilled though I do not know where does your Powershell scripts run.
You could write your Powershell scripts in HTTP Trigger Azure Function,please refer to this doc. Then you could get the output of the pipeline in Powershell:
https://learn.microsoft.com/en-us/powershell/module/azurerm.datafactoryv2/invoke-azurermdatafactoryv2pipeline?view=azurermps-4.4.1#outputs.
Then pass the value you want to HTTP Trigger Azure Function as parameters.

trigger azure data factory pipeline

Is there any way to manually trigger a azure data factory pipeline? I would like to have this feature for a demo.
I know that we can suspend and resume a pipeline using power shell scripts.
Thanks.
Here is what I would do.
create everything without pipeline active periods
When you want to run the demo update active periods to dates in the past
If you want to run again, update to another date in the past
Updating dates via powershell would look something like this
Set-AzureDataFactoryPipelineActivePeriod -DataFactoryName $DataFactoryName -PipelineName $PipelineName -StartDateTime $DateInPast -EndDateTime $DateOneDayLessInPast -ResourceGroupName $ResourceGroupName -Force
try following url via postman:
https://management.azure.com/subscriptions/{subId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/pipelines/{pipelineName}/createRun?api-version=2018-06-01
replace {} with your values
remember to add OAuth token. :)

Resources