trigger azure data factory pipeline - azure

Is there any way to manually trigger a azure data factory pipeline? I would like to have this feature for a demo.
I know that we can suspend and resume a pipeline using power shell scripts.
Thanks.

Here is what I would do.
create everything without pipeline active periods
When you want to run the demo update active periods to dates in the past
If you want to run again, update to another date in the past
Updating dates via powershell would look something like this
Set-AzureDataFactoryPipelineActivePeriod -DataFactoryName $DataFactoryName -PipelineName $PipelineName -StartDateTime $DateInPast -EndDateTime $DateOneDayLessInPast -ResourceGroupName $ResourceGroupName -Force

try following url via postman:
https://management.azure.com/subscriptions/{subId}/resourceGroups/{resourceGroupName}/providers/Microsoft.DataFactory/factories/{factoryName}/pipelines/{pipelineName}/createRun?api-version=2018-06-01
replace {} with your values
remember to add OAuth token. :)

Related

How to pass parameters to pipeline during trigger run in Azure Data Factory?

As far as I know I can pass a parameters in manual run(trigger now). But how if I want to set the pipeline to autorun everyday, and be able to pass a parameter without entering the trigger now pipeline page?
Another question is that during the deign of my pipeline, I have set up few parameters and logic linked to it, like "if the parameter is null then run all table, it there is value, then only run that table", that is for user enter re-run for specific table.
However, I noticed that the message "Parameters that are not provided a value will not be included in the trigger.", does that mean my logic in the pipeline cannot be setup this way if I want to trigger it automatically everyday?
Thanks a lot!
Implementing heavy ADF logic can be difficult. You can set the default value for parameters but I assume those need to be set dynamically?
You could also use pipeline variables and an Activity at the beginning of your pipeline named "Set variable" and work with that using expressions to run your activity based on variables that are set with parameters?
In our project we did even something more complicated and we deploy and trigger a Pipeline once a week from Azure Devops. So not the ADF itself triggers the pipeline but AzureDevops scheduled run does.
Powershell:
$parameters = #{
"parameterName1" = $parameterValue
"parameterName2" = $ParameterValue
}
Invoke-AzDataFactoryV2Pipeline -DataFactoryName $DataFactoryName -ResourceGroupName
$ResourceGroupName -PipelineName $pipelineName -Parameter $parameters
With powershell you can implement any logic you really want at this point passing values to ADF.

Dynamicallly get KeyVault secret in Azure DevOps Powershell script

We have an Azure Key Vault task in our release pipeline which downloads some secrets for use in the stage.
In an Inline Azure PowerShell script you can just use the following to get the secret value:
$secretValue = $(nameOfTheSecretInKeyVault)
This works fine.
However we want to move to using scripts in the repo, i.e. poiting the DevOps task to a file path i.e. /somePath/myScript.ps1
So I would need to parameterise the above line of code, as I cannot just change the name in the inline script like I'm currently doing, but I can't get it to work.
I have tried:
$compositeName = "${someParameter}-Application"
$secretValue1 = $($compositeName)
$secretValue2 = $("${compositeName}")
$secretValue3 = env:$compositeName
$secretValue4 = $(${compositeName})
The top line is just building up the name of the secret which it needs to look for. Unfortunately none of these work. Attempt #1, #2 and #4 come back with the string name only, not having actually got the secret value, and #3 errors saying it doesn't exist.
Is there a way to achieve this, or do I simply need to parameterise the secret and pass it into the script from the ADO task?
As you, I couldn't figure out a way to access the variables the log mentions are loaded in the Download secrets task of the job. It did work in inline mode, but not a chance with a script file.
So instead I leveraged the existing wiring (variable group linked to my KeyVault) and just run the command myself at the start of my script:
$mySecretValue = (Get-AzKeyVaultSecret -VaultName "myVault" -Name "mySecret").SecretValueText
From there I could use it as any other variable.
Either run your KeyVault tasks first, before your PowerShell script, or do it all in PowerShell.
You will need to create a service connection to your Azure subscription from Azure DevOps. Allow the service connection to access the KeyVault. Access the KeyVault from PowerShell or Azure CLI.
E.g. for PowerShell:
(Get-AzKeyVaultSecret -vaultName "Contosokeyvault" -name "ExamplePassword").SecretValueText
Here is a detailed walk through.
There is also native key vault integration now so you can just have your keys read in as a variable group transparently, no Keyvault-specific powershell code required.
https://learn.microsoft.com/en-us/azure/devops/pipelines/library/variable-groups?view=azure-devops&tabs=yaml#link-secrets-from-an-azure-key-vault
One way to tackle this would be to add a parameter for your script to pass the release variable in when you call it, something like -secretValue $(nameOfTheSecretInKeyVault)
You should be able to use $env:nameOfTheSecretInKeyVault, but remember . become _
EDIT: Looking at your question again if you used env:$nameOfTheSecretInKeyVault you would have had an issue. It's $env:<variable_name>
If anyone comes across this in the future and is looking for a bash alternative, I ended up being able to do this with the following command
$(az keyvault secret show --name "${secret_name}" --vault-name "${vault_name} --query "value" | sed "s/\"//g")
This let's you get the value of the vault secret and use it wherever. The sed at the end is needed to drop the " that gets pulled out from the query

How to run a remote command (powershell/bash) against an existing Azure VM in Azure Data Factory V2?

I've been trying to find a way to run a simple command against one of my existing Azure VMs using Azure Data Factory V2.
Options so far:
Custom Activity/Azure Batch won't let me add existing VMs to the pool
Azure Functions - I have not played with this but I have not found any documentation on this using AZ Functions.
Azure Cloud Shell - I've tried this using the browser UI and it works, however I cannot find a way of doing this via ADF V2
The use case is the following:
There are a few tasks that are running locally (Azure VM) in task scheduler that I'd like to orchestrate using ADF as everything else is in ADF, these tasks are usually python applications that restore a SQL Backup and or purge some folders.
i.e. sqdb-restore -r myDatabase
where sqldb-restore is a command that is recognized locally after installing my local python library. Unfortunately the python app needs to live locally in the VM.
Any suggestions? Thanks.
Thanks to #martin-esteban-zurita, his answer helped me to get to what I needed and this was a beautiful and fun experiment.
It is important to understand that Azure Automation is used for many things regarding resource orchestration in Azure (VMs, Services, DevOps), this automation can be done with Powershell and/or Python.
In this particular case I did not need to modify/maintain/orchestrate any Azure resource, I needed to actually run a Bash/Powershell command remotely into one of my existing VMs where I have multiple Powershell/Bash commands running recurrently in "Task Scheduler".
"Task Scheduler" was adding unnecessary overhead to my data pipelines because it was unable to talk to ADF.
In addition, Azure Automation natively only runs Powershell/Python commands in Azure Cloud Shell which is very useful to orchestrate resources like turning on/off Azure VMs, adding/removing permissions from other Azure services, running maintenance or purge processes, etc, but I was still unable to run commands locally in an existing VM. This is where the Hybrid Runbook Worker came into to picture. A Hybrid worker group
These are the steps to accomplish this use case.
1. Create an Azure Automation Account
2. Install the Windows Hybrid Worker in my existing VM . In my case it was tricky because my proxy was giving me some errors. I ended up downloading the Nuget Package and manually installing it.
.\New-OnPremiseHybridWorker.ps1 -AutomationAccountName <NameofAutomationAccount> -AAResourceGroupName <NameofResourceGroup>
-OMSResourceGroupName <NameofOResourceGroup> -HybridGroupName <NameofHRWGroup>
-SubscriptionId <AzureSubscriptionId> -WorkspaceName <NameOfLogAnalyticsWorkspace>
Keep in mind that in the above code, you will need to find your own parameter values, the only parameter that does not have to be found and will be created is HybridGroupName this will define the name of the Hybrid Group
3. Create a PowerShell Runbook
[CmdletBinding()]
Param
([object]$WebhookData) #this parameter name needs to be called WebHookData otherwise the webhook does not work as expected.
$VerbosePreference = 'continue'
#region Verify if Runbook is started from Webhook.
# If runbook was called from Webhook, WebhookData will not be null.
if ($WebHookData){
# Collect properties of WebhookData
$WebhookName = $WebHookData.WebhookName
# $WebhookHeaders = $WebHookData.RequestHeader
$WebhookBody = $WebHookData.RequestBody
# Collect individual headers. Input converted from JSON.
$Input = (ConvertFrom-Json -InputObject $WebhookBody)
# Write-Verbose "WebhookBody: $Input"
#Write-Output -InputObject ('Runbook started from webhook {0} by {1}.' -f $WebhookName, $From)
}
else
{
Write-Error -Message 'Runbook was not started from Webhook' -ErrorAction stop
}
#endregion
# This is where I run the commands that were in task scheduler
$callBackUri = $Input.callBackUri
# This is extremely important for ADF
Invoke-WebRequest -Uri $callBackUri -Method POST
4. Create a Runbook Webhook pointing to the Hybrid Worker's VM
4. Create a webhook activity in ADF where the above PowerShell runbook script will be called via a POST Method
Important Note: When I created the webhook activity it was timing out after 10 minutes (default), so I noticed in the Azure Automation Account that I was actually getting INPUT data (WEBHOOKDATA) that contained a JSON structure with the following elements:
WebhookName
RequestBody (This one contains whatever you add in the Body plus a default element called callBackUri)
All I had to do was to invoke the callBackUri from Azure Automation. And this is why in the PowerShell runbook code I added Invoke-WebRequest -Uri $callBackUri -Method POST. With this, ADF was succeeding/failing instead of timing out.
There are many other details that I struggled with when installing the hybrid worker in my VM but those are more specific to your environment/company.
This looks like a use case that is supported with Azure Automation, using a hybrid worker. Try reading here: https://learn.microsoft.com/en-us/azure/automation/automation-hybrid-runbook-worker
You can call runbooks with webhooks in ADFv2, using the web activity.
Hope this helped!

How to get the creation date of Azure RM Resources including all resources from Azure

I need to use the command Get-AzureRMResource and return resources created after a particular date . Is it possible to filter the resources w.r.t creation date. Can someone please help?
The Get-AzureRMResource could not get the creation date of Azure RM Resources. It seems there is no other way to get the creation date except the Activity log.
But still it can return only the items created on past 90 days.
For this issue, you could try to Archive the Azure Activity Log, this option is useful if you would like to retain your Activity Log longer than 90 days (with full control over the retention policy) for audit, static analysis, or backup.
Update:
If you want to get resources created after a particular date, try the command below, it returns the resources created after 11/20/2018 1:57:19 AM.
Get-AzureRmResourceGroupDeployment -ResourceGroupName "<ResourceGroupName>" | Where-Object {$_.Timestamp -gt '11/20/2018 1:57:19 AM'}
This information is available via ARM, but you have to call the API directly rather than the PS Get-AzureRMResource (or Get-AzResource) cmdlets.
See Deleting all resources in an Azure Resource Group with age more than x days.
Essentially, you need to add the $expand=createdTime to your query parameters, ie.:
GET
https://management.azure.com/subscriptions/1237f4d2-3dce-4b96-ad95-677f764e7123/resourcegroups?api-version=2019-08-01&%24expand=createdTime
Like #kwill suggested, this site can also help run the command interactively via your browser and return these results for you:
https://learn.microsoft.com/en-us/rest/api/resources/resources/list#code-try-0
Steps below:
Click on the try it now button
Enter your subscription ID
For a key value name use: $expand
For the key value value use: createdTime
Then run the query and it should produce a JSON file for you
Example

Azure Data Factory V1

Is it possible to trigger a pipeline in ADF v1 using Powershell script?
I found this command "Resume-AzureRmDataFactoryPipeline" to trigger the pipeline, but it does not really start the pipeline..
Please advise.
It really depends on what your pipeline does, but an alternative method is setting the status of a slice to waiting, with the following powershell cmdlet:
$StartDateTime = (Get-Date).AddDays(-7)
$ResourceGroupName = "YourRGName"
$DSName = "YourDatasetName"
$DataFactoryV1Name = "YourDFv1Name"
Set-AzureRmDataFactorySliceStatus -DataFactoryName $DataFactoryV1Name -DatasetName $DSName -ResourceGroupName $ResourceGroupName -StartDateTime $StartDateTime -Status Waiting
Replace with your values and run after being logged in and selecting a subscription. What this does is sets some slices to Waiting, and if their startdatetime is in the past, data factory will run them immediately.
Hope this helped!
Resume-AzureRmDataFactoryPipeline will work only on those pipelines which are suspended as this only
resumes a suspended pipeline in Data Factory. Link.
Now, if you want to start a pipeline then start with -
New-AzureRmDataFactoryPipeline which would create a pipeline for you and if the pipeline already exists then it would ask for confirmation to replace the existing one.
Once successfully done then you can use Set-AzureRmDataFactoryPipelineActivePeriod to configure active period for the data slices. So, this basically means after you create the pipeline, you specify the period in which data processing occurs by specifying the active period for the pipeline in which the data slices are processed. These cmdlets would run only when the data factory is already created.
You could also choose to run Set-AzureRmDataFactoryPipelineActivePeriod independently to define the active periods of the pipeline and run your data factory.
You can use this command Set-AzureRmDataFactorySliceStatus. Through this, you can reset the slice to "Pending Execution" state. You also get the option to set the same status for Upstream slices so that the entire pipeline can re-run.
See this for reference https://learn.microsoft.com/en-us/powershell/module/azurerm.datafactories/set-azurermdatafactoryslicestatus?view=azurermps-5.4.0

Resources