Executing custom activity on Azure Data Factory Pipeline - azure

I am creating simple pipeline in the data factory that should only run a custom activity. The deployment template for the pipeline looks like this:
{
"type": "pipelines",
"name": "MyCustomActivityPipeline",
"dependsOn": [
"DataFactoryName",
"AzureBatchLinkedService"
],
"apiVersion": "[variables('api-version')]",
"properties": {
"description": "Custom activity sample",
"activities": [
{
"type": "Custom",
"name": "MyCustomActivity",
"linkedServiceName": {
"referenceName": "AzureBatchLinkedService",
"type": "LinkedServiceReference"
},
"typeProperties": {
"command": "cmd /c echo hello world"
}
}
]
}
}
Additionally I have created all the resources needed- the batch account with pools and the storage account. All the resources are in the same resource group and subscription. I try to trigger the pipeline using console command
Invoke-AzureRmDataFactoryV2Pipeline -DataFactory "DataFactory" -PipelineName "PipelineName" -ResourceGroupName "ResourceGroupName"
I am getting this error:
Activity MyCustomActivity failed: Can not access user batch account, please check batch account setiings.
Has anyone ever experienced such an error from ADF execution of a pipeline? The weird part is that all the resources have access to each other and are within the same resource group and subscription.

Please check the settings for the storage linked service used by batch linked service. Make sure the connection string type is SecureString

Related

ARM template - bad request failed when deploying a linked service

I am trying to deploy a synapse instance via an ARM template and the deployment is successful via the Azure DevOps portal, but when I try to deploy the same template with an Azure Keyvault linked service I encounter the following error:
##[error]At least one resource deployment operation failed. Please list deployment
operations for details. Please see https://aka.ms/DeployOperations for usage details.
##[error]Details:
##[error]BadRequest:
After inspecting the activity logs from the Synapse instance I found out the following:
"resourceGroupName": "platform-test-group",
"resourceProviderName": {
"value": "Microsoft.Synapse",
"localizedValue": "Microsoft.Synapse"
},
"resourceType": {
"value": "Microsoft.Synapse/workspaces/linkedservices",
"localizedValue": "Microsoft.Synapse/workspaces/linkedservices"
},
"resourceId": "/subscriptions/xxxx-xxxx-xxxx-xxxx/resourcegroups/platform-test-group/providers/Microsoft.Synapse/workspaces/synapsedataapp/linkedservices/AzureKeyVault",
"status": {
"value": "Failed",
"localizedValue": "Failed"
},
"subStatus": {
"value": "NotFound",
"localizedValue": "Not Found (HTTP Status Code: 404)"
},
"submissionTimestamp": "2022-02-01T02:30:31.1471914Z",
"subscriptionId": "xxxx-xxxx-xxxx-xxxx",
"tenantId": "0f44c5d4-xxxx-xxxx-xxxxx",
"properties": {
"statusCode": "NotFound",
"serviceRequestId": null,
"statusMessage": "{\"error\":{\"code\":\"BadRequest\",\"message\":\"\"}}",
"eventCategory": "Administrative",
"entity": "/subscriptions/xxxx-xxxx-xxxx-xxxx/resourcegroups/platform-test-group/providers/Microsoft.Synapse/workspaces/synapsedataapp/linkedservices/AzureKeyVault",
"message": "Microsoft.Synapse/workspaces/linkedservices/write",
"hierarchy": "xxxx-xxxx-xxxx-xxxx/Enterprise/Group/Group-Test/xxxx-xxxx-xxxx-xxxx"
},
"relatedEvents": []
}
As you can see, the 404 error appears when the template tries to deploy to the tenant id which is not found, however, when I deploy the keyvault via the synapse UI I encounter no error.
Below is the code snippet that I use in my ARM template to deploy the keyvault to the synapse instance:
{
"name": "[concat(variables('workspaceName'), '/AzureKeyVault')]",
"type": "Microsoft.Synapse/workspaces/linkedservices",
"apiVersion": "2021-06-01-preview",
"properties": {
"annotations": [],
"type": "AzureKeyVault",
"typeProperties": {
"baseUrl": "https://data-test-kv.vault.azure.net/"
}
},
"dependsOn": [
"[variables('workspaceName')]"
]
}
Am I missing some kind of permission or connection that I need to enable? Why am I able to deploy successfully through the UI but not through the ARM template? Any comment or suggestion is greatly valued, so please feel free to comment or improve this question.
I had to contact Microsoft support and their reply was the following:
ARM templates cannot be used to create a linked service. This is due to the fact that linked services are not ARM resources, for examples, synapse workspaces, storage account, virtual networks, etc. Instead, a linked service is classified as an artifact. To still complete the task at hand, you will need to use the Synapse REST API or PowerShell. Below is the link that provides guidance on how to use the API. https://learn.microsoft.com/en-us/powershell/module/az.synapse/set-azsynapselinkedservice?view=azps-7.1.0
This limitation in ARM is applied only to Synapse and they might fix this in the future.
Additional references:
https://feedback.azure.com/d365community/idea/05e41bf1-0925-ec11-b6e6-000d3a4f07b8
https://feedback.azure.com/d365community/idea/48f1bf78-2985-ec11-a81b-6045bd7956bb
In Synapse unlike ADF, linked-services are not part of arm-templates. They are called artifacts and it comprises: Note Books, Spark Definitions, Linked Services, Pipelines etc.
You can find the full article here: https://techcommunity.microsoft.com/t5/azure-synapse-analytics-blog/how-to-use-ci-cd-integration-to-automate-the-deploy-of-a-synapse/ba-p/2248060
In short, first, deploy Synapse using arm templates. And then set up the linked services:
- task: Synapse workspace deployment#1
displayName: 'Setup:Synapse KeyVault Linked Service'
inputs:
TemplateFile: '$(Build.Repository.LocalPath)/TemplateForWorkspace.json'
ParametersFile: '$(Build.Repository.LocalPath)/TemplateParametersForWorkspace.json'
azureSubscription: '${{ parameters.environments.serviceConnectionId }}'
ResourceGroupName: '$(computeResourceGroupName)'
TargetWorkspaceName: '$(synapseWorkspaceName)'
DeleteArtifactsNotInTemplate: true
OverrideArmParameters: |
synapseLinkedServiceKV: $(synapseLinkedServiceKV)
workspaceName: $(synapseWorkspaceName)
Environment: 'prod'
TemplateForWorkspace.json:
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"workspaceName": {
"type": "string"
},
"synapseLinkedServiceKV": {
"type": "string"
}
},
"variables": {
"workspaceId": "[concat('Microsoft.Synapse/workspaces/', parameters('workspaceName'))]"
},
"resources": [
{
"name": "[concat(parameters('workspaceName'), '/' , parameters('synapseLinkedServiceKV'))]",
"type": "Microsoft.Synapse/workspaces/linkedServices",
"apiVersion": "2019-06-01-preview",
"properties": {
"type": "AzureKeyVault",
"typeProperties": {
"baseUrl": "[concat('https://', parameters('synapseLinkedServiceKV'), '.vault.azure.net/')]"
},
"annotations": [],
"description": "Linked Service to Azure KeyVault. KeyVault is used to primarily fetch secrets"
},
"dependsOn": []
}
]
}
TemplateParametersForWorkspace.json:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"workspaceName": {
"value": ""
},
"synapseLinkedServiceKV": {
"value": ""
}
}
}
It deletes the existing artifacts and deploys the one above. You would first need to install the task extension on your Azure Devops for Synapse workspace deployment#1
Note above template was auto-generated. In synapse studio, goto Git Configuration and point it to your repo. It will submit the changes to the branch workspace_publish. You can copy and build on top of the specific artifact code.

Pass Service Principal Client Id and Secret to ARM Template

I have an ARM template that creates an Azure Key Vault followed by an Azure Kubernetes service. The problem is that the Azure Kubernetes service needs a Service Principle's Client ID and Client Secret passed in the first time I create it. So I run the application.json without the kubernetes_servicePrincipalClientId and kubernetes_servicePrincipalClientSecret parameters in the production.parameters.json file:
application.json
{
"comments": "Kubernetes Service Principal Client ID",
"type": "Microsoft.KeyVault/vaults/secrets",
"name": "[concat(parameters('key_vault_name'), '/KubernetesServicePrincipalClientId')]",
"apiVersion": "2018-02-14",
"properties": {
"contentType": "text/plain",
"value": "[parameters('kubernetes_servicePrincipalClientId')]"
}
},
{
"comments": "Kubernetes Service Principal Client Secret",
"type": "Microsoft.KeyVault/vaults/secrets",
"name": "[concat(parameters('key_vault_name'), '/KubernetesServicePrincipalClientSecret')]",
"apiVersion": "2018-02-14",
"properties": {
"contentType": "text/plain",
"value": "[parameters('kubernetes_servicePrincipalClientSecret')]"
}
}
The second time I run the ARM template, I add the following lines to my production.parameters.json file, so that the Client ID and Client Secret are retrieved from Azure Key Vault where they were stored the first time I ran the ARM template.
production.parameters.json
"kubernetes_servicePrincipalClientId": {
"reference": {
"keyVault": {
"id": "/subscriptions/[Subscription Id]/resourcegroups/[Resource Group Name]/providers/Microsoft.KeyVault/vaults/[Vault Name]"
},
"secretName": "KubernetesServicePrincipalClientId"
}
},
"kubernetes_servicePrincipalClientSecret": {
"reference": {
"keyVault": {
"id": "/subscriptions/[Subscription Id]/resourcegroups/[Resource Group Name]/providers/Microsoft.KeyVault/vaults/[Vault Name]"
},
"secretName": "KubernetesServicePrincipalClientSecret"
}
}
Unfortunately it looks like you can't create a service principal in an ARM template. Is there a better way to configure all this in an automated way, so that regardless of whether or not I'm running the template the first time or second time, I don't have to perform any manual steps?
no, those apis are not exposed to arm, there is no way of managing a service principal with an ARM Template. you can however create a script that will provision the service principal and pass its details to the arm template, or you can use some sort of tool to handle all of this for you (pulumi\terraform\ansible)

How to get custom output from an executed pipeline?

I would like to be able to get custom output from an "Execute Pipeline Activity". During the execution of the invoked pipeline, I capture some information in a variable using the "Set Variable" activity. I would like to be able to use that value in the master pipeline.
I know that the master pipeline can read the invoked pipeline's name and runId using "#activity('InvokedPipeline').output," but those are the only properties available.
I have the invokable pipeline because it's configurable to be used by multiple other pipelines, assuming we can get the output from it. It currently consists of 8 activities; I would hate to have to duplicate them all across multiple pipelines just because we can't get the output from an invoked pipeline.
Reference: Execute Pipeline Activity
[
{
"name": "MasterPipeline",
"type": "Microsoft.DataFactory/factories/pipelines"
"properties": {
"description": "Uses the results of the invoked pipeline to do some further processing",
"activities": [
{
"name": "ExecuteChildPipeline",
"description": "Executes the child pipeline to get some value.",
"type": "ExecutePipeline",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"pipeline": {
"referenceName": "InvokedPipeline",
"type": "PipelineReference"
},
"waitOnCompletion": true
}
},
{
"name": "UseVariableFromInvokedPipeline",
"description": "Uses the variable returned from the invoked pipeline.",
"type": "Copy",
"dependsOn": [
{
"activity": "ExecuteChildPipeline",
"dependencyConditions": [
"Succeeded"
]
}
]
}
],
"parameters": {},
"variables": {}
}
},
{
"name": "InvokedPipeline",
"type": "Microsoft.DataFactory/factories/pipelines"
"properties": {
"description": "The child pipeline that makes some HTTP calls, gets some metadata, and sets a variable.",
"activities": [
{
"name": "SetMyVariable",
"description": "Sets a variable after some processing from other activities.",
"type": "SetVariable",
"dependsOn": [
{
"activity": "ProcessingActivity",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "MyVariable",
"value": {
"value": "#activity('ProcessingActivity').output",
"type": "Expression"
}
}
}
],
"parameters": {},
"variables": {
"MyVariable": {
"type": "String"
}
}
}
}
]
Hello Heather and thank you for your inquiry. Custom outputs are not an inbuilt feature at this time. You can request/upvote for the feature in the Azure feedback forum. For now, I do have two workarounds.
Utilizing the invoked pipeline's runID, we can query the REST API (using Web Activity) for the activity run logs, and from there, the activity outputs. However, before making the query, it is necessary to authenticate.
REST call to get the activities of a pipeline
For authentication I reccomend using the Web Activity to get an oauth2 token. The URL would be https://login.microsoftonline.com/tenantid/oauth2/token. Headers "Content-Type": "application/x-www-form-urlencoded" and body "grant_type=client_credentials&client_id=xxxx&client_secret=xxxx&resource=https://management.azure.com/". Since this request is to get credentials, the Authentication setting for this request is type 'None'. These credentials correspond to an App you create via Azure Active Directory>App Registrations. Do not forget to assign the app RBAC in Data FActory Access Control (IAM).
Another workaround, has the child pipeline write its output. It can write to a database table, or it can write to a blob (I passed the Data Factory variable to a Logic App which wrote to blob storage), or to something else of your choice. Since you are planning to use the child pipeline for many different parent pipelines, I would recommend passing the child pipeline a parameter that it uses to identify the output to the parent. That could mean a blob name, or writing the parent runID to a SQL table. This way the parent pipeline knows where to look to get the output.
just had a chat with ADF team, and the response
[10:11 PM] Mark Kromer
Brajesh Jaishwal: any plans on custom output from execute pipeline activity?
Yes, this work is on the engineering work plan

Azure Logic App - Update Blob API Connection through powershell

I've searched online and browsed the available powershell cmdlets to try and find a solution for this problem but have been unsuccessful. Essentially, I have a few Data Factory pipelines that copy/archive incoming files and will use a web http post component that will invoke a Logic App that connects to a Blob container and will delete the incoming file. The issue I'm facing is that we have several automation runbooks that will rest Blob access keys every X days. When the Blob keys get reset the Logic App will fail whenever this happens because the connection is manually created in the designer itself and I can't specify a connection string that could pull from the Keyvault, as an example. Inside of the {Logic App > API Connections > Edit API Connection} we can manually update the connection string/key but obviously for an automated process we should be able to do this programmatically.
Is there a powershell cmdlet or other method I'm not seeing that would allow me to update/edit the API Connections that get created when using and Blob component inside a Logic App?
Any insights is appreciated!
Once you've rotated your key in the storage account, you can use an ARM template to update your connection API. In this ARM template, the connection api is created referencing the storage account internally so you don't have to provide the key:
azuredeploy.json file:
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"azureBlobConnectionAPIName": {
"type": "string",
"metadata": {
"description": "The name of the connection api to access the azure blob storage."
}
},
"storageAccountName": {
"type": "string",
"metadata": {
"description": "The Storage Account Name."
}
}
},
"variables": {},
"resources": [
{
"type": "Microsoft.Web/connections",
"name": "[parameters('azureBlobConnectionAPIName')]",
"apiVersion": "2016-06-01",
"location": "[resourceGroup().location]",
"scale": null,
"properties": {
"displayName": "[parameters('azureBlobConnectionAPIName')]",
"parameterValues": {
"accountName": "[parameters('storageAccountName')]",
"accessKey": "[listKeys(resourceId('Microsoft.Storage/storageAccounts', parameters('storageAccountName')),'2015-05-01-preview').key1]"
},
"api": {
"id": "[concat('subscriptions/', subscription().subscriptionId, '/providers/Microsoft.Web/locations/', resourceGroup().location, '/managedApis/azureblob')]"
}
},
"dependsOn": []
}
]
}
azuredeploy.parameters.json file:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"azureBlobConnectionAPIName": {
"value": "myblobConnectionApiName"
},
"storageAccountName": {
"value": "myStorageAccountName"
}
}
}
You can them execute the arm template like that:
Connect-AzureRmAccount
Select-AzureRmSubscription -SubscriptionName <yourSubscriptionName>
New-AzureRmResourceGroupDeployment -Name "ExampleDeployment" -ResourceGroupName "MyResourceGroupName" `
-TemplateFile "D:\Azure\Templates\azuredeploy.json" `
-TemplateParameterFile "D:\Azure\Templates\azuredeploy.parameters.json"
to get started with ARM template and powerhsell, you cam have a look at this article:
Deploy resources with Resource Manager templates and Azure PowerShell

Azure Batch Pool

I am creating a custom activity pipeline in Azure Data Factory and as a part of that I need to create an AzureBatchLinkedService using the JSON below:
{
"name": "AzureBatchLinkedService",
"properties": {
"hubName": "name_hub",
"type": "AzureBatch",
"typeProperties": {
"accountName": "accountname",
"accessKey": "**********",
"poolName": "?????????",
"batchUri": "https://northeurope.batch.azure.com",
"linkedServiceName": "AzureStorageLinkedService"
}
}
}
When I created this the first time, I didn't have to create a batch pool. It was created for me automatically
My question is how do I get Azure to create this "pool" for me? If I go into the batch account and click "Add+" to add a pool manually, it has all these config options and I don't know what to put for any of those fields.

Resources