Accessing pipeline activity status from an Azure Function - azure

I have an Azure Function which triggers a Pipeline and I'm able to poll the pipeline status to check when it completes using: Pipeline.Properties.RuntimeInfo.PipelineState
My pipeline uses several parallel Copy activities and I'd like to be able to access the status of these activities incase they fail. The Azure documentation describes how to access the pipeline activities but you can only get at static properties like name and description but not dynamic properties like Status (like you can for the Pipeline via its RuntimeInfo property).
For completeness, I've accessed the activity list using:
IList<Microsoft.Azure.Management.DataFactories.Models.Activity> activityList = plHandle.Pipeline.Properties.Activities;
Is it possible to check individual activity statuses programmatically?

Its certainly possible.
I use the ADF PowerShell cmdlets in the Azure module to monitor our data factories.
Maybe do something like the below for what you need with Get-AzureRmDataFactoryActivityWindow command.
Eg:
$ActivityWindows = Get-AzureRmDataFactoryActivityWindow `
-DataFactoryName $ADFName.DataFactoryName `
-ResourceGroupName $ResourceGroup `
| ? {$_.WindowStart -ge $Now} `
| SELECT ActivityName, ActivityType, WindowState, RunStart, InputDatasets, OutputDatasets `
| Sort-Object ActivityName
This gives you the activity level details including the status. Being:
Ready
In Progress
Waiting
Failed
... I list them because they differ slightly from what you see in the portal blades.
The datasets are also arrays if you have multiple inputs and outputs for particular activities.
More ADF cmdlets available here: https://learn.microsoft.com/en-gb/powershell/module/azurerm.datafactories/?view=azurermps-3.8.0
Hope this helps

I've managed to resolve this by accessing the DataSliceRuns (i.e. activities) for the pipeline as follows:
var datasets = client.Datasets.ListAsync(<resourceGroupName>, <DataFactoryName>).Result;
foreach (var dataset in datasets.Datasets)
{
// Check the activity statuses for the pipelines activities.
var datasliceRunlistResponse = client.DataSliceRuns.List(<resourceGroupName>, <dataFactoryName>,<DataSetName>, new DataSliceRunListParameters()
{
DataSliceStartTime = PipelineStartTime.ConvertToISO8601DateTimeString()
});
foreach (DataSliceRun run in datasliceRunlistResponse.DataSliceRuns)
{
// Do stuff...
}
}

Related

How can I run a search job periodically in Azure Log Analytics?

I'm trying to visualize the browser statistics of our app hosted in Azure.
For that I'm using the nginx logs and run an Azure Log Analytics query like this:
ContainerLog
| where LogEntrySource == "stdout" and LogEntry has "nginx"
| extend logEntry=parse_json(LogEntry)
| extend userAgent=parse_user_agent(logEntry.nginx.http_user_agent, "browser")
| extend browser=parse_json(userAgent)
| summarize count=count() by tostring(browser.Browser.Family)
| sort by ['count']
| render piechart with (legend=hidden)
Then I'm getting this diagram, which is exactly what I want:
But the query is very very slow. If I set the time range to more than just the last few hours it takes several minutes or doesn't work at all.
My solution is to use a search job like this:
ContainerLog
| where LogEntrySource == "stdout" and LogEntry has "nginx"
| extend d=parse_json(LogEntry)
| extend user_agent=parse_user_agent(d.nginx.http_user_agent, "browser")
| extend browser=parse_json(user_agent)
It creates a new table BrowserStats_SRCH on which I can do this search query:
BrowserStats_SRCH
| summarize count=count() by tostring(browser.Browser.Family)
| sort by ['count']
| render piechart with (legend=hidden)
This is much faster now and only takes some seconds.
But my problem is, how can I keep this up-to-date? Preferably this search job would run once a day automatically and refreshed the BrowserStats_SRCH table so that new queries on that table run always on the most recent logs. Is this possible? Right now I can't even trigger the search job manually again, because then I get the error "A destination table with this name already exists".
In the end I would like to have a deeplink to the pie chart with the browser stats without the need to do any further click. Any help would be appreciated.
But my problem is, how can I keep this up-to-date? Preferably this search job would run once a day automatically and refreshed the BrowserStats_SRCH table so that new queries on that table run always on the most recent logs. Is this possible?
You can leverage the api to create a search job. Then use a timer triggered azure function or logic app to call that api on a schedule.
PUT https://management.azure.com/subscriptions/00000000-0000-0000-0000-00000000000/resourcegroups/testRG/providers/Microsoft.OperationalInsights/workspaces/testWS/tables/Syslog_suspected_SRCH?api-version=2021-12-01-preview
with a request body containing the query
{
"properties": {
"searchResults": {
"query": "Syslog | where * has 'suspected.exe'",
"limit": 1000,
"startSearchTime": "2020-01-01T00:00:00Z",
"endSearchTime": "2020-01-31T00:00:00Z"
}
}
}
Or you can use the Azure CLI:
az monitor log-analytics workspace table search-job create --subscription ContosoSID --resource-group ContosoRG --workspace-name ContosoWorkspace --name HeartbeatByIp_SRCH --search-query 'Heartbeat | where ComputerIP has "00.000.00.000"' --limit 1500 --start-search-time "2022-01-01T00:00:00.000Z" --end-search-time "2022-01-08T00:00:00.000Z" --no-wait
Right now I can't even trigger the search job manually again, because then I get the error "A destination table with this name already exists".
Before you start the job as described above, remove the old result table using an api call:
DELETE https://management.azure.com/subscriptions/{subscriptionId}/resourcegroups/{resourceGroupName}/providers/Microsoft.OperationalInsights/workspaces/{workspaceName}/tables/{tableName}?api-version=2021-12-01-preview
Optionally, you could check the status of the job using this api before you delete it to make sure it is not InProgress or Deleting

How to generate reports for Azure Log Activity Data using python for any resource alongwith 'Tags'?

My colleague used the below powershell query to retrieve log data for past 4 days excluding today which matches operations of the resources and collects features such as EventTimeStamp, Caller, SubscriptionId etc.
Get-AzureRmLog -StartTime (Get-Date).AddDays(-4) -EndTime (Get-Date).AddDays(-1) | Where-Object {$_.OperationName.LocalizedValue -match "Start|Stop|Restart|Create|Update|Delete"} |
Select-Object EventTimeStamp, Caller, SubscriptionId, #{name="Operation"; Expression = {$_.operationname.LocalizedValue}},
I am a newbie to azure and want to generate a report where I can also fetch the 'Tags' name & value against a resource for past 90 days in this report. What will be the powershell query for this? Can I also use python to query this data? I tried searching the documentation and was unable to dig into it, so if anybody could redirect me to the right place it will be helpful.
First of all, you should know that not all azure resources can specify tags, so you should consider this in your code. Please refer to Tag support for Azure resources to check which azure resource supports tags.
For powershell query, I suggest using the new azure powershell az module instead of the old azureRM module.
Here is a simple powershell code with az module. And for testing purpose, I just introduce how to fetch and add tags to the output. Please feel free to change it as per your requirement.
#for testing purpose, I just get the azure activity logs from a specified resource group
$mylogs = Get-AzLog -ResourceGroupName "a resource group name"
foreach($log in $mylogs)
{
if(($log.Properties.Content.Values -ne $null))
{
#the tags is contains in the Properties of the log entry.
$s = $log.Properties.Content.Values -as [string]
if($s.startswith("{"))
{
$log | Select-Object EventTimeStamp, Caller, SubscriptionId,#{name="Operation"; Expression = {$_.operationname.LocalizedValue}}, #{name="tags"; Expression = {($s | ConvertFrom-Json).tags}}
}
#if it does not contains tags.
else
{
$log | Select-Object EventTimeStamp, Caller, SubscriptionId,#{name="Operation"; Expression = {$_.operationname.LocalizedValue}}, #{name="tags"; Expression = {""}}
}
}
#if it does not contains tags.
else
{
$log | Select-Object EventTimeStamp, Caller, SubscriptionId,#{name="Operation"; Expression = {$_.operationname.LocalizedValue}}, #{name="tags"; Expression = {""}}
}
Write-Output "************************"
}
The test result:
For python, you can take a look at this github issue which introduces how to fetch logs from azure activity logs, but you need do some research on how to add tags to the output.
Hope it helps.

Creating Topic Filter rule via CorrelationFilter with Azure Functions App

I want to create filter rule via CorrelationFilter for subscriptions associated with a Topic, as it is faster than SQLFilter.
The rule: any message that contains a header that equals to a string will go to one subscription, another string will go to different subscription. For example:
Topic: order
Subcription1: header_orderType: orderPlaced
Subcription2: header_orderType: orderPaid
Similar to the one highlighted in blue below via Service Bus Explorer.
Below is other ways that can acheive that.
SQLFilter in code
https://dzone.com/articles/everything-you-need-know-about-5
SQLFilter
https://github.com/Azure/azure-service-bus/tree/master/samples/DotNet/Microsoft.Azure.ServiceBus/TopicFilters
PS
https://learn.microsoft.com/en-us/powershell/module/azurerm.servicebus/New-AzureRmServiceBusRule?view=azurermps-6.13.0
The TopicFilters sample covers correlation filter too which is setup using an ARM template. The same should be possible in C# and PS as well.
C#
You will have to first create a Microsoft.Azure.ServiceBus.CorrelationFilter object
var orderPlacedFilter = new CorrelationFilter();
filter.Properties["header_orderType"] = "orderPlaced";
And then add it to your subscription client object by calling Microsoft.Azure.ServiceBus.SubscriptionClient.AddRuleAsync()
subsClient.AddRuleAsync("orderPlacedFilter", orderPlacedFilter);
Similarly, for the other subscription and its filter.
PowerShell
Guess the documentation isn't really great on this one but I believe this should work
$rule = New-AzServiceBusRule -ResourceGroupName prvalav-common -Namespace prvalav-common -Topic test -Subscription test -Name SBRule -SqlExpression "test = 0"
$rule.FilterType = 1
$rule.SqlFilter = $null
$rule.CorrelationFilter.Properties["header_orderType"] = "orderPlaced"
Set-AzServiceBusRule -ResourceGroupName prvalav-common -Namespace prvalav-common -Topic test -Subscription test -Name SBRule -InputObject $rule
If you were wondering about the FilterType = 1, check the FilterType enum.
After setting this up, in your function app, you would just use the Service Bus Trigger with the topic/subscription details.

How do I use ARM 'outputs' values another release task?

I have an ARM template that has and outputs section like the following:
"outputs": {
"sqlServerFqdn": {
"type": "string",
"value": "[reference(concat('Microsoft.Sql/servers/', variables('sqlserverName'))).fullyQualifiedDomainName]"
},
"primaryConnectionString": {
"type": "string",
"value": "[concat('Data Source=tcp:', reference(concat('Microsoft.Sql/servers/', variables('sqlserverName'))).fullyQualifiedDomainName, ',1433;Initial Catalog=', variables('databaseName'), ';User Id=', parameters('administratorLogin'), '#', variables('sqlserverName'), ';Password=', parameters('administratorLoginPassword'), ';')]"
},
"envResourceGroup": {
"type": "string",
"value": "[parameters('hostingPlanName')]"
}
}
I have a Azure Resource Group Deployment task that uses the template. I then want to use the variable $(sqlServerFqdn) in the next task for configuration. The variable doesn't seem to just populate and I cannot find anywhere that tells me how to use 'outputs' values on release.
What do I need to do to get the variable to populate for use in configuring tasks after this ARM template runs? An example would be in the parameters to a powershell script task or another ARM template.
VSTS Azure Resource Group Deployment task has outputs section now (since January 2018). So you can set variable name in Deployment outputs of Azure Resource Group Deployment task to, for example, ResourceGroupDeploymentOutputs and add PowerShell Script task with the following inline script:
# Make outputs from resource group deployment available to subsequent tasks
$outputs = ConvertFrom-Json $($env:ResourceGroupDeploymentOutputs)
foreach ($output in $outputs.PSObject.Properties) {
Write-Host "##vso[task.setvariable variable=RGDO_$($output.Name)]$($output.Value.value)"
}
And in subsequent tasks you can use your template variables. So, for example, if you have sqlServerFqdn variable in your template it will be available as $(RGDO_sqlServerFqdn) after PowerShell Script task is completed.
Capturing this answer because I always end up at this question when searching for the solution.
There is a marketplace task which makes ARM template output parameters available further down the pipeline. But in some cases you don't have permission to purchase marketplace items for your subscription, so the following PowerShell will do the same thing. To use it you add it as a powershell script step immediately following the ARM template resource group deployment step. It will look at the last deployment and pull the output variables into pipeline variables.
param(
[string] $resourceGroupName
)
$lastDeployment = Get-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroupName | Sort Timestamp -Descending | Select -First 1
if(!$lastDeployment) {
throw "Deployment could not be found for Resource Group '$resourceGroupName'."
}
if(!$lastDeployment.Outputs) {
throw "No output parameters could be found for the last deployment of Resource Group '$resourceGroupName'."
}
foreach ($key in $lastDeployment.Outputs.Keys){
$type = $lastDeployment.Outputs.Item($key).Type
$value = $lastDeployment.Outputs.Item($key).Value
if ($type -eq "SecureString") {
Write-Host "##vso[task.setvariable variable=$key;issecret=true]$value"
}
else {
Write-Host "##vso[task.setvariable variable=$key;]$value"
}
}
Note that the environmental variables won't be available in the context of this script, but will in subsequent tasks.
The output value shown on the UI for the Visual Studio Team Services task for Azure Resource Group Deployment only seems to work for the scenario described in Eddie's answer, which is for VMs. In fact, if your deployment doesn't include VMs, you will get an error something like:
No VMs found in resource group: 'MY-RESOURCE-GROUP-NAME'. Could not
register environment in the output variable: 'myVariableName'.
For non-VM examples, I created a powershell script that runs after the RG deployment. This script, as an example, takes input variables for resource group $resourceGroupName and the name of the output variable you need $rgDeploymentOutputParameterName. You could customize and use something similar:
#get the most recent deployment for the resource group
$lastRgDeployment = Get-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroupName |
Sort Timestamp -Descending |
Select -First 1
if(!$lastRgDeployment)
{
throw "Resource Group Deployment could not be found for '$resourceGroupName'."
}
$deploymentOutputParameters = $lastRgDeployment.Outputs
if(!$deploymentOutputParameters)
{
throw "No output parameters could be found for the last deployment of '$resourceGroupName'."
}
$outputParameter = $deploymentOutputParameters.Item($rgDeploymentOutputParameterName)
if(!$outputParameter)
{
throw "No output parameter could be found with the name of '$rgDeploymentOutputParameterName'."
}
$outputParameterValue = $outputParameter.Value
# From here, use $outputParameterValue, for example:
Write-Host "##vso[task.setvariable variable=$rgDeploymentOutputParameterName;]$outputParameterValue"
In Nov 2020, after this commit - https://github.com/microsoft/azure-pipelines-tasks/commit/1173324604c3f61ce52cdcc999f6d4d7ea9ab8f9 , the variables could directly be used in the subsequent tasks in the pipeline (No powershell scripts required!!)
This is what the steps look like -
In the ARM template deployment task, give any reference name to the Deployment Outputs section under Advanced drop down. In my case I have given armOutputVariable.
See image for visual description
Now to use the value of sqlServerFqdn in the subsequent tasks, simply use it in this manner $(armOutputVariable.sqlServerFqdn.value)
For example, let's say I want to use it to override a parameter in my test task which follows the deployment so I can use it in the following manner -
Example image
To summarize all the outputs in the ARM could be used in the further steps directly in this manner (make sure you assign a reference name in the ARM template deployment step) -
$(armOutputVariable.sqlServerFqdn.value)
$(armOutputVariable.sqlServerFqdn.type)
$(armOutputVariable.primaryConnectionString.value)
$(armOutputVariable.primaryConnectionString.type)
$(armOutputVariable.envResourceGroup.value)
$(armOutputVariable.envResourceGroup.type)
First you define the Azure Resource Deployment Task and in this context the Deployment outputs
In the next step you create a PowerShell Task that takes the Deployment outputs defined above as input arguments
The PowerShell script looks as follows and assigns for each output defined in the ARM template a separate VSTS environment variable with the same name as defined in the ARM template output section. These variables can then be used in subsequent tasks.
param (
[Parameter(Mandatory=$true)]
[string]
$armOutputString
)
Write-Host $armOutputString
$armOutputObj = $armOutputString | convertfrom-json
Write-Host $armOutputObj
$armOutputObj.PSObject.Properties | ForEach-Object {
$type = ($_.value.type).ToLower()
$key = $_.name
$value = $_.value.value
if ($type -eq "securestring") {
Write-Host "##vso[task.setvariable variable=$key;issecret=true]$value"
Write-Host "Create VSTS variable with key '$key' and value '$value' of type '$type'!"
} elseif ($type -eq "string") {
Write-Host "##vso[task.setvariable variable=$key]$value"
Write-Host "Create VSTS variable with key '$key' and value '$value' of type '$type'!"
} else {
Throw "Type '$type' not supported!"
}
}
In a subsequent task you can access the environment variables either by passing them as argument via '$(varName)' (this works for SecureString too) or e.g. in a PowerShell script via $env:varName (this does not work for SecureString)
VSTS allows setting variables in powershell scripts which you can use in other tasks.
The syntax is
Write-Host "##vso[task.setvariable variable=myvariable;]myvalue"
You can have an inline Powershell script which can set the required variable to consume in yet to be executed tasks.You can access it like $(myvariable).
You may need to system.debug variable to true to use this.
Read more details here.
You just need to add an output variable name for "Azure Resource Group Deployment" task like following:
And then use the variable in "PowerShell on Target Machines" task:
"PowerShell on Target Machines" task will use the resource configured in "Azure Resource Group Deployment" task:
Output variables:
Create/update action of the Azure Resource Group task now produces an
output variable during execution. The output variable can be used to
refer to the resource group object in the subsequent tasks. For
example "PowerShell on Target Machine" task can now refer to resource
group output variable as '$(variableName)' so that it can execute the
powershell script on the resource group VM targets.
Limitation: Output variable produced during execution will have
details about VM hostname(s) and (public) ports, if any. Credentials
to connect to the VM host(s) are to be provided explicitly in the
subsequent tasks.
Refer to this link for more details: Azure Resource Group Deployment Task

Dynamic selection of storage table in azure data factory

I've got an existing set of azure storage tables that are one-per-client to hold events in a multi-tenant cloud system.
Eg, there might be 3 tables to hold sign-in information:
ClientASignins
ClientBSignins
ClientCSignins
Is there a way to dynamically loop through these as part of either a copy operation or in something like a Pig script?
Or is there another way to achieve this result?
Many thanks!
If you keep track of these tables in another location, like Azure Storage, you could use PowerShell to loop through each of them and create a hive table over each. For example:
foreach($t in $tableList) {
$hiveQuery = "CREATE EXTERNAL TABLE $t(IntValue int)
STORED BY 'com.microsoft.hadoop.azure.hive.AzureTableHiveStorageHandler'
TBLPROPERTIES(
""azure.table.name""=""$($t.tableName)"",
""azure.table.account.uri""=""http://$storageAccount.table.core.windows.net"",
""azure.table.storage.key""=""$((Get-AzureStorageKey $storageAccount).Primary)"");"
Out-File -FilePath .\HiveCreateTable.q -InputObject $hiveQuery -Encoding ascii
$hiveQueryBlob = Set-AzureStorageBlobContent -File .\HiveCreateTable.q -Blob "queries/HiveCreateTable.q" `
-Container $clusterContainer.Name -Force
$createTableJobDefinition = New-AzureHDInsightHiveJobDefinition -QueryFile /queries/HiveCreateTable.q
$job = Start-AzureHDInsightJob -JobDefinition $createTableJobDefinition -Cluster $cluster.Name
Wait-AzureHDInsightJob -Job $job
#INSERT YOUR OPERATIONS FOR EACH TABLE HERE
}
Research:
http://blogs.msdn.com/b/mostlytrue/archive/2014/04/04/analyzing-azure-table-storage-data-with-hdinsight.aspx
How can manage Azure Table with Powershell?
In the end I opted for a couple Azure Data Factory Custom Activities written in c# and now my workflow is:
Custom activity: aggregate the data for the current slice into a single blob file for analysis in Pig.
HDInsight: Analyse with Pig
Custom activity: disperse the data to the array of target tables from blob storage to table storage.
I did this to keep the pipelines as simple as possible and remove the need for any duplication of pipelines/scripts.
References:
Use Custom Activities In Azure Data Factory pipeline
HttpDataDownloader Sample

Resources