Start Azure Datafactory (v2) Trigger - azure

I have a ScheduleTrigger in Azure Datafactory. I cannot change the runtimestate to started. I have tried using StartWithHttpMessageAsync. (and what feels like all the other start commands in the API)
the json for the trigger looks like this:
{
"properties": {
"type": "ScheduleTrigger",
"typeProperties": {
"recurrence": {
"frequency": "Day",
"startTime": "2017-12-15T12:00:00Z",
"endTime": "2099-12-31T00:00:00Z"
}
},
"pipelines": [
{
"pipelineReference": {
"referenceName": "DynamicFlowMaster",
"name": "StartMasterPipeline",
"type": "PipelineReference"
},
"parameters": {}
}
]
}
}

Ok, I finally answer...
While the documentation explicitly says:
"interval": <>, // optional, how often to fire (default to 1)
The Activity log in azure portal states:
'The template trigger is not valid: Required property 'interval' not found in JSON.
Adding the interval = 1 allowed the Client.Trigger.Start() - function to work, it previously only returned "Bad Request".

Related

Azure Data Factory CI/CD for Schedule Triggers Not Working

For triggers it looks like only pipelines pipeline and typeProperties blocks can be overriden based on the documentation.
What I want to achieve is with my CI/CD process and overriding parameters functionality, to have a schedule trigger disabled in the target ADF, unlike my source ADF.
If I inspect the JSON of a trigger that looks like the following field could do the trick "runtimeState": "Started".
{
"name": "name_daily",
"properties": {
"description": " ",
"annotations": [],
"runtimeState": "Started",
"pipelines": [
{
"pipelineReference": {
"referenceName": "name",
"type": "PipelineReference"
}
}
],
"type": "ScheduleTrigger",
"typeProperties": {
"recurrence": {
"frequency": "Day",
"interval": 1,
"startTime": "2020-05-05T13:01:00.000Z",
"timeZone": "UTC",
"schedule": {
"minutes": [
1
],
"hours": [
13
]
}
}
}
}
}
But if I attempt to add it in the JSON file like this:
"Microsoft.DataFactory/factories/triggers": {
"properties": {
"runtimeState": "-",
"typeProperties": {
"recurrence": {
"interval": "=",
"frequency": "="
}
}
}
}
it never shows up in the Override section in Azure Pipeline Releases.
Does this ADF CI/CD functionality exist for triggers? How can I achieve my target here?
Turns out runtimeState for triggers is not obeyed in the arm-template-parameters-definition.json.
The path is clearer after some more research - I can achieve what I want with either editing the Powershell script Microsoft has provided or use an ADF custom task from the Azure Devops marketplace.

Stream Analytics Job deployed as Azure Resource Manager (ARM) template

I am trying to setup an output EventHub for a Stream Analytics Job defined as a JSON template. Without the output bit the template is successfully deployed, however when adding the output definition it fails with:
Deployment failed. Correlation ID: <SOME_UUID>. {
"code": "BadRequest",
"message": "The JSON provided in the request body is invalid. Property 'eventHubName' value 'parameters('eh_name')' is not acceptable.",
"details": {
"code": "400",
"message": "The JSON provided in the request body is invalid. Property 'eventHubName' value 'parameters('eh_name')' is not acceptable.",
"correlationId": "<SOME_UUID>",
"requestId": "<SOME_UUID>"
}
}
I've defined the ARM template as:
{
"$schema": "http://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"location": {
"type": "string",
"defaultValue": "westeurope"
},
"hubName": {
"type": "string",
"defaultValue": "fooIotHub"
},
"eh_name": {
"defaultValue": "fooEhName",
"type": "String"
},
"eh_namespace": {
"defaultValue": "fooEhNamespace",
"type": "String"
},
"streamAnalyticsJobName": {
"type": "string",
"defaultValue": "fooStreamAnalyticsJobName"
}
},
"resources": [{
"type": "Microsoft.StreamAnalytics/StreamingJobs",
"apiVersion": "2016-03-01",
"name": "[parameters('streamAnalyticsJobName')]",
"location": "[resourceGroup().location]",
"properties": {
"sku": {
"name": "standard"
},
"outputErrorPolicy": "Drop",
"eventsOutOfOrderPolicy": "adjust",
"eventsOutOfOrderMaxDelayInSeconds": 0,
"eventsLateArrivalMaxDelayInSeconds": 86400,
"inputs": [{
"Name": "IoTHubInputLable",
"Properties": {
"DataSource": {
"Properties": {
"iotHubNamespace": "[parameters('hubName')]",
"sharedAccessPolicyKey": "[listkeys(resourceId('Microsoft.Devices/IotHubs/IotHubKeys',parameters('hubName'), 'iothubowner'),'2016-02-03').primaryKey]",
"sharedAccessPolicyName": "iothubowner",
"endpoint": "messages/events"
},
"Type": "Microsoft.Devices/IotHubs"
},
"Serialization": {
"Properties": {
"Encoding": "UTF8"
},
"Type": "Json"
},
"Type": "Stream"
}
}],
"transformation": {
"name": "Transformation",
"properties": {
"streamingUnits": 1,
"query": "<THE SQL-LIKE CODE FOR THE JOB QUERY>"
}
},
"outputs": [{
"name": "EventHubOutputLable",
"properties": {
"dataSource": {
"type": "Microsoft.ServiceBus/EventHub",
"properties": {
"eventHubName": "parameters('eh_name')",
"serviceBusNamespace": "parameters('eh_namespace')",
"sharedAccessPolicyName": "RootManageSharedAccessKey"
}
},
"serialization": {
"Properties": {
"Encoding": "UTF8"
}
}
}
}]
}
}]
}
Checking here https://learn.microsoft.com/en-us/azure/templates/microsoft.streamanalytics/streamingjobs
it looks like the structure of the JSON for the output is as the expected one (with the properties field along with the type).
I've figured out those "Event Hub properties" from the Chrome browser using Developer Tools and checking the details of the HTTP request "GetOutputs", otherwise I am not sure where I could see how to specify those properties? The structure looks quite similar to the one for the input IoT Hub (which is working), in that case using different lables for the properties related to the IoT Hub details.
Checking this blog post https://blogs.msdn.microsoft.com/david/2017/07/20/building-azure-stream-analytics-resources-via-arm-templates-part-2-the-template/
the output part is related to PowerBI and it looks like a different structure is used for the properties: outputPowerBISource, so I've tried to use for the Event Hub the field outputEventHubSource (from the checks using Chrome Developer Tools) instead of properties, but then I get this error:
Deployment failed. Correlation ID: <SOME_UUID>. {
"code": "BadRequest",
"message": "The JSON provided in the request body is invalid. Required property 'properties' not found in JSON. Path '', line 1, position 1419.",
"details": {
"code": "400",
"message": "The JSON provided in the request body is invalid. Required property 'properties' not found in JSON. Path '', line 1, position 1419.",
"correlationId": "<SOME_UUID>",
"requestId": "<SOME_UUID>"
}
}
The command I am using to deploy this template is the Azure CLI (from a Linux Debian machine):
az group deployment create \
--name "deployStreamAnalyticsJobs" \
--resource-group "MyRGName" \
--template-file ./templates/stream-analytics-jobs.json
How do I specify an output in an Azure Resource Manager (ARM) template for a Stream Analytics Job?
Any property that contains a parameter (or any expression that needs to be evaluated must contain square brackets, e.g.
"eventHubName": "[parameters('eh_name')]",
"serviceBusNamespace": "[parameters('eh_namespace')]",
Otherwise the literal value in the quotes is used.
That help?
I found out all parameters need to be wrapped in square brakets (as pointed out in the other answer to this question).
Also to dynamically retrieve the shared access policy key (or any other parameter for an existing resource like the Event Hub) a combination of functions like listKeys and resourceId etc must be used, see below for a full example of an Event Hub described as an output for a Stream Analytics Job.
In details:
the parameters defined for eventHubName and serviceBusNamespace must be evaluated using square brackets (see how I defined those parameters in the JSON example in the body of the question I asked above),
the shared access policy could be either an hardcoded string (or a parameter as before) like sharedAccessPolicyName or dynamically retrieved using "sharedAccessPolicyKey": "[listKeys(resourceId('Microsoft.EventHub/namespaces/eventhubs/authorizationRules', parameters('eh_namespace'), parameters('eh_name'), 'RootManageSharedAccessKey'),'2017-04-01').primaryKey]" for the sharedAccessPolicyKey (this is sensitive data and it should be protected avoiding hardcoding information as a plain string)
The following JSON configuration shows an existing Event Hub defined as an output defined for the Stream Analytics Job:
"outputs": [{
"Name": "EventHubOutputLable",
"Properties": {
"DataSource": {
"Type": "Microsoft.ServiceBus/EventHub",
"Properties": {
"eventHubName": "[parameters('eh_name')]",
"serviceBusNamespace": "[parameters('eh_namespace')]",
"sharedAccessPolicyKey": "[listKeys(resourceId('Microsoft.EventHub/namespaces/eventhubs/authorizationRules', parameters('eh_namespace'), parameters('eh_name'), 'RootManageSharedAccessKey'),'2017-04-01').primaryKey]",
"sharedAccessPolicyName": "RootManageSharedAccessKey"
}
},
"Serialization": {
"Properties": {
"Encoding": "UTF8"
},
"Type": "Json"
}
}
}]

Azure Data Factory v2 using utcnow() as a pipeline parameter

For context, I currently have a Data Factory v2 pipeline with a ForEach Activity that calls a Copy Activity. The Copy Activity simply copies data from an FTP server to a blob storage container.
Here is the pipeline json file :
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "ForEach1",
"type": "ForEach",
"typeProperties": {
"items": {
"value": "#pipeline().parameters.InputParams",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Copy1",
"type": "Copy",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false
},
"typeProperties": {
"source": {
"type": "FileSystemSource",
"recursive": true
},
"sink": {
"type": "BlobSink"
},
"enableStaging": false,
"cloudDataMovementUnits": 0
},
"inputs": [
{
"referenceName": "FtpDataset",
"type": "DatasetReference",
"parameters": {
"FtpFileName": "#item().FtpFileName",
"FtpFolderPath": "#item().FtpFolderPath"
}
}
],
"outputs": [
{
"referenceName": "BlobDataset",
"type": "DatasetReference",
"parameters": {
"BlobFileName": "#item().BlobFileName",
"BlobFolderPath": "#item().BlobFolderPath"
}
}
]
}
]
}
}
],
"parameters": {
"InputParams": {
"type": "Array",
"defaultValue": [
{
"FtpFolderPath": "/Folder1/",
"FtpFileName": "#concat('File_',formatDateTime(utcnow(), 'yyyyMMdd'), '.txt')",
"BlobFolderPath": "blobfolderpath",
"BlobFileName": "blobfile1"
},
{
"FtpFolderPath": "/Folder2/",
"FtpFileName": "#concat('File_',formatDateTime(utcnow(), 'yyyyMMdd'), '.txt')",
"BlobFolderPath": "blobfolderpath",
"BlobFileName": "blobfile2"
}
]
}
}
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
The issue I am having is that when specifying pipeline parameters, it seems I cannot use system variables and functions the same way I can when for example specifying folder paths for a blob storage dataset.
The consequence of this is that formatDateTime(utcnow(), 'yyyyMMdd') is not being interpreted as function calls but rather the actual string with value formatDateTime(utcnow(), 'yyyyMMdd').
To counter this I am guessing I should be using a trigger to execute my pipeline and pass the trigger's execution time as a parameter to the pipeline like trigger().startTime but is this the only way? Am I simply doing something wrong in my pipeline's JSON?
This should work:
File_#{formatDateTime(utcnow(), 'yyyyMMdd')}
Or complex paths as well:
rootfolder/subfolder/#{formatDateTime(utcnow(),'yyyy')}/#{formatDateTime(utcnow(),'MM')}/#{formatDateTime(utcnow(),'dd')}/#{formatDateTime(utcnow(),'HH')}
You can't put a dynamic expression in the default value. You should define this expression and function either when you creating a trigger, or when you define dataset parameters in sink/source in copy activity.
So either you create dataset property FtpFileName with some default value in source dataset, and then in copy activity, you can in source category to specify that dynamic expression.
Another way is to define pipeline parameter, and then to add dynamic expression to that pipeline parameter when you are defining a trigger. Hope this is a clear answer to you. :)
Default value of parameters cannot be expressions. They must be literal strings.
You could use trigger to achieve this. Or you could extract the common part of your expressions and just put literal values into the foreach items.

Error while running U-SQL Activity in Pipeline in Azure Data Factory

I am getting following error while running a USQL Activity in the pipeline in ADF:
Error in Activity:
{"errorId":"E_CSC_USER_SYNTAXERROR","severity":"Error","component":"CSC",
"source":"USER","message":"syntax error.
Final statement did not end with a semicolon","details":"at token 'txt', line 3\r\nnear the ###:\r\n**************\r\nDECLARE #in string = \"/demo/SearchLog.txt\";\nDECLARE #out string = \"/scripts/Result.txt\";\nSearchLogProcessing.txt ### \n",
"description":"Invalid syntax found in the script.",
"resolution":"Correct the script syntax, using expected token(s) as a guide.","helpLink":"","filePath":"","lineNumber":3,
"startOffset":109,"endOffset":112}].
Here is the code of output dataset, pipeline and USQL script which i am trying to execute in pipeline.
OutputDataset:
{
"name": "OutputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceDestination",
"typeProperties": {
"folderPath": "scripts/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
Pipeline:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"script": "SearchLogProcessing.txt",
"scriptPath": "scripts\\",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/demo/SearchLog.txt",
"out": "/scripts/Result.txt"
}
},
"inputs": [
{
"name": "InputDataLakeTable"
}
],
"outputs": [
{
"name": "OutputDataLakeTable"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "CopybyU-SQL",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-01-03T12:01:05.53Z",
"end": "2017-01-03T13:01:05.53Z",
"isPaused": false,
"hubName": "denojaidbfactory_hub",
"pipelineMode": "Scheduled"
}
}
Here is my USQL Script which i am trying to execute using "DataLakeAnalyticsU-SQL" Activity Type.
#searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM #in
USING Extractors.Text(delimiter:'|');
#rs1 =
SELECT Start, Region, Duration
FROM #searchlog
WHERE Region == "kota";
OUTPUT #rs1
TO #out
USING Outputters.Text(delimiter:'|');
Please suggest me how to resolve this issue.
Your script is missing the scriptLinkedService attribute. You also (currently) need to place the U-SQL script in Azure Blob Storage to run it successfully. Therefore you also need an AzureStorage Linked Service, for example:
{
"name": "StorageLinkedService",
"properties": {
"description": "",
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=myAzureBlobStorageAccount;AccountKey=**********"
}
}
}
Create this linked service, replacing the Blob storage name myAzureBlobStorageAccount with your relevant Blob Storage account, then place the U-SQL script (SearchLogProcessing.txt) in a container there and try again. In my example pipeline below, I have a container called adlascripts in my Blob store and the script is in there:
Make sure the scriptPath is complete, as Alexandre mentioned. Start of the pipeline:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "adlascripts\\SearchLogProcessing.txt",
"scriptLinkedService": "StorageLinkedService",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/input/SearchLog.tsv",
"out": "/output/Result.tsv"
}
},
...
The input and output .tsv files can be in the data lake and use the the AzureDataLakeStoreLinkedService linked service.
I can see you are trying to follow the demo from: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity#script-definition. It is not the most intuitive demo and there seem to be some issues like where is the definition for StorageLinkedService?, where is SearchLogProcessing.txt? OK I found it by googling but there should be a link in the webpage. I got it to work but felt a bit like Harry Potter in the Half-Blood Prince.
Remove the script attribute in your U-SQL activity definition and provide the complete path to your script (including filename) in the scriptPath attribute.
Reference: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity
I had a similary issue, where Azure Data Factory would not recognize my script files. A way to avoid the whole issue, while not having to paste a lot of code, is to register a stored procedure. You can do it like this:
DROP PROCEDURE IF EXISTS master.dbo.sp_test;
CREATE PROCEDURE master.dbo.sp_test()
AS
BEGIN
#searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM #in
USING Extractors.Text(delimiter:'|');
#rs1 =
SELECT Start, Region, Duration
FROM #searchlog
WHERE Region == "kota";
OUTPUT #rs1
TO #out
USING Outputters.Text(delimiter:'|');
END;
After running this, you can use
"script": "master.dbo.sp_test()"
in your JSON pipeline definition. Whenever you update the U-SQL script, simply re-run the definition of the procedure. Then there will be no need to copy script files to Blob Storage.

How to define a Table in Azure Data Factory

When creating a HDInsight On Demand Linked Resource, the Data Factory creates a new container for the hdinsight. I wonder to know how I can creates a Table that points to that container? Here is my Table definition
{
"name": "AzureBlobLocation",
"properties": {
"published": false,
"type": "AzureBlob",
"linkedServiceName": "AzureBlobLinkedService",
"typeProperties": {
"folderPath": "????/Folder1/Folder2/Output/Aggregation/",
"format": {
"type": "TextFormat"
}
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
What should goes instead of '????' that I put in there? the keyword in not accepted.
I should use the keyword 'container' in order to point to the working container.

Resources