I am running Python notebook on Azure data factory. Which has failed and giving me following output.
{
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (Central India)",
"executionDuration": 260,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "ExternalActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.08333333333333333,
"unit": "Hours"
}
]
}
}
What is the meaning of this output?
Per my experience, it's the pipeline run consumption. It give the value which can help you
calculate the cost of the pipeline. No matter the pipeline failed or succeeded.
Ref this: https://azure.microsoft.com/en-us/pricing/calculator/?service=data-factory%2F
HTH.
It means your activity ran on an Azure Integration Runtime (DefaultIntegrationRuntime in Central India) and you were billed for ~260 seconds of usage.
Related
I have a stored procedure which accepts a single input parameter and returns a data set. I want to invoke this stored procedure from my ADF Pipeline and with the stored proc data, I want to call another proc with whose result I want to do further processing.
I tried with the Stored Procedure Activity but it's output doesn't contain the actual data set:
{
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Australia East)",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "ExternalActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
Also, I tried with LookUp Activitybut it's result only contains the first row of the resultant data set:
{
"firstRow": {
"CountryID": 1411,
"CountryName": "Maldives",
"PresidentName": "XXXX"
},
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Australia East)",
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "DIUHours"
}
]
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
My main intention behind using ADF is to reduce the huge amount of time taken otherwise by an existing API (.Net Core) for the same steps. What else can be done? Should I consider any other Azure Service(s)?
I have a pipeline that pull out data from external and sink into SQL Server table as staging. Process for getting raw data has already succeeded by using 4 'Copy data'. Because of so many columns (250 columns), so I split them.
What the next requirement validate 4 those 'Copy data' by getting succeeded status. The output of 'Copy data' look like this
{
"dataRead": 4772214,
"dataWritten": 106918,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"rowsRead": 1366,
"rowsCopied": 1366,
"copyDuration": 8,
"throughput": 582.546,
"errors": [],
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Southeast Asia)",
"usedDataIntegrationUnits": 4,
"billingReference": {
"activityType": "DataMovement",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "DIUHours"
}
]
},
"usedParallelCopies": 1,
"executionDetails": [
{
"source": {
"type": "RestService"
},
"sink": {
"type": "AzureSqlDatabase",
"region": "Southeast Asia"
},
"status": "Succeeded",
"start": "2022-04-13T07:16:48.5905628Z",
"duration": 8,
"usedDataIntegrationUnits": 4,
"usedParallelCopies": 1,
"profile": {
"queue": {
"status": "Completed",
"duration": 4
},
"transfer": {
"status": "Completed",
"duration": 4,
"details": {
"readingFromSource": {
"type": "RestService",
"workingDuration": 1,
"timeToFirstByte": 1
},
"writingToSink": {
"type": "AzureSqlDatabase",
"workingDuration": 0
}
}
}
},
"detailedDurations": {
"queuingDuration": 4,
"timeToFirstByte": 1,
"transferDuration": 3
}
}
],
"dataConsistencyVerification": {
"VerificationResult": "NotVerified"
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
Now, I want to get "status": "Succeeded" (JSON output) for validating in the 'IF Condition'. So, I set Value from variable in the dynamic content #activity('copy_data_Kobo_MBS').output
but when it run, I got error
The variable 'copy_Kobo_MBS' of type 'Boolean' cannot be initialized
or updated with value of type 'Object'. The variable 'copy_Kobo_MBS'
only supports values of types 'Boolean'.
And the question is how to get "status": "Succeeded" (JSON output) as 'Variable' value ? So 'IF condition' can examine the 'Variable' value.
You can use the below expression to pull the run status from the copy data activity. As your variable is of Boolean type, you need to evaluate it using the #equals() function which returns true or false.
#equals(activity('Copy data1').output.executionDetails[0].status,'Succeeded')
As per knowledge, you don’t have to extract the status from copy data activity as you are connecting your copy activity to set variable activity upon success.
That means your set variable activity runs only when your copy data activity ran successfully.
Also, note that
If the copy data activity (or any other activity) fails, then the activities which are added upon the success output of the previous activity will not be running.
And if you are connecting more than 1 activity output to a single activity, it only runs when all the connected activities run.
You can add activities upon failure or upon completion to process further.
Example:
In the below snip, the Set Variable activity is not run as copy data is not successful. And Wait2 activity is not run as all the input activities are not run successfully.
I have an Azure Data Factory (v2) that I use to backup the contents of a database to a blob store nightly (2am UTC). However, I expect the name of the file to contain the day of the month (dd) that the backup was generated, but it's always the day before.
The file name is generated using an expression -
#{formatDateTime(pipeline().parameters.windowStart,'dd')}.json
So for example the run at 3am today should have been called 23.json, but it was actually called 22.json. 3am is the expected run time as I'm in the UK, which is currently on BST (UTC+1)
Looking at the parameters of the run, I can see that the windowStart is indeed a day out. For example, todays run which was triggered at 2am on the 23rd had 9/22/2020 2:00:00 AM.
Is anybody able to explain why Data Factory is behaving in this way, and hopefully how I can make it work as expected.
Here is the trigger as exported from the Data Factory.
{
"name": "Trigger_Copy_Transactions",
"properties": {
"annotations": [],
"runtimeState": "Started",
"pipeline": {
"pipelineReference": {
"referenceName": "Copy_Transactions",
"type": "PipelineReference"
},
"parameters": {
"windowStart": "#trigger().outputs.windowStartTime"
}
},
"type": "TumblingWindowTrigger",
"typeProperties": {
"frequency": "Hour",
"interval": 24,
"startTime": "2020-08-24T02:00:00Z",
"delay": "00:00:00",
"maxConcurrency": 50,
"retryPolicy": {
"intervalInSeconds": 30
},
"dependsOn": []
}
}
}
One thing you could try is to force the file to be generated in the same time zone that your IR is running in. For example we have a Self Hosted IR so when we would generate files it would not match EST times. In that case I did the following:
#concat('File_name',formatDateTime(
convertFromUtc(utcnow(),'Eastern Standard Time'),'yyyy-MM-dd'),'.txt')
Perhaps doing that would force the proper date?
Are you also using Auto-generated IR or Self-Hosted IR when running this job?
I'm using a stored procedure activity for ADF v2 pipeline. Now issue here is whenever the pipeline fails at the stored procedure activity I'm not getting the complete error details. Below is the JSON output of that stored procedure activity:
{
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (West Europe)",
"executionDuration": 416,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "ExternalActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.11666666666666667,
"unit": "Hours"
}
]
}
}
Please let me know how do I get the error details for the stored procedure activity for ADF v2 pipeline?
You should throw the exception in your stored procedure code:
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/throw-transact-sql?view=sql-server-ver15
I am trying to make a periodic copy of all the data returning from an OData query into a documentDB collection, on a daily basis.
The copy works fine using the copy wizard, which is A REALLY GREAT option for simple tasks. Thanks for that.
What isn't working for me though: The copy just adds data each time, and I have NO WAY that I can SEE with a documentDB sink to "pre-delete" the data in the collection (compare to the SQL sink which has sqlWriterCleanupScript, which I could set to something like Delete * from 'table').
I know I can create an Azure Batch and do what I need, but at this point, I'm not sure that it isn't better to do a function and forego the Azure Data Factory (ADF) for this move. I'm using ADF for replicating on-prem SQL stuff just fine, because it has the writer cleanup script.
At this point, I'd like to just use DocumentDB but I don't see a way to do it given the way my data works.
Here's a look at my pipeline:
{
"name": "R-------ProjectToDocDB",
"properties": {
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "RelationalSource",
"query": " "
},
"sink": {
"type": "DocumentDbCollectionSink",
"nestingSeparator": ".",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
/// this is where a cleanup script would be great.
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "ProjectId:ProjectId,.....:CostClassification"
}
},
"inputs": [
{
"name": "InputDataset-shc"
}
],
"outputs": [
{
"name": "OutputDataset-shc"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "Activity-0-_Custom query_->---Project"
}
],
"start": "2017-04-26T20:13:27.683Z",
"end": "2099-12-31T05:00:00Z",
"isPaused": false,
"hubName": "r-----datafactory01_hub",
"pipelineMode": "Scheduled"
}
}
Perhaps there's an update in the pipeline that creates parity between SQL output and DocumentDB
Azure Data Factory did not support clean up script for DocDB today. It's something in our backlog. If you can describe a little bit more for the E2E scenario, could help us priorities. For example, why append to the same collection not work? Is that because there's no way to identify the incremental records after each run? For the clean up requirement, will that always be delete * or it might be based on time stamp, etc. Thanks. Before the support for clean up script was there, custom activity was the only way to workaround now, sorry.
You could use a Logic App that runs on a Timer Trigger.