Stored procedure activity ADF V2

Stored procedure activity ADF V2 - azure

I'm using a stored procedure activity for ADF v2 pipeline. Now issue here is whenever the pipeline fails at the stored procedure activity I'm not getting the complete error details. Below is the JSON output of that stored procedure activity:
{
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (West Europe)",
"executionDuration": 416,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "ExternalActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.11666666666666667,
"unit": "Hours"
}
]
}
}
Please let me know how do I get the error details for the stored procedure activity for ADF v2 pipeline?

You should throw the exception in your stored procedure code:
https://learn.microsoft.com/en-us/sql/t-sql/language-elements/throw-transact-sql?view=sql-server-ver15

Related

Get Stored Procedure Output DataSet from Azure Data Factory

I have a stored procedure which accepts a single input parameter and returns a data set. I want to invoke this stored procedure from my ADF Pipeline and with the stored proc data, I want to call another proc with whose result I want to do further processing.
I tried with the Stored Procedure Activity but it's output doesn't contain the actual data set:
{
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Australia East)",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "ExternalActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
Also, I tried with LookUp Activitybut it's result only contains the first row of the resultant data set:
{
"firstRow": {
"CountryID": 1411,
"CountryName": "Maldives",
"PresidentName": "XXXX"
},
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Australia East)",
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "DIUHours"
}
]
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
My main intention behind using ADF is to reduce the huge amount of time taken otherwise by an existing API (.Net Core) for the same steps. What else can be done? Should I consider any other Azure Service(s)?

ADF get property "status": "Succeeded" and IF for validation

I have a pipeline that pull out data from external and sink into SQL Server table as staging. Process for getting raw data has already succeeded by using 4 'Copy data'. Because of so many columns (250 columns), so I split them.
What the next requirement validate 4 those 'Copy data' by getting succeeded status. The output of 'Copy data' look like this
{
"dataRead": 4772214,
"dataWritten": 106918,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"rowsRead": 1366,
"rowsCopied": 1366,
"copyDuration": 8,
"throughput": 582.546,
"errors": [],
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Southeast Asia)",
"usedDataIntegrationUnits": 4,
"billingReference": {
"activityType": "DataMovement",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "DIUHours"
}
]
},
"usedParallelCopies": 1,
"executionDetails": [
{
"source": {
"type": "RestService"
},
"sink": {
"type": "AzureSqlDatabase",
"region": "Southeast Asia"
},
"status": "Succeeded",
"start": "2022-04-13T07:16:48.5905628Z",
"duration": 8,
"usedDataIntegrationUnits": 4,
"usedParallelCopies": 1,
"profile": {
"queue": {
"status": "Completed",
"duration": 4
},
"transfer": {
"status": "Completed",
"duration": 4,
"details": {
"readingFromSource": {
"type": "RestService",
"workingDuration": 1,
"timeToFirstByte": 1
},
"writingToSink": {
"type": "AzureSqlDatabase",
"workingDuration": 0
}
}
}
},
"detailedDurations": {
"queuingDuration": 4,
"timeToFirstByte": 1,
"transferDuration": 3
}
}
],
"dataConsistencyVerification": {
"VerificationResult": "NotVerified"
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
Now, I want to get "status": "Succeeded" (JSON output) for validating in the 'IF Condition'. So, I set Value from variable in the dynamic content #activity('copy_data_Kobo_MBS').output
but when it run, I got error
The variable 'copy_Kobo_MBS' of type 'Boolean' cannot be initialized
or updated with value of type 'Object'. The variable 'copy_Kobo_MBS'
only supports values of types 'Boolean'.
And the question is how to get "status": "Succeeded" (JSON output) as 'Variable' value ? So 'IF condition' can examine the 'Variable' value.

You can use the below expression to pull the run status from the copy data activity. As your variable is of Boolean type, you need to evaluate it using the #equals() function which returns true or false.
#equals(activity('Copy data1').output.executionDetails[0].status,'Succeeded')
As per knowledge, you don’t have to extract the status from copy data activity as you are connecting your copy activity to set variable activity upon success.
That means your set variable activity runs only when your copy data activity ran successfully.
Also, note that
If the copy data activity (or any other activity) fails, then the activities which are added upon the success output of the previous activity will not be running.
And if you are connecting more than 1 activity output to a single activity, it only runs when all the connected activities run.
You can add activities upon failure or upon completion to process further.
Example:
In the below snip, the Set Variable activity is not run as copy data is not successful. And Wait2 activity is not run as all the input activities are not run successfully.

What is the meaning of following ADF output

I am running Python notebook on Azure data factory. Which has failed and giving me following output.
{
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (Central India)",
"executionDuration": 260,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "ExternalActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.08333333333333333,
"unit": "Hours"
}
]
}
}
What is the meaning of this output?

Per my experience, it's the pipeline run consumption. It give the value which can help you
calculate the cost of the pipeline. No matter the pipeline failed or succeeded.
Ref this: https://azure.microsoft.com/en-us/pricing/calculator/?service=data-factory%2F
HTH.

It means your activity ran on an Azure Integration Runtime (DefaultIntegrationRuntime in Central India) and you were billed for ~260 seconds of usage.

Error while running U-SQL Activity in Pipeline in Azure Data Factory

I am getting following error while running a USQL Activity in the pipeline in ADF:
Error in Activity:
{"errorId":"E_CSC_USER_SYNTAXERROR","severity":"Error","component":"CSC",
"source":"USER","message":"syntax error.
Final statement did not end with a semicolon","details":"at token 'txt', line 3\r\nnear the ###:\r\n**************\r\nDECLARE #in string = \"/demo/SearchLog.txt\";\nDECLARE #out string = \"/scripts/Result.txt\";\nSearchLogProcessing.txt ### \n",
"description":"Invalid syntax found in the script.",
"resolution":"Correct the script syntax, using expected token(s) as a guide.","helpLink":"","filePath":"","lineNumber":3,
"startOffset":109,"endOffset":112}].
Here is the code of output dataset, pipeline and USQL script which i am trying to execute in pipeline.
OutputDataset:
{
"name": "OutputDataLakeTable",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "LinkedServiceDestination",
"typeProperties": {
"folderPath": "scripts/"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
Pipeline:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"script": "SearchLogProcessing.txt",
"scriptPath": "scripts\\",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/demo/SearchLog.txt",
"out": "/scripts/Result.txt"
}
},
"inputs": [
{
"name": "InputDataLakeTable"
}
],
"outputs": [
{
"name": "OutputDataLakeTable"
}
],
"policy": {
"timeout": "06:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"retry": 1
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "CopybyU-SQL",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2017-01-03T12:01:05.53Z",
"end": "2017-01-03T13:01:05.53Z",
"isPaused": false,
"hubName": "denojaidbfactory_hub",
"pipelineMode": "Scheduled"
}
}
Here is my USQL Script which i am trying to execute using "DataLakeAnalyticsU-SQL" Activity Type.
#searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM #in
USING Extractors.Text(delimiter:'|');
#rs1 =
SELECT Start, Region, Duration
FROM #searchlog
WHERE Region == "kota";
OUTPUT #rs1
TO #out
USING Outputters.Text(delimiter:'|');
Please suggest me how to resolve this issue.

Your script is missing the scriptLinkedService attribute. You also (currently) need to place the U-SQL script in Azure Blob Storage to run it successfully. Therefore you also need an AzureStorage Linked Service, for example:
{
"name": "StorageLinkedService",
"properties": {
"description": "",
"type": "AzureStorage",
"typeProperties": {
"connectionString": "DefaultEndpointsProtocol=https;AccountName=myAzureBlobStorageAccount;AccountKey=**********"
}
}
}
Create this linked service, replacing the Blob storage name myAzureBlobStorageAccount with your relevant Blob Storage account, then place the U-SQL script (SearchLogProcessing.txt) in a container there and try again. In my example pipeline below, I have a container called adlascripts in my Blob store and the script is in there:
Make sure the scriptPath is complete, as Alexandre mentioned. Start of the pipeline:
{
"name": "ComputeEventsByRegionPipeline",
"properties": {
"description": "This is a pipeline to compute events for en-gb locale and date less than 2012/02/19.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "adlascripts\\SearchLogProcessing.txt",
"scriptLinkedService": "StorageLinkedService",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "/input/SearchLog.tsv",
"out": "/output/Result.tsv"
}
},
...
The input and output .tsv files can be in the data lake and use the the AzureDataLakeStoreLinkedService linked service.
I can see you are trying to follow the demo from: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity#script-definition. It is not the most intuitive demo and there seem to be some issues like where is the definition for StorageLinkedService?, where is SearchLogProcessing.txt? OK I found it by googling but there should be a link in the webpage. I got it to work but felt a bit like Harry Potter in the Half-Blood Prince.

Remove the script attribute in your U-SQL activity definition and provide the complete path to your script (including filename) in the scriptPath attribute.
Reference: https://learn.microsoft.com/en-us/azure/data-factory/data-factory-usql-activity

I had a similary issue, where Azure Data Factory would not recognize my script files. A way to avoid the whole issue, while not having to paste a lot of code, is to register a stored procedure. You can do it like this:
DROP PROCEDURE IF EXISTS master.dbo.sp_test;
CREATE PROCEDURE master.dbo.sp_test()
AS
BEGIN
#searchlog =
EXTRACT UserId int,
Start DateTime,
Region string,
Query string,
Duration int?,
Urls string,
ClickedUrls string
FROM #in
USING Extractors.Text(delimiter:'|');
#rs1 =
SELECT Start, Region, Duration
FROM #searchlog
WHERE Region == "kota";
OUTPUT #rs1
TO #out
USING Outputters.Text(delimiter:'|');
END;
After running this, you can use
"script": "master.dbo.sp_test()"
in your JSON pipeline definition. Whenever you update the U-SQL script, simply re-run the definition of the procedure. Then there will be no need to copy script files to Blob Storage.

Call stored procedure using ADF

I am loading SQL server table using ADF and after insertion is over, I have to do little manipulation using below approach
Trigger (After insert) - Failed, SQL server not able to detect inserted record that I push using ADF.. **Seems to be a bug**.
Stored procedure using user defined table type - Getting error
Error Number '156'. Error message from database execution : Incorrect
syntax near the keyword 'select'. Must declare the table variable
"#a".
I have created below pipeline
{
"name": "CopyPipeline-xxx",
"properties": {
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "AzureDataLakeStoreSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"sqlWriterStoredProcedureName": "sp_xxx",
"storedProcedureParameters": {
"stringProductData": {
"value": "str1"
}
},
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "col1:col1,col2:col2"
}
},
"inputs": [
{
"name": "InputDataset-3jg"
}
],
"outputs": [
{
"name": "OutputDataset-3jg"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Hour",
"interval": 8
},
"name": "Activity-0-xxx_csv->[dbo]_[xxx_staging]"
}
],
"start": "2017-01-09T21:48:53.348Z",
"end": "2099-12-30T18:30:00Z",
"isPaused": false,
"hubName": "hub",
"pipelineMode": "Scheduled"
}
}
and using below stored procedure
create procedure [dbo].[sp_xxx] #xxx1 [dbo].[ut_xxx] READONLY, #str1 varchar(100) AS
MERGE xxx_dummy AS a
USING #xxx1 AS b
ON (a.col1 = b.col1)
WHEN NOT MATCHED
THEN INSERT(col1, col2)
VALUES(b.col1, b.col2)
WHEN MATCHED
THEN UPDATE SET a.col2 = b.col2;
Please help me to resolve the issue.

I can reproduce your first error. Inserting to a SQL Server table with Azure Data Factory (ADF) appears to use a bulk insert method (similar to BULK INSERT, bcp, SSIS etc) and by default these methods do not fire triggers:
insert bulk [dbo].[testADF] ([col1] Int, [col2] Int, [col3] Int, [col4] Int)
with (TABLOCK, CHECK_CONSTRAINTS)
With bcp, BULK INSERT there is a flag to change to say 'fire triggers' but it appears there is no way to change this setting for ADF. As a workaround, move the logic from your trigger into the stored proc.
If you believe this flag is important, consider creating a feedback item.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string