Azure Datafactory can't handle empty json array in blob - azure

in azure data factory dataset, using the copy activity to load json blob to sqldb, when the json blob is an empty array "[]" the copy activity gets stuck with error.
{
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=UserErrorTypeInSchemaTableNotSupported,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Failed to get the type from schema table. This could be caused by missing Sql Server System CLR Types.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=System.InvalidCastException,Message=Unable to cast object of type 'System.DBNull' to type 'System.Type'.,Source=Microsoft.DataTransfer.ClientLibrary,'",
"failureType": "UserError",
"target": "BP_acctset_Blob2SQL",
"details": []
}

Use Get Metadata to get the file size.
Use if condition to juge if the size is greater than 2. If true then exec copy activity.

Related

Azure Data Factory REST API return invalid JSON file with pagination

I'm building a pipeline, which copy a response from a API into a file in my storage account. There is also an element of pagination. However, that works like a charm and i get all my data from all the pages.
My result is something like this:
{"data": {
"id": "Something",
"value": "Some other thing"
}}
The problem, is that the copy function just appends the response to the file and thereby making it invalid JSON, which is a big problem further down the line. The final output would look like:
{"data": {
"id": "22222",
"value": "Some other thing"
}}
{"data": {
"id": "33333",
"value": "Some other thing"
}}
I have tried everything I could think of and google my way to, but nothing changes how the data is appended to the file and i'm stuck with an invalid JSON file :(
As a backup plan, i'll just make a loop and create a JSON file for each PAGE. But that seems a bit janky and really slow
Anyone got an idea or have a solution for my problem?
When you copy data from Rest API to blob storage it will copy data in the form of set of objects by default.
Example:
sample data
{ "time": "2015-04-29T07:12:20.9100000Z", "callingimsi": "466920403025604"}
sink data
{"time":"2015-04-29T07:12:20.9100000Z","callingimsi":"466920403025604"}
{"time":"2015-04-29T07:13:21.0220000Z","callingimsi":"466922202613463"}
{"time":"2015-04-29T07:13:21.4370000Z","callingimsi":"466923101048691"}
This is the invalid format of Json.
To work around this, select file pattern in sink activity setting as Array of objects this will return array of all objects.
Output:

Azure synapse get metadata

I am trying to get a list of all files in a folder with get metadata activity. To pass this list to the for-each activity, which in turn executes a notebook.
I have a binary dataset and field list is set to child items.
Pipeline crashes every time with the error:
{
"errorCode": "2011",
"message": "Blob operation Failed. ContainerName: tmp, path: /tmp/folder/folder1/.",
"failureType": "UserError",
"target": "Get Metadata",
"details": []
}
The files are in 'folder/folder1'.
It's not my first time working with Get Metadata activity and so far it has always worked (in ADF). But I do it first time in Synapse, are there differences? Do you have any ideas what this could be or how I can solve the problem?
Usage of Get Metadata activity to retrieve the metadata of any data is the same in the Azure data factory and Azure Synapse pipeline.
Create a binary dataset with a dataset parameter for filename.
Connect the binary dataset to the Get metadata activity.
Pass '*' to the filename parameter value.
Select child items under Field list to get the list of files/subfolders from the folder.
The output gives the list of files from the folder.

Azure Data Factory Copy Activity on Failure | Expression not evaluated

I'm trying to run a copy activity in ADF and purposely trying to fail this activity to test my failure logging.
Here is what the pipeline looks like (please note that this copy activity sits inside a "for each" activity and (inside "for each") an "if conditional" activity.
This is how the pipeline looks
I'm expecting the copy to fail, however not for the "LOG FAILURE" stored procedure, since I want to log the copy activity details in a SQL DB table. Here is what the error says:
In the LOG FAILURE activity:
"errorCode": "InvalidTemplate",
"message": "The expression 'activity('INC_COPY_TO_ADL').output.rowsCopied' cannot be evaluated because property 'rowsCopied' doesn't exist, available properties are 'dataWritten, filesWritten, sourcePeakConnections, sinkPeakConnections, copyDuration, errors, effectiveIntegrationRuntime, usedDataIntegrationUnits, billingReference, usedParallelCopies, executionDetails, dataConsistencyVerification, durationInQueue'.",
"failureType": "UserError",
"target": "LOG_FAILURE"
In the Copy activity INC_COPY_TO_ADL (this is expected since the SQL query is wrong)
"errorCode": "2200",
"message": "Failure happened on 'Source' side. ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: 'Invalid object name 'dbo.CustCustomerV3Staging123'.',Source=,''Type=System.Data.SqlClient.SqlException,Message=Invalid object name 'dbo.CustCustomerV3Staging123'.,Source=.Net SqlClient Data Provider,SqlErrorNumber=208,Class=16,ErrorCode=-2146232060,State=1,Errors=[{Class=16,Number=208,State=1,Message=Invalid object name 'dbo.CustCustomerV3Staging123'.,},],'",
"failureType": "UserError",
"target": "INC_COPY_TO_ADL"
I wonder why the LOG Failure activity failed (i.e. the expression was not evaluated)? Please note that when the copy activity is correct, the "LOG SUCCESS" stored procedures works okay.
This is how the pipeline looks like
Many thanks.
RA
#rizal activity('INC_COPY_TO_ADL').output.rowsCopied is not part of the output of the copy activity in case of failure. Try to set default value for Log_Failure in this case -1 and keep Log_Success as it is

Azure : How to write path to get a file from a time series partitioned folder using the Azure logic apps

I am trying to retrieve a csv file from the Azure blob storage using the logic apps.
I set the azure storage explorer path in the parameters and in the get blob content action I am using that parameter.
In the Parameters I have set the value as:
concat('Directory1/','Year=',string(int(substring(utcNow(),0,4))),'/Month=',string(int(substring(utcnow(),5,2))),'/Day=',string(int(substring(utcnow(),8,2))),'/myfile.csv')
So during the run time this path should form as:
Directory1/Year=2019/Month=12/Day=30/myfile.csv
but during the execution action is getting failed with the following error message
{
"status": 400,
"message": "The specifed resource name contains invalid characters.\r\nclientRequestId: 1e2791be-8efd-413d-831e-7e2cd89278ba",
"error": {
"message": "The specifed resource name contains invalid characters."
},
"source": "azureblob-we.azconn-we-01.p.azurewebsites.net"
}
So my question is: How to write path to get data from the time series partitioned path.
The response of the Joy Wang was partially correct.
The Parameters in logic apps will treat values as a String only and will not be able to identify any functions such as concat().
The correct way to use the concat function is to use the expressions.
And my solution to the problem is:
concat('container1/','Directory1/','Year=',string(int(substring(utcNow(),0,4))),'/Month=',string(int(substring(utcnow(),5,2))),'/Day=',string(int(substring(utcnow(),8,2))),'/myfile.csv')
You should not use that in the parameters, when you use this line concat('Directory1/','Year=',string(int(substring(utcNow(),0,4))),'/Month=',string(int(substring(utcnow(),5,2))),'/Day=',string(int(substring(utcnow(),8,2))),'/myfile.csv') in the parameters, its type is String, it will be recognized as String by logic app, then the function will not take effect.
And you need to include the container name in the concat(), also, no need to use string(int()), because utcNow() and substring() both return the String.
To fix the issue, use the line below directly in the Blob option, my container name is container1.
concat('container1/','Directory1/','Year=',substring(utcNow(),0,4),'/Month=',substring(utcnow(),5,2),'/Day=',substring(utcnow(),8,2),'/myfile.csv')
Update:
As mentioned in #Stark's answer, if you want to drop the leading 0 from the left.
You can convert it from string to int, then convert it back to string.
concat('container1/','Directory1/','Year=',string(int(substring(utcNow(),0,4))),'/Month=',string(int(substring(utcnow(),5,2))),'/Day=',string(int(substring(utcnow(),8,2))),'/myfile.csv')

Issue with Azure Blob Indexer

I have come across a scenario where I want to index all the files that are present in the blob storage.
But, In a scenario if the file that is uploaded in Blob is password protected, the indexer fails and also the indexer is now not able to index the remaining files.
[
{
"key": null,
"errorMessage": "Error processing blob 'url' with content type ''. Status:422, error: "
}
]
Is there a way to ignore the password protected files or a way to continue with the indexing process even if there is an error in some file.
See Dealing with unsupported content types section in Controlling which blobs are indexed. Use failOnUnsupportedContentType configuration setting:
PUT https://[service name].search.windows.net/indexers/[indexer name]?api-version=2016-09-01
Content-Type: application/json
api-key: [admin key]
{
... other parts of indexer definition
"parameters" : { "configuration" : { "failOnUnsupportedContentType" : false } }
}
Is there a way to ignore the password protected files or a way to
continue with the indexing process even if there is an error in some
file.
One possible way to do it is define a metadata on the blob by the name AzureSearch_Skip and set its value to true. In this case, Azure Search Service will ignore this blob and moves to the next blob in the list.
You can read more about this here: https://learn.microsoft.com/en-us/azure/search/search-howto-indexing-azure-blob-storage#controlling-which-parts-of-the-blob-are-indexed.

Resources