I am trying to implement a Get Metadata activity to return the column count of files I have in a single blob storage container.
Get Metadata activity is returning this error:
Error
I'm fairly new to Azure Data Factory and cannot solve this. Here's what I have:
Dataset:Source dataset
Name- ten_eighty_split_CSV
Connection- Blob storage
Schema- imported from blob storage file
Parameters- "FileName"; string; "#pipeline().parameters.SourceFile"
Pipeline:
Name: ten eighty split
Parameters: "SourceFile"; string; "#pipeline().parameters.SourceFile"
Settings: Concurrency: 1
Get Metadata activity: Get Metadata
Only argument is "Column count"
Throws the error upon debugging. I am not sure what to do, (404) not found is so broad I could not ascertain a specific solution. Thanks!
The error occurs because you have given incorrect file name (or) name of a file that does not exist.
Since you are trying to use blob created event trigger to find the column count, you can use the procedure below:
After configuring the get metadata activity, create a storage event trigger. Go to Add trigger -> choose trigger -> Create new.
Click on continue. You will get a Trigger Run Parameters tab. In this, give the value as #triggerBody().fileName.
Complete the trigger creation and publish the pipeline. Now whenever the file is uploaded into your container (on top of which you created storage event trigger), it will trigger the pipeline automatically (no need to debug). If the container is empty and you try to debug by giving some value for sourceFile parameter, it would give the same error.
Upload a sample file to your container. It will trigger the pipeline and give the desired result.
The following is the trigger JSON that I created for my container:
{
"name": "trigger1",
"properties": {
"annotations": [],
"runtimeState": "Started",
"pipelines": [
{
"pipelineReference": {
"referenceName": "pipeline1",
"type": "PipelineReference"
},
"parameters": {
"sourceFile": "#triggerBody().fileName"
}
}
],
"type": "BlobEventsTrigger",
"typeProperties": {
"blobPathBeginsWith": "/data/blobs/",
"blobPathEndsWith": ".csv",
"ignoreEmptyBlobs": true,
"scope": "/subscriptions/b83c1ed3-c5b6-44fb-b5ba-2b83a074c23f/resourceGroups/<user>/providers/Microsoft.Storage/storageAccounts/blb1402",
"events": [
"Microsoft.Storage.BlobCreated"
]
}
}
}
So, I have a logic app that looks like below
enter image description here
The main idea of the app is to get the list items of a list and copy the contents in a csv file in blob storage.
The site name and list name are passed through the HTTP request body.
However, I would like to also define the Select operation column mapping dynamically.
The body looks like this
{
"listName" : "The list name",
"siteAddress" : "SharepointSiteAddress",
"columns" : {
"Email": " #item()?['Employee']?['Email']",
"Region": " #item()?['Region']?['Value']"
}
}
In the 'Map' section of the 'Select' Operation I use the 'columns' property as shown below
enter image description here
However, in the output stream of the 'Select' Operation, email and region column values are resolved with the string that is passed instead of retrieving the actual item value that I am trying to refer to.
Can I somehow create the csv table dynamically through the HTTP request while also being able to access the items' values?
Using expressions, you can create CSV file with Dynamic data. I have reproduced issue from my side and below are the stepts I followed.
Created Logic app as shown below,
In Http trigger, I have defined sample payload as shown below,
{
"listName" : "The list name",
"siteAddress" : "SharepointSiteAddress",
"columns" : {
"Email": " Email",
"DisplayName": "DisplayName"
}
}
In select action, from is taken from Get items value. In Map row, key is taken from Http trigger and value is from SharePoint item as shown below,
Map:
Key -triggerBody()?['columns']?['Email']
Value - item()?['Editor']?['Email']
Output of Get Items action in my case is like below. Hence written expression according to that.
"value": [
{
"#odata.etag": "\"1\"",
"ItemInternalId": "3",
"ID": 3,
"Modified": "2022-11-15T10:49:47Z",
"Editor": {
"#odata.type": "#Microsoft.Azure.Connectors.SharePoint.SPListExpandedUser",
"Claims": "i:0#.f|membership|xyzt.com",
"DisplayName": "Test",
"Email": "v#mail.com",
"JobTitle": ""
}
Tested logic app. It ran successfully and csv file is generated like below,
Csv file:
I have a daily export set up for several subscriptions - the files export like so
with 7 different directories within daily -- i'm simply trying to rename the files to get rid of the underscore for data flows
my parent pipeline looks like so
get metadata gets the folder names and for each invokes the child pipeline like so
here are the screen grabs of the child pipeline
copy data within the foreach1 -- the source
and now the sink - this is where i want to rename the file, the first time i debugged it simply copied them to the correct place with a .txt extension, the next time it got the extension right but it is not renaming the file,
i replaced #replace(item().name, '_', '-') with #replace(activity('FileInfo').output.itemName, '_','-') and got the following error
The expression '#replace(activity('FileInfo').output.itemName, '_','-')' cannot be evaluated because property 'itemName' doesn't exist, available properties are 'childItems, effectiveIntegrationRuntime, executionDuration, durationInQueue, billingReference'.
so then I replaced that with
#replace(activity('FileInfo').output.childItems, '_', '-')
but that gives the following error
Cannot fit childItems return type into the function parameter string
I'm not sure where to go from here
edit 7/14
making the change from the answer below
here is my linked service for the sink dataset with the parameter renamedFile
here is the sink on the copy data1 for the child_Rename pipeline, it grayed out the file extension as this was mentioned
now here is the sink container after running the pipeline
this is the directory structure of the source data - it's dynamically created from scheduled daily azure exports
here is the output of get metadata - FileInfo from the child pipeline
{
"childItems": [
{
"name": "daily",
"type": "Folder"
}
],
"effectiveIntegrationRuntime": "integrationRuntime1 (Central US)",
"executionDuration": 0,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
allsubs - source container
daily - directory created by the scheduled export
sub1 - subN - the different subs with scheduled exports
previous-month -> this-month - monthly folders are created automatically
this_fileXX.csv -- files are automatically generated with the underscore in the name - it is my understanding that data flows cannot handle these characters in the file name
├──allsubs/
└── daily/
├── sub1/
| └── previous-month/
└── this_file.csv
└── this_file1.csv
| └── previous-month/
└── this_file11.csv
└── this_file12.csv
| └── this-month/
├── subN/
| └── previous-month/
| └── previous-month/
| └── this-month/
└── this_fileXX.csv
edit 2 - july 20
I think i'm getting closer but there are still some small errors i do not see
the pipeline now moves all the files from the container allsubs to the container renamed-files but it is not renaming the files - it looks like so
Get Metadata -from the dataset allContainers it retrieves the folders with the Child Items
dataset allContainers shown (preview works, linked service works, no paremeters in this dataset)
next the forEach activity calls the output of get metadata
for the items #activity('Get Metadata1').output.childItems
next shown is the copy data within ForEach
the source is the allContainers dataset with the wildcard file path selected, recursively selected and due to the following error max concurrent connections set at 1 -- but this did not resolve the error
error message:
Failure happened on 'Sink' side.
ErrorCode=AzureStorageOperationFailedConcurrentWrite,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Error occurred when trying to upload a file.
It's possible because you have multiple concurrent copy activities
runs writing to the same file 'renamed-files/rlcosts51122/20220601-20220630/rlcosts51122_082dd29b-95b2-4da5-802a-935d762e89d8.csv'.
Check your ADF configuration.
,Source=Microsoft.DataTransfer.ClientLibrary,
''Type=Microsoft.WindowsAzure.Storage.StorageException,
Message=The remote server returned an error: (400) Bad
Request.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified block list is invalid.
RequestId:b519219f-601e-000d-6c4c-9c9c5e000000
Time:2022-07-
20T15:23:51.4342693Z,,''Type=System.Net.WebException,
Message=The remote server returned an error: (400) Bad
Request.,Source=Microsoft.WindowsAzure.Storage,'
copy data source:
copy data sink - the dataset is dsRenamesink, it's simply another container in a different storage account, linked service is set up correctly, it has the parameter renamedFile but I suspect this is the source of my error. still testing that.
sink dataset dsRenamesink:
parmeter page:
here's the sink in the copy data where the renamed file is passed the iterator from ForEach1 like so:
#replace(item().name,'_','renameworked')
so the underscore would be replaced with 'renameworked' easy enough to test
debugging the pipeline
the errors look to be consistent for the 7 failures which was shown above as the 'failure happened on the sink side'
however - going into the storage account sink i can see that all of the files from the source were copied over to the sink but the files were not renamed like so
pipeline output:
error messages:
{
"dataRead": 28901858,
"dataWritten": 10006989,
"filesRead": 4,
"filesWritten": 0,
"sourcePeakConnections": 1,
"sinkPeakConnections": 1,
"copyDuration": 7,
"throughput": 4032.067,
"errors": [
{
"Code": 24107,
"Message": "Failure happened on 'Sink' side. ErrorCode=AzureStorageOperationFailedConcurrentWrite,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Error occurred when trying to upload a file. It's possible because you have multiple concurrent copy activities runs writing to the same file 'renamed-files/rlcosts51122/20220601-20220630/rlcosts51122_082dd29b-95b2-4da5-802a-935d762e89d8.csv'. Check your ADF configuration.,Source=Microsoft.DataTransfer.ClientLibrary,''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=The specified block list is invalid.\nRequestId:b519219f-601e-000d-6c4c-9c9c5e000000\nTime:2022-07-20T15:23:51.4342693Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,'",
"EventType": 0,
"Category": 5,
"Data": {
"FailureInitiator": "Sink"
},
"MsgId": null,
"ExceptionType": null,
"Source": null,
"StackTrace": null,
"InnerEventInfos": []
}
],
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Central US)",
"usedDataIntegrationUnits": 4,
"billingReference": {
"activityType": "DataMovement",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.06666666666666667,
"unit": "DIUHours"
}
]
},
"usedParallelCopies": 1,
"executionDetails": [
{
"source": {
"type": "AzureBlobFS",
"region": "Central US"
},
"sink": {
"type": "AzureBlobStorage"
},
"status": "Failed",
"start": "Jul 20, 2022, 10:23:44 am",
"duration": 7,
"usedDataIntegrationUnits": 4,
"usedParallelCopies": 1,
"profile": {
"queue": {
"status": "Completed",
"duration": 3
},
"transfer": {
"status": "Completed",
"duration": 2,
"details": {
"listingSource": {
"type": "AzureBlobFS",
"workingDuration": 0
},
"readingFromSource": {
"type": "AzureBlobFS",
"workingDuration": 0
},
"writingToSink": {
"type": "AzureBlobStorage",
"workingDuration": 0
}
}
}
},
"detailedDurations": {
"queuingDuration": 3,
"transferDuration": 2
}
}
],
"dataConsistencyVerification": {
"VerificationResult": "NotVerified"
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
all i wanted to do was remove the underscore from the file name to work with data flows....I'm not sure what else to try next
next attempt july 20
it appears that now I have been able to copy and rename some of the files -
changing the sink dataset as follows
#concat(replace(dataset().renamedFile,'_','-'),'',formatDateTime(utcnow(),'yyyyMMddHHmmss'),'.csv')
and removing this parameter from the sink in the copy activity
upon debugging this pipeline I get 1 file in the sink and it is named correctly but there is still something wrong
third attempt 7/20
further updating to be closer to the original answer
sink dataset
copy data activity in the sink - concat works
now after debugging i'm left with 1 file for each of the subs - so there is something still not quite correct
I reproduce the same thing in my environment.
Go to Sink dataset, click and open.First create parameters and add dynamic content, I used this expression #dataset().sinkfilename
In copy activity sink, under dataset properties pass the filename value using the expression #replace(item().name,'_','-') to replace _ with -.
when you create a dataset parameter to pass the filename, the File extension property is automatically disabled.
when the pipeline runs you can see the file name has been renamed accordingly.
Another Azure Data Factory question.
I'm trying to use a 'Copy Data' activity within a ForEach, setting the destination sink to an item of the foreach.
My setup is as follows:
Lookup activity to read a json file.
The format of the json file:
{
"OutputFolders":[
{
"Source": "aaa/bb1/Output",
"Destination": "Dest002/bin"
},
{
"Source": "aaa/bbb2/Output",
"Destination": "Dest002/bin"
},
{
"Source": "aaa/bb3/Output",
"Destination": "Dest002/bin"
}
]
}
Foreach activity with items set to #activity('Read json config').output.value[0].OutputFolders
Within the foreach activity a 'Copy Data' activity
This Sink has the following Sink dataset:
When I run this pipeline however I get the following error message:
{
"errorCode": "2200",
"message": "Failure happened on 'Sink' side. ErrorCode=SftpPermissionDenied,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Permission denied to access '/#item().Destination'.,Source=Microsoft.DataTransfer.ClientLibrary.SftpConnector,''Type=Renci.SshNet.Common.SftpPermissionDeniedException,Message=Permission denied,Source=Renci.SshNet,'",
"failureType": "UserError",
"target": "Copy output files",
"details": []
}
So Message=Permission denied to access '/#item().Destination' seems to indicate that the destination folder is not resolved. Since this folder does not exist I get a SftpPermissionDenied.
I used the same method to copy files to a file share and there it seemed to work.
Does somebody have an idea how to make this destination resolve correctly?
What you would usually do in this type of situation is to create a Parameter in the Dataset which you would then reference in the File Path you are trying to construct.
This way, you can input your '#item().Destination' to this Parameter in your Copy Activity, as it will appear on the Dataset in the Pipeline.
There is also an example here: https://www.mssqltips.com/sqlservertip/6187/azure-data-factory-foreach-activity-example/
Ok, I tried some more and apparently if I use a concat function it works.
So #concat(item().Destination)
I do get a warning 'item' is not a recognized function, but it does the trick.
Not very straightforward and I wonder why the initial approach doesn't work.
I'm using 'GetMetadata' activity in my pipelines to get all the folders and child items and item types. But this activity is giving output in the JSON format which i'm unable to store the values to a variable so that i can iterate thru them. I need to store the Folders metadata in a sql table
Get Metadata activity sample output is like below.
{
"itemName": "ParentFolder",
"itemType": "Folder",
"childItems": [
{
"name": "ChildFolder1",
"type": "Folder"
},
{
"name": "ChildFolder2",
"type": "Folder"
},
{
"name": "ChildFolder3",
"type": "Folder"
}
],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (North Europe)",
"executionDuration": 187
}
Can some help me to store the above json output of 'Get MetaData' Activity into a sql table like below.
The easiest way to do this is pass the Get MetaData output as a string to a stored proc and parse it in your sql db using OPENJSON.
This is how to convert the output to a string.
#string(activity('Get Metadata').output)
Now you just pass that to a stored proc and then use OPENJSON to parse it.
I have seen many others doing this using ADF foreach, however if you have 1000s of files/folders you will end up paying a lot for this method overtime. (each loop counts as an activity)