get metadata, ForEach and copy activity in Azure Data Factory - azure

I want to copy 2 tables from blob storage to SQL Database.
I created pipeline like this:-
Get MetaData:- For capturing the files (2 csv files) in the input container
ForEach:- For iterating the files in input container
Copy activity:- Inside the ForEach. Copy both of the files in SQL database.
Now, when I started debugging, I got the error 2200 and it says userBlobDoesNotExists.
Here is the error code for the copy activity:-
"copyDuration": 3,
"errors": [
{
"Code": 9013,
"Message": "ErrorCode=UserErrorSourceBlobNotExist,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The required Blob is missing. ContainerName: https://employeestorage1.blob.core.windows.net/employeeinput, path: employeeinput/workdetail.csv.,Source=Microsoft.DataTransfer.ClientLibrary,'",
"EventType": 0,
"Category": 5,
"Data": {},
"MsgId": null,
"ExceptionType": null,
"Source": null,
"StackTrace": null,
"InnerEventInfos": []
}
],
"effectiveIntegrationRuntime": "DefaultIntegrationRuntime (East US)",
"usedDataIntegrationUnits": 4,
"billingReference": {
"activityType": "DataMovement",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.06666666666666667,
"unit": "DIUHours"
}
]
},
"usedParallelCopies": 1,
"executionDetails": [
{
"source": {
"type": "AzureBlobStorage",
"region": "East US"
},
"sink": {
"type": "AzureSqlDatabase",
"region": "East US"
},
"status": "Failed",
"start": "2021-06-24T17:28:09.4507134Z",
"duration": 3,
"usedDataIntegrationUnits": 4,
"usedParallelCopies": 1,
"profile": {
"queue": {
"status": "Completed",
"duration": 2
},
"transfer": {
"status": "Completed",
"duration": 0
}
},
"detailedDurations": {
"queuingDuration": 2,
"transferDuration": 0
}
}
],
"dataConsistencyVerification": {
"VerificationResult": "Unsupported"
},
"durationInQueue": {
"integrationRuntimeQueue": 0
}
}
And here is the code for the pipeline:-
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "inputfolder",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "employeeinputdataset",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "for each table in input folder",
"type": "ForEach",
"dependsOn": [
{
"activity": "inputfolder",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('inputfolder').output.Childitems",
"type": "Expression"
},
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"wildcardFolderPath": "employeeinput",
"wildcardFileName": {
"value": "#item().name",
"type": "Expression"
},
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink",
"tableOption": "autoCreate",
"disableMetricsCollection": false
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "employeeinputdataset",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "employeeoutputsql",
"type": "DatasetReference",
"parameters": {
"OutputTableName": {
"value": "#item().name",
"type": "Expression"
}
}
}
]
}
]
}
}
],
"annotations": []
}
}

Don't choose Wildcard file path in the File path type setting, please choose File path in dataset to have a try.
And you need to create a parameter in your Source dataset. In the File path of Source dataset, type the #dataset().fileName expression. Finally, pass #item().name to the parameter of dataset is ok.
Screenshots:

Related

Azure ForEach activity failing with: The function 'length' expects its parameter to be an array or a string. The provided value is of type 'Integer'

How do I convert my value to an integer?
Here's added context if helpful:
My pipeline should get the column count of a blob CSV and pass that count to a ForEach activity. A switch activity is embedded in ForEach, but the pipeline is failing at ForEach with this error: 'The function 'length' expects its parameter to be an array or a string. The provided value is of type 'Integer'.
Metadata output:
{
"columnCount": 52,
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (Central US)",
"executionDuration": 1,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
ForEach input:
{
"items": "#activity('Get Metadata1').output.columnCount",
"activities": [
{
"name": "Switch1",
"type": "Switch",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"on": "#item()",
"cases": [
{
"value": "44",
"activities": [
{
"name": "Copy data1_copy1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": false,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink",
"writeBehavior": "insert",
"sqlWriterUseTableLock": false
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "ten_eighty_split_CSV",
"type": "DatasetReference",
"parameters": {
"FileName": "#pipeline().parameters.SourceFile"
}
}
],
"outputs": [
{
"referenceName": "ten_eighty_split_10_15_SQL",
"type": "DatasetReference",
"parameters": {}
}
]
}
]
},
{
"value": "52",
"activities": [
{
"name": "Copy data2_copy1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": false,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink",
"writeBehavior": "insert",
"sqlWriterUseTableLock": false
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "ten_eighty_split_CSV",
"type": "DatasetReference",
"parameters": {
"FileName": "#pipeline().parameters.SourceFile"
}
}
],
"outputs": [
{
"referenceName": "ten_eighty_split_15_20_SQL",
"type": "DatasetReference",
"parameters": {}
}
]
}
]
},
{
"value": "60",
"activities": [
{
"name": "Copy data3_copy1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": false,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink",
"writeBehavior": "insert",
"sqlWriterUseTableLock": false
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "ten_eighty_split_CSV",
"type": "DatasetReference",
"parameters": {
"FileName": "#pipeline().parameters.SourceFile"
}
}
],
"outputs": [
{
"referenceName": "ten_eighty_split_25_30_SQL",
"type": "DatasetReference",
"parameters": {}
}
]
}
]
},
{
"value": "68",
"activities": [
{
"name": "Copy data4_copy1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": false,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink",
"writeBehavior": "insert",
"sqlWriterUseTableLock": false
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "ten_eighty_split_CSV",
"type": "DatasetReference",
"parameters": {
"FileName": "#pipeline().parameters.SourceFile"
}
}
],
"outputs": [
{
"referenceName": "ten_eighty_split_30_35_SQL",
"type": "DatasetReference",
"parameters": {}
}
]
}
]
}
]
}
}
]
}
ForEach output:
{}
Not sure how to satisfy this error. Thanks!
Failure type: User configuration issue
Details: The function 'length' expects its parameter to be an array or a string. The provided value is of type 'Integer'.
Since you use an integer value (columnCount) as an input to for-each activity, you are getting this error. If you have array of values and you want to iterate the activity based on each value of array, you can use for-each activity. In this case, you can use the switch case activity directly after get metadata activity. In Switch activity, expression is given within braces {...} .
Expression:
#{activity('Get Metadata1').output.columnCount}
I tried this in my environment and got the same error when I give the expression without braces {..}. When {} are added, it worked. Below are the steps.
Get MetaData activity is taken and column count is taken as an argument.
Output of MetaData activity:
{
"columnCount": 2,
"effectiveIntegrationRuntime": "AutoResolveIntegrationRuntime (West US)",
"executionDuration": 1,
"durationInQueue": {
"integrationRuntimeQueue": 0
},
"billingReference": {
"activityType": "PipelineActivity",
"billableDuration": [
{
"meterType": "AzureIR",
"duration": 0.016666666666666666,
"unit": "Hours"
}
]
}
}
Then Switch activity is taken and expression and case are given.
Expression: #{activity('Get Metadata1').output.columnCount}
When pipeline is debugged, it got executed successfully without error.

How to get modified date as column in table while ingesting all files from year/month/date directories of storage account?

I have some json files in ADLS account. The files are ingested in multiple Year/Month/Day directory structure. I want to copy all the files from ADLS to Azure SQL DB using azure data flow.
I am able to ingest the data from using data flow but I want to include the file path, file ingestion date along with the file name in three separate columns but I do not know how to get these values.
Please note that each Day directory has more than one file as following:
container_name/Dataset/Year/Month/Day/file1.json.file2.json,file3.json
Could any one help me , how do I ingest the modified date column in table with data of each files
tried using getmedata to copy each file on by one also in dataflow derived column for any modified date
I have reproduced the above and able to get the desired file by using combination of addional column option in copy activity, lookup and Get Meta data activity.
In this these are my datasets which I have used at various activities with dataset parameters.
Source_files_wild_path:
temporary_filepaths:
Each_file:
intermediate:
target_folder:
AFAIK, in ADF we can get the last modified date of files either by REST APIs or Get Meta data. But Get Meta data won't work with dynamic file paths with a folder structure like yours.
Also, we can get the file path of a blob file either from triggers or additonal column option of copy activity only. Here, as there is no usage of triggers, I have used the 2nd method.
So, First I have used a copy activity with wild card path to all source files and added $$FILEPATH as column and copied to a temporary file temp1.csv with Merge files as copy behavior.
Then I have used a lookup activity to temp1.csv to get the file as array of objects by which I can get the file paths list.
Here I have created two variables of array type.
As it is lookup output is an array objects, to get only the filename object array, use a for loop and append the #item().filepath to path_list array.
Then use the below expression to get the unique list of all file paths in unique_path_list array.
#union(variables('path_list'),variables('path_list'))
Now, use this array in a ForEach and inside Foreach, use a Get Meta data activity with each_file dataset and #item() as filename and add the filedsList like Item name and Last modified.
Then use copy activity inside Foreach, and use the same dataset. Here add the additional columns like filename, filepath and last modified and give those values.
In sink of this copy activity use another temporary folder and staging(dataset intermediate). give random file name using date function.
After ForEach, use another copy activity with intermediate dataset as source(use wild card path *.csv and give any empty string to dataset parameter) and target_folder folder as sink to get the result file by using merge files.
My pipeline JSON:
{
"name": "last_modifed_pipeline_copy1",
"properties": {
"activities": [
{
"name": "for_paths_columns",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"additionalColumns": [
{
"name": "filepath",
"value": "$$FILEPATH"
}
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"wildcardFolderPath": "*/*/*",
"wildcardFileName": "*.csv",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings",
"copyBehavior": "MergeFiles"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "Source_files_wild_card_path",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "temporary_filepaths",
"type": "DatasetReference"
}
]
},
{
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [
{
"activity": "for_paths_columns",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"dataset": {
"referenceName": "temporary_filepaths",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "append filepaths array",
"type": "ForEach",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('Lookup1').output.value",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Append variable1",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "path_list",
"value": {
"value": "#item().filepath",
"type": "Expression"
}
}
}
]
}
},
{
"name": "get_unique_paths array",
"type": "SetVariable",
"dependsOn": [
{
"activity": "append filepaths array",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "unique_path_list",
"value": {
"value": "#union(variables('path_list'),variables('path_list'))",
"type": "Expression"
}
}
},
{
"name": "adds_last modifed column",
"type": "ForEach",
"dependsOn": [
{
"activity": "get_unique_paths array",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#variables('unique_path_list')",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "Each_file",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "#item()",
"type": "Expression"
}
}
},
"fieldList": [
"itemName",
"lastModified"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Copy data2",
"type": "Copy",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"additionalColumns": [
{
"name": "file_path",
"value": "$$FILEPATH"
},
{
"name": "file_name",
"value": {
"value": "#activity('Get Metadata1').output.itemName",
"type": "Expression"
}
},
{
"name": "last_modifed",
"value": {
"value": "#activity('Get Metadata1').output.lastModified",
"type": "Expression"
}
}
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "Each_file",
"type": "DatasetReference",
"parameters": {
"filename": {
"value": "#item()",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "intermediate",
"type": "DatasetReference",
"parameters": {
"file_name": {
"value": "#concat(utcNow(),'.csv')",
"type": "Expression"
}
}
}
]
}
]
}
},
{
"name": "Copy data3",
"type": "Copy",
"dependsOn": [
{
"activity": "adds_last modifed column",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"wildcardFileName": "*.csv",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings",
"copyBehavior": "MergeFiles"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "intermediate",
"type": "DatasetReference",
"parameters": {
"file_name": "No value"
}
}
],
"outputs": [
{
"referenceName": "target_folder",
"type": "DatasetReference"
}
]
}
],
"variables": {
"path_list": {
"type": "Array"
},
"unique_path_list": {
"type": "Array"
}
},
"annotations": [],
"lastPublishTime": "2023-01-27T12:40:51Z"
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
My pipeline:
Result file:
NOTE:
If you want run this on a regular basis, use Storage event trigger by which you can use trigger parameters like #triggerBody().folderPath and #triggerBody().fileName. you can give these to Get Meta data to get last modified time and then pass it to copy activity or dataflow to add as additonal column as per your requirement.

how to use Lookup activity and if activity together

i have a file on storage account emp.csv that contains. i want that from storage account a file and b file would go to database table.
emp_id,filename
1,a
2,b
3,anubhav
so for this i pass emp.csv file on lookup activity as source dataset then i use foreach activity
Inside foreach activity i used a if condition on expression
#equals(item().filename,'anubhav' )
if this expression is true then wait activity will come and wait for 1 sec. if this expression false then
but this pipeline is failing
The pipeline is failing because inside copy activity dataset properties it should be a string, but you have given it as an array value #activity('Lookup1').output.value. because of that, you getting an error.
Try to replace the array value with the string #item().filename as you can see, I reproduce the same thing in my environment and got this output.
You can use this Json pipeline activity
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"dataset": {
"referenceName": "DelimitedText1",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('Lookup1').output.value",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "If Condition1",
"type": "IfCondition",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"expression": {
"value": "#equals(item().filename, 'anubhav.csv')",
"type": "Expression"
},
"ifFalseActivities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "abcsv",
"type": "DatasetReference",
"parameters": {
"file": {
"value": "#item().filename",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "DelimitedText2",
"type": "DatasetReference",
"parameters": {
"file": {
"value": "#item().filename",
"type": "Expression"
}
}
}
]
}
],
"ifTrueActivities": [
{
"name": "Wait1",
"type": "Wait",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"waitTimeInSeconds": 1
}
}
]
}
}
]
}
}
],
"annotations": []
}
}
Pipeline successfully executed

Copy Files from a folder to multiple folders based on the file name in Azure Data Factory

I have a parent folder in ADLS Gen2 called Source which has number of subfolders and these subfolders contain the actual data files as shown in in the below example...
***Source: ***
Folder Name: 20221212
A_20221212.txt B_20221212.txt C_20221212.txt
Folder Name: 20221219
A_20221219.txt B_20221219.txt C_20221219.txt
Folder Name: 20221226
A_20221226.txt B_20221226.txt C_20221226.txt
How can I copy files from subfolders to name specific folders (should create a new folder if it does not exist) using Azure Data Factory, please see the example below...
***Target: ***
Folder Name: A
A_20221212.txt A_20221219.txt A_20221226.txt
Folder Name: B
B_20221212.txt B_20221219.txt B_20221226.txt
Folder Name: C
C_20221212.txt C_20221219.txt C_20221226.txt
Really appreciate your and help.
I have reproduced the above and got below results.
You can follow the below procedure using Get Meta data activity if you have the folder directories at same level.
This is my source folder structure.
data
20221212
A_20221212.txt
B_20221212.txt
C_20221212.txt`
20221219
A_20221219.txt
B_20221219.txt
C_20221219.txt
20221226
A_20221226.txt
B_20221226.txt
C_20221226.txt
Source dataset:
Give this to Get Meta data activity and use ChildItems.
Then Give the ChildItems array from Get Meta data activity to a ForEach activity. Inside ForEach I have used set variable for storing folder name.
#split(item().name,'_')[0]
Now, use copy activity and in source use wild card path like below.
For sink create dataset parameters and give it copy activity sink like below.
My pipeline JSON:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "sourcetxt",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Set variable1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"wildcardFolderPath": "*",
"wildcardFileName": {
"value": "#item().name",
"type": "Expression"
},
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "sourcetxt",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "targettxts",
"type": "DatasetReference",
"parameters": {
"folder_name": {
"value": "#variables('folder_name')",
"type": "Expression"
},
"file_name": {
"value": "#item().name",
"type": "Expression"
}
}
}
]
},
{
"name": "Set variable1",
"type": "SetVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "folder_name",
"value": {
"value": "#split(item().name,'_')[0]",
"type": "Expression"
}
}
}
]
}
}
],
"variables": {
"folder_name": {
"type": "String"
}
},
"annotations": []
}
}
Result:

Azure Data Factory - Export data to sub container/blob

Hi I have an ADF that copies (exports Azure SQL data) CSV files to a blob.
How can I direct the the files - the destination to a 'sub' container
I have blob Named 'SQLdata' , I want the files to be create in sub-container/blob called customers
SQLdata/Customers
SQLdata/Customers/Cust1.csv
SQLdata/Customers/Cust2.csv
I have tried
"destination": {
"fileName": "Customers//Cust1.csv"
What is wrong with the following?
"activities": [
{
"name": "Export",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [
{
"name": "Source",
"value": "dbo.#{item().source.table}"
},
{
"name": "Destination",
"value": "#{item().destination.fileName}"
}
],
"parameters": {
"cw_items": {
"type": "Array",
"defaultValue": [
{
"source": {
"table": "Cust1"
},
"destination": {
"fileName": "Cust1.csv"
}
},
{
"source": {
"table": "Cust2"
},
"destination": {
"fileName": "Cust2.csv"
}
},
I tried the same export and it works well, all the csv files is stored in containerleon/csv:
JSON code reference:
{
"name": "CopyPipeline_fls",
"properties": {
"activities": [
{
"name": "ForEach_fls",
"type": "ForEach",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#pipeline().parameters.cw_items",
"type": "Expression"
},
"activities": [
{
"name": "Copy_fls",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [
{
"name": "Source",
"value": "dbo.#{item().source.table}"
},
{
"name": "Destination",
"value": "containerleon/csv/#{item().destination.fileName}"
}
],
"typeProperties": {
"source": {
"type": "AzureSqlSource"
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "SourceDataset_fls",
"type": "DatasetReference",
"parameters": {
"cw_table": "#item().source.table"
}
}
],
"outputs": [
{
"referenceName": "DestinationDataset_fls",
"type": "DatasetReference",
"parameters": {
"cw_fileName": "#item().destination.fileName"
}
}
]
}
]
}
}
],
"parameters": {
"cw_items": {
"type": "Array",
"defaultValue": [
{
"source": {
"table": "test"
},
"destination": {
"fileName": "dbotest.csv"
}
},
{
"source": {
"table": "test3"
},
"destination": {
"fileName": "dbotest3.csv"
}
}
]
}
},
"annotations": []
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
Storage preview:
Hope this helps.

Resources