I am getting following error during pipeline run.
Operation on target ac_ApplyMapping failed: Column name or path 'StudentId'
duplicated in 'source' under 'mappings'. Please check it in 'mappings'.
In the Copy Activity we are applying following mappings.
{
"type": "TabularTranslator",
"mappings": [{
"source": {
"name": "StudentId",
"type": "string"
},
"sink": {
"name": "StudentId_Primary",
"type": "string"
}
}, {
"source": {
"name": "StudentId",
"type": "string"
},
"sink": {
"name": "StudentId_Secondary",
"type": "string"
}
}
]
}
Is there any way to handle this scenario?
You can use Derived column transformation to change the column name in the source, and then mapping to your sink.
Related
I need to Maintain Folder Structure to store files in yyyy/MM/DD format and I am getting date like this "2021-12-01T00:00:00Z" I need to Extract year from the date and to store in one variable and need to extract Month from date and set to another variable and for Date as well so that I will Concat these variable result under Copy activity Sink section
Yes , you can Maintain Folder Structure by using Split .
First create parameter name and add value.
Create 3 set variable with year , month and date.
Inside Year,Month,Date. Add this dynamic content value :
Year:#split(pipeline().parameters.fileName,'-')[0]
Month: #split(pipeline().parameters.fileName,'-')[1]
Data:
#split(split(pipeline().parameters.fileName,'-')[2],'T')[0]
For more information refer this JSON Code representation.
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "year",
"type": "SetVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "year",
"value": {
"value": "#split(pipeline().parameters.fileName,'-')[0]",
"type": "Expression"
}
}
},
{
"name": "month",
"type": "SetVariable",
"dependsOn": [
{
"activity": "year",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "month",
"value": {
"value": "#split(pipeline().parameters.fileName,'-')[1]",
"type": "Expression"
}
}
},
{
"name": "date",
"type": "SetVariable",
"dependsOn": [
{
"activity": "month",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "date",
"value": {
"value": "#split(split(pipeline().parameters.fileName,'-')[2],'T')[0]",
"type": "Expression"
}
}
}
],
"parameters": {
"fileName": {
"type": "string",
"defaultValue": "2021-12-01T00:00:00Z.csv"
}
},
"variables": {
"year": {
"type": "String"
},
"month": {
"type": "String"
},
"date": {
"type": "String"
}
},
"annotations": [],
"lastPublishTime": "2023-02-15T05:42:23Z"
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
Use a metadata activity to get the file list, and then a foreach activity to copy each file to the appropriate folder:
Metdata activity setting:
On foreach activity, in items loop through all the child items -
#activity('Get files Metadata').output.childItems
Inside the foreach look, add a copy activity to copy each file:
Source:
Source dataset (with a parameter on filename, to copy one file only:
Sync settings:
Expression to pass to folder parameter:
#concat(
formatDateTime(item().name ,'yyyy'),
'/',
formatDateTime(item().name ,'MM'),
'/',
formatDateTime(item().name ,'dd')
)
Sync dataset, with a parameter on folder name to create the hierarchy:
The result:
I'm trying to get data from Azure Table Storage using Azure Data Factory. I have a table called orders which has 30 columns. I want to take only 3 columns from this table (PartitionKey, RowKey and DeliveryDate). The DeliveryDate column has different data types like DateTime.Null (String value) and actual datetime values. When I want to preview the data i get the following error:
The DataSource looks like this:
{
"name": "Orders",
"properties": {
"linkedServiceName": {
"referenceName": "AzureTableStorage",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "AzureTable",
"structure": [
{
"name": "PartitionKey",
"type": "String"
},
{
"name": "RowKey",
"type": "String"
},
{
"name": "DeliveryDate",
"type": "String"
}
],
"typeProperties": {
"tableName": "Orders"
}
},
"type": "Microsoft.DataFactory/factories/datasets"}
I test your problem,it works.Can you show me more detail about this or there is something wrong about my test?
Below is my test data:
The dataset code:
{
"name": "Order",
"properties": {
"linkedServiceName": {
"referenceName": "AzureTableStorage",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "AzureTable",
"structure": [
{
"name": "PartitionKey",
"type": "String"
},
{
"name": "RowKey",
"type": "String"
},
{
"name": "DeliveryDate",
"type": "String"
}
],
"typeProperties": {
"tableName": "Table7"
}
},
"type": "Microsoft.DataFactory/factories/datasets"
}
I am trying to load the data from the on-premise sql server to Sql Server at VM. I need to do it every day. For the same, I have created a trigger. Trigger is inserting the data properly. But now, I need to insert triggerID in the destination columns for every run in a column.
I don't know what mistake i am doing. I found many blogs on the same but all have information when we are extracting the data from a blob not from sql server.
I was trying to insert the value of the same like this but it's giving error.
"Activity Copy Data1 failed: Please choose only one of the three property "name", "path" and "ordinal" to reference columns for "source" and "sink" under "mappings" property. "
pipeline details. Please suggest
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource"
},
"sink": {
"type": "SqlServerSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "Name",
"type": "String"
},
"sink": {
"name": "Name",
"type": "String"
}
},
{
"source": {
"type": "String",
"name": "#pipeline().parameters.triggerIDVal"
},
"sink": {
"name": "TriggerID",
"type": "String"
}
}
]
}
},
"inputs": [
{
"referenceName": "AzureSqlTable1",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "SqlServerSQLDEV02",
"type": "DatasetReference"
}
]
}
],
"parameters": {
"triggerIDVal": {
"type": "string"
}
},
"annotations": []
}
}
I want that each time trigger is executed then the triggerID should be populating into the destination column TriggerID.
Firstly,please see the limitation in the copy activity column mapping:
Source data store query result does not have a column name that is
specified in the input dataset "structure" section.
Sink data store (if with pre-defined schema) does not have a column
name that is specified in the output dataset "structure" section.
Either fewer columns or more columns in the "structure" of sink
dataset than specified in the mapping.
Duplicate mapping.
So,i don't think you could do the data transfer plus trigger id which is not contained by the source columns.My idea is:
1.First use a Set Variable activity to get the trigger id value.
2.Then connect with copy activity and pass the value as parameter.
3.In the sink of copy activity, you could invoke stored procedure to combine the trigger id with other columns before the row is inserted into table. More details, please see this document.
I am looking at the link below.
https://azure.microsoft.com/en-us/updates/data-factory-supports-wildcard-file-filter-for-copy-activity/
We are supposed to have the ability to use wildcard characters in folder paths and file names. If we click on the 'Activity' and click 'Source', we see this view.
I would like to loop through months any days, so it should be something like this view.
Of course that doesn't actually work. I'm getting errors that read: ErrorCode: 'PathNotFound'. Message: 'The specified path does not exist.'. How can I get the tool to recursively iterate through all files in all folders, given a specific pattern of strings in a file path and file name? Thanks.
I would like to loop through months any days
In order to do this you can pass two parameters to the activity from your pipeline so that the path can be build dynamically based on those parameters. ADF V2 allows you to pass parameters.
Let's start the process one by one:
1. Create a pipeline and pass two parameters in it for your month and day.
Note: This parameters can be passed from the output of other activities as well if needed. Reference: Parameters in ADF
2. Create two datasets.
2.1 Sink Dataset - Blob Storage here. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.
2.2 Source Dataset - Blob Storage here again or depends as per your need. Link it with your Linked Service and provide the container name (make sure it is existing). Again if needed, it can be passed as parameters.
Note:
1. The folder path decides the path to copy the data. If the container does not exists, the activity will create for you and if the file already exists the file will get overwritten by default.
2. Pass the parameters in the dataset if you want to build the output path dynamically. Here i have created two parameters for dataset named monthcopy and datacopy.
3. Create Copy Activity in the pipeline.
Wildcard Folder Path:
#{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}
where:
The path will become as: current-yyyy/month-passed/day-passed/* (the * will take any folder on one level)
Test Result:
JSON Template for the pipeline:
{
"name": "pipeline2",
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"wildcardFolderPath": {
"value": "#{concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),'/',string(pipeline().parameters.month),'/',string(pipeline().parameters.day),'/*')}",
"type": "Expression"
},
"wildcardFileName": "*.csv",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobStorageWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".csv"
}
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "DelimitedText1",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "DelimitedText2",
"type": "DatasetReference",
"parameters": {
"monthcopy": {
"value": "#pipeline().parameters.month",
"type": "Expression"
},
"datacopy": {
"value": "#pipeline().parameters.day",
"type": "Expression"
}
}
}
]
}
],
"parameters": {
"month": {
"type": "string"
},
"day": {
"type": "string"
}
},
"annotations": []
}
}
JSON Template for the SINK dataset:
{
"name": "DelimitedText1",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage1",
"type": "LinkedServiceReference"
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"container": "corpdata"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"quoteChar": "\""
},
"schema": []
}
}
JSON Template for the Source Dataset:
{
"name": "DelimitedText2",
"properties": {
"linkedServiceName": {
"referenceName": "AzureBlobStorage1",
"type": "LinkedServiceReference"
},
"parameters": {
"monthcopy": {
"type": "string"
},
"datacopy": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureBlobStorageLocation",
"folderPath": {
"value": "#concat(formatDateTime(adddays(utcnow(),-1),'yyyy'),dataset().monthcopy,'/',dataset().datacopy)",
"type": "Expression"
},
"container": "copycorpdata"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"quoteChar": "\""
},
"schema": []
}
}
How do I get the subject property from the payload below ?
I've got an http-triggered logic app:
I want to be able to grab the contents of the subject property.
The schema as shown above in the image looks like this:
{
"type": "array",
"items": {
"type": "object",
"properties": {
"topic": {
"type": "string"
},
"subject": {
"type": "string"
},
"eventType": {
"type": "string"
},
"eventTime": {
"type": "string"
},
"id": {
"type": "string"
},
"data": {
"type": "object",
"properties": {
"api": {
"type": "string"
},
"clientRequestId": {
"type": "string"
},
"requestId": {
"type": "string"
},
"eTag": {
"type": "string"
},
"contentType": {
"type": "string"
},
"contentLength": {
"type": "integer"
},
"blobType": {
"type": "string"
},
"url": {
"type": "string"
},
"sequencer": {
"type": "string"
},
"storageDiagnostics": {
"type": "object",
"properties": {
"batchId": {
"type": "string"
}
}
}
}
},
"dataVersion": {
"type": "string"
},
"metadataVersion": {
"type": "string"
}
},
"required": [
"topic",
"subject",
"eventType",
"eventTime",
"id",
"data",
"dataVersion",
"metadataVersion"
]
}
}
How do I get the subject property from this payload?
Go to your logic app designer in the azure portal and you can specifically assign the json to variables in your flow process
Here is the link on how to do this
With the Request trigger, if you want to get the property, you need pass the Request Body into json cause the triggerBody() value is in a String type, it doesn't support select the property. Set the parse json action like the below pic.
Then your json set the data in array type, that's another problem you will encounter. So when you select property you need add the index like the below with Expression: body('Parse_JSON')[0]['subject'].
I test with short json two properties subject and topic.