Azure DataFactory DelimitedText dataset with parametrized schema

Azure DataFactory DelimitedText dataset with parametrized schema - azure

I'm trying to create a generic CSV dataset with parametrized filename and schema to be able to use it in foreach loops with file lists and I'm having some trouble on publishing, and I don't know if I'm doing something wrong or if the framework docs are not correct.
According to documentation the schema description is:
Columns that define the physical type schema of the dataset. Type: array (or Expression with resultType array), itemType: DatasetSchemaDataElement.
I have a dataset with a parameter named Schema of type Array and the "schema" set to an expression that returns this parameter:
{
"name": "GenericCSVFile",
"properties": {
"linkedServiceName": {
"referenceName": "LinkedServiceReferenceName",
"type": "LinkedServiceReference"
},
"parameters": {
"Schema": {
"type": "array"
},
"TableName": {
"type": "string"
},
"TableSchema": {
"type": "string"
}
},
"folder": {
"name": "Folder"
},
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "AzureDataLakeStoreLocation",
"fileName": {
"value": "#concat(dataset().TableSchema,'.',dataset().TableName,'.csv')",
"type": "Expression"
},
"folderPath": "Path"
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": {
"value": "#dataset().Schema",
"type": "Expression"
}
},
"type": "Microsoft.DataFactory/factories/datasets"
}
However, when I publish, i get the following error:
Error code: BadRequest
Inner error code: InvalidPropertyValue
Message: Invalid value for property 'schema'
Am I doing something wrong? are the docs wrong?

Yes, this is the expected behavior. If you need to set dynamic value for column mapping, please ignore schema in DelimitedText dataset, which is more for a visually display of physical schema information and would not take effect when do copy activity column mapping. The expression setting for it is also not allowed. You could configure mapping as an expression to achieve this goal and pass it a proper value when trigger run.

Related

How to pass array of string in swagger form data field

I want to pass an array inside form data like below:
but I am getting the whole array as a string in NodeJS console like this:
{
targetUniversity: "['613e3ecfefa725074cb17968', '613e3ecfefa725074cb17969']",
targetBusinessType: "['freelancer','sw dev']",
}
The swagger file looks something like this,
"/announce": {
"post": {
"tags": ["Announcement"],
"description": "Make an announcement",
"parameters": [
{
"name": "targetUniversity",
"in": "formData",
"type": "array",
"description": "University ID in array []- from DD"
},
{
"name": "targetBusinessType",
"in": "formData",
"type": "array",
"description": "Business type (string - name) in array []"
}
],
"produces": ["application/json"],
"responses": {
"201": {
"description": "announced successfully"
}
}
}
}
I just want the array itself, not the array in string format.

Array parameters need the items keyword to define the type of array items. Also, the operation must specify what MIME type(s) it consumes:
"consumes": [
"application/x-www-form-urlencoded"
],
"parameters": [
{
"name": "targetUniversity",
"in": "formData",
"type": "array",
"items": { // <------
"type": "string"
},
"description": "University ID in array []- from DD"
},
{
"name": "targetBusinessType",
"in": "formData",
"type": "array",
"items": { // <------
"type": "string"
},
"description": "Business type (string - name) in array []"
}
],
In Swagger UI, enter the array items one per line and without quotes:

You can specify arrays in OAS3 as follows;
"parameters": [
{
"name": "targetUniversity",
"in": "formData", -->> according to comments this should be replaced by requestBody in OAS3
"schema": {
"type": "array",
"items": {
"type": "string"
}
...
The result will be like;

ADF: Column name or path duplicated in 'source' under 'mappings'

I am getting following error during pipeline run.
Operation on target ac_ApplyMapping failed: Column name or path 'StudentId'
duplicated in 'source' under 'mappings'. Please check it in 'mappings'.
In the Copy Activity we are applying following mappings.
{
"type": "TabularTranslator",
"mappings": [{
"source": {
"name": "StudentId",
"type": "string"
},
"sink": {
"name": "StudentId_Primary",
"type": "string"
}
}, {
"source": {
"name": "StudentId",
"type": "string"
},
"sink": {
"name": "StudentId_Secondary",
"type": "string"
}
}
]
}
Is there any way to handle this scenario?

You can use Derived column transformation to change the column name in the source, and then mapping to your sink.

Azure Data factory parameter

I am trying to load the data from the on-premise sql server to Sql Server at VM. I need to do it every day. For the same, I have created a trigger. Trigger is inserting the data properly. But now, I need to insert triggerID in the destination columns for every run in a column.
I don't know what mistake i am doing. I found many blogs on the same but all have information when we are extracting the data from a blob not from sql server.
I was trying to insert the value of the same like this but it's giving error.
"Activity Copy Data1 failed: Please choose only one of the three property "name", "path" and "ordinal" to reference columns for "source" and "sink" under "mappings" property. "
pipeline details. Please suggest
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Copy Data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource"
},
"sink": {
"type": "SqlServerSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "Name",
"type": "String"
},
"sink": {
"name": "Name",
"type": "String"
}
},
{
"source": {
"type": "String",
"name": "#pipeline().parameters.triggerIDVal"
},
"sink": {
"name": "TriggerID",
"type": "String"
}
}
]
}
},
"inputs": [
{
"referenceName": "AzureSqlTable1",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "SqlServerSQLDEV02",
"type": "DatasetReference"
}
]
}
],
"parameters": {
"triggerIDVal": {
"type": "string"
}
},
"annotations": []
}
}
I want that each time trigger is executed then the triggerID should be populating into the destination column TriggerID.

Firstly,please see the limitation in the copy activity column mapping:
Source data store query result does not have a column name that is
specified in the input dataset "structure" section.
Sink data store (if with pre-defined schema) does not have a column
name that is specified in the output dataset "structure" section.
Either fewer columns or more columns in the "structure" of sink
dataset than specified in the mapping.
Duplicate mapping.
So,i don't think you could do the data transfer plus trigger id which is not contained by the source columns.My idea is:
1.First use a Set Variable activity to get the trigger id value.
2.Then connect with copy activity and pass the value as parameter.
3.In the sink of copy activity, you could invoke stored procedure to combine the trigger id with other columns before the row is inserted into table. More details, please see this document.

Loopback sends wrong datatype

In my mongodb i have several arrays, but when i load those arrays from the database they strangley are objects instead of arrays.
This strange behaviour is since a couple of days, before everything worked fine and i got arrays out of the database.
Has loopback some strange flags which are set automatically, that transform my arrays to objects or something like that?
Currently I have the newest versions in all my packages and have already tried to use older versions, but nothing changes this behaviour.
At first there was also a problem with the saving of the arrays, sometimes they were saved as objects, but since I removed all null objects from the database, only arrays were saved.
The problem occurs with the sections array
my model json is:
{
"name": "Track",
"plural": "Tracks",
"base": "PersistedModel",
"idInjection": true,
"options": {
"validateUpsert": true
},
"properties": {
"alreadySynced": {
"type": "boolean"
},
"approved": {
"type": "boolean"
},
"deletedByClient": {
"type": "boolean",
"default": false
},
"sections": {
"type": "object",
"required": true
},
"type": {
"type": "string"
},
"email": {
"type": "string",
"default": ""
},
"name": {
"type": "string",
"default": "Neuer Track"
},
"reason": {
"type": "string",
"default": ""
},
"date": {
"type": "date"
},
"duration": {
"type": "number",
"default": 0
},
"correctnessscore": {
"type": "number",
"default": 0
},
"evaluation": {
"type": "object"
}
},
"validations": [],
"relations": {},
"acls": [],
"methods": {}
}
I have also already tried to change the type object to array but without success

Well, I am not seeing any array type in your model and I am not sure what is exactly your problem.
Has loopback some strange flags which are set automatically, that
transform my arrays to objects or something like that?
No loopback has no flags and doesn't transform any data type unless you set it !
So if you define a property as object and you are passing an array without validating the data type this will change the type of your data and save it as object instead of array.
Lets define an array in your Track model :
"property": {
"type": "array"
}
Do you need array of objects?
"property": {
"type": ["object"]
}
Strings ?
"property": {
"type": ["string"]
}
Numbers ?
"property": {
"type": ["number"]
}
Read more about loopback types here.

How to serialize complex types using Microsoft Avro library

I am trying to serialize generic records (expressed as JSON strings) as avro objects using the Microsoft.Hadoop.Avro library.
I've been following the tutorial for Generic Records HERE. However, the records I am trying to serialize as more complex than the sample code provided by Microsoft (Location), with nested properties inside the JSON.
Here is a sample of a record I want to serialize in Avro:
{
"deviceId": "UnitTestDevice01",
"serializationFormat": "avro",
"messageType": "state",
"messageVersion": "avrov2.0",
"arrayProp": [
{
"itemProp1": "arrayValue1",
"itemProp2": "arrayValue2"
},
{
"itemProp1": "arrayValue3",
"itemProp2": "arrayValue4"
}
]
}
For info, here is the Avro schema I can extract:
{
"type": "record",
"namespace": "xxx.avro",
"name": "MachineModel",
"fields": [{
"name": "deviceId",
"type": ["string", "null"]
}, {
"name": "serializationFormat",
"type": ["string", "null"]
}, {
"name": "messageType",
"type": ["string", "null"]
}, {
"name": "messageVersion",
"type": ["string", "null"]
}, {
"name": "array",
"type": {
"type": "array",
"items": {
"type": "record",
"name": "array_record",
"fields": [{
"name": "arrayProp1",
"type": ["string", "null"]
}, {
"name": "arrayProp2",
"type": ["string", "null"]
}]
}
}
}]
}
I have managed to extract the correct schema for this object, but I can't get the code right to take the schema and create a correct Avro record.
Can someone provide some pointers on how I can use the AvroSerializer or AvroContainer classes to produce a valid avro object using this json payload and this avro schema? The sample from Microsoft are quite simple to work with complex objects and I have not been able to find relevant samples online either.

Develop Reference

node.js excel linux python-3.x azure haskell apache-spark rust .htaccess string

Azure DataFactory DelimitedText dataset with parametrized schema - azure

Related

How to pass array of string in swagger form data field

ADF: Column name or path duplicated in 'source' under 'mappings'

Azure Data factory parameter

Loopback sends wrong datatype

How to serialize complex types using Microsoft Avro library

Categories

Resources