Azure ADF sliceIdentifierColumnName is not populating correctly - azure

I've set up a ADF pipeline using a sliceIdentifierColumnName which has worked well as it populated the field with a GUID as expected. Recently however this field stopped being populated, the refresh would work but the sliceIdentifierColumnName field would have a value of null, or occasionally the load would fail as it attempted to populate this field with a value of 1 which causes the slice load to fail.
This change occurred at a point in time, before it worked perfectly, after it repeatedly failed to populate the field correctly. I'm sure no changes were made to the Pipeline which caused this to suddenly fail. Any pointers where I should be looking?
Here an extract of the pipeline source, I'm reading from a table in Amazon Redshift and writing to an Azure SQL table.
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "RelationalSource",
"query": "$$Text.Format('select * from mytable where eventtime >= \\'{0:yyyy-MM-ddTHH:mm:ssZ}\\' and eventtime < \\'{1:yyyy-MM-ddTHH:mm:ssZ}\\' ' , SliceStart, SliceEnd)"
},
"sink": {
"type": "SqlSink",
"sliceIdentifierColumnName": "ColumnForADFuseOnly",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "AmazonRedshiftSomeName"
}
],
"outputs": [
{
"name": "AzureSQLDatasetSomeName"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 10,
"style": "StartOfInterval",
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Hour",
"interval": 2
},
"name": "Activity-somename2Hour"
}
],
Also, here is the error output text
Copy activity encountered a user error at Sink:.database.windows.net side: ErrorCode=UserErrorInvalidDataValue,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Column 'ColumnForADFuseOnly' contains an invalid value '1'.,Source=Microsoft.DataTransfer.Common,''Type=System.ArgumentException,Message=Type of value has a mismatch with column typeCouldn't store <1> in ColumnForADFuseOnly Column.
Expected type is Byte[].,Source=System.Data,''Type=System.ArgumentException,Message=Type of value has a mismatch with column type,Source=System.Data,'.
Here is part of the source dataset, it's a table with all datatypes as Strings.
{
"name": "AmazonRedshiftsomename_2hourly",
"properties": {
"structure": [
{
"name": "eventid",
"type": "String"
},
{
"name": "visitorid",
"type": "String"
},
{
"name": "eventtime",
"type": "Datetime"
}
}
Finally, the target table is identical to the source table, mapping each column name to its counterpart in Azure, with the exception of the additional column in Azure named
[ColumnForADFuseOnly] binary NULL,
It is this column which is now either being populated with NULLs or 1.
thanks,

You need to define [ColumnForADFuseOnly] as binary(32), binary with no length modifier is defaulting to a length of 1 and thus truncating your sliceIdentifier...
When n is not specified in a data definition or variable declaration statement, the default length is 1. When n is not specified with the CAST function, the default length is 30. See here

Related

Unable to fetch the entire column index based on the value using JSONPath finder in npm

I have the below response payload and I just want to check the amount == 1000 if it's matching then I just want to get the entire column as output.
Sample Input:
{
"sqlQuery": "select SET_UNIQUE, amt as AMOUNT from transactionTable where SET_USER_ID=11651 ",
"message": "2 rows selected",
"row": [
{
"column": [
{
"value": "22621264",
"name": "SET_UNIQUE"
},
{
"value": "1000",
"name": "AMOUNT"
}
]
},
{
"column": [
{
"value": "226064213",
"name": "SET_UNIQUE"
},
{
"value": "916",
"name": "AMOUNT"
}
]
}
]
}
Expected Output:
"column": [
{
"value": "22621264",
"name": "SET_UNIQUE"
},
{
"value": "1000",
"name": "AMOUNT"
}
]
The above sample I just want to fetch the entire column if the AMOUNT value will be 1000.
I just tried below to achieve this but no luck.
1. row[*].column[?(#.value==1000)].column
2. row[*].column[?(#.value==1000)]
I don't want to do this by using index. Because It will be change.
Any ideas please?
I think you'd need nested expressions, which isn't something that's widely supported. Something like
$.row[?(#.column[?(#.value==1000)])]
The inner expression returns matches for value==1000, then the outer expression checks for existence of those matches.
Another alternative that might work is
$.row[?(#.column[*].value==1000)]
but this assumes some implicit type conversions that may or may not be supported.

Can I index an array in a composite index in Azure Cosmos DB?

I have a problem indexing an array in Azure Cosmos DB
I am trying to save this indexing policy via the portal
{
"indexingMode": "consistent",
"automatic": true,
"includedPaths": [
{
"path": "/*"
}
],
"excludedPaths": [
{
"path": "/\"_etag\"/?"
}
],
"compositeIndexes": [
[
{
"path": "/DeviceId",
"order": "ascending"
},
{
"path": "/TimeStamp",
"order": "ascending"
},
{
"path": "/Items/[]/Name/?",
"order": "ascending"
},
{
"path": "/Items/[]/DoubleValue/?",
"order": "ascending"
}
]
]
}
I get the error "Failed to update container DeviceEvents:
Message: {"code":"BadRequest","message":"Message: {"Errors":["The indexing path '\/Items\/[]\/Name\/?' could not be accepted, failed near position '8'."
This seems to be the array [] syntax that is giving an error.
On a side note I am not sure what I am doing makes sense at all but I have a query that looks like this
SELECT SUM(de0["DoubleValue"])
FROM root JOIN de0 IN root["Items"]
WHERE root["ApplicationId"] = 57 AND root["DeviceId"] = 126 AND root["TimeStamp"] >= "2021-02-21T17:55:29.7389397Z" AND de0["Name"] = "Use Case"
Where ApplicationId is the partition key and the item saved looks like this
{
"id": "59ab9323-26ca-436f-8d29-e1ddd826f025",
"DeviceId": 3,
"ApplicationId": 3,
"RawData": "640F7A000A00E30142000000",
"TimeStamp": "2021-02-20T18:36:52.833174Z",
"Items": [
{
"Name": "Battery Status",
"StringValue": "Full",
"DoubleValue": null
},
{
"Name": "Use Case",
"StringValue": null,
"DoubleValue": 12
},
{
"Name": "Battery Voltage",
"StringValue": null,
"DoubleValue": 3.962
},
{
"Name": "Rain Gauge Count",
"StringValue": null,
"DoubleValue": 10
}
],
"_rid": "CgdVAO7B0DNkAAAAAAAAAA==",
"_self": "dbs/CgdVAA==/colls/CgdVAO7B0DM=/docs/CgdVAO7B0DNkAAAAAAAAAA==/",
"_etag": "\"61008771-0000-0d00-0000-603156c50000\"",
"_attachments": "attachments/",
"_ts": 1613846213
}
I need to aggregate on some of these items in the array like say get MAX on temperature or something like this (using Use Case for test although it doesn't make sense). I reasoned that if all the data in the query is in a single composite index the database would be able to do the aggregation without reading the documents themselves. However I can't seem to add a composite index containing an array at all.
Yes, composite index can't contain an array path. It should be a scalar value.
Unlike with included or excluded paths, you can't create a path with
the /* wildcard. Every composite path has an implicit /? at the end of
the path that you don't need to specify. Composite paths lead to a
scalar value and this is the only value that is included in the
composite index.
Reference:https://learn.microsoft.com/en-us/azure/cosmos-db/index-policy#composite-indexes

How to insert data to bigquery table with custom fields with NodeJS?

I'm using npm BigQuery module for inserting data into bigquery. I have a custom field say params which is of type RECORD and accept any int,float or string value as a key value pair. How can I insert to such fields?
Looked into this, but could not find anything useful
[https://cloud.google.com/nodejs/docs/reference/bigquery/1.3.x/Table#insert]
If I understand correctly, you are asking for a map with ANY TYPE value, which is not support in BigQuery.
You may have a map with value type info with a record like below schema.
Your insert code needs to pick correct type_value to set.
{
"name": "map_field",
"type": "RECORD",
"mode": "REPEATED",
"fields": [
{
"name": "key",
"type": "STRING",
},
{
"name": "int_value",
"type": "INTEGER"
},
{
"name": "string_value",
"type": "STRING"
},
{
"name": "float_value",
"type": "FLOAT"
}
]
}

Azure Data Factory v2 using utcnow() as a pipeline parameter

For context, I currently have a Data Factory v2 pipeline with a ForEach Activity that calls a Copy Activity. The Copy Activity simply copies data from an FTP server to a blob storage container.
Here is the pipeline json file :
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "ForEach1",
"type": "ForEach",
"typeProperties": {
"items": {
"value": "#pipeline().parameters.InputParams",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Copy1",
"type": "Copy",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false
},
"typeProperties": {
"source": {
"type": "FileSystemSource",
"recursive": true
},
"sink": {
"type": "BlobSink"
},
"enableStaging": false,
"cloudDataMovementUnits": 0
},
"inputs": [
{
"referenceName": "FtpDataset",
"type": "DatasetReference",
"parameters": {
"FtpFileName": "#item().FtpFileName",
"FtpFolderPath": "#item().FtpFolderPath"
}
}
],
"outputs": [
{
"referenceName": "BlobDataset",
"type": "DatasetReference",
"parameters": {
"BlobFileName": "#item().BlobFileName",
"BlobFolderPath": "#item().BlobFolderPath"
}
}
]
}
]
}
}
],
"parameters": {
"InputParams": {
"type": "Array",
"defaultValue": [
{
"FtpFolderPath": "/Folder1/",
"FtpFileName": "#concat('File_',formatDateTime(utcnow(), 'yyyyMMdd'), '.txt')",
"BlobFolderPath": "blobfolderpath",
"BlobFileName": "blobfile1"
},
{
"FtpFolderPath": "/Folder2/",
"FtpFileName": "#concat('File_',formatDateTime(utcnow(), 'yyyyMMdd'), '.txt')",
"BlobFolderPath": "blobfolderpath",
"BlobFileName": "blobfile2"
}
]
}
}
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
The issue I am having is that when specifying pipeline parameters, it seems I cannot use system variables and functions the same way I can when for example specifying folder paths for a blob storage dataset.
The consequence of this is that formatDateTime(utcnow(), 'yyyyMMdd') is not being interpreted as function calls but rather the actual string with value formatDateTime(utcnow(), 'yyyyMMdd').
To counter this I am guessing I should be using a trigger to execute my pipeline and pass the trigger's execution time as a parameter to the pipeline like trigger().startTime but is this the only way? Am I simply doing something wrong in my pipeline's JSON?
This should work:
File_#{formatDateTime(utcnow(), 'yyyyMMdd')}
Or complex paths as well:
rootfolder/subfolder/#{formatDateTime(utcnow(),'yyyy')}/#{formatDateTime(utcnow(),'MM')}/#{formatDateTime(utcnow(),'dd')}/#{formatDateTime(utcnow(),'HH')}
You can't put a dynamic expression in the default value. You should define this expression and function either when you creating a trigger, or when you define dataset parameters in sink/source in copy activity.
So either you create dataset property FtpFileName with some default value in source dataset, and then in copy activity, you can in source category to specify that dynamic expression.
Another way is to define pipeline parameter, and then to add dynamic expression to that pipeline parameter when you are defining a trigger. Hope this is a clear answer to you. :)
Default value of parameters cannot be expressions. They must be literal strings.
You could use trigger to achieve this. Or you could extract the common part of your expressions and just put literal values into the foreach items.

Call stored procedure using ADF

I am loading SQL server table using ADF and after insertion is over, I have to do little manipulation using below approach
Trigger (After insert) - Failed, SQL server not able to detect inserted record that I push using ADF.. **Seems to be a bug**.
Stored procedure using user defined table type - Getting error
Error Number '156'. Error message from database execution : Incorrect
syntax near the keyword 'select'. Must declare the table variable
"#a".
I have created below pipeline
{
"name": "CopyPipeline-xxx",
"properties": {
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "AzureDataLakeStoreSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"sqlWriterStoredProcedureName": "sp_xxx",
"storedProcedureParameters": {
"stringProductData": {
"value": "str1"
}
},
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "col1:col1,col2:col2"
}
},
"inputs": [
{
"name": "InputDataset-3jg"
}
],
"outputs": [
{
"name": "OutputDataset-3jg"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Hour",
"interval": 8
},
"name": "Activity-0-xxx_csv->[dbo]_[xxx_staging]"
}
],
"start": "2017-01-09T21:48:53.348Z",
"end": "2099-12-30T18:30:00Z",
"isPaused": false,
"hubName": "hub",
"pipelineMode": "Scheduled"
}
}
and using below stored procedure
create procedure [dbo].[sp_xxx] #xxx1 [dbo].[ut_xxx] READONLY, #str1 varchar(100) AS
MERGE xxx_dummy AS a
USING #xxx1 AS b
ON (a.col1 = b.col1)
WHEN NOT MATCHED
THEN INSERT(col1, col2)
VALUES(b.col1, b.col2)
WHEN MATCHED
THEN UPDATE SET a.col2 = b.col2;
Please help me to resolve the issue.
I can reproduce your first error. Inserting to a SQL Server table with Azure Data Factory (ADF) appears to use a bulk insert method (similar to BULK INSERT, bcp, SSIS etc) and by default these methods do not fire triggers:
insert bulk [dbo].[testADF] ([col1] Int, [col2] Int, [col3] Int, [col4] Int)
with (TABLOCK, CHECK_CONSTRAINTS)
With bcp, BULK INSERT there is a flag to change to say 'fire triggers' but it appears there is no way to change this setting for ADF. As a workaround, move the logic from your trigger into the stored proc.
If you believe this flag is important, consider creating a feedback item.

Resources