Azure Data Factory Dynamic content parameter - azure

I am trying to load the data from the last runtime to lastmodifieddate from the source tables using Azure Data Factory.
this is working fine :
#concat(' SELECT * FROM dbo. ',
item().TABLE_list ,
' WHERE modifieddate > DATEADD(day, -1, GETDATE())')"
when i use:
#concat(' SELECT * FROM dbo. ',
item().TABLE_list ,
' WHERE modifieddate > #{formatDateTime(
addhours(pipeline().TriggerTime-24)),
''yyyy','-','MM','-','ddTHH',':','mm',':','ssZ''}')
getting error as ""errorCode": "2200",
"message": "Failure happened on 'Source' side. 'Type=System.Data.SqlClient.SqlException,Message=Must declare the scalar variable \"#\".,Source=.Net SqlClient Data Provider,SqlErrorNumber=137,Class=15,ErrorCode=-2146232060,State=2,Errors=[{Class=15,Number=137,State=2,Message=Must declare the scalar variable \"#\".,},],'",
"failureType": "UserError",
"target": "Copy Data1"
}
what mistake am I doing?
I need to pass dynamically last run time date of pipeline after > in where condition.

SELECT *
FROM dbo.#{item().TABLE_LIST}
WHERE modifieddate >
#{formatDateTime(addhours(pipeline().TriggerTime, -24), 'yyyy-MM-ddTHH:mm:ssZ')}
You could use string interpolation expression. Concat makes things complicated.
https://learn.microsoft.com/en-us/azure/data-factory/control-flow-expression-language-functions#expressions

Related

How to convert an azure data factory string type variable into datetime format

** I had a string type variable in azure data factory which is storing datetime format from a lookup activity**
but after that i need to compare that value inside the variable with a datetime. how can i convert it into datetime format
i tried this but i am getting an error i will post the code and error below
varible--string(activity('Lookup1').output.value[1].CREATED_DATE) variable i created which converts datetime into string variable
query-select * from sampletable where modified_date >= formatDateTime(variables('createddate'),"o")```
this is the code i tried for comparing and to convert it into datetime format
ERROR
Failure happened on 'Source' side. ErrorCode=SqlOperationFailed,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A database operation failed with the following error: ''variables' is not a recognized built-in function name.',Source=,''Type=System.Data.SqlClient.SqlException,Message='variables' is not a recognized built-in function name.,Source=.Net SqlClient Data Provider,SqlErrorNumber=195,Class=15,ErrorCode=-2146232060,State=10,Errors=[{Class=15,Number=195,State=10,Message='variables' is not a recognized built-in function name.,},],'
You can try as per the below sample
#{concat('SELECT TOP (10) * FROM [SalesLT].[Customer] WHERE ModifiedDate <=', formatDateTime(variables('createddate'),'yyyy-MM-dd'))}
Equivalent to:
SELECT TOP (10) * FROM [SalesLT].[Customer] WHERE ModifiedDate <=2021-10-27
See official doc: Functions in expressions
But If you try as per default format 'o' in formatDateTime() function
#{concat('SELECT TOP (10) * FROM [SalesLT].[Customer] WHERE ModifiedDate <=', formatDateTime(variables('createddate'),'o'))}
You might see the below error:
Try to refer formatDateTime and generate a query sutable for datetime format in your database.

Kusto/Azure Data Explorer - How can I partition an external table using a timespan field?

Hoping someone can help..
I am new to Kusto and have to get an external table reading data from an Azure Blob storage account working, but the one table I have is unique in that the data for the timestamp column is split into 2 separate columns , i.e. LogDate and LogTime (see script below).
My data is stored in the following structure in the Azure Storage account container (container is named "employeedata", for example):
{employeename}/{year}/{month}/{day}/{hour}/{minute}.csv, in a simple CSV format.
I know the CSV is good because if I import it into a normal Kusto table, it works perfectly.
My KQL script for the external table creation looks as follows:
.create-or-alter external table EmpLogs (Employee: string, LogDate: datetime, LogTime:timestamp)
kind=blob
partition by (EmployeeName:string = Employee, yyyy:datetime = startofday(LogDate), MM:datetime = startofday(LogDate), dd:datetime = startofday(LogDate), HH:datetime = todatetime(LogTime), mm:datetime = todatetime(LogTime))
pathformat = (EmployeeName "/" datetime_pattern("yyyy", yyyy) "/" datetime_pattern("MM", MM) "/" datetime_pattern("dd", dd) "/" substring(HH, 0, 2) "/" substring(mm, 3, 2) ".csv")
dataformat=csv
(
h#'************************'
)
with (folder="EmployeeInfo", includeHeaders="All")
I am getting the error below constantly, which is not very helpful (redacted from full error, basically comes down to the fact there is a syntax error somewhere):
Syntax error: Query could not be parsed: {
"error": {
"code": "BadRequest_SyntaxError",
"message": "Request is invalid and cannot be executed.",
"#type": "Kusto.Data.Exceptions.SyntaxException",
"#message": "Syntax error: Query could not be parsed: . Query: '.create-or-alter external table ........
I know the todatetime() function works on timespan's, I tested it with another table and it created a date similar to the following: 0001-01-01 20:18:00.0000000.
I have tried using the bin() function on the timestamp/LogTime columns, but the same error as above, and even tried importing the time value as a string and doing some string manipulation on it, no luck. Getting the same syntax error.
Any help/guidance would be greatly appreciated.
Thank you!!
Currently, there's no way to define an external table partition based on more than one column. If your dataset timestamp is splitted between two columns: LogDate:datetime and LogTime:timestamp, then the best you can do is use virtual column for the partition by time:
.create-or-alter external table EmpLogs(Employee: string, LogDate:datetime, LogTime:timespan)
kind=blob
partition by (EmployeeName:string = Employee, PartitionDate:datetime)
pathformat = (EmployeeName "/" datetime_pattern("yyyy/MM/dd/HH/mm", PartitionDate))
dataformat=csv
(
//h#'************************'
)
with (folder="EmployeeInfo", includeHeaders="All")
Now, you can filter by the virtual column and fine tune using LogTime:
external_table("EmpLogs")
| where Employee in ("John Doe", ...)
| where PartitionDate between(datetime(2020-01-01 10:00:00) .. datetime(2020-01-01 11:00:00))
| where LogTime ...

I want to use SQL_VARIANT datatype in external table Azure SQL and I get the "Index was out of range error."

I have two SQL Azure databases - DatabaseA and DatabaseB on a server hosted in Azure.
I need to access a view on DatabaseA from DatabaseB - namely I need the sys.identity_columns in DatabaseA to be available to me on DatabaseB. So I am creating an external table on DatabaseB that links to this information like this (I didn't include all the columns but I included the one causing the problem)
CREATE EXTERNAL TABLE [SOURCE_SYS].[identity_columns](
[object_id] int not null
,[name] nvarchar(128) null
,[column_id] int not null
,[system_type_id] tinyint not null
,[seed_value] sql_variant null
)
WITH
(
DATA_SOURCE = MyElasticDBQueryDataSrc,
SCHEMA_NAME = 'sys',
OBJECT_NAME = 'identity_columns'
);
When I run this - it works. But when I try to use the result - select * from [SOURCE_SYS].[identity_columns] - I get this error:
Msg 46823, Level 16, State 1, Line 50
Error retrieving data from MyServer.database.windows.net.DatabaseA. The underlying error message received was: 'Index was out of range. Must be non-negative and less than the size of the collection.
Parameter name: index'.
If I comment out the fields in this table that have the sql_variant datatypes - it works fine but I do need the information in that field and the other two sql_variant fields that exist in the same table. MyElasticDBQueryDataSrc works fine on other similar tables without the sql_variant type.
Can anyone suggest what I might be doing wrong? Or suggest a workaround? I tried using bigints as it is mostly seed values that are either integers or null but that didn't work because it told me it wasn't the same datatype.
Any help much appreciated.
Well - after a weekend of sleep I figured out the answer!
If you use nvarchar(30) in he external table definition - you can then convert it to a bigint in any query you use it in
CREATE EXTERNAL TABLE [SOURCE_SYS].[identity_columns](
[object_id] int not null
,[name] nvarchar(128) null
,[column_id] int not null
,[system_type_id] tinyint not null
,[seed_value] nvarchar(30) null
)
WITH
(
DATA_SOURCE = MyElasticDBQueryDataSrc,
SCHEMA_NAME = 'sys',
OBJECT_NAME = 'identity_columns'
);
Now I can access the value like this:
select cast(isnull([seed_value], 0) as bigint) from SOURCE_SYS.identity_columns
Beware that if you do a select * from - you will need to do the variants separately from the rest of the query - you'll get this error:
Msg 46825, Level 16, State 1, Line 58
The data type of the column 'seed_value' in the external table is different than the column's data type in the underlying standalone or sharded table present on the external source.
Hope this is helpful to someone!

Data Factory V2 Query Azure Table Storage but use a lookup Value

I have a SQL watermark table which contains the last date in my destination table
My source data is coming from an Azure Storage Table and the date time is a string
I set up the date time in the watermark table to match the format in the Azure table storage
I create a lookup and a copy task
If I hard code the date into the Query for source and run this works fine CreatedAt ge '2019-03-06T14:03:11.000Z'
But obviously I dont want to hard code this value. I want to use the date from the lookup
But when I replace the hardcoded date with the lookup value
CreatedAt ge 'activity('LookupWatermarkOld').output'
I get an error
{
"errorCode": "2200",
"message":"ErrorCode=FailedStorageOperation,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=A
storage operation failed with the following error 'The remote server returned an error: (400) Bad Request.'.,Source=,
''Type=Microsoft.WindowsAzure.Storage.StorageException,Message=The remote server returned an error: (400) Bad Request.,
Source=Microsoft.WindowsAzure.Storage,StorageExtendedMessage=Syntax
error at position 42 in 'CreatedAt ge 'activity('LookupWatermarkOld').output''.\nRequestId:8c65ced9-b002-0051-79d9-d41d49000000\nTime:2019-03-07T11:35:39.0640233Z,,''Type=System.Net.WebException,Message=The remote server returned an error: (400) Bad Request.,Source=Microsoft.WindowsAzure.Storage,'",
"failureType": "UserError",
"target": "CopyMentions"
}
Can anyone help me with this? How do you use the Lookup value in a Azure Table query?
check this out:
1) Lookup activity. Query field:
SELECT MAX(WatermarkColumnName) as LastId FROM TableName;
Also, make sure that you checked "First row only" option.
2) In Copy Data activity use query. Query field:
#concat('SELECT * FROM TableName as s WHERE s.WatermarkColumnName > ''', activity('LookupActivity').output.firstRow.LastID, '''')
Finally I got some help on this and it works with
CreatedAt gt '#{activity('LookupWatermarkOld').output.firstRow.WaterMarkValue}'
the WaterarkValue is the column name from the SQL Lookup table
The Lookup creates an array so you have to specify the FirstRow from this array
And wrap in '' so its used as a string value
--For recent ADFv2
Use the watermark/lookup/output value in parameter.
Example: ParamUserCount = #{activity('LookupActivity').output.count}
or for output function
and you can use it in query as
Example: "select * from userDetails where usercount = {$ParamUserCount}"
make sure you enclose the query in " " to set as string and parameter in query should be enclosed in { }

Azure Data factory v2 copy activity source value is null sink value is not allowed null

My question is about how/where to perform the type conversion during copy activity.
I have an Azure data factory pipeline which defines data imports from a TSV file in Data Lake Gen1 to SQL server database table.
The schema of the TSV file is: {QueryDate,Count1,Count2}
count1 and count2 may have no value.
example data in TSV file:
20180717 10
20180717 5 5
20180717 7 1
20180717 7
The Schema of the SQL server table is
{QueryDate(datetime2(7)), UserNumber(int), ActiveNumber(int)}
Both UserNumber and ActiveNumber have Not null constraint.
When I use copy activity in my pipeline to copy TSV data to the table I get an error like this:
error Code": "2200", "message":
"'Type=System.InvalidOperationException, does not allow
DBNull.Value...
when count1 or count2 have no value I want to use 0 to replace the null value.
And I think that will not cause the error anymore.
But I don't know where and how to perform this conversion.
The conversion should in the source dataset, the sink dataset or the copy activity?
And the correct syntax of the conversion I don't figure out either.
I was trying to set the nullValue of the source datatset format setting
with value like null, 0, NULL, but none of them works, I still get the error.
"typeProperties": {
"format": {
"type": "TextFormat",
"columnDelimiter": "\t",
"nullValue": "NULL",
"treatEmptyAsNull": true,
"skipLineCount": 0,
"firstRowAsHeader": false},
...
}
I also see the question: Azure Data Factory - can't convert from "null" to datetime field
but it still not solve my problem.

Resources