Time function in Azure Data Factory - Expression Builder - azure

I only need to take the time part from the 'Timestamp type source attribute' and load it into a dedicated SQL pool table (Time datatype column). But I don't find a time function within the expression builder in ADF, is there a way I can do it?
-What did I do?
-I took the time part from the source attribute using substring and then tried to load the same into the destination table, when I do the destination table inserted null values as the column at the destination table is set to time datatype.

I tried to reproduce this and got the same issue. The following is a demonstration of the same. I have a table called mydemo as shown below.
CREATE TABLE [dbo].[mydemo]
(
id int NOT NULL,
my_date date,
my_time time
)
WITH
(
DISTRIBUTION = HASH (id),
CLUSTERED COLUMNSTORE INDEX
)
GO
The following is my source data in my dataflow.
time is not a recognized datatype in azure dataflow (date and timestamp are accepted). Therefore, dataflow fails to convert string (substring(<timestamp_col>,12,5)) into time type.
For better understanding, you can load your sink table as source in dataflow. The time column will be read as 1900-01-01 12:34:56 when time value in the table row is 12:34:56.
#my table row
insert into mydemo values(200,'2022-08-18','12:34:56')
So, instead of using substring(<timestamp_col>,12,5) to return 00:01, use concat('1900-01-01 ',substring(<timestamp_col>,12,8)) which returns 1900-01-01 00:01:00.
Configure the sink, mapping and look at the resulting data in data preview. Now, azure dataflow will be able to successfully insert the values and give desired results.
The following is the output after successful insertion of record into dedicated pool table.
NOTE: You can construct valid yyyy-MM-dd hh:mm:ss as a value using concat('yyyy-MM-dd ',substring(<timestamp_col>,12,8)) in place of 1900-01-01 hh:mm:ss in derived column transformation.

Related

Incremental load in Azure Data Factory

I am replicating my data from Azure SQl DB TO Azure SQL DB. I have some tables with date columns and some tables with just the ID columns which are assigning primary key. While performing incremental load in ADF, I can select date as watermark column for the tables which have date column and id as watermark column for the tables which has id column, But the issue is my id has guid values, So can I i take that as my watermark column ? and if yes while copy activity process it gives me following error in ADF
Please see the image for above reference
How can I overcome this issue. Help is appreciated
Thank you
Gp
I have tried dynamic mapping https://martinschoombee.com/2022/03/22/dynamic-column-mapping-in-azure-data-factory/ from here but it does not work it still gives me same error.
Regarding your question about watermak:
A watermark is a column that has the last updated time stamp or an incrementing key
So GUID column would not be a good fit.
Try to find a date column, or an integer identity which is ever incrementing, to use as watermark.
Since your source is SQL server, you can also use change data capture.
Links:
Incremental loading in ADF
Change data capture
Regards,
Chen
The watermark logic takes advantange of the fact that all the new records which are inserted after the last watermark saved should only be considered for copying from source A to B , basically we are using ">=" operator to our advantage here .
In case of guid you cannot use that logic as guid cann surely be unique but not ">=" or "=<" will not work.

Not able to change datatype of Additional Column in Copy Activity - Azure Data Factory

I am facing very simple problem of not able to change the datatype of additional column in copy activity in ADF pipeline from String to Datetime
I am trying to change source datatype for additional column in mapping using JSON but still it doesn't work with polybase cmd
When I run my pipeline it gives same error
Is it not possible to change datatype of additional column, by default it takes string only
Dynamic columns return string.
Try to put the value [Ex. utcnow()] in the dynamic content of query and cast it to the required target datatype.
Otherwise you can use data-flow-derived-column :
https://learn.microsoft.com/en-us/azure/data-factory/data-flow-derived-column
Since your source is a query, you can choose to bring current date in source SQL query itself in the desired format rather than adding it in the additional column.
Thanks
Try to use formatDateTime as shown below and define the desired Date format:
Here since format given is ‘yyyy-dd-MM’, the result will look as below:
Note: The output here will be of string format only as in Copy activity we could not cast data type as of the date.
We could either create current date in the Source sql query or use above way so that the data would load into the sink in expected format.

Incremental load without date or primary key column using azure data factory

I am having a source lets say SQL DB or an oracle database and I wanted to pull the table data to Azure SQL database. But the problem is I don't have any date column on which data is getting inserting or a primary key column. So is there any other way to perform this operation.
One way of doing it semi-incremental is to partition the table by a fairly stable column in the source table, then you can use mapping data flow to compare the partitions ( can be done with row counts, aggregations, hashbytes etc ). Each load you store the compare output in the partitions metadata somewhere to be able to compare it again the next time you load. That way you can reload only the partitions that were changed since your last load.

Adding Extraction DateTime in Azure Data Factory

I want to write a generic DataFactory in V2 with below scenario.
Source ---> Extracted (Salesforce or some other way), which don't have
extraction timestamp. ---> I want to write it to Blob with extraction
Time Stamp.
I want it to be generic, so I don't want to give column mapping anywhere.
Is there any way to use expression or system variable in Custom activity to append a column in output dataset? I like to have a very simple solution to make implementation realistic.
To do that you should change the query to add the column you need, with the query property in the copy activity of the pipeline. https://learn.microsoft.com/en-us/azure/data-factory/connector-salesforce#copy-activity-properties
I dont know much about Salesforce, but in SQL Server you can do the following:
SELECT *, CURRENT_TIMESTAMP as AddedTimeStamp from [schema].[table]
This will give you every field on your table and will add a column named AddedTimeStamp with the CURRENT_TIMESTAMP value in every row of the result.
Hope this helped!

Excel Power Query - Incremental Load from Query and adding date

I am trying to do something quite simple which I am failing to understand.
Take the output from a query, date time stamp and write it into a Excel table.
Iterate the logic again and you get the same output but the generated date time has progressed in time.
Query 1 -- From SQL which yields 2 columns category, count.
I am taking this and adding a generated date to it using DateTime.LocalNow().
Query 2 -- Target table
How can i construct a query which adds to an existing table and doesnt require me to load the result into a new table.
I have seen this blog.oraylis.de and i cant make it work since the DateTime.LocalNow() call runs for source and target and i end up with the same datetime throughout the query.
I think i am missing something obvious.
EDIT:-
= Table.Combine({SOURCE_DATA, TARGET_DATA})
This loads into a 3rd new table and doesnt take into account that 3rd table when loading - so you just end up with a new version of just the first two tables with new timestamp
These steps should work
create a query Q1 based on the SQL Statement, add your timestamp using DateTime.LocalNow() and load this into an Excel table (execute the query)
create a new query Q2 based on this Excel new table (just like that, no transforms)
Modify the first query Q1 by adding the Table.Combine with Q2 as the last step.
So, in other words, Q2 loads the existing data from the Excel table into which Q1 writes. The Excel table is always written completely but since the existing data is preserved you will get the result of new data being loaded to the table. Hope this helps.
Good luck, Hilmar

Resources