How to insert Data into the Computed Columns using ADF V1 - azure

How can we insert data into the computed columns using ADF V1. Currently we cannot insert the data into the Azure SQL Database.

Suppose the source system is a JDBC connection and the target system is an Azure Sql database table that has a computed column (say extracting year from a date string "20181022"). When ADF tries to load the table it complains about the column mismatch in with the target system.
Is there a way to get around this?

I have a similar situation and also looking for a better solution.
I am copying data from a prod db and updating the QA db with it.
Here's how I have been handling it so far: modify the hashbytes column in the target QA db table so that it is a varchar. But this solution is less than ideal because this will have to be done manually every single time a code change is pushed to the affected table.
Your situation is different.
If it were me, I would do a straight copy of the data into the Azure SQL Database. No computed tables. No modifications. Then I would stage the data into another table that had the computed columns I was after.
It is an Extract, Load, Transform approach, compared to the traditional Extract, Transform, Load method.
This question is really old, but hey, I found this question just now; maybe other will, too.

Related

How to delete records from a sql database using azure data factory

I am setting up a pipeline in data factory where the first part of the pipeline needs some pre-processing cleaning. I currently have a script set up to query these rows that need to be deleted, and export these results into a csv.
What I am looking for is essentially the opposite of an upsert copy activity. I would like the procedure to delete the rows in my table based on a matching row.
Apologies in advanced if this is an easy solution, I am fairly new to data factory and just need help looking in the right direction.
Assuming the source from which you are initially getting the rows is different from the sink
There are multiple ways to achieve it.
in case if the number of rows is less, we can leverage script activity or lookup activity to delete the records from the destination table
in case of larger dataset, limitations of lookup activity, you can copy the data into a staging table with in destination and leverage a script activity to delete the matching rows
in case if your org supports usage of dataflows, you can use that to achieve it

Azure Data Factory DataFlow Filter is taking a lot of time

I have an ADF Pipleline which executes a DataFlow.
The Dataflow has Source A table which has around 1 Million Rows,
Filter which has a query to select only yesterday's records from the source table,
Alter Row settings which uses upsert,
Sink which is archival table where the records are getting upsert
This whole pipeline is taking around 2 hours or so which is not acceptable. Actually, the records being transferred / upserted are around 3000 only.
Core count is 16. Tried the partitioning with round robin and 20 partitions.
Similar archival doesn't take more than 15 minutes for another table which has around 100K records.
I thought of creating source which would select only yesterday's record but the dataset we can select only table.
Please suggest if I am missing anything to optimize it.
The table of the Data Set really doesn't matter. Whichever activity you use to access that Data Set can be toggled to use a query instead of the whole table, so that you can pass in a value to select only yesterday's data from the database.
Or course, if you have the ability to create a stored procedure on the source, you could also do that.
When migrating really large sets of data, you'll get much better performance using a Copy activity to stage the data into an Azure Storage Blob before using another Copy activity to pull from that Blob into the source. But, for what you're describing here, that doesn't seem necessary.

Azure SQL External Table alternatives

Azure external tables between two azure sql databases on the same server don't perform well. This is known. I've been able to improve performance by defining a view from which the external table is defined. This works if the view can limit the data set returned. But this partial solution isn't enough. I'd love a way to at least nightly, move all the data that has been inserted or updated from the full set of tables from the one database (dbo schema) to the second database (pushing into the altdbo schema). I think Azure data factory will let me do this, but I haven't figured out how. Any thoughts / guidance? The copy option doesn't copy over table schemas or updates
Data Factory Mapping Data Flow can help you achieve that.
Using the AlterRow active and select an uptade method in Sink:
This can help you copy the new inserted or updated data to the another Azure SQL database based on the Key Column.
Alter Row: Use the Alter Row transformation to set insert, delete, update, and
upsert policies on rows.
Update method: Determines what operations are allowed on your
database destination. The default is to only allow inserts. To
update, upsert, or delete rows, an alter-row transformation is
required to tag rows for those actions. For updates, upserts and
deletes, a key column or columns must be set to determine which row
to alter.
Hope this helps.

Is there way a to use join query in Azure Data factory When copying data from Sybase source

I am trying to ingest data from Sybase source in to Azure datalake. I am ingesting several tables using a Watermark table that has tables names from Sybase source. Now process works fine for a full import, however we are trying to Import tables every 15 minutes to feed a dashboard. We don't need to ingest whole table as we don't need all the data from it.
Table doesn't have dateModified or any kind of incremental id to perform an incremental load. Only way of filtering out unwanted data is to perform a join on to another look up table at source and then using "filter" value in "Where" clause.
Is there a way we can perform this in Azure data factory ? I have attached my current pipeline screenshot just to make it a bit more clear.
Many thanks for looking in to this. I have managed to find a solution. I was using a Watermark table to ingest about 40 tables using one pipeline. My only issue was how to use join and "where" filter in my query without hard coding it in pipeline. I have achieved this by adding "Join" and "Where" fields in my Watermark table and then passing it in "Query" as #{item ().Join} #{item().Where). It Worked like a magic.

How to perform Incremental Load with date or key column using Azure data factory

I wanted to achieve an incremental load from oracle to Azure SQL data warehouse using azure data factory. The Issue that I am facing is I don't have any date column or any key column to perform Incremental load Is there any other way to achieve this.
You will either have to:
A. Identify a field in each table you want to use to determine if the row has changed
B. Implement some kind of change capture feature on the source data
Those are really the only the only two ways to limit the amount of data you pull from the source.
It wouldn't be very efficient, but if you are just trying not to update rows that haven't changed in your destination, you can hash your source values and hash the values in the destination, and only insert/update rows where the hashes don't match. Here's an example of how this works in T-SQL.
There is a section of the Data Factory documentation dedicated to incrementally loading data. Please check it out if you haven't.

Resources