Why won't my rows get deleted in Data Factory? - azure

I am trying to do some data transformations on a dataset in Data Factory. I wanted to delete a set of rows based on certain conditions. This is the data flow so far:
So in AlterRow1 I deleted the rows I wanted, and this is the result when I click on data preview:
As you can see, 6 rows get deleted, exactly what I wanted. However, in sink1 this is the data preview I'm getting:
The rows I wanted to delete are back and won't get deleted when I run this pipeline. I'll add that the source is an excel file from the blob storage and sink is a csv file in my blob storage.
What am I doing wrong?
EDIT:
There are no settings in the sink to allow deletion.

Although you seem to be able to get the preview, Alter row transformation can result in a row (or rows) being inserted, updated, deleted, or upserted (DDL & DML actions) against your database only.
See, Alter row transformation in mapping data flow
I did try to repro your exact scenario and I do see the same behavior. I can see in AlterRow transformation's Data review the rows marked X to be deleted. But the sink preview doesn't show them and all the rows from source are seen.
I could not find any particular details as to this behavior, you can reach out here and here for official response.

Related

How to delete records from a sql database using azure data factory

I am setting up a pipeline in data factory where the first part of the pipeline needs some pre-processing cleaning. I currently have a script set up to query these rows that need to be deleted, and export these results into a csv.
What I am looking for is essentially the opposite of an upsert copy activity. I would like the procedure to delete the rows in my table based on a matching row.
Apologies in advanced if this is an easy solution, I am fairly new to data factory and just need help looking in the right direction.
Assuming the source from which you are initially getting the rows is different from the sink
There are multiple ways to achieve it.
in case if the number of rows is less, we can leverage script activity or lookup activity to delete the records from the destination table
in case of larger dataset, limitations of lookup activity, you can copy the data into a staging table with in destination and leverage a script activity to delete the matching rows
in case if your org supports usage of dataflows, you can use that to achieve it

Overwrite sql table with new data in Azure Dataflow

Here is my situation. Iam using Alteryx ETL tool where in basically we are appending new records to tableau by using option provided like 'Overwrite the file'.
What it does is any data incoming is captured to the target and delete the old data--> publish results in Tableau visualisation tool.
So whatever data coming in source must overwrite the existing data in Sink table.
How can we achieve this in Azure data Flow?
If you are writing to a database table, you'll see a sink setting for "truncate table" which will remove all previous rows and replace them with your new rows. Or if you are trying to overwrite just specific rows based on a key, then use an Alter Row transformation and use the "Update" option.
If your requirement is just to copy data from your source to target and truncate the table data before the latest data is copied, then you can just use a copy activity in Azure Data factory. In copy activity you have an option called Pre-copy script, in which you can specify a query to truncate the table data and then proceed with copying the latest data.
Here is an article by a community volunteer where a similar requirement has been discussed with various approaches - How to truncate table in Azure Data Factory
In case if your requirement is to do data transformation first and then copy the data to your target sql table and truncate table before your copy the latest transformed data then, you will have to use mapping data flow activity.

How to parse each row of an excel using Azure Data Factory

here is my requirement:
I have an excel with few columns in it and few rows with data
I have uploaded this excel in Azure blob storage
Using ADF I need to read this excel and parse the records in it one by one and perform an action of creating dynamic folders in Azure blob.
This needs to be done for each and every record present in the excel.
Each record in the excel has some information that is going to help me create the folders dynamically.
Could someone help me in choosing the right set of activities or data flow in ADF to do this work?
Thanks in advance!
This is my Excel file as a Source.
I have created folders in Blob storage based on Country column.
I have selected DataFlow activity.
As shown in below screenshot, Go to Optimize tab of Sink configuration.
Now select Partition option as Set Partition.
Partition type as Key.
And Unique value per partition as Country column.
Now run Pipeline.
Expected Output:-
Inside these folders you will get files with corresponding data.

Altering CSV Rows in Azure Data Factory

I've tried to use the 'Alter Rows' function within a Data Flow in Azure Data Factory to remove rows that match a condition from a CSV dataset.
The Data Preview shows that the rows matched will be deleted, however in the next step 'sink' it seems to ignore that and writes the original rows to the CSV file output.
Is it not possible to use alter rows on a CSV dataset and if not, is there a work around?
Firstly,use 'union' to migrate your csv files as source.
Then,use 'filter' to filter your data with date time stamps at source.

Azure Data Pipeline Copy Activity loses column names when copying from SAP Hana to Data Lake Store

I am trying to copy data from SAP Hana to Azure Data Lake Store (DLS) using a Copy Activity in a Data Pipeline via Azure Data Factory.
Our copy activity runs fine and we can see that rows made it from Hana to the DLS, but they don't appear to have column names (instead they are just given 0-indexed numbers).
This link says “For structured data sources, specify the structure section only if you want map source columns to sink columns, and their names are not the same.”
We are fine using the original column names from the SAP Hana table, so it seems like we shouldn't need to specify the structure section in our dataset. However, even when we do, we still just see numbers for column names.
We have also seen the translator property at this link, but are not sure if that is the route we need to go.
Can anyone tell me why we aren't seeing the original column names copied into DLS and how we can change that? Thank you!
UPDATE
Setting the firstRowAsHeader property of the format section on our dataset to true basically solved the problem. The console still shows the numerical indices, but now includes the headers we are after as the first row. Upon downloading and opening the file, we can see the numbers are not there (the console just shows them for whatever reason), and it is a standard comma-delimeted file with a header row and one row entry per line.
Example:
COLUMNA,COLUMNB
aVal1,bVal1
aVal2,bVal2
We can now tell our sources and sinks to write and expect this format when reading.
BONUS UPDATE:
To get rid of the numerical indices and see the proper column headers in the console, click Format in the top-left corner, and then check the "First row is a header" box toward the bottom of the resulting blade
See the update above.
The format.firstRowAsHeader property needed to be set to true

Resources