Azure Data Factory- Data Flow - After completion - move - azure

I am using ADF v2 DataFlow ativity to load data from a csv file in a Blob Storage into a table in Azure SQL database. In the Dataflow (Source - Blob storage), in Source options, there is an option 'After Completion(No Action/Delete Source file/ Move)'. I am looking to utilize the move option to save those csv files in a container renaming those files in concatenation with with today's date. How do I frame the logic for this? Can someone please help?

You can define the file name explicitly in both From and To-fields. This is not so well (if at all) documented, and I found it just trying different approaches.
You can also add dynamic content such as timestamps. Here's an example:
concat('incoming/archive/', toString(currentUTC(), 'yyyy-MM-dd_HH.mm.ss_'), 'target_file.csv')

You could parameter the source file to achieve that. Please ref my example.
Data Flow parameter settings:
Set the source file and move expression in Source Options:
Expressions to rename the source with "name + current date":
concat(substring($filename, 1, length($filename)-4),toString(currentUTC(),'yyyy-MM-dd') )
My full file name is "word.csv", the output file name is "word2020-01-26",
HTH.

Related

How to give dynamic expression path for file location (Wildcard file paths) in Azure data factory?

I am getting every data single excel file in my data lake. My container name is 'odoo' in the data lake. Excel files get stored in the folder called 'odoo' and below is the name of the file
report_2022-01-20.xlsx
I am using dataflow and I wanted to take everyday file using a wildcard path. Below is the dynamic expression I am trying to give but no success
/odoo/#concat('report_', string(formatDateTime(utcNow(), 'yyyy-MM-dd')), '.xlsx')
Can anyone advise me how to write the correct expression? I am a newbie to the adf.
Your expression looks fine. In your dataset, browse the container/folder and add the expression in the file path to get the file dynamically.
Source file:
#concat('report_', string(formatDateTime(utcNow(), 'yyyy-MM-dd')), '.xlsx')
Dataflow:
Can you try: #concat('/odoo/report_', formatDateTime(utcNow(), 'yyyy-MM-dd'), '.xlsx')
The string() might be causing the issue

Writing in sub directory - Azure Mapping Data Flow

In my Data flow, the sink is ADLS.
My source files are present in SoureDump/Data and I am reading from path SoureDump/Data. I am doing few transformations and I am trying write the output files into SoureDump/Rawzone.
Output file name is created from the data.
When I trigger the pipeline, the output files are generated as expected but are written in the parent directory SoureDump.
My work:
DataSet screenshot
Please let me know if I have given anything wrong.
Thanks.
"As data in column" defaults back to your dataset container object. Check the info bubble next to the column name field:
Just set your target folder with a Derived Column prior to your sink and append the value like this:
tgt_file_name_w_path = '/mypath/output/'+tgt_file_name
Then use tgt_file_name_w_path instead in your column with file name property.

Azure data factory file creation

I have a basic requirement where I want to append time stamp to file extracted from sql db and put it in blob.i use utcnow() and it creates a timestamp with T and all which I dont need.
any format expression to get date and just time??
New to javascript expressions as I am from ssis background
Help appreciated
The only way you can do that is copy and create a new blob with a new name concat with the timestamp.
Data Factory doesn't support rename the blob.
I only succeed with one file.
You can follow my steps:
Using lookup activity to get the timestamp from SQL database.
Using Get metadata to get the blob name from Storage.
Using Copy data activity to copy and create new file name blob.
Pileline preview:
Lookup preview:
Get metadata and Source Dataset:
Copy data activity Source setting:
Copy data activity Sink setting:
Add parameter to set the new file name in source datasaet:
Using expression to create the new file with the filename and timestamp:
#concat(split(activity('Get Metadata1').output.itemName,'.')[0],activity('Lookup1').output.firstRow.tt)
Then check the output file in the Blob Storage:
Hope this helps.
You can use expression in the destination file name, in the sink.
toTimestamp(utcnow(), 'yyyyMMdd_HHmm_ss')

Error using data factory for copyactivity from blob storage as source

Why do I keep getting this error while using a folder from a blob container as source (which contains only one GZ compressed file) in copy activity in data factory v2 and as sink another blob storage (but I want the file decompressed)?
"message":"ErrorCode=UserErrorFormatIsRequired,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Format setting is required for file based store(s) in this scenario.,Source=Microsoft.DataTransfer.ClientLibrary,'",
I know it means I need to specify explicitly the format for my sink dataset, but I am not sure how to do that.
I suggest using the copy data tool.
step 1
step 2
According you comment, I tried a lot times, unless you choose the compressed file as source dataset and import the schemas, Azure Data factory copy actives will not help you decompress the file.
If the files in the the compressed file don't have the same schema, the copy active also could be failed.
Hope this helps.
The easiest way to do this: go to the dataset, and click on the tab Schema, then Import Schema.
Hope this helped!!

How to include blob metadata in copy data mapping

I'm working on a ADF v2 pipeline, which copies data from csv blob to Azure SQL database table. For each load I would like to collect source metadata, like source blob name, and save it to a target table as a part of data lineage framework.
My blob source run the following schema:
StoreName,
StoreLocation,
StoreTaxId.
My destination table run the following schema:
StoreName,
StoreLocation,
DwhProcessDate,
DwhSourceName.
I do not know, how to properly include name of the source in the mapping section of Copy Data activity.
For the moment I have:
defined a [Get Metadata1] activity to get references to all blobs that are available from Azure Blob Storage
defined a [ForEach1] activity, iterating through the output of an expression #activity('Get Metadata1').output.childitems
inside the [ForEach1] activity, I have placed [Copy Data1] activity, where I have source and sink sections defined.
What I'm looking for is a way to add extra line to the mapping section, which will samehow bind #item().name to destination column [DwhSourceName]
Thanks for all suggestion on how to achieve this.
Actually,based on my test,you can specify the dymatic content of column key,but you can't set blob metadata as value of columns in copy data mapping at the pipeline run time. Please see the rules mentioned in this document.
You still need to add the FileName column in your source data before the copy activity.Maybe you could use Azure Blob Trigger Function to get the blob file name so hat you could add the FileName column when any data stream into the blob.(Please refer to this case:How Do I get the Name of The inputBlob That Triggered My Azure Function With Python)

Resources