Azure data factory file creation - azure

I have a basic requirement where I want to append time stamp to file extracted from sql db and put it in blob.i use utcnow() and it creates a timestamp with T and all which I dont need.
any format expression to get date and just time??
New to javascript expressions as I am from ssis background
Help appreciated

The only way you can do that is copy and create a new blob with a new name concat with the timestamp.
Data Factory doesn't support rename the blob.
I only succeed with one file.
You can follow my steps:
Using lookup activity to get the timestamp from SQL database.
Using Get metadata to get the blob name from Storage.
Using Copy data activity to copy and create new file name blob.
Pileline preview:
Lookup preview:
Get metadata and Source Dataset:
Copy data activity Source setting:
Copy data activity Sink setting:
Add parameter to set the new file name in source datasaet:
Using expression to create the new file with the filename and timestamp:
#concat(split(activity('Get Metadata1').output.itemName,'.')[0],activity('Lookup1').output.firstRow.tt)
Then check the output file in the Blob Storage:
Hope this helps.

You can use expression in the destination file name, in the sink.
toTimestamp(utcnow(), 'yyyyMMdd_HHmm_ss')

Related

How do I pull the last modified file with data flow in azure data factory?

I have files that are uploaded into an onprem folder daily, from there I have a pipeline pulling it to a blob storage container (input), from there I have another pipeline from blob (input) to blob (output), here is were the dataflow is, between those two blobs. Finally, I have output linked to sql. However, I want the blob to blob pipeline to pull only the file that was uploaded that day and run through the dataflow. The way I have it setup, every time the pipeline runs, it doubles my files. I've attached images below
[![Blob to Blob Pipeline][1]][1]
Please let me know if there is anything else that would make this more clear
[1]: https://i.stack.imgur.com/24Uky.png
I want the blob to blob pipeline to pull only the file that was uploaded that day and run through the dataflow.
To achieve above scenario, you can use Filter by last Modified date by passing the dynamic content as below:
#startOfDay(utcnow()) : It will take start of the day for the current timestamp.
#utcnow() : It will take current timestamp.
Input and Output of Get metadata activity: (Its filtering file for that day only)
If the files are multiple for particular day, then you have to use for each activity and pass the output of Get metadata activity to foreach activity as
#activity('Get Metadata1').output.childItems
Then add Dataflow activity in Foreach and create source dataset with filename parameter
Give filename parameter which is created as dynamic value in filename
And then pass source parameter filename as #item().name
It will run dataflow for each file get metadata is returning.
I was able to solve this by selecting "Delete source files" in dataflow. This way the the first pipeline pulls the new daily report into the input, and when the second pipeline (with the dataflow) pulls the file from input to output, it deletes the file in input, hence not allowing it to duplicate

How to set and get variable value in Azure Synapse or Data Factory pipeline

I have created a pipeline with Copy Activity, say, activity1in Azure Synapse Analytics workspace that loads the following JSON to Azure Data Lake Storage Gen2 (ADLSGen2) using source as a REST Api and Sink (destination) as ADLSGen2. Ref.
MyJsonFile.json (stored in ADLSGen2)
{"file_url":"https://files.testwebsite.com/Downloads/TimeStampFileName.zip"}
In the same pipeline, I need to add an activity2 that reads the URL from the above JSON, and activity3 that loads the zip file (mentioned in that URL) to the same Gen2 storage.
Question: How can we add an activity2 to the existing pipeline that will get the URL from the above JSON and then pass it to activity3? Or, are there any better suggestions/solutions to achieve this task.
Remarks: I have tried Set Variable Activity (shown below) by first declaring a variable in the pipeline and the using that variable, say, myURLVar in this activity, but I am not sure how to dynamically set the value of myURLVar to the value of the URL from the above JSON. Please NOTE the Json file name (MyJsonFile.json) is a constant, but zip file name in the URL is dynamic (based on timestamp), hence we cannot just hard code the above url.
As #Steve Zhao mentioned in the comments, use lookup activity to get the data from the JSON file and extract the required URL from the lookup output value using set variable activity.
Connect the lookup activity to the sink dataset of previous copy data activity.
Output of lookup activity:
I have used the substring function in set activity to extract the URL from the lookup output.
#replace(substring(replace(replace(replace(string(activity('Lookup1').output.value),'"',''),'}',''),'{',''),indexof(replace(replace(replace(string(activity('Lookup1').output.value),'"',''),'}',''),'{',''),'http'),sub(length(string(replace(replace(replace(string(activity('Lookup1').output.value),'"',''),'}',''),'{',''))),indexof(replace(replace(replace(string(activity('Lookup1').output.value),'"',''),'}',''),'{',''),'http'))),']','')
Check the output of set variable:
Set variable output value:
There is a way to do this without needing complex string manipulation to parse the JSON. The caveat is that the JSON file needs to be formatted such that there are no line breaks (or that each line break represents a new record).
First setup a Lookup activity that loads the JSON file in the same way as #NiharikaMoola-MT's answer shows.
Then for the Set Variable activity's Value setting, use the following dynamic expression: #activity('<YourLookupActivityNameHere>').output.firstRow.file_url

Azure Data Factory- Data Flow - After completion - move

I am using ADF v2 DataFlow ativity to load data from a csv file in a Blob Storage into a table in Azure SQL database. In the Dataflow (Source - Blob storage), in Source options, there is an option 'After Completion(No Action/Delete Source file/ Move)'. I am looking to utilize the move option to save those csv files in a container renaming those files in concatenation with with today's date. How do I frame the logic for this? Can someone please help?
You can define the file name explicitly in both From and To-fields. This is not so well (if at all) documented, and I found it just trying different approaches.
You can also add dynamic content such as timestamps. Here's an example:
concat('incoming/archive/', toString(currentUTC(), 'yyyy-MM-dd_HH.mm.ss_'), 'target_file.csv')
You could parameter the source file to achieve that. Please ref my example.
Data Flow parameter settings:
Set the source file and move expression in Source Options:
Expressions to rename the source with "name + current date":
concat(substring($filename, 1, length($filename)-4),toString(currentUTC(),'yyyy-MM-dd') )
My full file name is "word.csv", the output file name is "word2020-01-26",
HTH.

azure data factory v2 copy data activity recursive

I am new to azure Data factory v2
I have a folder having 2 files F1.csv and F2.csv in a blob storage.
I have a created a copy data pipeline activity to load the data from file to a table in azure DWH with 3 parameters and copy recursively was made to false.
Parameter1: container
Parameter2: directory
Parameter3: F1.csv
executed successfully when used the above parameters for the copy data activity.
But the data has been loaded from two files, only one file has given as parameter for the activity
Can you please check the "wildcard file name" parameter ( Select the pipeline and look under the source tab ) ?
If it is set as . please remove that and it should work .

How to include blob metadata in copy data mapping

I'm working on a ADF v2 pipeline, which copies data from csv blob to Azure SQL database table. For each load I would like to collect source metadata, like source blob name, and save it to a target table as a part of data lineage framework.
My blob source run the following schema:
StoreName,
StoreLocation,
StoreTaxId.
My destination table run the following schema:
StoreName,
StoreLocation,
DwhProcessDate,
DwhSourceName.
I do not know, how to properly include name of the source in the mapping section of Copy Data activity.
For the moment I have:
defined a [Get Metadata1] activity to get references to all blobs that are available from Azure Blob Storage
defined a [ForEach1] activity, iterating through the output of an expression #activity('Get Metadata1').output.childitems
inside the [ForEach1] activity, I have placed [Copy Data1] activity, where I have source and sink sections defined.
What I'm looking for is a way to add extra line to the mapping section, which will samehow bind #item().name to destination column [DwhSourceName]
Thanks for all suggestion on how to achieve this.
Actually,based on my test,you can specify the dymatic content of column key,but you can't set blob metadata as value of columns in copy data mapping at the pipeline run time. Please see the rules mentioned in this document.
You still need to add the FileName column in your source data before the copy activity.Maybe you could use Azure Blob Trigger Function to get the blob file name so hat you could add the FileName column when any data stream into the blob.(Please refer to this case:How Do I get the Name of The inputBlob That Triggered My Azure Function With Python)

Resources