Dynamic file name with custom outputter - azure

I'm trying to process images (create thumbnail images) using u-sql with custom outputter and trying to output files with dynamic file name.
My u-sql code look like this.
REFERENCE ASSEMBLY [USQLAssemblies];
#image_out =
SELECT USQLAssemblies.ImageOps.scaleImageTo(ImgData, 480, 480) AS thumbnail_image,
FileName + "480" AS FileName
FROM dbo.ThumbnailImages;
OUTPUT #image_out
TO #"D:\Test\{FileName}.gif"
USING new USQLAssemblies.ImageOutputter();
The script returned an error.
Error: Data partitioned output is not supported for user-defined outputters.
Does u-sql support custom outputter with dynamic file name? or is it in preview?
Any suggestion for workaround?

To use partitioned output you need to activate this on preview functions.
You can try to add this line to beginning of your code.
SET ##FeaturePreviews = "DataPartitionedOutput:on";
If not works, you will need to contact usql team to activate it on your account.

Related

How to give dynamic expression path for file location (Wildcard file paths) in Azure data factory?

I am getting every data single excel file in my data lake. My container name is 'odoo' in the data lake. Excel files get stored in the folder called 'odoo' and below is the name of the file
report_2022-01-20.xlsx
I am using dataflow and I wanted to take everyday file using a wildcard path. Below is the dynamic expression I am trying to give but no success
/odoo/#concat('report_', string(formatDateTime(utcNow(), 'yyyy-MM-dd')), '.xlsx')
Can anyone advise me how to write the correct expression? I am a newbie to the adf.
Your expression looks fine. In your dataset, browse the container/folder and add the expression in the file path to get the file dynamically.
Source file:
#concat('report_', string(formatDateTime(utcNow(), 'yyyy-MM-dd')), '.xlsx')
Dataflow:
Can you try: #concat('/odoo/report_', formatDateTime(utcNow(), 'yyyy-MM-dd'), '.xlsx')
The string() might be causing the issue

Azure Data Factory: output dataset file name from input dataset folder name

I'm trying to solve following scenario in Azure Data Factory:
I have a large number of folders in Azure Blob Storage. Each folder contains varying number of files in parquet format. Folder name contains the date when data contained in the folder was generated, something like this: DATE=2021-01-01. I need to filter the files and save them into another container in delimited format and each file should have the date indicated in source folder name in it's file name.
So when my input looks something like this...
DATE=2021-01-01/
data-file-001.parquet
data-file-002.parquet
data-file-003.parquet
DATE=2021-01-02/
data-file-001.parquet
data-file-002.parquet
...my output should look something like this:
output-data/
data_2021-01-01_1.csv
data_2021-01-01_2.csv
data_2021-01-01_3.csv
data_2021-01-02_1.csv
data_2021-01-02_2.csv
Reading files from subfolders and filtering them and saving them is easy. Problems start when I'm trying to set output dataset file name dynamically. I can get the folder names using Get Metadata activity and then I can use ForEach activity to set them into variables. However, I can't figure out how to use this variable in filtering data flow sinks dataset.
Update:
My Get Metadata1 activity, set the container input as:
Set the container input as follows:
My debug info is as follows:
I think I've found the solution. I'm using csv files for example.
My input looks something like this
container:input
2021-01-01/
data-file-001.csv
data-file-002.csv
data-file-003.csv
2021-01-02/
data-file-001.csv
data-file-002.csv
My debug result is as follows:
Using Get Metadata1 activity to get the folder list and then using ForEach1 activity to iterate this list.
Inside the ForEach1 activity, we now using data flow to move data.
Set the source dataset to the container and declare a parameter FolderName.
Then add dynamic content #dataset().FolderName to the source dataser.
Back to the ForEach1 activity, we can add dynamic content #item().name to parameter FolderName.
Key in File_Name to the tab. It will store the file name as a column eg. /2021-01-01/data-file-001.csv.
Then we can process this column to get the file name we want via DerivedColumn1.
Addd expression concat('data_',substring(File_Name,2,10),'_',split(File_Name,'-')[5]).
In the Settings of sink, we can select Name file as column data and File_Name.
That's all.

Data Factory Data Flow sink file name

I have a data flow that merges multiple pipe delimited files into one file and stores it in Azure Blob Container. I'm using a file pattern for the output file name concat('myFile' + toString(currentDate('PST')), '.txt').
How can I grab the file name that's generated after the dataflow is completed? I have other activities to log the file name into a database, but not able to figure out how to get the file name.
I tried #{activity('Data flow1').output.filePattern} but it didn't help.
Thank you
You can use GetMeta data activity to get the file name that is generated after the data flow.

Azure Data Factory- Data Flow - After completion - move

I am using ADF v2 DataFlow ativity to load data from a csv file in a Blob Storage into a table in Azure SQL database. In the Dataflow (Source - Blob storage), in Source options, there is an option 'After Completion(No Action/Delete Source file/ Move)'. I am looking to utilize the move option to save those csv files in a container renaming those files in concatenation with with today's date. How do I frame the logic for this? Can someone please help?
You can define the file name explicitly in both From and To-fields. This is not so well (if at all) documented, and I found it just trying different approaches.
You can also add dynamic content such as timestamps. Here's an example:
concat('incoming/archive/', toString(currentUTC(), 'yyyy-MM-dd_HH.mm.ss_'), 'target_file.csv')
You could parameter the source file to achieve that. Please ref my example.
Data Flow parameter settings:
Set the source file and move expression in Source Options:
Expressions to rename the source with "name + current date":
concat(substring($filename, 1, length($filename)-4),toString(currentUTC(),'yyyy-MM-dd') )
My full file name is "word.csv", the output file name is "word2020-01-26",
HTH.

Azure Data Factory - CSV to Parquet - Changing file extension

I have created a Data Factory to convert a CSV file to Parquet format, as I needed to retain the orginial file name I am using the 'Preserve Hierarchy' at the pipeline. The conversion works fine but the output file is generated with the csv extension (an expected output). Is there any out of the box option I could use to generate the file name without the csv extension. I scanned through the system varaible currently supported by ADF and it doesn't list Input File name as an option to mask the file extension - https://learn.microsoft.com/en-us/azure/data-factory/control-flow-system-variables. Is writing a custom component the only option?
Thanks for your inputs.
You could use get metadata activity to get the file name of your source dataset and then pass it to both input and output dataset of your copy activity if you are using azure data factory v2.

Resources