Can we extract a zip file in copy activity - Azure Datafactory - azure

i have zip file i would like to uncproesss the file and get the csv file and push it to the blob.i can achive in.gz but .zip file we are not able to.
could you please assit here.
Thanks
Richard

You could set binary format as source and sink dataset in ADF copy activity.Select Compression type as ZipDefalte following this link: https://social.msdn.microsoft.com/Forums/en-US/a46a62f2-e211-4a5f-bf96-0c0705925bcf/working-with-zip-files-in-azure-data-factory
Source:
Sink:
Test result in sink path:

Related

Azure data factory - convert YYYYDDMMmmHHss to DDMMYYYYHHmmSS

how to convert YYYYDDMMmmHHss to DDMMYYYYmmHHSS in Azure data factory file name output
Regards
Ravi
I tried this and my source and sink are both csv.
This is my file name: 20201706170905.csv.
First:create a getMetaData activity like this
Then:create a copy activity,and the sink dataset filename set like this:
The expression in file name(you can use concat and substring to convert what you want):
#concat(substring(activity('Get Metadata1').output.itemName,4,4),substring(activity('Get Metadata1').output.itemName,0,4),substring(activity('Get Metadata1').output.itemName,10,2),substring(activity('Get Metadata1').output.itemName,8,2),substring(activity('Get Metadata1').output.itemName,12,6))
Finally: run the pipeline(if you don't need the Original file,you can use delete activity delete it).
Result:
Hope this can help you.

Unzip files from a ZIP file and use them as source in an ADF Copy activity

I see there is a way to deflate a ZIP file but when there are multiple .csv files within a ZIP, how do I specify which to use as my source for copy activity? It is now parsing both csv files and giving as a single file and I'm not able to select the file I want as source
According to my test, we can't unzip .zip file in the ADF to get the file name lists in the ADF dataset. So, i provide below workaround for your reference.
Firstly, you could use Azure Function Activity to trigger a function which is for the decompression of your zip file.You only need to get the file name list then return it as an array.
Secondly, use ForEach Activity to loop the result, to get your desired file name.
Finally, inside ForEach Activity, please use #item() in the Dataset to configure the specific file path so that you could you could refer it in the copy activity.

write data to text file in azure data factory version 2

It's seem ADF v2 does not support writing data to TEXT file (.TXT).
After select File System
But don't see TextFormat at the next screen
So do we any method to write data to TEXT file ?
Thanks,
Thai
Data Factory only support these 6 file formats:
Please see: Supported file formats and compression codecs in Azure Data Factory.
If we want to write data to a txt file, the only format we can using is Delimited text, when the pipeline finished, you will get a txt file.
Reference: Delimited text: Follow this article when you want to parse the delimited text files or write the data into delimited text format.
For example, I create a pipeline to copy data from Azure SQL to Blob, choose DelimitedText format as Sink dataset:
The txt file I get in Blob Storeage:
Hope this helps
I think what you are looking for is DelimitedText dataset. You can specify extension as part of the file name

Error using data factory for copyactivity from blob storage as source

Why do I keep getting this error while using a folder from a blob container as source (which contains only one GZ compressed file) in copy activity in data factory v2 and as sink another blob storage (but I want the file decompressed)?
"message":"ErrorCode=UserErrorFormatIsRequired,
'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,
Message=Format setting is required for file based store(s) in this scenario.,Source=Microsoft.DataTransfer.ClientLibrary,'",
I know it means I need to specify explicitly the format for my sink dataset, but I am not sure how to do that.
I suggest using the copy data tool.
step 1
step 2
According you comment, I tried a lot times, unless you choose the compressed file as source dataset and import the schemas, Azure Data factory copy actives will not help you decompress the file.
If the files in the the compressed file don't have the same schema, the copy active also could be failed.
Hope this helps.
The easiest way to do this: go to the dataset, and click on the tab Schema, then Import Schema.
Hope this helped!!

copy and decompress .tar file with Azure Data Factory

I m trying to copy and decompress .tar file from FTP to Azure Data Lake Store.
.tar file contains HTML files. In the copy activity, on a dataset, i select Compression type GZipDeflate, but I wonder what file format do I need to use? Is it supported to do such I thing without custom activity?
Unfortunately, Data factory doesn't support decompression of .tar files. The supported types for ftp are GZip, Deflate, BZip2, and ZipDeflate. (as seen here: https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#compression-support).
A solution may be to save the files in one of the supported formats, or try a custom activity as was explained here, although I'm not sure if it was for data factory v1 or v2: Import .tar file using Azure Data Factory
Hope this helped!
So its true that there is no way just to decompress .tar files with ADF or ADL Analytics, but there is an option to take a content from every file in .tar file and save as an output in U-SQL.
I have a scenario that I need to take content from html files inside the .tar file, so i just created html extractor that will take stream content of each html file in .tar file and save in a U-SQL output variable.
Maybe this can help someone who has a similar use case.
I used SharpCompress.dll for extracting and looping over .tar files in c#.

Resources