Azure Data Factory Compression Type - azure

I am working on a project trying to zip the files in Azure Blob Storage. I know the Azure Data Factory support compression type option, but I cannot find any reference to how this compression process behaves.
If I want to generate a *.zip file:
Origin Files:
ParentFolder
Image1.jpeg
Txt1.txt
ChildFolder
Image2.jpeg
Txt.txt
Is it going to zip only the ParentFolder? Or it is going to zip every single file recursively?

Compression type seems does not support .zip, it just supports GZIP, Deflate, BZIP2, ZipDeflate, see this link.
I test to copy the files like the sample you mentioned from one storage account to another one, use GZIP.
The files after being copied, it will be like as below(choose the Copy file recursively option).
ParentFolder
Image1.jpeg.gz
Txt1.txt.gz
ChildFolder
Image2.jpeg.gz
Txt.txt.gz
If not choose the Copy file recursively option, it will be like as below.
ParentFolder
Image1.jpeg.gz
Txt1.txt.gz

Related

Unzip a file contains multiple text files using copy activity in azure data factory

I have an issue while unzipping a file that contains multiple text files. I have used copy activity to unzip the file but its creating folder with name as zip file (folder named as source zip file) and can see my text files inside that. My requirement is text files should be placed in the folder I wanted.
I tried below copy sink properties but nothing working:
flatten hierarchy+ #{item().name}
none+ #{item(),name}
preserver hierarchy+ #{item().name}
Please unselect Preserve zip file name as folder at the source tab. ADF will not create the xxx.zip folder.
At source side dataset, we can select ZipDeflate as Compression type.
At sink side dataset, select none as Compression type.

Unzip files from a ZIP file and use them as source in an ADF Copy activity

I see there is a way to deflate a ZIP file but when there are multiple .csv files within a ZIP, how do I specify which to use as my source for copy activity? It is now parsing both csv files and giving as a single file and I'm not able to select the file I want as source
According to my test, we can't unzip .zip file in the ADF to get the file name lists in the ADF dataset. So, i provide below workaround for your reference.
Firstly, you could use Azure Function Activity to trigger a function which is for the decompression of your zip file.You only need to get the file name list then return it as an array.
Secondly, use ForEach Activity to loop the result, to get your desired file name.
Finally, inside ForEach Activity, please use #item() in the Dataset to configure the specific file path so that you could you could refer it in the copy activity.

Azure Data Factory unzip and move files

I know this has been asked (in other question as well as) here, which are exactly my cases. I have downloaded (via ADF) a zip file to Azure Blob and I am trying to decompress it and move the files to another location within the Azure Blob container.
However having tried both of those approaches I only end up with a zipped file moved to another location without it being unzipped.
Trying to understand your question - Was your outcome a zip file or the folder name has .zip on it? It sounds crazy, let me explain in detail. In ADF decompressing the zip file using copy activity creates a folder(which has .zip on its name) which has actual file in it.
Example: Let's say you have sample.txt inside abc.zip
Blob sourcepath: container1/abc.zip [Here abc.zip is a compressed file]
Output path will be: container2/abc.zip/sample.txt [Here, abc.zip is the decompressed folder name]
This is achieved when the copy behaviour of sink is "none". Hope it helps :)

Is there any way to convert the encoding of json files in Azure Blob Storage?

I have copied the files from remote server to Azure Blob Storage using Azure Data Factory Copy Activity (Binary file copy). Those files are json files and txt files. I would like to change the encoding of the files to UTF-16.
I know its possible to change the encoding while copying the text files from remote server by just mentioning the encoding as UTF-16 in sink side in Copy Activity.I have implemented the copy activity which takes every files as txt file and it was working file. Sometimes, i get some error related to row delimiter and i changed the implementation to binary copy.Now, i would like to change the encoding of those files from UTF-8 to UTF-16. I couldn't find any way to do it.
Any help/suggestions would be appreciated.
If a file is stored in blob storage, you cannot directly change it's content-encoding, even if you set the blob property of content-encoding.
The way(via code or manually) to do this is that you should download it -> encode it with UTF-16 -> then upload it again.

copy and decompress .tar file with Azure Data Factory

I m trying to copy and decompress .tar file from FTP to Azure Data Lake Store.
.tar file contains HTML files. In the copy activity, on a dataset, i select Compression type GZipDeflate, but I wonder what file format do I need to use? Is it supported to do such I thing without custom activity?
Unfortunately, Data factory doesn't support decompression of .tar files. The supported types for ftp are GZip, Deflate, BZip2, and ZipDeflate. (as seen here: https://learn.microsoft.com/en-us/azure/data-factory/supported-file-formats-and-compression-codecs#compression-support).
A solution may be to save the files in one of the supported formats, or try a custom activity as was explained here, although I'm not sure if it was for data factory v1 or v2: Import .tar file using Azure Data Factory
Hope this helped!
So its true that there is no way just to decompress .tar files with ADF or ADL Analytics, but there is an option to take a content from every file in .tar file and save as an output in U-SQL.
I have a scenario that I need to take content from html files inside the .tar file, so i just created html extractor that will take stream content of each html file in .tar file and save in a U-SQL output variable.
Maybe this can help someone who has a similar use case.
I used SharpCompress.dll for extracting and looping over .tar files in c#.

Resources