Zipped File in Azure Data factory Pipeline adds extra files - azure

I have a Data Factory pipeline (copy activity) that zips an entire folder and adds it to an Archive Folder.
The folder Structure in Blob storage
Main/Network/data.csv.
The source and sink use binary datasets
source location: wildcard path ->container/Main*
sink location:container/Archive/
compresessiontype ->.zipdeflate
I zip the entire Main folder and copy it to another Archive folder
Archive Folder: Main.zip
When I download this file and unzip it, the Main folder it contains
Is there a way the network file can be avoided in the pipeline?
because when I unzip the file, the network folder gets deleted since it has the same file and folder name
Thank you
Thank you

I tried the same option and there isn't an extra same file. Please ref my steps:
Source dataset:
Source settings:
Sink dataset:
Sink settings:
Output:
I download it and everything is ok:

Related

Can I copy files from Sharepoint to Azure Blob Storage using dynamic file path?

I am building a pipeline to copy files from Sharepoint to Azule Blob Storage at work.
After reading some documentation, I was able to create a pipeline that only copies certain files.
However, I would like to automate this pipeline by using dynamic file paths to specify the source files in Sharepoint.
In other words, when I run the pipeline on 2022/07/14, I want to get the files from the Sharepoint folder named for that day, such as "Data/2022/07/14/files".
I know how to do this with PowerAutomate, but my company does not want me to use PowerAutomate.
The current pipeline looks like the attached image.
Do I need to use parameters in the URL of the source dataset?
Any help would be appreciated.
Thank you.
Try this approach.
You can create a parameterized dataset as below
Then from the copy activity you can give the file path to this parameter
as
#concat('Data/',formatDateTime(utcnow(),'yyyy'),'/',formatDateTime(utcnow(),'MM'),'/',formatDateTime(utcnow(),'dd'))

How to delete specific blob file when that file has been removed from the source location via Azure Data Factory (self-hosted)

I created a Copy Data task in Azure Data Factory which will periodically copy modified files from my file system (self-hosted integration runtime) to an Azure Blob location. That works great when an existing file is modified or when a new file is created in the source, however, it won't delete a file from the blob destination when the corresponding file is deleted from the source file path location - since the file is gone, there is no modified date. Is there a way to keep the source and destination in sync via Azure Data Factory with individually deleted files such as in the scenario described above? Is there a better way to do this? Thanks.
I'm afraid to say Data Factory can't do that with actives, the pipeline only support read the exist file and copy them to sink. And sink side also doesn't support delete a file.
You can achieve that in code level, such as functions or notebook. After the copy finished, build a logic to compare the source and destination files list, delete the file which not exist in source list.
HTH.

Azure File Copy Task in pipeline creating $tf directory

I have a Azure File Copy Task as a part of my build. Some directory needs to be recursively copied to a blob container. Basically, the "cdn" directory from sources should be copied to the cdn blob container.
So, as "Source" for the task, i specified "$/Website/AzureWebsite/www.die.de/cdn-content/cdn/*"
As "Container Name" i specified "cdn".
The task works: My files do get copied. However, after the copying ends, i also see a directory named "$tf" which has various subdirectories with numbers as names. (0, 1, 2, etc.). All of those contain files named "*.gz" or ".rw".
Where is this coming from and how do i get rid of it?
I found this thread: https://developercommunity.visualstudio.com/content/problem/391618/tf-file-is-still-created-in-a-release-delete-all-s.html, the $tf folder is generated when mapping sources for TFVC repository, it's by design. It will create a temp workspace and map the sources first when you queue a build.
If you want to get rid of these, then set the Workspace type to a server workspace, but lose the advantages of local workspaces. See: TFS creates a $tf folder with gigabytes of .gz files. Can I safely delete it? for guidance.

How to unzip and move files in Azure?

Problem: I get an email with a zip file. In that zip are two files. I want to extract one of those files and place it in a folder in ADL.
I've automated this before using logic apps but the zip and extra file is throwing a wrench in the gears here. So far I've managed to get a logic app going to download the zip into a blob container and another logic app to extract the files to another container. Don't know how to proceed from there. Should I be using data factory? I want this automated and to run every week every time I receive an email from a specific sender.
Update:
I am sorry, dont notice your source is ADL, the below steps only need to change the source as ADL is ok. the key is select the Compression type of your source, it will unzip the file for you.
Original Answer:
Create a pipeline,
2.Create a activity.
3.After you create a copy data activity, you need to choose the source and the sink. From your description, you need to unzip a file in a storage container to another container. So, please follow these steps:
And the sink is similar, also choose the azure storage blob and choose the same linked service. Select the container that you want to copy to.
4.Then let's Validate all. If there is no problem, we can publish them.
Now please trigger your pipeline:
6.After that, your zip file will successful unzip and copy to another container.:)

How to do incremental backup to archive files at destination sub-folders

i am using debian. i need to run daily incremental backup for files that are modified since last backup. this is local backup from one disk to another in same system.
directory structure will be same as source
each folder and subfolder has its own compressed archive containing files-only
on first backup, folder structure will be made as source, and files will be archived
on each subsequent backup, newly created folders will be created and new files will be archived to destination. files modified since last backup, will be updated into relevant archive in destination.
is it possible with rsync on any other efficient method?? please help.

Resources