i am using debian. i need to run daily incremental backup for files that are modified since last backup. this is local backup from one disk to another in same system.
directory structure will be same as source
each folder and subfolder has its own compressed archive containing files-only
on first backup, folder structure will be made as source, and files will be archived
on each subsequent backup, newly created folders will be created and new files will be archived to destination. files modified since last backup, will be updated into relevant archive in destination.
is it possible with rsync on any other efficient method?? please help.
Related
I have a Data Factory pipeline (copy activity) that zips an entire folder and adds it to an Archive Folder.
The folder Structure in Blob storage
Main/Network/data.csv.
The source and sink use binary datasets
source location: wildcard path ->container/Main*
sink location:container/Archive/
compresessiontype ->.zipdeflate
I zip the entire Main folder and copy it to another Archive folder
Archive Folder: Main.zip
When I download this file and unzip it, the Main folder it contains
Is there a way the network file can be avoided in the pipeline?
because when I unzip the file, the network folder gets deleted since it has the same file and folder name
Thank you
Thank you
I tried the same option and there isn't an extra same file. Please ref my steps:
Source dataset:
Source settings:
Sink dataset:
Sink settings:
Output:
I download it and everything is ok:
I created a Copy Data task in Azure Data Factory which will periodically copy modified files from my file system (self-hosted integration runtime) to an Azure Blob location. That works great when an existing file is modified or when a new file is created in the source, however, it won't delete a file from the blob destination when the corresponding file is deleted from the source file path location - since the file is gone, there is no modified date. Is there a way to keep the source and destination in sync via Azure Data Factory with individually deleted files such as in the scenario described above? Is there a better way to do this? Thanks.
I'm afraid to say Data Factory can't do that with actives, the pipeline only support read the exist file and copy them to sink. And sink side also doesn't support delete a file.
You can achieve that in code level, such as functions or notebook. After the copy finished, build a logic to compare the source and destination files list, delete the file which not exist in source list.
HTH.
I have a Azure File Copy Task as a part of my build. Some directory needs to be recursively copied to a blob container. Basically, the "cdn" directory from sources should be copied to the cdn blob container.
So, as "Source" for the task, i specified "$/Website/AzureWebsite/www.die.de/cdn-content/cdn/*"
As "Container Name" i specified "cdn".
The task works: My files do get copied. However, after the copying ends, i also see a directory named "$tf" which has various subdirectories with numbers as names. (0, 1, 2, etc.). All of those contain files named "*.gz" or ".rw".
Where is this coming from and how do i get rid of it?
I found this thread: https://developercommunity.visualstudio.com/content/problem/391618/tf-file-is-still-created-in-a-release-delete-all-s.html, the $tf folder is generated when mapping sources for TFVC repository, it's by design. It will create a temp workspace and map the sources first when you queue a build.
If you want to get rid of these, then set the Workspace type to a server workspace, but lose the advantages of local workspaces. See: TFS creates a $tf folder with gigabytes of .gz files. Can I safely delete it? for guidance.
We have a code base which is downloaded from internet (GitHub repository). Updating process is following:
p4 Checkout existing version
Download new version from internet and extract it over old version
p4 Revert unchanged files
p4 Submit changes
Problem with this approach is that files which are not present in downloaded repo (removed from GitHub repo) are still present in file system and considered as unchanged. Revert unchanged files will revert them back and keep in depot/workspace. This is particular problem for Java files since we compile by specify root folder. Remaining file is unreferenced in new source but you can't see it.
p4 clean has option -d
Deleted files: Find those files in the depot that do not exist in your workspace and add them to the workspace.
but I am looking for opposite
Find those files in the workspace that do not exist in your depot and delete them from the depot.
If I delete whole folder structure from file system, workspace goes out of sync.
How to find/mark for delete files which are not present in new folder structure?
This is my typically recommended workflow for this use case:
Start with an empty workspace
Extract the current version of the tree into the workspace
p4 flush to the revision you want to use as the base (if you've made no changes to this branch on the Perforce side, you can just use the default #head)
p4 reconcile to open all files for the appropriate action
p4 submit
To elaborate on step 3: the "base" should be whatever revision the two trees were last in sync at. If this is a one-way operation, it's always just the latest revision (which came directly from github). If you're making changes on both sides, you should have a separate branch on the Perforce side for your github imports, and only use it for imports; then do one-way merges from there into your development mainline so you can resolve differences with all the right history tracking.
I have synced files around 1 Tb from one server to another and set up P4, but I have a different workspace in the new server so if i initiate a p4 sync in new server, Will it add only the difference or will it duplicate the files?(source remains the same)
If the workspace map is the same as the old client, and there have been no changes, you can do a p4 sync -k. This updates the metadata on the server, without doing the file transfers. If there have been changes, you'll have to force sync those files to get the correct revision.
Since you created a new workspace, and P4 doesn't know that you have the files already existing on your new environment, it will pull over all of those files.
If you kept the same workspace name, (by default) it would pull over only "new" files (because it thinks you already have the files you previously synced).