Azure Blob storage azCopy replace container contents - azure

Lets say I have container one with 3 files and I have container two with those same 3 files + 1 extra for a total of 4.
Is it possible to do a copy/replace so that container 2 only contains 3 files when doing azcopy?

AzCopy currently don't support to delete any blobs, it focus on transfer blobs from source to destination.
You might can write your own code the compare the 2 container and do clean up.

Related

Is there possibility to synchronize data between two Azure blob storages

I have a task to copy BLOB storage to the other location and make synchronization between them. Unfortunately, I didn't find a solution for it. Is there a possibility to make it more simple way?
You can use the AzCopy utility to synchronize files, or replicate a source location to a destination location.
The azcopy sync command identifies all files at the destination, and then compares file names and last modified timestamps before the starting the sync operation. If you set the --delete-destination flag to true AzCopy deletes files without providing a prompt. If you want a prompt to appear before AzCopy deletes a file, set the --delete-destination flag to prompt.
Also check the az storage blob sync from Azure CLI.
Also consider the newer: Object Replication for Block Blobs.
Object replication asynchronously copies block blobs between a source
storage account and a destination account.

Use the AzureCLI task to delete containers

I know that az storage blob delete-batch lets you delete blobs from a blob container recursively. I need to delete containers instead of single blobs. In particular, I need to delete containers older than two years. Is there any way to accomplish this?
As I see it, there're two parts to your problem:
Deleting Multiple Containers: For this you can write a script that will first list the containers using az storage container list, loop over that list to delete each container individually using az storage container delete.
Find Containers Older Than 2 Years: This is going to be a tricky thing because currently there's no way to find out when a blob container was created. It does have a Last Modified Date property but that gets changed every time an operation is performed on that blob container (not including the operations performed on the blobs inside that container).

Unix commands on Azure csv files in blob

how to run sed command on some csv files i have in azure blob storage ?
I am using Azure copy activity to copy data from csv file to postgres, but my csv is a big 20 gb file and contains NULL character \x000 something.. which is not recognized by postgres Text data type. ADF copy activity cannot convert csv string columns to postgres abyte, so only option is to use Text. I thought of a workaround solution to run sed command on my csv to substitute null character with some other character like - . So I need to know how to run sed commands on azure csv files which are in blob storage. should i copy them first to a new vm which has linux, but also note that adf copy activity does not show an option to copy binary files from blob to some lunux vm
You can't treat blobs as local files. You'll have to download them first, to local storage (local can be in your vm or anywhere else that your machine has access to). As for Data Factory: You definitely can copy content from a VM, as long as you create an appropriate file share (e.g. samba share), along with Integration Runtime, if the VM in question is locked down to a particular VNet.
I simply added a resource i.e. a linux vm in ms azure subscription. copied files from azure blob to vm, ran sed command, copied files back to blob

Can we copy Azure blobs from one storage account to other storage accounts in parallel from same machine?

In Microsoft Azure, I have a source storage account in one region and 3 destination storage accounts in 3 different regions. I want to copy blob data from source storage account to all 3 destination storage accounts. Currently I am using the azcopy (version 6) command in a bash script to do it. First it completes for one region then starts for another. It takes almost an hour everyday due to the geographical distance between the regions. I wanted to know if azcopy has any option to copy blobs from source to multiple destinations in a parallel manner. Any other suggestions to reduce the time are also invited :)
Generalization of azcopy command being used in my bash script:
/usr/bin/azcopy --source https://[srcaccount].blob.core.windows.net/[container]/[path/to/blob] --source-key $SOURCE_KEY --destination https://[destaccount].blob.core.windows.net/[container]/[path/to/blob] --dest-key $DEST_KEY --recursive --quiet --exclude-older
Azcopy can always only copy data from one source to one destination. But since you mention that you need to do this every day, I would probably go for a scheduled pipeline in Azure Data Factory instead. There you can also set up the three different copy jobs as parallel activities.
Just spawn a separate instance of your script for each destination. That way your copy will happen in parallel.
Here is a simple guide for doing this in BASH : https://www.slashroot.in/how-run-multiple-commands-parallel-linux

Understanding azcopy resumption when downloading blob objects from Azure storage

I have a scenario where I need to download millions of blobs from Azure storage. I'm using azcopy.
More blobs are being added constantly to this storage device (# ~10K/day).
Imagine my download was disrupted. I try again 30 mins later. By then, ~200 more blobs have already been added to the storage. On my command line, I see:
Incomplete operation with same command line detected at the journal directory "/home/myuser/Microsoft/Azure/AzCopy", do you want to resume the operation?
Which of the following 2 scenarios will happen if I enter "Yes"?
1) It will download the remaining blob files, including the 200 new ones that were added
2) It will download the remaining blob files, excluding the new ones.
Please confirm.
It depends on the names of your new blobs. AzCopy records the last listed blob name into its journal file as a checkpoint, and after resuming, it will continue listing the blobs page by page since that checkpoint. If names of your new blobs are before the checkpoint, they won't be included in this downloading transfer job; otherwise, they will be included.

Resources