Azure CLI storage delete-batch with exclude pattern - azure

Consider this list of blobs (or any storage data):
backup-2018-08-29-0000.archive
backup-2018-08-29-0100.archive
backup-2018-08-29-0200.archive
backup-2018-08-29-0300.archive
backup-2018-08-29-0400.archive
backup-2018-08-29-0500.archive
backup-2018-08-29-0600.archive
backup-2018-08-29-0700.archive
backup-2018-08-29-0800.archive
backup-2018-08-29-0900.archive
backup-2018-08-29-1000.archive
backup-2018-08-29-1100.archive
backup-2018-08-29-1200.archive
backup-2018-08-29-1300.archive
backup-2018-08-29-1400.archive
backup-2018-08-29-1500.archive
backup-2018-08-29-1600.archive
backup-2018-08-29-1700.archive
backup-2018-08-29-1800.archive
backup-2018-08-29-1900.archive
backup-2018-08-29-2000.archive
backup-2018-08-29-2100.archive
backup-2018-08-29-2200.archive
backup-2018-08-29-2300.archive
I wish to delete all files except one. So my initial idea is to use --pattern flag.
--pattern
The pattern used for globbing files or blobs in the source. The
supported patterns are '*', '?', '[seq]', and '[!seq]'.
source
But I can not find info about how '*', '?', '[seq]', and '[!seq]' works.
In the command below, what pattern will seize all files excluding backup-2018-08-29-0000.archive?
$ az storage blob delete-batch --source mycontainer --pattern <pattern>
Update
Additional issue is that I have about 10000 backups collected in more than one year. Using non-batch operations seems non practical.

I doubt there is an easy way to do that with wildcards (it would be easy with regex).
[seq] and [!seq] works like that:
--pattern backup-2018-08-29-[01]???.archive
would delete all with files where the first digit after 29- is either 0 or 1:
backup-2018-08-29-0000.archive
backup-2018-08-29-0100.archive
backup-2018-08-29-0200.archive
backup-2018-08-29-0300.archive
backup-2018-08-29-0400.archive
backup-2018-08-29-0500.archive
backup-2018-08-29-0600.archive
backup-2018-08-29-0700.archive
backup-2018-08-29-0800.archive
backup-2018-08-29-0900.archive
backup-2018-08-29-1000.archive
backup-2018-08-29-1100.archive
backup-2018-08-29-1200.archive
backup-2018-08-29-1300.archive
backup-2018-08-29-1400.archive
backup-2018-08-29-1500.archive
backup-2018-08-29-1600.archive
backup-2018-08-29-1700.archive
backup-2018-08-29-1800.archive
backup-2018-08-29-1900.archive
[!seq] just negates that:
--pattern backup-2018-08-29-[!01]???.archive
This would delete:
backup-2018-08-29-2000.archive
backup-2018-08-29-2100.archive
backup-2018-08-29-2200.archive
backup-2018-08-29-2300.archive
To answer your question. I would rename (copy) the blob to e.g. backup-keep.archive and then delete the remaining backups using the pattern backup-2018-08-29-????.archive

You could use Acquire lease of the blob(in the portal or use az storage blob lease acquire
), then use the command az storage blob delete-batch to delete other blobs. If you lease the blob, the blob could not be deleted, if you want to delete it, just break the lease in the portal or use az storage blob lease break
My test command(I specifie the duration of the lease with 15 seconds):
az storage blob lease acquire --blob-name "azureProfile.txt"--container-name "testdel" --account-key "accountkey" --account-name "storagename" --lease-duration "15"
az storage blob delete-batch --source "testdel" --account-key "accountkey" --account-name "storagename"
It gives a warning, but it works fine on my side.
Check in the portal:

I solved the problem by doing two batch-delete commands:
#!/bin/bash
set -e
# AZURE_CONNECTION_STRING has taken from env
CONTAINER=backups
DATES="201[78]-??-??"
# delete blobs with a range of 1000-2300 timestamps
az storage blob delete-batch \
--connection-string $AZURE_CONNECTION_STRING \
--source $CONTAINER \
--pattern "$DATES-[1-2][0-9]00--mongo.archive"
# delete blobs with a range of 0100-0900 timestamps
az storage blob delete-batch \
--connection-string $AZURE_CONNECTION_STRING \
--source $CONTAINER \
--pattern "$DATES-0[1-9]00--mongo.archive"
With this script, I'm deleting all backups excluding backups made at midnight (with 0000 timestamp).

Related

How to check if Azure blob exists anywhere in storage account in multiple containers?

I have an Azure Storage account and in it are multiple Containers. How can I check all of the Containers to see if a specific named blob is in any of them? Also the blobs have multiple directories.
I know there's the az storage blob exists command, but that requires a Container name parameter. Will I have to use the List Containers command first?
Yes, you need to get the containers list. I have reproduced in my environment and got expected results as below and I followed Microsoft-Document:
One of the way is Firstly, i have executed below code for getting containers list:
$storage_account_name="rithemo"
$key="T0M65s8BOi/v/ytQUN+AStFvA7KA=="
$containers=az storage container list --account-name $storage_account_name --account-key $key
$x=$containers | ConvertFrom-json
$x.name
$key = Key of Storage Account
Now getting every blob present in all containers in my Storage account:
$Target = #()
foreach($emo in $x.name )
{
$y=az storage blob list -c $emo --account-name $storage_account_name --account-key $key
$y=$y | ConvertFrom-json
$Target += $y.name
}
$Target
Now checking if given blob exists or not as below:
$s="Check blob name"
if($Target -contains $s){
Write-Host("Blob Exists")
}else{
Write-Host("Blob Not Exists")
}
Or you can directly use az storage blob exists command as below after getting containers list:
foreach($emo in $x.name )
{
az storage blob exists --account-key $key --account-name $storage_account_name --container-name mycontainer --name $emo --name "xx"
}
Yes, you will need to use the List Containers command to get a list of all the containers in your storage account, and then you can loop through the list of containers and check each one for the specific blob that you are looking for.
Here's an example of how you can accomplish this using the Azure CLI
# First, get a list of all the containers in your storage account
containers=$(az storage container list --account-name YOUR_STORAGE_ACCOUNT_NAME --output tsv --query '[].name')
# Loop through the list of containers
for container in $containers
do
# Check if the specific blob exists in the current container
az storage blob exists --account-name YOUR_STORAGE_ACCOUNT_NAME --container-name $container --name YOUR_BLOB_NAME
# If the blob exists, print a message and break out of the loop
if [ $? -eq 0 ]; then
echo "Blob found in container: $container"
break
fi
done

Azure CLI - How to Undelete a Storage Blob with Soft-Delete and Versioning enabled

I am using the Azure CLI to add blobs to my storage account. Via the Azure CLI, I am successfully able to soft delete blobs; I can confirm this by viewing the soft-deleted blobs on the Azure Portal. I want to restore a blob that I delete via the Azure CLI again, but I am having trouble. I have attempted to use the az storage blob undelete command to do this. It is reportedly successful - I know this by adding the --verbose flag and seeing the 200 HTTP Status returned from the API call that the CLI triggers. The response is:
{
"undeleted": null
}
And when I look at the list of blobs in the Azure Portal again, there is no indication that the blob was actually restored/undeleted. Has anyone else had success using the undelete Azure CLI command previously?
Here is some terminal output; hopefully it is helpful in understanding what I'm trying to do:
PS C:\Users\admin> az storage blob list --account-name azartbackupstore01 -c backupcontainer01 -o table
Name IsDirectory Blob Type Blob Tier Length Content Type Last Modified Snapshot
------------------------------------------------------------------- ------------- ----------- ----------- -------- ------------------------ ------------------------- ----------
20/20162F8E84F43EEAAEC0DB0010545C32D8D1A0CF60284CA2E9A57884B55C2445 BlockBlob 47 application/octet-stream 2021-08-05T15:25:59+00:00
92/92D536261E45E93DB4A8F063A98102BF443DD7EC16B1075F7D13A1A326544035 BlockBlob 11458 application/octet-stream 2021-08-05T15:22:47+00:00
PS C:\Users\admin> az storage blob delete --account-name azartbackupstore01 -c backupcontainer01 --name 20/20162F8E84F43EEAAEC0DB0010545C32D8D1A0CF60284CA2E9A57884B55C2445
PS C:\Users\admin> az storage blob undelete --account-name azartbackupstore01 -c backupcontainer01 --name 20/20162F8E84F43EEAAEC0DB0010545C32D8D1A0CF60284CA2E9A57884B55C2445
{
"undeleted": null
}
PS C:\Users\admin> az storage blob list --account-name azartbackupstore01 -c backupcontainer01 -o table
Name IsDirectory Blob Type Blob Tier Length Content Type Last Modified Snapshot
------------------------------------------------------------------- ------------- ----------- ----------- -------- ------------------------ ------------------------- ----------
92/92D536261E45E93DB4A8F063A98102BF443DD7EC16B1075F7D13A1A326544035 BlockBlob 11458 application/octet-stream 2021-08-05T15:22:47+00:00
Apparently when you have soft-delete and versioning enabled on blobs, something weird is happening (even in the Azure portal where the blob is shown as deleted but the blob state is null, and the versions of the deleted blob are still shown as active).
But I found some kind of a workaround.
In short:
Get the (latest) versionId of the blob you want to undelete
Get the blob URI and add the versionId and SAS token as query parameters to the URI.
Copy the blob where source URI is the deleted blob including versionId (I found this solution in the code here)
When a versioned blob is soft-deleted, it will show up with the command:
az storage blob list --account-name azartbackupstore01 -c backupcontainer01 -o table --include v
I only added the --include (v)ersion at the end which will show all versions of the blobs. The --include (d)eleted will not work, because the blob somehow does not have have a state deleted.
Here is how I've done it:
$blobName="20/20162F8E84F43EEAAEC0DB0010545C32D8D1A0CF60284CA2E9A57884B55C2445"
$containerName="backupcontainer01"
$accountName="azartbackupstore01"
$sas="replace with your sas token"
# query all blobs where name equals $blobName, and reverse sort by versionId (which is the date) so most recent will be the first in the list
$versionId=az storage blob list --account-name $accountName -c $containerName --include v -o json --query "reverse(sort_by([?name=='$blobName'], &versionId))[0].versionId"
$blobUriRoot=az storage blob url --account-name $accountName -c $containerName --name $blobName
# The blobUriRoot and versionId variables are outputted with additional quotes, so these need to be replaced.
$blobUri=$($blobUriRoot + "?versionId=" + $versionId).Replace('"', "")
$blobUriWithSas = $blobUri + "&" + $sas
az storage blob copy start --account-name $accountName --destination-blob $blobName --destination-container $containerName --source-uri $blobUriWithSas
After running above commands, the specified blob is active again.

Azure CLI - az storage blob delete-batch pattern

I have a container called container1 in my Storage Account storageaccount1, with the following files:
blobs/tt-aa-rr/data/0/2016/01/03/02/01/20.txt
blobs/tt-aa-rr/data/0/2016/01/03/02/02/12.txt
blobs/tt-aa-rr/data/0/2016/01/03/02/03/13.txt
blobs/tt-aa-rr/data/0/2016/01/03/03/01/10.txt
I would like to delete the first 3, for that I use the following command:
az storage blob delete-batch --source container1 --account-key XXX --account-name storageaccount1 --pattern 'blobs/tt-aa-rr/data/0/2016/01/03/02/*' --debug
The files are not deleted and I see the following log:
urllib3.connectionpool : Starting new HTTPS connection (1): storageaccount1.blob.core.windows.net:443
urllib3.connectionpool : https://storageaccount1.blob.core.windows.net:443 "GET /container1?restype=container&comp=list HTTP/1.1" 200 None
What is wrong with my pattern?
If I try to delete file by file it works.
As stated in comments, you are not able to apply patterns to subfolders, only first level folders, as documented here. But if you want, you can easily write a script to list the blobs in your container, using the prefix to filter them az storage blob list and then apply the delete for each of the result blobs.
Here is what just worked for me — applied to the command you listed above.
az storage blob delete-batch --source container1 --account-key XXX --account-name storageaccount1 --pattern blobs/tt-aa-rr/data/0/2016/01/03/02/\* --debug
I didn't quote the pattern argument and I added an escape before the *. Using iTerm2 on a Mac. I didn't try --debug but the --dryrun argument was really helpful in getting it to tell me what it had matched (or not!).

How to correctly grab blob names with whitespaces in azure from az cli

I have a user that is putting a lot of whitespaces in their filenames and this is causing a download script to go bad.
To get the names of the blobs I use this:
BLOBS=$(az storage blob list --container-name $c \
--account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY \
--query "[].{name:name}" --output tsv)
What is happening for a blob like blob with space.pdf it is getting stored as [blob\twith\tspace.pdf] where \t is the tab. When I iterate in an effort to download obviously I can't get at the file.
How can I do this correctly?
You can use this command az storage blob download-batch.
I tested it in azure portal, all the blobs including whose name contains white-space are downloaded.
The command:
c=container_name
AZURE_STORAGE_ACCOUNT=xx
AZURE_STORAGE_KEY=xx
//download the blobs to clouddrive
cd clouddrive
az storage blob download-batch -d . -s $c --account-name $AZURE_STORAGE_ACCOUNT --account-key $AZURE_STORAGE_KEY
The test result:

Azure CLI: storage blob delete-batch to delete all blobs excluding two directories

I have a PowerShell script that currently deletes all blobs in my $web container.
az storage blob delete-batch --account-name myaccountname --source $web
This works great, but now I want to exclude two directories from being deleted. I've looked over the documentation and I'm still not sure how the exclusion syntax is supposed to look.
I'm certain that I have to use the --pattern parameter.
The pattern used for globbing files or blobs in the source. The supported patterns are '*', '?', '[seq]', and '[!seq]'.
I'm hoping someone can let me know what the value of the --pattern param should look like so that I can delete everything in the $web container except the blobs in the /aaa and /bbb directories.
az storage blob delete-batch --account-name myaccountname --source $web --pattern ???
According to my test, if you want to parameter --pattern to exclude a directory from being deleted, you can use the expression '[[!]]'
az storage blob delete-batch --source <container_name> --account-name <storage_account_name> --pattern '[!<folder name>]*'
For example:
The structure of my container is as below before I run the command.
Run the command
az storage blob delete-batch --source test --account-name blobstorage0516 --pattern '[!user&&!people]*'
The structure of my container is as below after I run the command.

Resources