How to clean an Azure storage Blob container? - azure

I just want to clean (dump, zap, del .) an Azure Blob container. How can I do that?
Note: The container is used by IIS (running Webrole) logs (wad-iis-logfiles).

A one liner using the Azure CLI 2.0:
az storage blob delete-batch --account-name <storage_account_name> --source <container_name>
Substitute <storage_account_name> and <container_name> by the appropriate values in your case.
You can see the help of the command by running:
az storage blob delete-batch -h

There is only one way to bulk delete blobs and that is by deleting the entire container. As you've said there is a delay between deleting the container and when you can use that container name again.
Your only other choice is to delete the one at a time. If you can do the deleting from the same data centre where the blobs are stored it will be faster than running the delete locally. This probably means writing code (or you could RDP into one of your instances and install cloud explorer). If you're writing code then you can speed up the overall process by deleting the items in parallel. Something similar to this would work:
Parallel.ForEach(myCloudBlobClient.GetContainerReference(myContainerName).ListBlobs(), x => ((CloudBlob) x).Delete());

Update: Easier way to do it now (in 2018) is to use the Azure CLI. Check joanlofe's answer :)
Easiest way to do it in 2016 is using Microsoft Azure Storage Explorer IMO.
Download Azure Storage Explorer and install it
Sign in with the appropriate Microsoft Account
Browse to the container you want to empty
Click on the Select All button
Click on the Delete button

Try using cloudberry product for windows azure
this is the link: http://www.cloudberrylab.com/free-microsoft-azure-explorer.aspx
you can search in the blob for specific extension. select multiple blobs and delete them

If you mean you want to delete a container. I would like to suggest you to check http://msdn.microsoft.com/en-us/library/windowsazure/dd179408.aspx to see if Delete Container operation (The container and any blobs contained within it are later deleted during garbage collection) could fulfill the requirement.

If you are interested in a CLI way, then the following piece of code will help you out:
for i in `az storage blob list -c "Container-name" --account-name "Storage-account-name" --account-key "Storage-account-access-key" --output table | awk {'print $1'} | sed '1,2d' | sed '/^$/d'`; do az storage blob delete --name $i -c "Container-name" --account-name "Storage-account-name" --account-key "Storage-account-access-key" --output table; done
It first fetches the list of blobs in the container and deletes them one by one.

If you are using a spark (HDInsight) cluster which has access to that storage account, then you can use HDFS commands on the command line;
hdfs dfs -rm -r wasbs://container_name#account_name.blob.core.windows.net/path_goes_here
The real benefit is that the cluster is unlikely to go down, and if you have screen running on it, then you won't lose your session whilst you delete away.

For This case the better option is to identify the list of item found in the container. then delete each item from the container. That is the best option. If you delete the container you should have a run time error on the next time...

You can use Cloud Combine to delete all the blobs in your Azure container.

Related

Azure container copy only changes

I would like to update static website assets from github repos. The documentation suggests to use an action based on
az storage blob upload-batch --account-name <STORAGE_ACCOUNT_NAME> -d '$web' -s .
If I see this correct, this copies all files regardless of the changes. Even if only one file was altered. Is it possible to only transfer files that have been changed? Like rsync does.
Else I would try to judge the changed files based on the git history and only transfer them. Please also answer, if you know an existing solution in this direction.
You can use azcopy sync to achieve that. That is a different tool, though.
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-blobs-synchronize?toc=/azure/storage/blobs/toc.json
https://learn.microsoft.com/en-us/azure/storage/common/storage-ref-azcopy-sync
Based on the sugggestion by #4c74356b41, I discovered that the mentioned tool was recently integrated into the az tool.
It can be used the same way as az storage blob upload-batch. The base command is:
az storage blob sync

How to copy one storage account's container's blobs to another storage account's container's blobs

I have two storage accounts(storage1) and (storage2) and both of them have containers called data.
Now, storage1's data contains folder called database-files which contains lots of folders recursively. I mean it's kind of huge.
What I am trying to do is I want to copy database-files and everything that's in it from storage1's data container to storage2's data container. Note: both storage accounts are in the same resource group and subscription.
Here is what I've tried:
az storage blob copy start-batch --source-account-name "storage1" --source-container "data" --account-name "storage2" --destination-container "data"
This worked fine, but The problem is time it takes is ridiculously big and I can't wait this much because I want to do this command for one of my release . Which means that i need this as soon as fast, so that my deployment happens fast.
Is there any way to make it faster? maybe zip it/copy it/unzip it? Even If I use AzCopy, I have no idea how it's going to help with timing. All it helps is it doesn't have point of failure and also I have no idea how to use it via azure cli.
How can I proceed?

azure blob storage cli list directories

I'm writing a PowerShell script that uses the Azure Storage CLI. I'm looking for a way to list only the directories under a certain path. I can easily list blobs using az storage blob list but I'm hoping I can just get the directories without post-processing the result. Given that Azure blob storage has a flat file structure, I'm not really sure this is possible but thought someone might know something I don't.
Thanks.
You can use the --delimiter option to get closer to what you want -- top level folders (and blobs), but not nested blobs:
# suppress nested blobs
$ az storage blob list -c foo --delimiter '/' --output table
Name Blob Type Content Type
--------- ----------- -------------------------
dir1/
dir2/
dir3/
dir4/
file.txt BlockBlob text/plain; charset=utf-8
This suppresses all the blobs at deeper levels. From there, you can filter down to just the "folders" because the folder entries have zero metadata, while the blobs do. The folder entries also end in a /. The az CLI can filter that for you, too, if you provide the query option:
az storage blob list -c foo --delimiter '/' --query '[].name | [?ends_with(#, `/`)]'
[ "dir1/",
"dir2/",
"dir3/",
"dir4/" ]
Edit:
Another flag --auth-mode login might be needed if you have trouble with fetching resources due to the authentication issues
z storage blob list -c foo --auth-mode login --delimiter '/' --query '[].name | [?ends_with(#, `/`)]'
As mentioned, Azure Blob Storage is flat. For a hierarchical filesystem, check out Azure File Storage.
Check out the quickstart for the CLI: https://learn.microsoft.com/en-us/azure/storage/files/storage-how-to-use-files-cli
I ended up deciding that this wasn't really an option since the file system is flat and the "paths" are just virtual. I decided to go a different direction with my script that didn't require batching based on these virtual directories.

AzCopy - breaks on location with $ (dollar sign)

Goal is to copy straight into my blob container named "$web".
Problem is, dollar signs seem to break AzCopy's location parsing...
AzCopy.exe /Source:"C:\temp\" /Dest:"https://mystorage.blob.core.windows.net/$web" /DestKey:"..." /SetContentType /V
Invalid location 'https://mystorage.blob.core.windows.net/$web', address could not be parsed.
I don't get to choose the container name. Escaping the $, aka
\$
didn't work.
How can I workaround this? Insights appreciated. Thanks!
#Gaurav has pointed out the problem. For now Azcopy can only recognize the dollar sign with $root container. Also test in powershell, no breaking, but files are just uploaded to $root despite the name after $.
The new feature generating this $web container--Static website hosting for Azure Storage has just released. It may take time for Azcopy to catch up the change.
Have opened an issue, you can subscribe it for progress.
Update
Latest v7.3.0 Azcopy has supported this feature, and for VSTS users, Azure File Copy v2 task(2.0.7) is working with this latest version as well.
To future readers who may be tempted to use pre-baked VSTS tasks like File Copy (which uses AzCopy under the hood), I recommend considering the Azure CLI task instead, e.g.
az storage blob upload-batch --account-name myAccountName --source mySource -d $web
My client wasn't willing to wait for a schedule they didn't control so switching to the CLI path moved our dependency one level upstream & removed having to wait on the VSTS release cadence (looks like ~6 weeks this time).
Thanks Jerry for posting back, kudos! In my VSTS I see File Copy v2.0 Preview seems to be available and ostensibly fixes this issue. Static website hosting direct from Azure storage is a nice feature and I'm happy Azure offers it.
(I hope in the future MS may be able to improve cross-org communication so savvy users keen to checkout new feature releases can have a more consistent experience across all the public-facing surface area.)
The accepted answer is a viable workaround suggesting using az storage blob upload-batch but the blob destination argument $web needs to be single quoted to work in PowerShell. Otherwise PowerShell will refer to a variable with the name "web"
E.g. Upload the current directory: az storage blob upload-batch --account-name myaccountname --source . -d '$web'
the dollar sign works fine if you execute azcopy via cmd. if you use powershell, you have to escape the $ sign with `
so instead of:
azcopy list "https://mystorage.blob.core.windows.net/$web?..."
# or
azcopy copy "c:\temp" "https://mystorage.blob.core.windows.net/$web?..."
use:
azcopy list "https://mystorage.blob.core.windows.net/`$web?..."
# or
azcopy "c:\temp" "https://mystorage.blob.core.windows.net/`$web?..."
btw.: I received the following errors when I did not escape the dollar sign:
failed to traverse container: cannot list files due to reason -> github.com/Azure/azure-storage-blob-go/azblob.newStorageError, /home/vsts/go/pkg/mod/github.com/!azure/azure-storage-blob-go#v0.15.0/azblob/zc_storage_error.go:42
===== RESPONSE ERROR (ServiceCode=OutOfRangeInput) =====
Description=The specified resource name length is not within the permissible limits.

How to sync between two Azure storage (blobs) hosted on two different data centers

We are planning to deploy our azure web application to two separate data centers (one located in West Europe and the other located in Southeast Asia) for purely performance reasons. We allow users to upload files which means we need to keep the blob storage of the two data centers in sync. I know Azure provides support for synchronizing structured data but there seems to be no such support for blob synchronization. My questions is:
Is there a service that provides blob synchronization between different data centers? if not, how can I implement one? I see many samples on the web to sync between Azure blob storage and local file system and vice versa but not between data centers.
Is there a service that provides blob synchronization between
different data centers?
No. Currently no such service exists out of the box which would synchronize content between 2 data centers.
if not, how can I implement one?
Although all the necessary infrastructure is available for you to implement this, the actual implementation would be tricky.
First you would need to decide if you want real-time synchronization or will a batched synchronization would do?
For realtime synhroniztion you could rely on Async Copy Blob. Using async copy blob you can actually instruct the storage service to copy blob from one storage account to another instead of manually download the blob from source and uploading to target. Assuming all uploads are happening from your application, as soon as a blob is uploaded you would know in which datacenter it is being uploaded. What you could do is create a SAS URL of this blob and initiate an async copy to the other datacenter.
For batched synchronization, you would need to query both storage accounts and list blobs in each blob container. In case the blob is available in just one storage account and not other, then you could simply create the blob in destination storage account by initiating async copy blob. Things would become trickier if the blob (by the same name) is present in both storage accounts. In this case you would need to define some rules (like comparing modified date etc.) to decide whether the blob should be copied from source to destination storage account.
For scheduling the batch synchronization, you could make use of Windows Azure Scheduler Service. Even with this service, you would need to write code for synchronization logic. Scheduler service will only take care of scheduling part. It won't do the actual synchronization.
I would recommend making use of a worker role to implement synchronization logic. Another alternative is Web Jobs which are announced recently though I don't know much about it.
If your goals are just about performance and the content is public use Azure CDN for this. Point it at your primary blob storage container and it will copy the files around the world for best performance.
I know this is an old query and much would have changed in the recent past. I ended up this link while searching for the similar task . So thought will update the latest from AzCopy v10. It has an sync option ;
Synchronizes file systems to Azure Blob storage or vice versa.
Use azcopy sync . Ideal for incremental copy scenarios.
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-azcopy-v10
You can automate this task with powershell:
Download all Blobs (with Snapshots) from One Windows Azure Storage Account
http://gallery.technet.microsoft.com/scriptcenter/all-Blobs-with-Snapshots-3b184a79
Using PowerShell to move files to Azure Storage
http://www.scarydba.com/2013/06/03/using-powershell-to-move-files-to-azure-storage/
Copy all VHDs in Blob Storage from One Windows Azure Subscription to Another
http://gallery.technet.microsoft.com/scriptcenter/Copy-all-VHDs-in-Blog-829f316e
Old question I know, but the Windows.Azure.Storage.DataMovement library is good for this.
https://learn.microsoft.com/en-us/azure/storage/common/storage-use-data-movement-library
Using Bash with Azure CLI and AZCopy - Code is on Github and associated video on YouTube to get it working.
https://github.com/J0hnniemac/yt-blobsync
#!/bin/bash
cd /home
app_id=""
tenant=""
sourceurl="https://<>.blob.core.windows.net"
destinationurl="https://<>.blob.core.windows.net"
pemfile="/home/service-principal.pem"
sourceaccount=$(echo $sourceurl | awk -F/ '{print $3}' | awk -F. '{print $1}')
destinationaccount=$(echo $destinationurl | awk -F/ '{print $3}' | awk -F. '{print $1}')
echo $app_id
echo $tenant
echo $sourceurl
echo $destinationurl
echo $sourceaccount
echo $destinationaccount
az login --service-principal --password $pemfile --username $app_id --tenant $tenant
# list storage containers
az storage container list --auth-mode login --account-name $sourceaccount -o=table | awk 'NR>1 {print $1}' | grep networking-guru > src.txt
az storage container list --auth-mode login --account-name $destinationaccount -o=table | awk 'NR>1 {print $1}' | grep networking-guru > dst.txt
grep -vf dst.txt src.txt > diff.txt
for blob_container in $(cat diff.txt);
do
echo $blob_container;
newcmd="az storage container create --auth-mode login --account-name $destinationaccount -n $blob_container --fail-on-exist"
echo "---------------------------------"
echo $newcmd
eval $newcmd
done
echo "performing AZCOPY login"
azcopy login --service-principal --certificate-path $pemfile --application-id $app_id --tenant-id $tenant
echo "performing AZCOPY sync for each container"
for blob_container in $(cat src.txt);
do
#Create timestame + 30 Minutes for SAS token
end=`date -u -d "30 minutes" '+%Y-%m-%dT%H:%MZ'`
sourcesas=`az storage container generate-sas --account-name $sourceaccount --as-user --auth-mode login --name $blob_container --expiry $end --permissions acdlrw`
echo $sourcesas
# remove leading and trailing quotes from SAS Token
sourcesas=$(eval echo $sourcesas)
echo $sourcesas
src="$sourceurl/$blob_container?$sourcesas"
dst="$destinationurl/$blob_container"
echo $src
echo $dst
synccmd="azcopy sync \"$src\" \"$dst\" --recursive --delete-destination=true"
echo $synccmd
eval $synccmd
done

Resources